EresusSecurity
ScannersHuggingFace Guard

HuggingFace Guard

Sentinel HuggingFace Guard checks model repositories before download or promotion, focusing on unsafe artifacts, provenance gaps, secret exposure, and model-bundle attack paths.

Definition

Sentinel HuggingFace Guard is a pre-download and pre-release intake workflow for model repositories. It helps teams decide whether a model repository can be mirrored, loaded, or promoted by checking files, metadata, manifests, secrets, and artifact risk signals.

Model intake gate

Treat external model repositories like third-party code. The first question is not whether the model works; it is whether the repository can be trusted enough to enter your build or inference environment.

Operational checklist
  • Scan before download when possible
  • Mirror approved repositories internally
  • Record source, commit, license, and hash
  • Block executable artifact formats unless explicitly approved

Signals to review

The most important intake signals are executable serialization formats, unsigned sidecar files, suspicious metadata, large archive expansion, and secrets in model cards or examples.

Operational checklist
  • Pickle, PyTorch checkpoint, joblib, and custom code files
  • Safetensors metadata and missing integrity checks
  • README, config, and example notebooks with credentials
  • External data references and archive traversal

Promotion policy

Approved models should be promoted with a signed manifest, reproducible scan output, and a clear owner. A clean scan should be attached to the release record.

Operational checklist
  • CRITICAL/HIGH findings block mirroring
  • MEDIUM findings require owner review
  • Manifest includes hash, source URL, model version, and scan date

Supply chain poisoning — OWASP LLM03:2025 / MITRE ATLAS AML.T0010

OWASP LLM03:2025 identifies model supply chain poisoning as a top threat. MITRE ATLAS AML.T0010 (ML Supply Chain Compromise) documents attack techniques including adversarial examples embedded in training data, backdoored model checkpoints, and malicious repository contributions. Attackers have demonstrated trojan checkpoints that behave normally on standard inputs but activate on embedded trigger patterns.

Operational checklist
  • Trojan checkpoints: backdoor triggers activate only on specific attacker-controlled inputs
  • MITRE ATLAS AML.T0010: ML Supply Chain Compromise — covers training data, model files, and pipeline components
  • Safetensors reduces code-execution risk but does not prevent metadata manipulation or provenance gaps
  • Model card provenance: verify SHA-256 hash, source organization, training data lineage, and license
  • Sightline DB: community-maintained vulnerability database for AI/ML supply chain issues
  • datasig (Trail of Bits): cryptographic fingerprinting of training datasets for AIBOM integrity

Commands

sentinel hf-guard org/model-name
sentinel hf-scan org/model-name
sentinel artifact ./models/org-model-name/ -f sarif -o sentinel.sarif

Expected output

Output should carry rule ID, severity, surface, evidence, and release decision in a way other teams can understand.

repo: org/model-name
status: blocked
findings:
  - PICKLE-EXEC CRITICAL model.pkl GLOBAL os.system
  - MODEL-SECRET-API-KEY CRITICAL README.md exposed token pattern

FAQ

Should every public model be blocked?

No. Public models need intake evidence, not blanket blocking. The policy should distinguish executable artifacts, missing provenance, and hygiene findings.

What is the safest first format?

Safetensors is generally preferred for tensor storage because it is designed to avoid pickle-style code execution, but metadata and integrity still need validation.

What is a trojan checkpoint?

A trojan checkpoint is a model that has been modified during training to produce attacker-desired outputs when presented with a specific trigger input, while appearing normal otherwise. Detection requires hash verification against a signed source, provenance review, and ideally behavioral testing.

References

Eresus support

Turn the finding into an action your team can actually close.

If you need exploit evidence, prioritization, remediation direction, and retesting for HuggingFace Guard, Eresus can help scope the work with your team.

Start Security Test