New preprint: AI_Bleeding — inference cost amplification via OOD linguistic payload

TL;DR: send queries in Grecanico or Farsi to an LLM endpoint → TTFT +59.8%, compute cost +2.8%, statistically significant. No vuln, no volumetric signature, evades all standard detection.

Worst case: exposed unauthenticated Ollama instance with num_predict=4096 + keep_alive=300s → Amplification Factor 17.56 Wh/KB. 3KB of attacker bandwidth → enough energy to charge a phone 5%.

Especially nasty for:

  • PA/judicial chatbots on fixed budgets
  • Pay-per-use API deployments with client-side exposed keys
  • PNRR-funded public sector AI with zero inference monitoring

Four scenarios: EDoS, browser JS distribution, Ollama open-proxy relay, frontier providers as involuntary relays.

All tests on self-hosted Ollama, no commercial endpoints touched.

Paper (CC BY 4.0): https://doi.org/10.13140/RG.2.2.26767.96166

#llmsecurity #infosec #threatmodeling #ollama #ood #AI #AIResearch #aisecurity