Recommendations to bolster authentication for AI service access points.
The large language model (LLM) training dataset Common Crawl has been discovered to contain almost 12,000 hardcoded live API keys and passwords.
According to analysis from Truffle Security, Common Crawl — which is leveraged to train DeepSeek and other LLMs — also had 2.76 billion web pages with live secrets, 63 percent of which are present across several pages.
Aside from leading to substantial financial losses from unpermitted AI utilisation, LLMjacking also facilitates the covert creation of malicious content.
Advice from SlashNext's email security field CTO Stephen Kowski was that this should prompt organisations' security teams to bolster authentication for AI service access points, restrict permissions, leverage extensive AI model usage logging and analytics, track suspicious API calls and configuration modifications, and create billing alerts.
Written by
Dan Raywood
Senior Editor
SC Media UK
Dan Raywood is a B2B journalist with more than 20 years of experience, including covering cybersecurity for the past 16 years. He has extensively covered topics from Advanced Persistent Threats and nation-state hackers to major data breaches and regulatory changes.
He has spoken at events including 44CON, Infosecurity Europe, RANT Conference, BSides Scotland, Steelcon and ESET Security Days.
Outside work, Dan enjoys supporting Tottenham Hotspur, managing mischievous cats, and sampling craft beers.