Header image

Common Crawl LLM Training Dataset Exposes Thousands of API Keys, Passwords

Recommendations to bolster authentication for AI service access points.


The large language model (LLM) training dataset Common Crawl has been discovered to contain almost 12,000 hardcoded live API keys and passwords.

According to analysis from Truffle Security, Common Crawl — which is leveraged to train DeepSeek and other LLMs — also had 2.76 billion web pages with live secrets, 63 percent of which are present across several pages.

Aside from leading to substantial financial losses from unpermitted AI utilisation, LLMjacking also facilitates the covert creation of malicious content.

Advice from SlashNext's email security field CTO Stephen Kowski was that this should prompt organisations' security teams to bolster authentication for AI service access points, restrict permissions, leverage extensive AI model usage logging and analytics, track suspicious API calls and configuration modifications, and create billing alerts.



Dan Raywood
Dan Raywood

Dan Raywood is a B2B journalist with 25 years of experience, including covering cybersecurity for the past 17 years. He has extensively covered topics from Advanced Persistent Threats and nation-state hackers to major data breaches and regulatory changes.

He has spoken at events including 44CON, Infosecurity Europe, RANT Forum, BSides Scotland, Steelcon and the National Cyber Security Show, and served as editor of SC Media UK, Infosecurity Magazine and IT Security Guru. He was also an analyst with 451 Research and a product marketing lead at Tenable.

Dan Raywood
Dan Raywood

Dan Raywood is a B2B journalist with 25 years of experience, including covering cybersecurity for the past 17 years. He has extensively covered topics from Advanced Persistent Threats and nation-state hackers to major data breaches and regulatory changes.

He has spoken at events including 44CON, Infosecurity Europe, RANT Forum, BSides Scotland, Steelcon and the National Cyber Security Show, and served as editor of SC Media UK, Infosecurity Magazine and IT Security Guru. He was also an analyst with 451 Research and a product marketing lead at Tenable.

Upcoming Events

No events found.