Header image

Common Crawl LLM Training Dataset Exposes Thousands of API Keys, Passwords

Recommendations to bolster authentication for AI service access points.


The large language model (LLM) training dataset Common Crawl has been discovered to contain almost 12,000 hardcoded live API keys and passwords.

According to analysis from Truffle Security, Common Crawl — which is leveraged to train DeepSeek and other LLMs — also had 2.76 billion web pages with live secrets, 63 percent of which are present across several pages.

Aside from leading to substantial financial losses from unpermitted AI utilisation, LLMjacking also facilitates the covert creation of malicious content.

Advice from SlashNext's email security field CTO Stephen Kowski was that this should prompt organisations' security teams to bolster authentication for AI service access points, restrict permissions, leverage extensive AI model usage logging and analytics, track suspicious API calls and configuration modifications, and create billing alerts.



Dan Raywood
Dan Raywood Senior Editor SC Media UK

Dan Raywood is a B2B journalist with more than 20 years of experience, including covering cybersecurity for the past 16 years. He has extensively covered topics from Advanced Persistent Threats and nation-state hackers to major data breaches and regulatory changes.

He has spoken at events including 44CON, Infosecurity Europe, RANT Conference, BSides Scotland, Steelcon and ESET Security Days.

Outside work, Dan enjoys supporting Tottenham Hotspur, managing mischievous cats, and sampling craft beers.

Dan Raywood
Dan Raywood Senior Editor SC Media UK

Dan Raywood is a B2B journalist with more than 20 years of experience, including covering cybersecurity for the past 16 years. He has extensively covered topics from Advanced Persistent Threats and nation-state hackers to major data breaches and regulatory changes.

He has spoken at events including 44CON, Infosecurity Europe, RANT Conference, BSides Scotland, Steelcon and ESET Security Days.

Outside work, Dan enjoys supporting Tottenham Hotspur, managing mischievous cats, and sampling craft beers.

Upcoming Events

02
Apr
Webinar

Benchmarking Security Skills and How to Ensure Secure-by-Design in the Enterprise

Consider how to prove the return on investment when implementing a secure-by-design initiative

image image image