Header image

Common Crawl LLM Training Dataset Exposes Thousands of API Keys, Passwords

Recommendations to bolster authentication for AI service access points.

The large language model (LLM) training dataset Common Crawl has been discovered to contain almost 12,000 hardcoded live API keys and passwords.

According to analysis from Truffle Security, Common Crawl — which is leveraged to train DeepSeek and other LLMs — also had 2.76 billion web pages with live secrets, 63 percent of which are present across several pages.

Aside from leading to substantial financial losses from unpermitted AI utilisation, LLMjacking also facilitates the covert creation of malicious content.

Advice from SlashNext's email security field CTO Stephen Kowski was that this should prompt organisations' security teams to bolster authentication for AI service access points, restrict permissions, leverage extensive AI model usage logging and analytics, track suspicious API calls and configuration modifications, and create billing alerts.

Written by

Dan Raywood

Dan Raywood is a B2B journalist with 25 years of experience, including covering cybersecurity for the past 17 years. He has extensively covered topics from Advanced Persistent Threats and nation-state hackers to major data breaches and regulatory changes.

He has spoken at events including 44CON, Infosecurity Europe, RANT Forum, BSides Scotland, Steelcon and the National Cyber Security Show, and served as editor of SC Media UK, Infosecurity Magazine and IT Security Guru. He was also an analyst with 451 Research and a product marketing lead at Tenable.

Artificial Intelligence Published: 3 March Written by

Dan Raywood

Dan Raywood is a B2B journalist with 25 years of experience, including covering cybersecurity for the past 17 years. He has extensively covered topics from Advanced Persistent Threats and nation-state hackers to major data breaches and regulatory changes.

He has spoken at events including 44CON, Infosecurity Europe, RANT Forum, BSides Scotland, Steelcon and the National Cyber Security Show, and served as editor of SC Media UK, Infosecurity Magazine and IT Security Guru. He was also an analyst with 451 Research and a product marketing lead at Tenable.

Upcoming Events

No events found.

Related content

Google sues to dismantle AI-powered cybercrime operation

Google sues to dismantle AI-powered cybercrime operation

Meta AI customer support tricked into forwarding password reset codes

Meta AI customer support tricked into forwarding password reset codes

Organizations knowingly ship vulnerable code amid shrinking exploit windows

Organizations knowingly ship vulnerable code amid shrinking exploit windows

Securing the AI factory: Dell and Intel address new security gaps

Securing the AI factory: Dell and Intel address new security gaps

Claude Mythos: What CISOs Should Know

Claude Mythos: What CISOs Should Know

Shanita Sojan Recognised as Rising Star in Cybersecurity and AI Governance

Shanita Sojan Recognised as Rising Star in Cybersecurity and AI Governance

Nokod Security Wins Best Emerging Technology Award for Securing No-Code and AI Agent Risk

Nokod Security Wins Best Emerging Technology Award for Securing No-Code and AI Agent Risk

First Name

Last Name

Work Email

Phone

Company

Job Title

Please select an option

Please select an option

Please select an option

City

Please select an option

Please select an option

By providing your email and clicking the 'Subscribe' button, you acknowledge that you have read and agree to the CyberRisk Alliance Privacy Policy. You can unsubscribe at any time.

Please accept our terms and conditions

First Name

Last Name

Work Email

Phone

Company

Job Title

Please select an option

Please select an option

Please select an option

City

Please select an option

Please select an option

By providing your email and clicking the 'Subscribe' button, you acknowledge that you have read and agree to the CyberRisk Alliance Privacy Policy. You can unsubscribe at any time.

Please accept our terms and conditions