Ultimate Block List to Stop AI Bots

More than you might think, AI (Artificial Intelligence) and ML (Machine Learning) bots are crawling your site and scraping your content. They are collecting and using your data to train software like ChatGPT, OpenAI, DeepSeek, and thousands of other AI creations. Whether you or anyone approves of all this is not my concern for this post. The focus of this post is aimed at website owners who want to stop AI bots from crawling their web pages, as much as […]


This content originally appeared on Perishable Press and was authored by Jeff Starr

More than you might think, AI (Artificial Intelligence) and ML (Machine Learning) bots are crawling your site and scraping your content. They are collecting and using your data to train software like ChatGPT, OpenAI, DeepSeek, and thousands of other AI creations. Whether you or anyone approves of all this is not my concern for this post. The focus of this post is aimed at website owners who want to stop AI bots from crawling their web pages, as much as possible. To help people with this, I’ve been collecting data and researching AI bots for many months now, and have put together a “Mega Block List” to help stop AI bots from devouring your content.

The ultimate block list for stopping AI bots from crawling your site.

Contents

If you can edit a file, you can block a ton of AI bots.

Thanks: Special Thanks to Kristina Ponting for help with researching AI bots and sharing with the community. Find Kristina at Teskedsgumman and on Github.

Block AI Bots via robots.txt

The easiest way for most website owners to block AI bots, is to append the following list to their site’s robots.txt file. There are many resources explaining the robots.txt file, and I encourage anyone not familiar to take a few moments to learn more.

In a nutshell, the robots.txt file is a file that contains rules for bots to obey. So you can add rules that limit where bots can crawl, whether individual pages or the entire site. Once you have added some rules, simply upload the robots file to the public root directory of your website. For example, here is my robots.txt for Perishable Press.

Using WordPress? Block bad bots automatically with my free plugin, Blackhole for Bad Bots. Trap bad bots in a virtual black hole :)

To block AI bots via your site’s robots.txt file, append the following rules. Understand that bots are not required to obey robots.txt rules. Robots rules are merely suggestions. Good bots will follow the rules, bad bots will ignore the rules and do whatever they want. To force compliance, you can add blocking rules via Apache/.htaccess. That in mind, here are the robots rules to block AI bots..

Blocks over 100 AI bots and user agents.

Block list for robots.txt

# Ultimate AI Block List v1.0 20250211
# https://perishablepress.com/ultimate-ai-block-list/

User-agent: Agent GPT
User-agent: AgentGPT
User-agent: AIBot
User-agent: AI2Bot
User-agent: AISearchBot
User-agent: AlexaTM
User-agent: Alpha AI
User-agent: AlphaAI
User-agent: Amazon Bedrock
User-agent: Amazon Lex
User-agent: Amazonbot
User-agent: Amelia
User-agent: anthropic-ai
User-agent: AnyPicker
User-agent: Applebot
User-agent: AutoGPT
User-agent: AwarioRssBot
User-agent: AwarioSmartBot
User-agent: Brave Leo AI
User-agent: Bytespider
User-agent: CatBoost
User-agent: CC-Crawler
User-agent: CCBot
User-agent: ChatGPT
User-agent: Chinchilla
User-agent: Claude-Web
User-agent: ClaudeBot
User-agent: cohere-ai
User-agent: cohere-training-data-crawler
User-agent: Common Crawl
User-agent: commoncrawl
User-agent: Crawlspace
User-agent: crew AI
User-agent: crewAI
User-agent: DALL-E
User-agent: DataForSeoBot
User-agent: DeepMind
User-agent: DeepSeek
User-agent: DepolarizingGPT
User-agent: DialoGPT
User-agent: Diffbot
User-agent: DuckAssistBot
User-agent: FacebookBot
User-agent: Firecrawl
User-agent: Flyriver
User-agent: FriendlyCrawler
User-agent: Gemini
User-agent: Gemma
User-agent: GenAI
User-agent: Google Bard AI
User-agent: Google-CloudVertexBot
User-agent: Google-Extended
User-agent: GoogleOther
User-agent: GPT-2
User-agent: GPT-3
User-agent: GPT-4
User-agent: GPTBot
User-agent: GPTZero
User-agent: Grok
User-agent: Hugging Face
User-agent: iaskspider
User-agent: ICC-Crawler
User-agent: ImagesiftBot
User-agent: img2dataset
User-agent: IntelliSeek.ai
User-agent: ISSCyberRiskCrawler
User-agent: Kangaroo
User-agent: LeftWingGPT
User-agent: LLaMA
User-agent: magpie-crawler
User-agent: Meltwater
User-agent: Meta AI
User-agent: Meta Llama
User-agent: Meta.AI
User-agent: Meta-AI
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher
User-agent: MetaAI
User-agent: Mistral
User-agent: OAI-SearchBot
User-agent: OAI SearchBot
User-agent: omgili
User-agent: Open AI
User-agent: OpenAI
User-agent: PanguBot
User-agent: peer39_crawler
User-agent: PerplexityBot
User-agent: PetalBot
User-agent: RightWingGPT
User-agent: Scrapy
User-agent: SearchGPT
User-agent: SemrushBot
User-agent: Sidetrade
User-agent: Stability
User-agent: The Knowledge AI
User-agent: Timpibot
User-agent: VelenPublicWebCrawler
User-agent: WebChatGPT
User-agent: Webzio
User-agent: Whisper
User-agent: x.AI
User-agent: xAI
User-agent: YouBot
User-agent: Zero GTP
Disallow: /
Important: Whenever making changes to your robots.txt file, take a few moments to validate the rules using a free online robots checker.

Block AI Bots via Apache/.htaccess

To actually enforce the “Ultimate AI Block List”, you can add the following rules to your Apache configuration or main .htaccess file. Like many others, I’ve written extensively on Apache and .htaccess. So if you’re unfamiliar, there are plenty of great resources, including my book .htaccess made easy.

In a nutshell, you can add rules via Apache/.htaccess to customize the functionality of your website. For example, you can add directives that help control traffic, optimize caching, improve performance, and even block bad bots. And these rules operate at the server level. So while bots may ignore rules added via robots.txt, they can’t ignore rules added via Apache/.htaccess.

Using Apache? Check out my free, open-source 8G Firewall. 8G is lightweight, fast, and protects your site against a wide range of threats.

To block AI bots via Apache/.htaccess, add the following rules to either your server configuration file, or add to the main (public root) .htaccess file. Before making any changes, be on the safe side and make a backup of your files. Just in case something unexpected happens, you can easily roll back. That in mind, here are the Apache rules to block AI bots..

Blocks over 100 AI bots and user agents.

Block list for Apache/.htaccess

# Ultimate AI Block List v1.1 20250211
# https://perishablepress.com/ultimate-ai-block-list/

<IfModule mod_rewrite.c>
	
	RewriteEngine On
	
	RewriteCond %{HTTP_USER_AGENT} (Agent\ GPT|AgentGPT|AIBot|AI2Bot|AISearchBot|AlexaTM|Alpha\ AI|AlphaAI|Amazon\ Bedrock|Amazon\ Lex|Amazonbot)             [NC,OR]
	RewriteCond %{HTTP_USER_AGENT} (Amelia|anthropic-ai|AnyPicker|Applebot|AutoGPT|AwarioRssBot|AwarioSmartBot|Brave\ Leo\ AI|Bytespider|CatBoost)            [NC,OR]
	RewriteCond %{HTTP_USER_AGENT} (CC-Crawler|CCBot|ChatGPT|Chinchilla|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Common\ Crawl)            [NC,OR]
	RewriteCond %{HTTP_USER_AGENT} (commoncrawl|Crawlspace|crew\ AI|crewAI|DALL-E|DataForSeoBot|DeepMind|DeepSeek|DepolarizingGPT|DialoGPT|Diffbot)           [NC,OR]
	RewriteCond %{HTTP_USER_AGENT} (DuckAssistBot|FacebookBot|Firecrawl|Flyriver|FriendlyCrawler|Gemini|Gemma|GenAI|Google\ Bard\ AI|Google-CloudVertexBot)   [NC,OR]
	RewriteCond %{HTTP_USER_AGENT} (Google-Extended|GoogleOther|GPT-2|GPT-3|GPT-4|GPTBot|GPTZero|Grok|Hugging\ Face|iaskspider|ICC-Crawler|ImagesiftBot)      [NC,OR]
	RewriteCond %{HTTP_USER_AGENT} (img2dataset|IntelliSeek\.ai|ISSCyberRiskCrawler|Kangaroo|LeftWingGPT|LLaMA|magpie-crawler|Meltwater|Meta\ AI|Meta\ Llama) [NC,OR]
	RewriteCond %{HTTP_USER_AGENT} (Meta\.AI|Meta-AI|Meta-ExternalAgent|Meta-ExternalFetcher|MetaAI|Mistral|OAI-SearchBot|OAI\ SearchBot|omgili|Open\ AI)     [NC,OR]
	RewriteCond %{HTTP_USER_AGENT} (OpenAI|PanguBot|peer39_crawler|PerplexityBot|PetalBot|RightWingGPT|Scrapy|SearchGPT|SemrushBot|Sidetrade|Stability)       [NC,OR]
	RewriteCond %{HTTP_USER_AGENT} (The\ Knowledge\ AI|Timpibot|VelenPublicWebCrawler|WebChatGPT|Webzio|Whisper|x\.AI|xAI|YouBot|Zero\ GTP)                   [NC]
	
	RewriteRule (.*) - [F,L]
	
</IfModule>
Important: Remember to test well before going live. You can use a free user-agent request tool to make requests posing as various AI bots.

Notes

Note: The two block lists above (robots.txt and Apache/.htaccess) are synchronized and include/block the same AI bots.

Note: Both block lists are case-insensitive. The robots.txt rules are case-insensitive by default, and the Apache rules are case-insensitive due to the inclusion of the [NC] flag. So don’t worry about mixed-case bot names, their user agents will be blocked, whether uppercase, lowercase, or mIxeD cAsE.

Learn more: According to Google documentation, the value of the user-agent line (in robots.txt) is case-insensitive.

Note: Numerous user agents are omitted from the block lists because the names are matched in wild-card fashion. Here is a list showing removed/redundant bots:

AI2Bot            // included
AI2Bot-Dolma      // removed

Applebot          // included
Applebot-Extended // removed

ChatGPT           // included
ChatGPT-User      // removed

GoogleOther       // included
GoogleOther-Image // removed
GoogleOther-Video // removed

omgili            // included
omgilibot         // removed

OpenAI            // included
OpenAI GPT        // removed

Webzio            // included
Webzio-Extended   // removed

Changelog

Robots.txt

  • Version 1.0 – 2025/02/11 – Initial release.

Apache/.htaccess

  • Version 1.0 – 2025/02/11 – Initial release.
  • Version 1.1 – 2025/02/11 – Replaces REQUEST_URI with HTTP_USER_AGENT

Disclaimer

The information shared on this page is provided “as-is”, with the intention of helping people protect their sites against AI bots. The two block lists (robots.txt and Apache/.htaccess) are open-source and free to use and modify without condition. By using either block list, you assume all risk and responsibility for anything that happens. So use wisely, test thoroughly, and enjoy the benefits of my work :)

Support my work

I spend countless hours digging through server logs, researching user agents, and compiling block lists to stop AI and other unwanted bots. I share my work freely with the hope that it will help make the Web a more secure place for everyone.

If you benefit from my work and want to show support, please make a donation or buy one of my books, such as .htaccess made easy. You’ll get a complete guide to .htaccess and a ton of awesome techniques for optimizing and securing your site.

Of course, tweets, likes, links, and shares also are super helpful and very much appreciated. Your generous support enables me to continue developing AI block lists and other awesome resources for the community. Thank you kindly :)

Show support! Donate via PayPal, Stripe, or your favorite digital coin »

References

Thanks to the following resources for sharing their work with identifying and blocking AI bots.

Feedback

Got more? Leave a comment below with your favorite AI bots to block. Or send privately via my contact form. Cheers! :)



This content originally appeared on Perishable Press and was authored by Jeff Starr


Print Share Comment Cite Upload Translate Updates
APA

Jeff Starr | Sciencx (2025-02-11T17:28:52+00:00) Ultimate Block List to Stop AI Bots. Retrieved from https://www.scien.cx/2025/02/11/ultimate-block-list-to-stop-ai-bots/

MLA
" » Ultimate Block List to Stop AI Bots." Jeff Starr | Sciencx - Tuesday February 11, 2025, https://www.scien.cx/2025/02/11/ultimate-block-list-to-stop-ai-bots/
HARVARD
Jeff Starr | Sciencx Tuesday February 11, 2025 » Ultimate Block List to Stop AI Bots., viewed ,<https://www.scien.cx/2025/02/11/ultimate-block-list-to-stop-ai-bots/>
VANCOUVER
Jeff Starr | Sciencx - » Ultimate Block List to Stop AI Bots. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/02/11/ultimate-block-list-to-stop-ai-bots/
CHICAGO
" » Ultimate Block List to Stop AI Bots." Jeff Starr | Sciencx - Accessed . https://www.scien.cx/2025/02/11/ultimate-block-list-to-stop-ai-bots/
IEEE
" » Ultimate Block List to Stop AI Bots." Jeff Starr | Sciencx [Online]. Available: https://www.scien.cx/2025/02/11/ultimate-block-list-to-stop-ai-bots/. [Accessed: ]
rf:citation
» Ultimate Block List to Stop AI Bots | Jeff Starr | Sciencx | https://www.scien.cx/2025/02/11/ultimate-block-list-to-stop-ai-bots/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.