robots.txt
A file telling search-engine crawlers what they can and cannot crawl.
robots.txt sits at the root of your site (yourdomain.com/robots.txt) and tells crawlers which URLs they should or shouldn't visit. Each rule targets a User-Agent (Googlebot, ChatGPT-User, etc.) with Allow/Disallow patterns.
Crucially, robots.txt is a crawling directive, not an indexing one. A Disallowed page can still appear in search results if other sites link to it. To prevent indexing, use a `<meta name="robots" content="noindex">` tag instead.
Modern robots.txt should also point to your sitemap (`Sitemap: https://yourdomain.com/sitemap.xml`). With the rise of AI search, many sites now explicitly allow GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, and Google-Extended.
Related terms