Robots.txt Generator & Tester
Build a valid robots.txt file from scratch or test an existing one to see which URLs are allowed or blocked.
Generator
Tester
What Is robots.txt?
A robots.txt file is a plain-text file that lives at the root of your website (e.g. https://example.com/robots.txt). It tells search engine crawlers which URLs on your site they are allowed to access and which they should skip. Every major search engine — Google, Bing, Yahoo, and others — checks this file before crawling your site.
The file follows the Robots Exclusion Protocol, a standard that has been in use since 1994. It does not require any special server configuration; you simply place the file in your site's public root directory and crawlers will find it automatically.
How to Use This Tool
Generator: Choose a preset template or build custom rules. Add user-agent groups, set Allow/Disallow paths, include your sitemap URL, and optionally set a crawl delay. The live preview updates instantly. Copy the output and save it as robots.txt in your site's root directory.
Tester: Paste an existing robots.txt file, enter a URL path and user-agent, then hit Test. The tool checks whether that URL would be allowed or blocked and shows the matching rule. It also validates syntax and warns you about common mistakes.
Robots.txt Syntax Reference
| Directive | Purpose | Example |
|---|---|---|
| User-agent | Which crawler the rules apply to (* = all) | User-agent: Googlebot |
| Disallow | Block a path from being crawled | Disallow: /admin/ |
| Allow | Override a Disallow for a sub-path | Allow: /admin/public/ |
| Sitemap | Tell crawlers where your sitemap is | Sitemap: https://example.com/sitemap.xml |
| Crawl-delay | Seconds between requests (not honored by Google) | Crawl-delay: 10 |
Google also supports the * wildcard (matches any sequence of characters) and $ end-of-URL anchor in path patterns.
Common Robots.txt Configurations
WordPress: Block /wp-admin/ (but allow /wp-admin/admin-ajax.php for front-end functionality), block /wp-includes/, and include your XML sitemap. Don't block /wp-content/ — that's where your images and CSS live, and Google needs access to render your pages.
Shopify: Shopify auto-generates a robots.txt file. It blocks checkout, cart, and internal search pages. You can customize it using the robots.txt.liquid theme template since June 2021.
Static sites: Most static sites only need a simple Allow All rule plus a Sitemap directive. There's typically no admin area or dynamic search to block.
Single-page applications (SPAs): Ensure crawlers can access your JavaScript bundles and API endpoints needed for rendering. Blocking /static/ or /assets/ can prevent Google from rendering your pages properly.
What Robots.txt Does NOT Do
- It is not a security measure. Malicious bots and scrapers can and will ignore robots.txt entirely. Never use it to hide sensitive content.
- It is not guaranteed to be followed. The Robots Exclusion Protocol is voluntary. Well-behaved crawlers (Googlebot, Bingbot) obey it; others may not.
- It does not remove pages from search results. If other sites link to a blocked URL, Google can still index that URL (showing it without a snippet). To remove a page from the index entirely, use a
noindexmeta tag orX-Robots-TagHTTP header instead. - Blocking CSS/JS can hurt your rankings. Google needs to render your page to understand its content and layout. Blocking stylesheets or scripts via robots.txt can lead to mobile-usability errors and lower rankings.
Best Practices
- Keep it simple. The fewer rules you have, the less likely you are to accidentally block important content.
- Don't block CSS and JavaScript. Google's rendering engine needs these resources to properly assess your page.
- Always include a Sitemap directive. This helps crawlers discover your pages more efficiently and understand your site structure.
- Test before deploying. Use the tester above or Google Search Console's robots.txt tester to verify your rules before pushing to production.
- Use the most specific User-agent when possible. If you only want to block a specific crawler, name it explicitly rather than using the wildcard.
- Review regularly. As your site grows and changes, your robots.txt rules may need updating. Audit it at least once a quarter.
Frequently Asked Questions
Where do I put my robots.txt file?
The file must be placed at the root of your domain — for example, https://example.com/robots.txt. Search engines look for it at this exact location. It won't work if placed in a subdirectory. For Next.js sites, place it in the public/ folder. For WordPress, it's usually auto-generated, but you can create or edit it with a plugin like Yoast SEO.
Does robots.txt block pages from appearing in Google?
Not necessarily. Robots.txt prevents crawling, not indexing. If other websites link to a URL that you've blocked via robots.txt, Google may still index that URL — it just won't have a cached version or snippet. To truly prevent a page from appearing in search results, use a noindex meta tag or X-Robots-Tag HTTP header.
What's the difference between robots.txt and noindex?
robots.txt controls crawling — whether a bot is allowed to fetch a page. noindex (as a meta tag or HTTP header) controls indexing — whether a page appears in search results. They serve different purposes and can be used together, but be careful: if you block a page with robots.txt, Google can't see the noindex tag on it. For pages you want deindexed, allow crawling but add noindex.
Should I block /admin or /wp-admin?
Yes, blocking admin areas is a common best practice. These pages have no SEO value and shouldn't appear in search results. For WordPress, block /wp-admin/ but allow /wp-admin/admin-ajax.php since many themes and plugins use AJAX calls from the front end. This is the default configuration WordPress uses.
Can robots.txt hurt my SEO?
Yes, if misconfigured. The most common mistake is accidentally blocking important pages, CSS files, or JavaScript that Google needs to render your content. Blocking your sitemap URL, images, or stylesheets can directly hurt your rankings. Always test your robots.txt after making changes and monitor Google Search Console for crawl errors.
Powered by HumanCalculations — free online calculators