Customize robots.txt
The documentation only applies to instances deployed on Alokai@Edge.
The robots.txt
file is not a security feature. It only provides guidance to crawlers — it does not prevent access to restricted content. Sensitive or private data must be protected using proper authentication and authorization mechanisms.
The robots.txt
file tells search engine crawlers which parts of your site they are allowed to access. It is commonly used to control indexing, protect private or duplicate content, and guide SEO behavior.
This feature lets you create and manage a custom robots.txt
file directly in the Console.
Configurationri:link
General behavior
- You can enable or disable serving the
robots.txt
file at any time. - The file content is defined as plain text directly in the Console.
- When enabled, the file is served from the CDN edge under the path:
https://your-domain.com/robots.txt
Examplesri:link
Correct usage
User-agent: *
Disallow: /checkout/
Disallow: /cart/
Allow: /
Sitemap: https://www.example.com/sitemap.xml
Incorrect usage
- Empty file enabled (no effect).
- Blocking all crawlers unintentionally:
User-agent: *
Disallow: /
(This prevents your entire site from being indexed.)
Best practicesri:link
- Disallow non-public or duplicate sections Block crawlers from indexing staging paths, checkout flows, or temporary pages.
- Allow only what is needed
Keep your
robots.txt
file minimal and explicit — avoid accidental disallow rules that might block search engines entirely. - Reference a sitemap
Add a
Sitemap:
line to help crawlers discover valid URLs quickly and reduce unnecessary crawling.
Common mistakesri:link
- Over-blocking assets needed for rendering
Disallowing
/static/
,/assets/
, or/api/
can break rendering or CLS metrics. Block only truly non-indexable areas. - Incorrect pattern usage
robots.txt
supports simple prefix matches; wildcard support is limited and differs between crawlers. Rules likeDisallow: */private/*
may not behave as expected—prefer explicit path prefixes (e.g.,Disallow: /private/
). - Path vs. host confusion
Rules apply per host. If you serve multiple domains, provide a correct
robots.txt
for each host instead of mixing hostnames inside one file. To handle this case, use the application to serve the filerobots.txt
customized for each domain. - Case and trailing slash mismatches
/Admin
and/admin
can be different on some systems;/folder
vs/folder/
may match different sets. Be consistent with your site’s actual URLs. - Whitespace and encoding issues Non-ASCII characters or trailing spaces in lines can cause parsers to ignore rules. Keep lines ASCII-only and trim whitespace.
- Unclear user-agent targeting
Place specific user-agent sections before the generic
User-agent: *
block to ensure crawler-specific rules are applied correctly. - Forgetting environment scoping Reusing production rules on staging (or vice versa) can lead to unintended indexing or over-blocking. Review rules during deploys.
- Assuming immediate effect Changes may be cached by CDNs and crawlers. Allow time for propagation and re-crawl after updates; consider surfacing the file with cache headers suitable for your workflow.