Customize robots.txt

The documentation only applies to instances deployed on Alokai@Edge.

The robots.txt file is not a security feature. It only provides guidance to crawlers — it does not prevent access to restricted content. Sensitive or private data must be protected using proper authentication and authorization mechanisms.

The robots.txt file tells search engine crawlers which parts of your site they are allowed to access. It is commonly used to control indexing, protect private or duplicate content, and guide SEO behavior.

This feature lets you create and manage a custom robots.txt file directly in the Console.

Configuration

General behavior

You can enable or disable serving the robots.txt file at any time.
The file content is defined as plain text directly in the Console.
When enabled, the file is served from the CDN edge under the path:

https://your-domain.com/robots.txt

Examples

Correct usage

User-agent: *
Disallow: /checkout/
Disallow: /cart/
Allow: /
Sitemap: https://www.example.com/sitemap.xml

Incorrect usage

Empty file enabled (no effect).
Blocking all crawlers unintentionally:

User-agent: *
Disallow: /

(This prevents your entire site from being indexed.)

Best practices

Disallow non-public or duplicate sections Block crawlers from indexing staging paths, checkout flows, or temporary pages.
Allow only what is needed Keep your robots.txt file minimal and explicit — avoid accidental disallow rules that might block search engines entirely.
Reference a sitemap Add a Sitemap: line to help crawlers discover valid URLs quickly and reduce unnecessary crawling.

Common mistakes

Over-blocking assets needed for rendering Disallowing /static/, /assets/, or /api/ can break rendering or CLS metrics. Block only truly non-indexable areas.
Incorrect pattern usagerobots.txt supports simple prefix matches; wildcard support is limited and differs between crawlers. Rules like Disallow: */private/* may not behave as expected—prefer explicit path prefixes (e.g., Disallow: /private/).
Path vs. host confusion Rules apply per host. If you serve multiple domains, provide a correct robots.txt for each host instead of mixing hostnames inside one file. To handle this case, use the application to serve the file robots.txt customized for each domain.
Case and trailing slash mismatches/Admin and /admin can be different on some systems; /folder vs /folder/ may match different sets. Be consistent with your site’s actual URLs.
Whitespace and encoding issues Non-ASCII characters or trailing spaces in lines can cause parsers to ignore rules. Keep lines ASCII-only and trim whitespace.
Unclear user-agent targeting Place specific user-agent sections before the generic User-agent: * block to ensure crawler-specific rules are applied correctly.
Forgetting environment scoping Reusing production rules on staging (or vice versa) can lead to unintended indexing or over-blocking. Review rules during deploys.
Assuming immediate effect Changes may be cached by CDNs and crawlers. Allow time for propagation and re-crawl after updates; consider surfacing the file with cache headers suitable for your workflow.