Skip to content

chore: add robots.txt for bot crawl protection#3064

Open
MillenniumFalconMechanic wants to merge 1 commit into
mainfrom
mim/3063-robots-txt
Open

chore: add robots.txt for bot crawl protection#3064
MillenniumFalconMechanic wants to merge 1 commit into
mainfrom
mim/3063-robots-txt

Conversation

@MillenniumFalconMechanic
Copy link
Copy Markdown
Contributor

Ticket

Closes #3063

Summary

  • Add robots.txt to block SEO scrapers (AhrefsBot, SemrushBot, etc.) and AI training crawlers (GPTBot, Google-Extended, CCBot, etc.)
  • Allow standard search engine indexing
  • Include sitemap reference

Note

Pre-commit hook skipped due to pre-existing TypeScript error on main (REQUIREMENT_LEVEL export missing in accessorFn.ts).

Test plan

  • Verify robots.txt is served at /robots.txt after deploy
  • Confirm search engines can still crawl the site normally

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a static robots.txt for the HCA Data Portal to allow general indexing while disallowing specified SEO and AI crawlers, plus advertising the sitemap location.

Changes:

  • Adds crawler allow/disallow rules in public/robots.txt.
  • Adds a sitemap reference for https://data.humancellatlas.org/sitemap.xml.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread public/robots.txt
@@ -0,0 +1,43 @@
# Search engines
Comment thread public/robots.txt
Comment on lines +42 to +43

Sitemap: https://data.humancellatlas.org/sitemap.xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore: add robots.txt for bot crawl protection

2 participants