[Reference 1] robots.txt Analysis
Content Analysis
- The provided robots.txt already contains a decent structure:
- General crawlers (User-Agent: *) are allowed all except a set of sensitive/development/internal folders.
- Major AI bots (OpenAI, Anthropic, Claude) are explicitly blocked.
- The
Host:andSitemap:directives are misplaced (using the placeholderyour-domain.com).
- The blocking of
api,debug-click,demo,test-*,_next, andprivateis both appropriate and aligned with typical best practice. Allow: /is unnecessary as/is not previously disallowed.- The block on AI bots is a conscious choice; adjust this policy based on whether you want your content included in LLM training or AI crawling.
Issues/Improvements
- Domain-Specific Endpoints:
Host:andSitemap:should refer to the correct domain. - Consistency: Structure is sound, but remove unnecessary lines and replace placeholders.
- Intent: If you want to allow OpenAI, Gemini, Anthropic, etc., for LLMs, modify their policies (see below).
- Location:
robots.txtMUST be placed at the domain root, i.e.,https://image-tools.wenjunjiang.com/robots.txt
Optimized robots.txt Example
# robots.txt for https://image-tools.wenjunjiang.com
User-Agent: *
Disallow: /api/
Disallow: /debug-click/
Disallow: /demo/
Disallow: /test-*
Disallow: /_next/
Disallow: /private/
User-Agent: GPTBot
Disallow: /
User-Agent: ChatGPT-User
Disallow: /
User-Agent: CCBot
Disallow: /
User-Agent: anthropic-ai
Disallow: /
User-Agent: Claude-Web
Disallow: /
Host: https://image-tools.wenjunjiang.com
Sitemap: https://image-tools.wenjunjiang.com/sitemap.xml
🟢 Place this at:
https://image-tools.wenjunjiang.com/robots.txt
[Reference 2] llms.txt Analysis
Content Analysis
- The LLMS file provides descriptive, structured metadata for LLM-based indexing and summary:
- Title, description, and core categories.
- Keywords for intent and entity discovery.
- Accessibility and structured data support are called out.
- SEO mentions that the robots.txt is at the default root path:
/robots.txt
- Mirrors best practices for LLM meta-consumption: explicit, bullet-pointed, highly discoverable info.
Issues/Improvements
- Domain Consistency: The listed domain and homepage point to
image-tools-eta.vercel.app, but your robots.txt is forimage-tools.wenjunjiang.com. Pick the canonical domain. - Clear Metadata: Structure is clear, but you may want to add more specific features or endpoints as the app evolves.
Optimized llms.txt Example
# Image Tools Beta
> Image Tools is an online platform for AI-powered image editing, analysis, and data extraction.
### Metadata
title: Image Tools Beta | AI-Powered Online Image Processing Suite
description: Edit, enhance, and analyze images online using advanced AI tools. Features include object detection, OCR, data extraction, and developer APIs.
domain: image-tools.wenjunjiang.com
language: en
category: Image Processing, AI Tools, Computer Vision, OCR, ML APIs, Developer Tools
keywords: Image editing, AI enhancement, image analysis, computer vision, OCR, object detection, ML API, developer tools, online image utilities
### Core Pages
- [Homepage](https://image-tools.wenjunjiang.com): Access all tools, documentation, and support.
### Features
- AI Image Editing and Enhancement
- Image Metadata & Object Detection
- Optical Character Recognition (OCR)
- Developer APIs
- Integrated Results Visualization
### Accessibility
alt_text_present: true
structured_data: true
mobile_friendly: true
### SEO
robots_txt: /robots.txt
🟢 Place this at:
https://image-tools.wenjunjiang.com/llms.txt
(or /llms.txt at the domain root; link to it in documentation or site footer if you wish LLMs to discover it).
Summary & Implementation Guidance
- Where to put the files?
- robots.txt → https://image-tools.wenjunjiang.com/robots.txt
- llms.txt → https://image-tools.wenjunjiang.com/llms.txt
- What if they’re missing?
- Use the examples given above as starting templates—they are SEO/AI-optimized and ready for production.
- Always ensure both files point to your actual (canonical) domain.
- If you want AI inclusion?
-
In
robots.txt, replace blocks like:
withUser-Agent: GPTBot Disallow: /User-Agent: GPTBot Allow: / - For even broader AI inclusion, you may use a policy like your provided starter.
-
In
🟢 Actionable Checklist
- At the domain root, add/overwrite
robots.txtandllms.txt. - Double-check domain canonicalization in all meta/data files.
- Maintain and update these files as your platform’s features and endpoints change.
- Monitor indexing/crawling in Google Search Console and other tools.
If you need sample content for missing files, or have further domains to optimize, just provide them!