Menu
Docs/Sitemap Crawler

Sitemap Crawler

Provide a sitemap.xml URL and SnapSharp will parse it, filter pages by pattern, and capture screenshots of all matching URLs. Results are delivered as a ZIP archive.

Sitemap crawling requires Starter plan or above. Plan limits: Starter 50 pages, Growth 200, Business 1,000, Enterprise 5,000.

Endpoint

POST/v1/sitemap

Request body

{
  "sitemap_url": "https://example.com/sitemap.xml",
  "max_pages": 100,
  "include_pattern": "/blog/",
  "exclude_pattern": "/tag/",
  "format": "png",
  "width": 1280,
  "full_page": true,
  "callback_url": "https://your-server.com/webhook"
}

Parameters

ParameterTypeDefaultDescription
sitemap_urlstringrequiredURL to the sitemap.xml (or sitemap index)
max_pagesinteger50Maximum pages to process
include_patternstringRegex — only URLs matching this pattern
exclude_patternstringRegex — exclude URLs matching this pattern
widthinteger1280Viewport width
heightinteger720Viewport height
formatstring"png"png, jpeg, or webp
qualityinteger80JPEG/WebP quality
full_pagebooleanfalseFull scrollable page
dark_modebooleanfalseDark mode
block_adsbooleanfalseBlock ads
stealthbooleanfalseStealth mode
callback_urlstringWebhook called when job completes

Response (202 Accepted)

{
  "job_id": "4fb96g75-6828-5673-c4gd-3d074g77bgb7",
  "status": "pending",
  "sitemap_url": "https://example.com/sitemap.xml",
  "max_pages": 100,
  "poll_url": "/v1/jobs/4fb96g75...",
  "download_url": "/v1/jobs/4fb96g75.../download"
}

ZIP contents

The downloaded archive contains:

  • 001_blog_my-first-post.png — one file per captured page
  • index.json — mapping of original URLs to filenames with success/error status

cURL example

# Start crawl
JOB=$(curl -s -X POST https://api.snapsharp.dev/v1/sitemap \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"sitemap_url":"https://example.com/sitemap.xml","max_pages":50}' \
  | jq -r '.job_id')

# Poll
curl https://api.snapsharp.dev/v1/jobs/$JOB \
  -H "Authorization: Bearer sk_live_..."

# Download when complete
curl https://api.snapsharp.dev/v1/jobs/$JOB/download \
  -H "Authorization: Bearer sk_live_..." \
  -o sitemap-screenshots.zip

Sitemap index support

SnapSharp automatically follows nested sitemap indexes (up to 10 sub-sitemaps). Use include_pattern / exclude_pattern to focus on the pages you need.

Plan limits

PlanMax pages
Free✗ Not available
Starter50
Growth200
Business1,000
Enterprise5,000