Async Sitemap Crawler

Provide a sitemap.xml URL and SnapSharp will parse it, filter pages by pattern, and capture screenshots of all matching URLs. Results are delivered as a ZIP archive.

Sitemap crawling requires Starter plan or above. Plan limits: Starter 50 pages, Growth 200, Business 1,000, Enterprise 5,000.

Endpoint

POST/v1/sitemap

Request body

{
  "sitemap_url": "https://example.com/sitemap.xml",
  "max_pages": 100,
  "include_pattern": "/blog/",
  "exclude_pattern": "/tag/",
  "format": "png",
  "width": 1280,
  "full_page": true,
  "callback_url": "https://your-server.com/webhook"
}

Parameters

Parameter	Type	Default	Description
`sitemap_url`	`string`	required	URL to the sitemap.xml (or sitemap index)
`max_pages`	`integer`	`50`	Maximum pages to process
`include_pattern`	`string`	—	Regex — only URLs matching this pattern
`exclude_pattern`	`string`	—	Regex — exclude URLs matching this pattern
`width`	`integer`	`1280`	Viewport width
`height`	`integer`	`720`	Viewport height
`format`	`string`	`"png"`	`png`, `jpeg`, or `webp`
`quality`	`integer`	`80`	JPEG/WebP quality
`full_page`	`boolean`	`false`	Full scrollable page
`dark_mode`	`boolean`	`false`	Dark mode
`block_ads`	`boolean`	`false`	Block ads
`stealth`	`boolean`	`false`	Stealth mode
`callback_url`	`string`	—	Webhook called when job completes

Response (202 Accepted)

{
  "job_id": "4fb96g75-6828-5673-c4gd-3d074g77bgb7",
  "status": "pending",
  "sitemap_url": "https://example.com/sitemap.xml",
  "max_pages": 100,
  "poll_url": "/v1/jobs/4fb96g75...",
  "download_url": "/v1/jobs/4fb96g75.../download"
}

ZIP contents

The downloaded archive contains:

001_blog_my-first-post.png — one file per captured page
index.json — mapping of original URLs to filenames with success/error status

cURL example

# Start crawl
JOB=$(curl -s -X POST https://api.snapsharp.dev/v1/sitemap \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"sitemap_url":"https://example.com/sitemap.xml","max_pages":50}' \
  | jq -r '.job_id')

# Poll
curl https://api.snapsharp.dev/v1/jobs/$JOB \
  -H "Authorization: Bearer sk_live_..."

# Download when complete
curl https://api.snapsharp.dev/v1/jobs/$JOB/download \
  -H "Authorization: Bearer sk_live_..." \
  -o sitemap-screenshots.zip

Sitemap index support

SnapSharp automatically follows nested sitemap indexes (up to 10 sub-sitemaps). Use include_pattern / exclude_pattern to focus on the pages you need.

Plan limits

Plan	Max pages
Free	✗ Not available
Starter	50
Growth	200
Business	1,000
Enterprise	5,000