Sitemap Crawler
Provide a sitemap.xml URL and SnapSharp will parse it, filter pages by pattern, and capture screenshots of all matching URLs. Results are delivered as a ZIP archive.
Sitemap crawling requires Starter plan or above. Plan limits: Starter 50 pages, Growth 200, Business 1,000, Enterprise 5,000.
Endpoint
POST
/v1/sitemapRequest body
{
"sitemap_url": "https://example.com/sitemap.xml",
"max_pages": 100,
"include_pattern": "/blog/",
"exclude_pattern": "/tag/",
"format": "png",
"width": 1280,
"full_page": true,
"callback_url": "https://your-server.com/webhook"
}Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
sitemap_url | string | required | URL to the sitemap.xml (or sitemap index) |
max_pages | integer | 50 | Maximum pages to process |
include_pattern | string | — | Regex — only URLs matching this pattern |
exclude_pattern | string | — | Regex — exclude URLs matching this pattern |
width | integer | 1280 | Viewport width |
height | integer | 720 | Viewport height |
format | string | "png" | png, jpeg, or webp |
quality | integer | 80 | JPEG/WebP quality |
full_page | boolean | false | Full scrollable page |
dark_mode | boolean | false | Dark mode |
block_ads | boolean | false | Block ads |
stealth | boolean | false | Stealth mode |
callback_url | string | — | Webhook called when job completes |
Response (202 Accepted)
{
"job_id": "4fb96g75-6828-5673-c4gd-3d074g77bgb7",
"status": "pending",
"sitemap_url": "https://example.com/sitemap.xml",
"max_pages": 100,
"poll_url": "/v1/jobs/4fb96g75...",
"download_url": "/v1/jobs/4fb96g75.../download"
}ZIP contents
The downloaded archive contains:
001_blog_my-first-post.png— one file per captured pageindex.json— mapping of original URLs to filenames with success/error status
cURL example
# Start crawl
JOB=$(curl -s -X POST https://api.snapsharp.dev/v1/sitemap \
-H "Authorization: Bearer sk_live_..." \
-H "Content-Type: application/json" \
-d '{"sitemap_url":"https://example.com/sitemap.xml","max_pages":50}' \
| jq -r '.job_id')
# Poll
curl https://api.snapsharp.dev/v1/jobs/$JOB \
-H "Authorization: Bearer sk_live_..."
# Download when complete
curl https://api.snapsharp.dev/v1/jobs/$JOB/download \
-H "Authorization: Bearer sk_live_..." \
-o sitemap-screenshots.zipSitemap index support
SnapSharp automatically follows nested sitemap indexes (up to 10 sub-sitemaps). Use include_pattern / exclude_pattern to focus on the pages you need.
Plan limits
| Plan | Max pages |
|---|---|
| Free | ✗ Not available |
| Starter | 50 |
| Growth | 200 |
| Business | 1,000 |
| Enterprise | 5,000 |