Screenshots for AI Agents
Give your AI agents eyes. SnapSharp turns any URL into a pixel-perfect screenshot that vision models can reason about — with one API call.
Why AI Agents Need Screenshots
Large language models are increasingly multimodal — GPT-4o, Claude 3.5, and Gemini can all process images alongside text. But they can't browse the web directly. A screenshot API bridges that gap: the agent sends a URL, receives an image, and uses its vision capabilities to understand the page layout, read rendered text, detect UI patterns, and make decisions.
This is fundamentally different from HTML scraping. Raw HTML doesn't tell you what the user sees — CSS transforms, JavaScript rendering, lazy-loaded images, and responsive layouts all change the visual output. A screenshot captures the final rendered result, exactly as a human would see it.
SnapSharp's API is designed for agent workflows: fast response times (2–4s), OpenAPI spec for auto-discovery, structured error responses, and support for full-page capture, dark mode, and element targeting — all features that agents need to gather complete visual context.
OpenAPI for Auto-Discovery
SnapSharp publishes a full OpenAPI 3.1 specification at api.snapsharp.dev/openapi.json. This means AI platforms that support tool/function discovery (ChatGPT GPT Actions, Claude MCP, AutoGPT) can automatically register SnapSharp's endpoints without manual configuration.
The spec includes parameter descriptions, example values, and response schemas — giving the LLM enough context to construct correct API calls without extra prompting.
Integration Examples
ChatGPT GPT Actions
Register SnapSharp as a GPT Action so ChatGPT can capture any URL on demand.
// OpenAPI spec fragment for GPT Actions
{
"openapi": "3.1.0",
"info": { "title": "SnapSharp Screenshot", "version": "1.0" },
"servers": [{ "url": "https://api.snapsharp.dev" }],
"paths": {
"/v1/screenshot": {
"get": {
"operationId": "takeScreenshot",
"summary": "Capture a website screenshot",
"parameters": [
{ "name": "url", "in": "query", "required": true, "schema": { "type": "string" } },
{ "name": "width", "in": "query", "schema": { "type": "integer", "default": 1280 } },
{ "name": "full_page", "in": "query", "schema": { "type": "boolean" } }
],
"responses": {
"200": {
"description": "PNG image",
"content": { "image/png": {} }
}
}
}
}
}
}Claude MCP (Model Context Protocol)
Expose SnapSharp as an MCP tool so Claude can see web pages during conversations.
// MCP tool definition (tools/screenshot.json)
{
"name": "screenshot",
"description": "Capture a screenshot of a web page",
"inputSchema": {
"type": "object",
"properties": {
"url": { "type": "string", "description": "URL to capture" },
"width": { "type": "integer", "default": 1280 },
"full_page": { "type": "boolean", "default": false }
},
"required": ["url"]
}
}
// MCP server handler (Node.js)
import { SnapSharp } from "snapsharp";
const snap = new SnapSharp(process.env.SNAPSHARP_API_KEY);
server.tool("screenshot", async ({ url, width, full_page }) => {
const image = await snap.screenshot(url, {
width: width ?? 1280,
fullPage: full_page ?? false,
});
return {
content: [{
type: "image",
data: Buffer.from(image).toString("base64"),
mimeType: "image/png",
}],
};
});LangChain Tool
Create a custom LangChain tool that gives your agent visual web access.
from langchain.tools import BaseTool
from snapsharp import SnapSharp
import base64, os
class ScreenshotTool(BaseTool):
name = "screenshot"
description = "Capture a screenshot of a website. Input: URL string."
def _run(self, url: str) -> str:
snap = SnapSharp(os.environ["SNAPSHARP_API_KEY"])
image = snap.screenshot(url, width=1280)
b64 = base64.b64encode(image).decode()
return f"Screenshot captured ({len(image)} bytes). Base64: {b64[:100]}..."
# Usage with an agent
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [ScreenshotTool()]
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS)
agent.run("Take a screenshot of https://example.com and describe the layout")CrewAI Tool
Add visual web capture to your CrewAI agent crew.
from crewai.tools import BaseTool
from snapsharp import SnapSharp
import os
class WebScreenshotTool(BaseTool):
name: str = "Web Screenshot"
description: str = "Captures a screenshot of a website URL and returns the image data."
def _run(self, url: str) -> str:
snap = SnapSharp(os.environ["SNAPSHARP_API_KEY"])
image = snap.screenshot(url, width=1280, full_page=True)
path = f"/tmp/screenshot_{hash(url)}.png"
with open(path, "wb") as f:
f.write(image)
return f"Screenshot saved to {path} ({len(image)} bytes)"
# Add to your crew
from crewai import Agent, Task, Crew
researcher = Agent(
role="Web Researcher",
goal="Analyze website designs and layouts",
tools=[WebScreenshotTool()],
llm="gpt-4o",
)
task = Task(
description="Screenshot and analyze the design of https://linear.app",
agent=researcher,
)
crew = Crew(agents=[researcher], tasks=[task])
crew.kickoff()n8n HTTP Request Node
Use SnapSharp in n8n workflows with a simple HTTP Request node.
// n8n workflow node configuration
{
"nodes": [
{
"name": "Capture Screenshot",
"type": "n8n-nodes-base.httpRequest",
"parameters": {
"url": "https://api.snapsharp.dev/v1/screenshot",
"method": "GET",
"authentication": "genericCredentialType",
"queryParameters": {
"parameters": [
{ "name": "url", "value": "={{ $json.website_url }}" },
{ "name": "width", "value": "1280" },
{ "name": "full_page", "value": "true" }
]
},
"options": {
"response": { "response": { "responseFormat": "file" } }
}
}
}
]
}What Agents Do with Screenshots
How It Works
Quick Start
The simplest integration is a single HTTP call. Any agent framework that supports tool/function calling can use this:
curl "https://api.snapsharp.dev/v1/screenshot?url=https://example.com&width=1280" \ -H "Authorization: Bearer sk_live_..." \ -o screenshot.png
For agents that process images inline, request base64 output:
curl "https://api.snapsharp.dev/v1/screenshot?url=https://example.com&response_type=base64" \ -H "Authorization: Bearer sk_live_..."
The API returns a JSON object with a data field containing the base64-encoded image — ready to pass directly to a vision model.