Menu
Use Case

Screenshots for AI Agents

Give your AI agents eyes. SnapSharp turns any URL into a pixel-perfect screenshot that vision models can reason about — with one API call.

Why AI Agents Need Screenshots

Large language models are increasingly multimodal — GPT-4o, Claude 3.5, and Gemini can all process images alongside text. But they can't browse the web directly. A screenshot API bridges that gap: the agent sends a URL, receives an image, and uses its vision capabilities to understand the page layout, read rendered text, detect UI patterns, and make decisions.

This is fundamentally different from HTML scraping. Raw HTML doesn't tell you what the user sees — CSS transforms, JavaScript rendering, lazy-loaded images, and responsive layouts all change the visual output. A screenshot captures the final rendered result, exactly as a human would see it.

SnapSharp's API is designed for agent workflows: fast response times (2–4s), OpenAPI spec for auto-discovery, structured error responses, and support for full-page capture, dark mode, and element targeting — all features that agents need to gather complete visual context.

OpenAPI for Auto-Discovery

SnapSharp publishes a full OpenAPI 3.1 specification at api.snapsharp.dev/openapi.json. This means AI platforms that support tool/function discovery (ChatGPT GPT Actions, Claude MCP, AutoGPT) can automatically register SnapSharp's endpoints without manual configuration.

The spec includes parameter descriptions, example values, and response schemas — giving the LLM enough context to construct correct API calls without extra prompting.

Integration Examples

ChatGPT GPT Actions

Register SnapSharp as a GPT Action so ChatGPT can capture any URL on demand.

// OpenAPI spec fragment for GPT Actions
{
  "openapi": "3.1.0",
  "info": { "title": "SnapSharp Screenshot", "version": "1.0" },
  "servers": [{ "url": "https://api.snapsharp.dev" }],
  "paths": {
    "/v1/screenshot": {
      "get": {
        "operationId": "takeScreenshot",
        "summary": "Capture a website screenshot",
        "parameters": [
          { "name": "url", "in": "query", "required": true, "schema": { "type": "string" } },
          { "name": "width", "in": "query", "schema": { "type": "integer", "default": 1280 } },
          { "name": "full_page", "in": "query", "schema": { "type": "boolean" } }
        ],
        "responses": {
          "200": {
            "description": "PNG image",
            "content": { "image/png": {} }
          }
        }
      }
    }
  }
}

Claude MCP (Model Context Protocol)

Expose SnapSharp as an MCP tool so Claude can see web pages during conversations.

// MCP tool definition (tools/screenshot.json)
{
  "name": "screenshot",
  "description": "Capture a screenshot of a web page",
  "inputSchema": {
    "type": "object",
    "properties": {
      "url": { "type": "string", "description": "URL to capture" },
      "width": { "type": "integer", "default": 1280 },
      "full_page": { "type": "boolean", "default": false }
    },
    "required": ["url"]
  }
}

// MCP server handler (Node.js)
import { SnapSharp } from "snapsharp";

const snap = new SnapSharp(process.env.SNAPSHARP_API_KEY);

server.tool("screenshot", async ({ url, width, full_page }) => {
  const image = await snap.screenshot(url, {
    width: width ?? 1280,
    fullPage: full_page ?? false,
  });
  return {
    content: [{
      type: "image",
      data: Buffer.from(image).toString("base64"),
      mimeType: "image/png",
    }],
  };
});

LangChain Tool

Create a custom LangChain tool that gives your agent visual web access.

from langchain.tools import BaseTool
from snapsharp import SnapSharp
import base64, os

class ScreenshotTool(BaseTool):
    name = "screenshot"
    description = "Capture a screenshot of a website. Input: URL string."

    def _run(self, url: str) -> str:
        snap = SnapSharp(os.environ["SNAPSHARP_API_KEY"])
        image = snap.screenshot(url, width=1280)
        b64 = base64.b64encode(image).decode()
        return f"Screenshot captured ({len(image)} bytes). Base64: {b64[:100]}..."

# Usage with an agent
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [ScreenshotTool()]
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS)
agent.run("Take a screenshot of https://example.com and describe the layout")

CrewAI Tool

Add visual web capture to your CrewAI agent crew.

from crewai.tools import BaseTool
from snapsharp import SnapSharp
import os

class WebScreenshotTool(BaseTool):
    name: str = "Web Screenshot"
    description: str = "Captures a screenshot of a website URL and returns the image data."

    def _run(self, url: str) -> str:
        snap = SnapSharp(os.environ["SNAPSHARP_API_KEY"])
        image = snap.screenshot(url, width=1280, full_page=True)
        path = f"/tmp/screenshot_{hash(url)}.png"
        with open(path, "wb") as f:
            f.write(image)
        return f"Screenshot saved to {path} ({len(image)} bytes)"

# Add to your crew
from crewai import Agent, Task, Crew

researcher = Agent(
    role="Web Researcher",
    goal="Analyze website designs and layouts",
    tools=[WebScreenshotTool()],
    llm="gpt-4o",
)

task = Task(
    description="Screenshot and analyze the design of https://linear.app",
    agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task])
crew.kickoff()

n8n HTTP Request Node

Use SnapSharp in n8n workflows with a simple HTTP Request node.

// n8n workflow node configuration
{
  "nodes": [
    {
      "name": "Capture Screenshot",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "https://api.snapsharp.dev/v1/screenshot",
        "method": "GET",
        "authentication": "genericCredentialType",
        "queryParameters": {
          "parameters": [
            { "name": "url", "value": "={{ $json.website_url }}" },
            { "name": "width", "value": "1280" },
            { "name": "full_page", "value": "true" }
          ]
        },
        "options": {
          "response": { "response": { "responseFormat": "file" } }
        }
      }
    }
  ]
}

What Agents Do with Screenshots

Visual QA & Regression Testing
AI agents can compare before/after screenshots to detect visual regressions, broken layouts, or missing elements — faster than manual review.
Competitive Intelligence
An agent monitors competitor websites, captures screenshots on a schedule, and generates reports on design changes, pricing updates, or new features.
Content Verification
After deploying content changes, an AI agent captures the live page and verifies that headings, images, and CTAs render correctly.
Accessibility Auditing
Agents capture screenshots with different color schemes and viewport sizes, then analyze the visual output for accessibility issues like low contrast or truncated text.
Lead Enrichment
Sales agents capture screenshots of prospect websites to understand their tech stack, design maturity, and brand identity before outreach.
Web Scraping with Visual Context
When structured data extraction fails, agents fall back to screenshots and use vision models (GPT-4o, Claude) to extract information from the visual layout.

How It Works

Your AI Agent
ChatGPT / Claude / LangChain
→ POST /v1/screenshot
SnapSharp API
Headless Chromium cluster
→ PNG / WebP image
Vision Model
GPT-4o / Claude 3.5 / Gemini

Quick Start

The simplest integration is a single HTTP call. Any agent framework that supports tool/function calling can use this:

curl "https://api.snapsharp.dev/v1/screenshot?url=https://example.com&width=1280" \
  -H "Authorization: Bearer sk_live_..." \
  -o screenshot.png

For agents that process images inline, request base64 output:

curl "https://api.snapsharp.dev/v1/screenshot?url=https://example.com&response_type=base64" \
  -H "Authorization: Bearer sk_live_..."

The API returns a JSON object with a data field containing the base64-encoded image — ready to pass directly to a vision model.