Turn any website into clean, AI-ready data.
A Model Context Protocol (MCP) server that exposes Firecrawl's API for scraping, crawling, mapping, searching, and extracting structured data from websites.
The Firecrawl MCP Server provides powerful web data extraction capabilities:
Perfect for:
Verifies the server is operational. No authentication required.
Inputs:
noneoutput:
Scrapes a single URL and returns the content in one or more formats. Supports JavaScript rendering, mobile emulation, and HTML tag filtering.
Inputs:
- `url` (string, required) — The URL to scrape
- `formats` (string, optional) — Comma-separated output formats: markdown, html, rawHtml, json, screenshot, links, images, summary, audio, branding, changeTracking (default: markdown)
- `only_main_content` (bool, optional) — Extract only main content, excluding headers/footers/navs (default: true)
- `include_tags` (string, optional) — Comma-separated HTML tags to include in output
- `exclude_tags` (string, optional) — Comma-separated HTML tags to exclude from output
- `wait_for_selector` (string, optional) — CSS selector to wait for before scraping
- `timeout_ms` (int, optional) — Request timeout in milliseconds (1000–300000, default: 30000)
- `mobile` (bool, optional) — Emulate mobile device (default: false)
- `skip_tls_verification` (bool, optional) — Skip TLS certificate verification (default: true)
- `proxy` (string, optional) — Proxy type: basic, enhanced, or auto (default: auto)
- `block_ads` (bool, optional) — Block ads and cookie popups (default: true)
- `remove_base64_images` (bool, optional) — Remove base64 encoded images from output (default: true)output:
{
"success": true,
"data": {
"markdown": "# Page Title\n...",
"metadata": { "title": "...", "url": "..." }
}
}Starts a crawl job from a base URL, following links and scraping all discovered pages. Returns a job ID for async processing.
Inputs:
- `url` (string, required) — Base URL to start crawling from
- `prompt` (string, optional) — Natural language prompt to generate crawler options
- `exclude_paths` (string, optional) — Comma-separated regex patterns for URLs to exclude
- `include_paths` (string, optional) — Comma-separated regex patterns for URLs to include
- `max_discovery_depth` (int, optional) — Maximum crawl depth from the start URL
- `sitemap` (string, optional) — Sitemap mode: skip, include, or only (default: include)
- `ignore_query_parameters` (bool, optional) — Don't re-scrape same path with different query params (default: false)
- `limit` (int, optional) — Maximum number of pages to crawl (default: 10000)
- `crawl_entire_domain` (bool, optional) — Follow sibling and parent URLs (default: false)
- `allow_external_links` (bool, optional) — Follow links to external domains (default: false)
- `allow_subdomains` (bool, optional) — Follow links to subdomains (default: false)
- `delay` (float, optional) — Delay in seconds between requests
- `max_concurrency` (int, optional) — Maximum concurrent scrapes
- `formats` (string, optional) — Output formats, comma-separated (default: markdown)
- `only_main_content` (bool, optional) — Extract main content only (default: true)
- `zero_data_retention` (bool, optional) — Enable zero data retention (default: false)output:
{
"success": true,
"id": "crawl-job-uuid",
"url": "https://api.firecrawl.dev/v2/crawl/crawl-job-uuid"
}Discovers and lists all URLs found on a website. Useful for site auditing and understanding site structure before crawling.
Inputs:
- `url` (string, required) — Base URL to start mapping from
- `search` (string, optional) — Filter and rank results by relevance to this query
- `sitemap` (string, optional) — Sitemap mode: skip, include, or only (default: include)
- `include_subdomains` (bool, optional) — Include subdomains (default: true)
- `ignore_query_parameters` (bool, optional) — Exclude URLs with query parameters (default: true)
- `ignore_cache` (bool, optional) — Bypass sitemap cache for fresh results (default: false)
- `limit` (int, optional) — Maximum URLs to return (max: 100000, default: 5000)
- `timeout_ms` (int, optional) — Timeout in milliseconds
- `country` (string, optional) — ISO 3166-1 alpha-2 country code (e.g., US, DE)
- `languages` (string, optional) — Comma-separated preferred languages (e.g., en-US,de-DE)output:
{
"success": true,
"links": ["https://example.com/", "https://example.com/about", "..."]
}Searches the web using a query and optionally scrapes the full content of result pages.
Inputs:
- `query` (string, required) — Search query (max 500 characters)
- `limit` (int, optional) — Number of results to return (1–100, default: 5)
- `sources` (string, optional) — Comma-separated sources: web, images, news (default: web)
- `categories` (string, optional) — Comma-separated filters: github, research, pdf
- `tbs` (string, optional) — Time filter: qdr:d (day), qdr:w (week), qdr:m (month)
- `location` (string, optional) — Geographic location (e.g., San Francisco,California,United States)
- `country` (string, optional) — ISO country code for geo-targeting (default: US)
- `timeout` (int, optional) — Timeout in milliseconds (1000–300000, default: 60000)
- `ignore_invalid_urls` (bool, optional) — Exclude invalid URLs from results (default: false)
- `formats` (string, optional) — Scrape output formats, comma-separated (default: markdown)
- `mobile` (bool, optional) — Emulate mobile device when scraping (default: false)
- `proxy` (string, optional) — Proxy type: basic, enhanced, or auto (default: auto)
- `block_ads` (bool, optional) — Block ads and cookie popups (default: true)output:
{
"success": true,
"data": [
{ "url": "https://...", "markdown": "...", "metadata": { "title": "..." } }
]
}Starts an autonomous agent that navigates websites and extracts data based on a natural language prompt. Returns a job ID — poll with agent_status for results.
Inputs:
- `prompt` (string, required) — Natural language description of what data to extract (max 10000 characters)
- `urls` (string, optional) — Comma-separated URLs to constrain the agent to
- `schema` (string, optional) — JSON schema string to structure extracted data
- `max_credits` (float, optional) — Maximum credits to spend (default: 2500)
- `strict_constrain_to_urls` (bool, optional) — Only visit URLs listed in the urls param (default: false)
- `model` (string, optional) — Model to use: spark-1-mini (default, cheaper) or spark-1-pro (higher accuracy)output:
{
"success": true,
"jobId": "agent-job-uuid"
}Polls the status of an agent job started by agent. Poll every 15–30 seconds for up to 2–3 minutes.
Inputs:
- `job_id` (string, required) — Agent job ID returned by the agent tool (UUID format)output:
{
"success": true,
"status": "completed",
"data": { "extracted": "..." }
}Starts an async job to extract structured data from one or more URLs using LLMs and an optional schema. Returns a job ID — poll with extract_status.
Inputs:
- `urls` (string, required) — Comma-separated URLs to extract from (glob format supported)
- `prompt` (string, optional) — Custom prompt to guide the extraction
- `schema` (string, optional) — JSON schema string for structured output
- `enable_web_search` (bool, optional) — Use web search for additional context (default: false)
- `ignore_sitemap` (bool, optional) — Ignore sitemap.xml files (default: false)
- `include_subdomains` (bool, optional) — Include subdomains in scanning (default: true)
- `show_sources` (bool, optional) — Include extraction sources in response (default: false)
- `ignore_invalid_urls` (bool, optional) — Skip invalid URLs instead of failing (default: true)
- `formats` (string, optional) — Scrape output formats, comma-separated (default: markdown)
- `only_main_content` (bool, optional) — Extract main content only (default: true)
- `mobile` (bool, optional) — Emulate mobile device (default: false)
- `proxy` (string, optional) — Proxy type: basic, enhanced, or auto (default: auto)
- `block_ads` (bool, optional) — Block ads and cookie popups (default: true)output:
{
"success": true,
"id": "extract-job-uuid"
}Polls the status of an extraction job started by extract.
Inputs:
- `job_id` (string, required) — Extraction job ID returned by the extract tool (UUID format)output:
{
"success": true,
"status": "completed",
"data": { "field": "extracted value" }
}All scraping tools accept a comma-separated formats parameter:
markdown — Clean markdown (default)html — Cleaned HTMLrawHtml — Raw page HTMLjson — Structured JSON extractionscreenshot — Page screenshotlinks — All links found on the pageimages — All image URLssummary — AI-generated page summarybasic — Standard proxy for general useenhanced — Advanced proxy for bot-protected sitesauto — Automatically selects the best proxy (default)Tools crawl, agent, and extract are asynchronous:
job_idagent_status or extract_status with the job_idstatus is completed, failed, or cancelledRecommended polling interval: every 15–30 seconds for at least 2–3 minutes before considering a job failed.
Use the tbs parameter in search to filter results by recency:
qdr:h — Past hour
qdr:d — Past day
qdr:w — Past week
qdr:m — Past month
qdr:y — Past yearAuthorization: Bearer YOUR_API_KEY and X-Mewcp-Credential-Id: CREDENTIAL-ID headers are presentX-Mewcp-Credential-Id headertimeout_ms must be 1000–300000){server-name}/mcp/{tool-name}