# Scrapingant

ScrapingAnt is a web scraping API service that enables data extraction from websites through headless Chrome browsers, rotating proxies, CAPTCHA/Cloudflare bypass, LLM-ready markdown output, and AI-powered structured data extraction.

- **Category:** ai web scraping
- **Auth:** API_KEY
- **Composio Managed App Available?** N/A
- **Tools:** 10
- **Triggers:** 0
- **Slug:** `SCRAPINGANT`
- **Version:** 20260312_00

## Tools

### Extract Content as Markdown

**Slug:** `SCRAPINGANT_EXTRACT_CONTENT_AS_MARKDOWN`

This tool extracts content from a given URL and converts it into Markdown format. It is particularly useful for preparing text for Language Learning Models (LLMs) and Retrieval-Augmented Generation (RAG) systems. It supports GET, POST, PUT, and DELETE methods.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | The URL of the web page to scrape and convert to Markdown. |
| `method` | string ("get" | "post" | "put" | "delete") | No | HTTP method to use for the request. |
| `browser` | boolean | No | Enables the use of a headless browser for scraping. Default is true. |
| `cookies` | string | No | Cookies to include with the request. |
| `js_snippet` | string | No | Base64-encoded JavaScript to execute on the page after it loads. |
| `proxy_type` | string | No | Specifies the type of proxy to use. |
| `proxy_country` | string | No | Specifies the country for the proxy (e.g., US, GB). |
| `block_resource` | array | No | List of resource types to block (e.g., image, script, stylesheet, font, media, websocket, other). |
| `wait_for_selector` | string | No | CSS selector to wait for before returning the result. |
| `return_page_source` | boolean | No | Returns the raw HTML as received from the server, without JavaScript rendering. Default is false. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Extract Data with AI

**Slug:** `SCRAPINGANT_EXTRACT_DATA_WITH_AI`

This tool allows you to extract structured data from a web page using ScrapingAnt's AI-powered extraction capabilities. You provide a URL and an AI query (prompt) describing what data you want to extract, and the tool returns the extracted data in a structured format. It supports additional parameters for browser rendering, proxies, and cookies to handle dynamic content and localization.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | The URL of the page to extract data from. |
| `cookies` | string | No | Cookies to use for the request. (e.g. cookie1=value1; cookie2=value2) |
| `proxy_type` | string | No | Proxy type to use for the request. (datacenter, residential) |
| `return_text` | boolean | No | Return text content of the page. (default: false) |
| `proxy_country` | string | No | Proxy country to use for the request. (e.g. US, GB, DE) |
| `enable_javascript` | boolean | No | Enable browser rendering. (default: true) |
| `wait_for_selector` | string | No | Wait for a specific selector to appear on the page before extracting data. |
| `extract_properties` | string | Yes | A free-form text describing the data you want to extract. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get API Credits Usage

**Slug:** `SCRAPINGANT_GET_API_CREDITS_USAGE`

This tool retrieves the current API credit usage status for the authenticated ScrapingAnt account. It enables users to monitor their consumption of API credits, check their current usage against the subscription limits, and manage their API credits effectively.

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get V1 Usage (Deprecated)

**Slug:** `SCRAPINGANT_GET_V1_USAGE`

[DEPRECATED - Use v2] Tool to get the current subscription status and API credits usage information. This is the legacy v1 endpoint which is no longer actively maintained.

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Scrape Web Page

**Slug:** `SCRAPINGANT_SCRAPE_WEB_PAGE`

This tool scrapes a web page using the ScrapingAnt API. It fetches the HTML content of the specified URL. Users can customize the scraping behavior by enabling a headless browser, using proxies, waiting for specific elements, executing JavaScript, passing cookies, and blocking certain resources.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | URL of the web page to scrape. |
| `browser` | boolean | No | Enable to use a headless browser for scraping. Defaults to True. If False, JavaScript will not be rendered. |
| `cookies` | string | No | Cookies to pass with the scraping request. |
| `js_snippet` | string | No | Base64 encoded JavaScript snippet to execute on the page. Requires headless browser. |
| `proxy_type` | string ("datacenter" | "residential") | No | Specifies the type of proxy to use. |
| `proxy_country` | string | No | Specifies the country for the proxy. |
| `block_resource` | array | No | List of resource types to block. Requires headless browser. |
| `wait_for_selector` | string | No | CSS selector to wait for before returning the result. Requires headless browser. |
| `return_page_source` | boolean | No | Enable to return the raw HTML from the server without JavaScript rendering. Requires headless browser. Defaults to False. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Scrape Webpage via POST

**Slug:** `SCRAPINGANT_SCRAPE_WEBPAGE_POST`

Tool to perform a POST request through ScrapingAnt's proxy to scrape a webpage. Use when you need to scrape pages that require POST method, such as form submissions or APIs that only accept POST requests. Data is forwarded transparently to the target web page.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | URL of the target web page to scrape using POST request. |
| `browser` | boolean | No | Enable to use a headless browser for scraping. Defaults to True. If False, JavaScript will not be rendered. |
| `cookies` | string | No | Cookies to pass with the scraping request to the target site. Format: cookie_name1=cookie_value1;cookie_name2=cookie_value2 |
| `post_data` | object | No | POST data to send to the target web page. This will be forwarded transparently to the target. Useful for form submissions and APIs requiring POST requests. |
| `js_snippet` | string | No | Base64 encoded JavaScript snippet to execute once the page is loaded. Requires browser=True. |
| `proxy_type` | string ("datacenter" | "residential") | No | Specifies the type of proxy to use for the request. Defaults to datacenter. |
| `proxy_country` | string | No | Specifies the country for the proxy. If not specified, a random country will be used. |
| `block_resource` | array | No | List of resource types to block. Prevents cloud browser from loading specified resource types. Requires browser=True. |
| `wait_for_selector` | string | No | CSS selector to wait for before returning the result. Requires browser=True. |
| `return_page_source` | boolean | No | Enable to return the raw HTML from the server without JavaScript rendering. Requires browser=True. Defaults to False. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Scrape Webpage with PUT

**Slug:** `SCRAPINGANT_SCRAPE_WEBPAGE_PUT`

Tool to perform a PUT request through ScrapingAnt's proxy to scrape a webpage that requires PUT method. Use when the target webpage requires PUT method for data submission. Data is forwarded transparently to the target web page.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | URL of the web page to scrape using PUT method. |
| `browser` | boolean | No | Enable to use a headless browser for scraping. Defaults to True. If False, JavaScript will not be rendered. |
| `cookies` | string | No | Cookies to pass with the scraping request to the target site. Format: cookie_name1=cookie_value1;cookie_name2=cookie_value2 |
| `js_snippet` | string | No | Base64 encoded JavaScript snippet to execute on the page once loaded. Requires headless browser. |
| `proxy_type` | string ("datacenter" | "residential") | No | Specifies the type of proxy to use. Defaults to datacenter. |
| `content_type` | string | No | Content-Type header to use for the PUT request. Will be sent as Ant-Content-Type header. |
| `request_body` | string | No | Request body data to send with the PUT request. This data will be forwarded transparently to the target web page. |
| `proxy_country` | string | No | Specifies the country for the proxy. If not specified, request will be made from a random country. |
| `block_resource` | array | No | List of resource types to block from loading. Requires headless browser. |
| `wait_for_selector` | string | No | CSS selector to wait for before returning the result. Requires headless browser. |
| `return_page_source` | boolean | No | Enable to return the raw HTML from the server without JavaScript rendering. Requires headless browser. Defaults to False. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Scrape Webpage (v1 POST - Deprecated)

**Slug:** `SCRAPINGANT_SCRAPE_WEBPAGE_V1_POST`

[DEPRECATED - Use v2] Tool to scrape a webpage using POST method with ScrapingAnt's v1 API. Returns JSON with content, cookies, and status_code. This is the legacy v1 endpoint which is no longer actively maintained. Use the v2 endpoints for new implementations.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | URL of the web page to scrape. |
| `browser` | boolean | No | Enables using headless browser for scraping. Default: true. If false, JavaScript will not be rendered. |
| `cookies` | string | No | Cookie data to include with the scraping request. Format: cookie_name1=cookie_value1;cookie_name2=cookie_value2 |
| `js_snippet` | string | No | Base64 encoded JavaScript snippet to run once the page is loaded. Requires browser=true. |
| `proxy_type` | string ("datacenter" | "residential") | No | Proxy classification to use. Default is datacenter. Options: datacenter or residential. |
| `return_text` | boolean | No | Enables returning text only content from the page. Default: false. If true, returns plain text instead of HTML. |
| `proxy_country` | string | No | Geographic location for proxy requests. If not specified, a random country will be used. |
| `wait_for_selector` | string | No | CSS selector of the element our service will wait for before returning the result. Requires browser=true. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Scrape with Extended JSON Output

**Slug:** `SCRAPINGANT_SCRAPE_WITH_EXTENDED_JSON_OUTPUT`

Scrapes a web page and returns comprehensive data including HTML content, plain text, cookies, HTTP headers, XHR/Fetch requests, and iframe content. This tool uses ScrapingAnt's extended endpoint which provides much richer data than standard scraping: - Full HTML and extracted plain text content - All cookies and HTTP response headers from the target page - Captured XHR/Fetch API requests made by the page (useful for finding hidden APIs) - Content from embedded iframes Best used when you need more than just the HTML - such as analyzing cookies, headers, or JavaScript API calls made by a page. For simple HTML scraping, consider using the basic scrape tool instead for lower API credit usage.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | The full URL of the web page to scrape (must include protocol, e.g., https://). |
| `browser` | boolean | No | Enable or disable headless browser rendering. When True (default), JavaScript is executed and dynamic content is loaded. Set to False for faster scraping of static pages. |
| `cookies` | string | No | Custom cookies to send with the request. Format: 'name1=value1; name2=value2'. |
| `timeout` | integer | No | Maximum time in seconds to wait for the page to load. Must be between 5 and 60. Default is 60 seconds. |
| `proxy_type` | string | No | Type of proxy to use: 'datacenter' (faster, cheaper) or 'residential' (better for anti-bot sites). Default is 'datacenter'. |
| `proxy_country` | string | No | Two-letter country code (ISO 3166-1 alpha-2) for geographic proxy location (e.g., 'US', 'GB', 'DE'). |
| `wait_for_selector` | string | No | CSS selector to wait for before returning the page content. Useful for pages with dynamic content that loads after initial page load. |
| `return_page_source` | boolean | No | When True, returns the raw HTML from the server without JavaScript rendering. Useful for faster scraping when JS execution is not needed. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |