# Webscraper io

WebScraper.IO is a web scraping tool that makes web data extraction easy and accessible for everyone through a cloud-based API.

- **Category:** developer tools
- **Auth:** API_KEY
- **Composio Managed App Available?** N/A
- **Tools:** 10
- **Triggers:** 0
- **Slug:** `WEBSCRAPER_IO`
- **Version:** 00000000_00

## Tools

### Create Sitemap

**Slug:** `WEBSCRAPER_IO_CREATE_SITEMAP`

Tool to create a new sitemap configuration for web scraping. Use when you need to define a new scraping structure with start URLs and selector rules for data extraction from a website.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `startUrl` | array | Yes | Array of starting URLs where scraping begins. At least one URL is required. |
| `selectors` | array | Yes | Array of selector objects defining data extraction rules. Minimum one selector required. |
| `sitemap_id` | string | Yes | Unique identifier for the sitemap. Must be alphanumeric with hyphens (e.g., 'webscraper-io-landing'). |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Delete Sitemap

**Slug:** `WEBSCRAPER_IO_DELETE_SITEMAP`

Tool to permanently delete a sitemap configuration from Web Scraper Cloud account. Use when you need to remove a sitemap that is no longer needed.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `sitemap_id` | integer | Yes | The unique identifier of the sitemap to delete |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Disable Sitemap Scheduler

**Slug:** `WEBSCRAPER_IO_DISABLE_SITEMAP_SCHEDULER`

Tool to disable automatic scheduling for a sitemap. Use when you need to stop automated scraping jobs from running on a schedule.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `sitemap_id` | integer | Yes | The unique identifier of the sitemap whose scheduler should be disabled |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Enable Sitemap Scheduler

**Slug:** `WEBSCRAPER_IO_ENABLE_SITEMAP_SCHEDULER`

Tool to enable and configure automatic scheduling for sitemap scraping jobs. Use when you need to automate scraping jobs to run at specific times using cron expressions with customizable request intervals, page load delays, driver types, and proxy settings.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `proxy` | string | No | Proxy configuration. Use format 'datacenter-{country_code}' (e.g., 'datacenter-us') or 'residential-{country_code}' (e.g., 'residential-us'), or 0 for no proxy, or 1 to use proxy, or proxy id for Scale plan users |
| `driver` | string ("fast" | "fulljs") | Yes | Scraper driver type. 'fast' doesn't execute JavaScript and extracts from raw HTML. 'fulljs' is full driver with JavaScript execution |
| `cron_day` | string | Yes | Day of month field of cron expression. Use '*' for any day, '1-31' for range, or specific values |
| `cron_hour` | string | Yes | Hour field of cron expression. Use '*' for any hour, '0-23' for range, or specific values like '9,17' |
| `cron_month` | string | Yes | Month field of cron expression. Use '*' for any month, '1-12' for range, or specific values |
| `sitemap_id` | integer | Yes | The unique identifier of the sitemap to enable scheduling for |
| `cron_minute` | string | Yes | Minute field of cron expression. Use '*' for any minute, '*/10' for every 10 minutes, or specific values like '0,15,30,45' |
| `cron_weekday` | string | Yes | Day of week field of cron expression. Use '*' for any weekday, '0-6' for range (0=Sunday), or specific values |
| `cron_timezone` | string | Yes | Timezone for cron schedule using tz database format (e.g., 'Europe/Riga', 'America/New_York', 'Asia/Tokyo') |
| `page_load_delay` | integer | Yes | Time period in milliseconds that scraper will wait for the page to load before extracting data. Default is 2000ms (2 seconds) |
| `request_interval` | integer | Yes | Page request interval in milliseconds. Default is 2000ms (2 seconds). Defines the delay between page requests during scraping |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Account Info

**Slug:** `WEBSCRAPER_IO_GET_ACCOUNT_INFO`

Tool to retrieve account information including email and page credits. Use when you need to check account details or available credits.

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Scraping Jobs

**Slug:** `WEBSCRAPER_IO_GET_SCRAPING_JOBS`

Tool to retrieve all scraping jobs for the account with optional filtering and pagination. Use when you need to list scraping jobs, check job status, or filter jobs by sitemap or tag.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `tag` | string | No | Filter jobs by tag name. Use to retrieve jobs with a specific tag. |
| `page` | integer | No | Page number for pagination. Use to retrieve specific page of results. |
| `sitemap_id` | integer | No | Filter jobs by specific sitemap ID. Use to retrieve jobs for a particular sitemap. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Sitemap

**Slug:** `WEBSCRAPER_IO_GET_SITEMAP`

Tool to retrieve a specific sitemap configuration by ID. Use when you need to inspect or reference an existing sitemap's configuration.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `sitemap_id` | integer | Yes | The numeric identifier of the sitemap to retrieve |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Sitemaps

**Slug:** `WEBSCRAPER_IO_GET_SITEMAPS`

Tool to retrieve all sitemaps for the authenticated account with pagination support. Use when you need to list available sitemaps or filter them by tag. Supports optional pagination via page parameter and filtering by tag name.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `tag` | string | No | Filter sitemaps by tag name to retrieve only sitemaps with a specific tag. |
| `page` | integer | No | Page number for pagination (e.g., 2 for the second page). |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Sitemap Scheduler

**Slug:** `WEBSCRAPER_IO_GET_SITEMAP_SCHEDULER`

Tool to retrieve scheduler configuration for a sitemap. Use when you need to check scheduling settings including cron configuration and proxy settings.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `sitemap_id` | integer | Yes | The unique identifier of the sitemap |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Update Sitemap

**Slug:** `WEBSCRAPER_IO_UPDATE_SITEMAP`

Tool to update an existing sitemap configuration including structure, URLs, and selectors. Use when you need to modify sitemap settings.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `_id` | string | Yes | Internal identifier for the sitemap |
| `startUrl` | array | Yes | Array of URLs where scraping begins |
| `selectors` | array | Yes | Array of selector objects defining data extraction rules |
| `sitemap_id` | integer | Yes | The unique identifier of the sitemap to update |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |