Resources

Knowledge Base

Web Data Extraction

Web data extraction is a very tedious task for most website owners and developers.

UiPath's Web Scraping tool helps you to build an automation to extract data from websites in very short amount of time.

web_scraping

 

1. Accessing the Web Scraping Wizard

You can access the wizard by clicking the Web Scraping in the Design menu.

web_scraping_wizard

 

2. Getting Started

Before running the Web Scraping wizard, make sure that you have already pulled up the website you want to scrape.

 

3. Running the Web Scraping Wizard

The wizard will ask you for two elements from the website you want to scrape. The scraping process is based on the pattern of data you have selected. Let's check the sample image below taken from eBay results. Let's say we want to scrape the name of item listed and the price. The wizard will ask to click the first and second item in the list to build a pattern of what needs to be scrapped. 

eBay_scraping_Pattern

 

4. Selecting an element in the page

At this stage, the Web Scraping wizard will identify what type of page you are trying to extract. If the page is in tabular format like Google Contacts for example, the wizard will be able to detect it. The wizard allows you to select an item you need to scrape. Follow the step-by-step instructions in the wizard and the Workflow will be created automatically once you are done. The first step is to select the first element. If we want to scrape the titles of the item in the sample image, the first element is the "listing title" of the first item in the eBay results page and the second element will be the second item. That's how simple it is. The same pattern is used for scraping the price category.

first_element_web_scraping

The important thing to remember is that when you set the first and second item as your pattern elements, scraping other data from the same category is based on the same pattern.

 

5. Renaming the Column Headers

Once you've selected the first and second item, your data will be saved in a CSV file. The wizard allows you to customize the header names so it will be easier for you to identify and manage the content.  

configure_columns_web_scraping

You can enable the Extract URL option if it's available. 

6. Extract Additional Data

Once you have reached this step, you should have already selected the first and second element for your first item. It will show you a preview of the data to be extracted. If you want to select another set of items form the same website, you can use the extract_correlated_data button. This will proceed to the same process you did for the first item which will ask you for the first and second element.

 

7. Data spanning multiple pages

There are times that data spans multiple pages. The Web Scraping wizard can extract data that spans multiple pages by specifying the Next button from the webpage. 

 

8. Pulling up the Spreadsheet

The wizard will create the Workflow for you. You can then run the wizard to produce the output CSV file. The CSV file will be saved in the same folder where the Workflow is saved. To easily access it, you can go to the Workspace panel, right click on the Workflow file and click on Open Containing Folder. Now you have a solution for extracting a large variety of data from any web page format.