Resources

Knowledge Base

Web Scraping Structured Data.Get News

webScrapping

Task

This sample gets the first 3 news stories from Mashable about a specific domain.

Steps to automate

  1. Display a list of categories.
  2. Extract first 3 news from that category.
  3. Write them into a text file.

Solution

  1. Use "Input Dialog" activity to allow the user to choose a news category.
  2. Use a "FlowDecision" activity to check if the user chose "None". If so, display a friendly message.
  3. Otherwise, get the selected category and open the proper website (mashable.com/Category/).
  4. After the website is opened, extract the news title and the URL of the stories. Close the tab.
  5. For each extracted news, navigate to the corresponding url and extract the content (without photos). Use Design->Screen Scraping Wizard and indicate on screen the region to be extracted. Save the result in a text document. The text document will contain the title and the content of each news story.
  6. Allow the user to select a news category again. 
Getnews
get_newz