Thursday, 19 January 2012

Extract Data from a Web Page into an Excel Spreadsheet

While surfing the Web, you may have come across interesting data that you want to use offline. You then faced the tiresome task of copying and pasting all the information row by row, column by column. OutWit Hub‘s “Data” views can automatically do this for you.
In this tutorial, we are going to learn how to grab structured data from a Web page with the “Table” view and export it to an Excel spreadsheet.

1. Launch OutWit Hub
If you haven’t installed OutWit Hub yet, please refer to the Getting Started with OutWit Hub tutorial.
Begin by launching OutWit Hub from Firefox. Open Firefox then click on the OutWit Button in the toolbar.
If the icon is not visible go to the menu bar and select Tools -> OutWit -> OutWit Hub
OutWit Hub will open displaying the Web page currently loaded on Firefox.
2. Go to the Desired Web Page
In the address bar, type the URL of the Website.  You can also type any string to search and OutWit Hub will look for it using the preferred search engine selected in Firefox.
Today, lets use this website which contains detailed information of 2008 Olympic medals: http://simon.forsyth.net/olympics.html
Go to the “Page” view where you can see that OutWit Hub displays the Web page as it would appear in a traditional browser.
Now, select “Data” from the view list and then select “Table,” the first view of the “Data” section.

In the “Data” section, OutWit Hub displays and structures all the data that it recognizes from the current Web page in the following views: tables, lists, guess and scraper.
If the “Table” view is blank, reload the page.
The “Table” view analyzes the source code of the page and extracts the data contained in the HTML tables.
If you do not get the desired results in this view, try clicking “Guess.” OutWit Hub will attempt to recognize the data present in the page even if not properly structured in tables. Another option is to create a scraper in the “Source” view. Click here for a tutorial on creating a scraper.
3. Export the Table into an Excel Spreadsheet
The data in the “Table” view can be edited, filtered, sorted and moved to the “Catch” or exported directly into an Excel file.
Let’s export the current table, so we can work directly on the Excel spreadsheet.
Select the rows you want.
If you want to select several rows, hold down the ctrl key (cmd key for Mac users) and select the desired data. To select all, you can use the shortcuts ctrl-A or cmd-A for Macs.
In the menu bar select “File” then “Export Selection as” or use the shortcuts ctrl-E/cmd-E. Select the destination folder and hit “OK.”
Based on your operating system and your version of Office, when opening the spreadsheet in Excel you may see a window saying the document you are trying to open is different from the one specified by the file extension. This is normal. Hit “OK” to open it.
4. Use the Catch Panel
If you want to save several tables from different Web pages in the same Excel file, you can use the “Catch” to collect the data and then export into directly into Excel.  To export the data from the “Catch,” select all then right click and select “Export Selection As…”
Table results can be dragged and dropped into the “Catch” or can be caught automatically by selecting “Catch Selection.”  Please note that if the “Empty” box is checked the existing information in the “Catch” will be replaced when you load a new Web page.  The tutorial Getting Started With OutWit Hub gives on overview on how to use the catch.
5. Application Examples
HTML tables are common in Web pages and simple to extract with OutWit Hub’s “Table” view. You can get more acquainted with this feature using the following link:


No comments:

Post a Comment