Author Topic: Scraping Data into Google Sheets  (Read 13590 times)

dungoofed

  • Pencil Stache
  • ****
  • Posts: 661
Scraping Data into Google Sheets
« on: November 10, 2015, 04:13:04 PM »
Hi - was wondering if anyone can help me scrape some data into Google Sheets.

I've just started getting into this whole "scraping" thing (yeah, only five years late). For listed stocks I can use GOOGLEFINANCE= and the Sheet populates nicely, but for things like warrants and bonds which aren't available I need another way to get the data.

An example would be from the following page:

http://www.asx.com.au/asx/markets/interestRateSecurityPrices.do?type=GOVERNMENT_BOND

If you go down to GSBK16, the value in the Valuation Price column (currently around 103.675) is the one I want to scrape. The way I've been trying to do it is using the formula:

=IMPORTXML("http://www.asx.com.au/asx/markets/interestRateSecurityPrices.do?type=GOVERNMENT_BOND","//*[@id='content']/table[1]/tbody/tr[2]/td[13]")

Google Sheets thinks for a moment then gives me a "#N/A Error - Imported Contents is Empty" error.

The way I got the Xpath was to open the page in Opera, select and inspect the element, then right click and select "Copy Xpath". I also changed the double-quotes around "content" to single quotes, and put the entire Xpath string in double-quotes.

Pulling my hair out. Anyone with more experience able to see why this isn't working?

BarkyardBQ

  • Pencil Stache
  • ****
  • Posts: 666
Re: Scraping Data into Google Sheets
« Reply #1 on: November 10, 2015, 04:17:36 PM »
I don't have the googles or a public spreadsheet to try this on, but you can try changing the wrapper quotes to single, and use double where they belong.

=IMPORTXML("http://www.asx.com.au/asx/markets/interestRateSecurityPrices.do?type=GOVERNMENT_BOND",'//*[@id="content"]/table[1]/tbody/tr[2]/td[13]')

And you might also try escaping the double quotes, https://www.google.com/webhp?hl=en#hl=en&q=xpath+escaping+double+quotes

Reading the docs... https://support.google.com/docs/answer/3093342?hl=en

Looks like the quotes might help. Also, are you getting the newest version of sheets in AU?
A maximum of 50 IMPORTXML calls are supported on a single spreadsheet. This limit is removed in the new version of Google Sheets.
« Last Edit: November 10, 2015, 04:23:27 PM by BackyarBQ »

dungoofed

  • Pencil Stache
  • ****
  • Posts: 661
Re: Scraping Data into Google Sheets
« Reply #2 on: November 10, 2015, 04:25:20 PM »
Thanks, checking the above....

dungoofed

  • Pencil Stache
  • ****
  • Posts: 661
Re: Scraping Data into Google Sheets
« Reply #3 on: November 10, 2015, 04:39:29 PM »
Hm can't seem to get anything to work. Have tried:

\"
CHAR(34)

plus all combinations of single, double and no quotes.

Regarding AU version, this is the only place I'm using IMPORTXML in the sheet at the moment.

BarkyardBQ

  • Pencil Stache
  • ****
  • Posts: 666
Re: Scraping Data into Google Sheets
« Reply #4 on: November 10, 2015, 04:53:04 PM »
I'm limited without a google sheet to try with... but I used to love building scraping apps with xpath.

try the importhtml() function.

https://productforums.google.com/forum/#!topic/docs/I1jR-b8wg48

dungoofed

  • Pencil Stache
  • ****
  • Posts: 661
Re: Scraping Data into Google Sheets
« Reply #5 on: November 10, 2015, 04:54:22 PM »
Thanks, will check that too.

I tried to scrape the same data from a completely different page but same result. I'm thinking something might be wrong with the Xpath that Opera creates, so I might try and fire up Chrome and see if that makes a difference.

BarkyardBQ

  • Pencil Stache
  • ****
  • Posts: 666
Re: Scraping Data into Google Sheets
« Reply #6 on: November 10, 2015, 04:56:19 PM »
Thanks, will check that too.

I tried to scrape the same data from a completely different page but same result. I'm thinking something might be wrong with the Xpath that Opera creates, so I might try and fire up Chrome and see if that makes a difference.

This is totally possible, I was using Chrome to validate your query. However, the reliability of xpath queries depends on the quality of the HTML on the page. I used to have to scrub and clean (query further out, and string/find replace/remove) some badly written pages to get the data I wanted.
« Last Edit: November 10, 2015, 04:58:50 PM by BackyarBQ »

dungoofed

  • Pencil Stache
  • ****
  • Posts: 661
Re: Scraping Data into Google Sheets
« Reply #7 on: November 10, 2015, 04:58:32 PM »
Hmm... it might be =ImportHTML that I need? Checking this too...

BarkyardBQ

  • Pencil Stache
  • ****
  • Posts: 666
Re: Scraping Data into Google Sheets
« Reply #8 on: November 10, 2015, 04:59:48 PM »
Hmm... it might be =ImportHTML that I need? Checking this too...

I'd try that... cause if the HTML is bad, xpath could be inaccurate, where importhtml() might read it differently.
« Last Edit: November 10, 2015, 05:01:39 PM by BackyarBQ »

dungoofed

  • Pencil Stache
  • ****
  • Posts: 661
Re: Scraping Data into Google Sheets
« Reply #9 on: November 10, 2015, 05:11:10 PM »
Hm.. ImportHTML doesn't even manage to return the entire table. I'm thinking there is something incompatible between the page and Google Sheets

BarkyardBQ

  • Pencil Stache
  • ****
  • Posts: 666
Re: Scraping Data into Google Sheets
« Reply #10 on: November 10, 2015, 05:13:04 PM »
Using either function see if you can import any cell in the example table. Assuming W3 would have perfect HTML and the table is small.

http://www.w3schools.com/html/html_tables.asp

dungoofed

  • Pencil Stache
  • ****
  • Posts: 661
Re: Scraping Data into Google Sheets
« Reply #11 on: November 10, 2015, 05:24:38 PM »
I think it's an issue with the page. Firefox, Opera and Chrome give me significantly different Xpaths

BarkyardBQ

  • Pencil Stache
  • ****
  • Posts: 666
Re: Scraping Data into Google Sheets
« Reply #12 on: November 10, 2015, 05:29:54 PM »
Bummer.

I don't know how determined you are...

http://webscraper.io
http://scrapy.org

dungoofed

  • Pencil Stache
  • ****
  • Posts: 661
Re: Scraping Data into Google Sheets
« Reply #13 on: November 10, 2015, 06:16:07 PM »
Thanks, will have another look in a bit.

dungoofed

  • Pencil Stache
  • ****
  • Posts: 661
Re: Scraping Data into Google Sheets
« Reply #14 on: November 11, 2015, 02:49:36 AM »
Still haven't solved it. I might have found a clue though.

As per the screenshot I can get some data with
Code: [Select]
=IMPORTXML(A9,"//html") (where A9 has my URL). The thing that is odd is that it appears three columns over. Not sure if this is a clue or not.

edit: the data I want is actually in that F-column cell further down, but I couldn't show it and take a screenshot at the same time.
« Last Edit: November 11, 2015, 02:51:09 AM by dungoofed »

BarkyardBQ

  • Pencil Stache
  • ****
  • Posts: 666
Re: Scraping Data into Google Sheets
« Reply #15 on: November 11, 2015, 09:55:42 AM »
Is it possible that Google doesn't want to take the parameter directly and wants it from a cell?

Try putting the path in a cell where A9 is the URL and A10 is the xpath.
=importxml(A9,A10)

dungoofed

  • Pencil Stache
  • ****
  • Posts: 661
Re: Scraping Data into Google Sheets
« Reply #16 on: November 11, 2015, 12:10:49 PM »
Actually most of the time I have been doing it that was : (

CoderNate

  • 5 O'Clock Shadow
  • *
  • Posts: 38
  • Age: 35
  • Location: Madison, WI
Re: Scraping Data into Google Sheets
« Reply #17 on: November 11, 2015, 06:38:18 PM »
If you're familiar with programming (or want to learn) you could use something like PHP to do an http request to the page you want to scrape and manually parse out the data. I've done this several times in the past.

dungoofed

  • Pencil Stache
  • ****
  • Posts: 661
Re: Scraping Data into Google Sheets
« Reply #18 on: November 11, 2015, 09:25:22 PM »
Willing to give that a go too, but I'd like to use the data in Google Sheets if possible. Let me check if there are any PHP add-ons for Google Sheets.

paulblack2025

  • 5 O'Clock Shadow
  • *
  • Posts: 1
Re: Scraping Data into Google Sheets
« Reply #19 on: June 13, 2016, 02:21:28 AM »
For what you want to extract from the web page you provided. I think Octoparse http://www.octoparse.com/ may interest you.

It's easy to use. just clicking and pointing the data you want from the websites, and Octoparse will get the data for you automatically. It won't let you down.

Download it now:  http://www.octoparse.com/download

Domus

  • 5 O'Clock Shadow
  • *
  • Posts: 1
Re: Scraping Data into Google Sheets
« Reply #20 on: April 12, 2019, 07:52:54 AM »
If you don't want to download software or write code to scrape websites you can use online GrabzIt's Web Scraper to scrape the information you need. If you export the data as XLSX or CSV you will be able to open it in Google Sheets.

MustacheAndaHalf

  • Walrus Stache
  • *******
  • Posts: 6633
Re: Scraping Data into Google Sheets
« Reply #21 on: April 13, 2019, 09:33:00 AM »
@ Domus - The post before yours is from 2016, so this might not be relevant anymore.

solon

  • Handlebar Stache
  • *****
  • Posts: 2359
  • Age: 1823
  • Location: OH
Re: Scraping Data into Google Sheets
« Reply #22 on: April 13, 2019, 12:20:06 PM »
I'm having the same trouble. Doe anyone know how =importxml treats a password-protected page? I'm logged in to the site, but it's giving me the same error OP got.

seattlecyclone

  • Walrus Stache
  • *******
  • Posts: 7254
  • Age: 39
  • Location: Seattle, WA
    • My blog
Re: Scraping Data into Google Sheets
« Reply #23 on: April 14, 2019, 10:24:06 AM »
I'm having the same trouble. Doe anyone know how =importxml treats a password-protected page? I'm logged in to the site, but it's giving me the same error OP got.

Google will send a request from their server to the URL you give, and they don't have your password. You can only get publicly-accessible pages using Google Sheets.

cjmiranda

  • 5 O'Clock Shadow
  • *
  • Posts: 1
  • Location: Toronto
    • Company Website
Re: Scraping Data into Google Sheets
« Reply #24 on: April 21, 2021, 09:54:46 AM »
Kinda late to the party (more than 120 days late..)

just came across this post and thought id share,
Parsehub has a tutorial on how to scrape data into google sheets
https://www.parsehub.com/blog/scrape-web-content-into-google-sheets/

just gotta put in the formula =importadata() in a1 cell once you're done