Website Scraping – VBA Code To Import Data From Website To Excel

I am going to show you 2 ways of scraping data from a website. One with the standard approach and 1 with JSON.

For this example, I want the stock numbers from yahoo in my worksheet.

The site is https://finance.yahoo.com/quote/%5EGSPC/history?period1=1520575200&period2=1552111200&interval=1d&filter=history&frequency=1d

This data is rendered in a table format so I can use the following code to bring it back to my worksheet:

Sub ScrapeSite()
    '3/8/18 - this is really laggy and slow!
    Dim appIE As Object
    Set appIE = CreateObject("InternetExplorer.Application")
    
    'navigate to the site
    With appIE
        .navigate "https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC"
        '.Visible = True
    End With
    
    ' Wait while the page is loading
    Do While appIE.Busy
        DoEvents
    Loop

    Dim Table As Object
    Dim tRows As Object
    Dim tCols As Object
    
    Dim tRow As Object
    Dim tCol As Object
        
    Dim intExcelCol As Integer
    Dim intExcelRow As Integer
    
    Set Table = appIE.document.getElementsByTagName("table")
    
    
    'these debug statements are to tell us where we are in the process.
    Debug.Print "Set table..."
    
    'Get the rows
    'set tRows = Table(0).getElementsByTagName("tr")
    
    'sometimes the code stalls and receives an error, but just check your internet connection and rerun the procedure.
    
    'Debug.Print "Got Rows..."
    
    'Get the column headings which use the "td" tag
    Set tCols = Table(0).getElementsByTagName("td")
    
    Debug.Print "Got Columns..."
    
    intExcelRow = 2
    intExcelCol = 1
    
    For Each tCol In tCols
        'after filling in 8 columns move to the next row
        If intExcelCol <= 7 Then
            ' Output the contents of the cell to the spreadsheet
            Debug.Print tCol.innerText
            Sheet1.Cells(intExcelRow, intExcelCol) = tCol.innerText
        
            'Increase the intExcelCol value so the next time around the data will output to the next column
            intExcelCol = intExcelCol + 1
        Else
            intExcelRow = intExcelRow + 1
            intExcelCol = 0
            
            Debug.Print "Row: " & intExcelRow
            
        End If
    Next

    
    
    appIE.Quit
    
    Set tRow = Nothing
    Set tRows = Nothing
    Set tCol = Nothing
    Set tCols = Nothing
    
    Set appIE = Nothing

    MsgBox "Done"
    
End Sub

This method worked buy was very clunky and slow.

One of the big things with web scraping involves getting the tables and rows in the table, and then returning the items back to your sheet, and that’s what the example above does.

This slow and sketchy method wouldn’t work for me. So I needed another way.

The data I want is really just a JSON string.

JSON supposedly replaced XML as a standard way of transferring data.

Anyway that’s fine, but Yahoo is speedily rendering stock quotes via a JSON string.

So what we are looking for is the data after the “HistoricalPriceStore” text. If you want to know if you are dealing with a proper JSON string, paste your string into http://json.parser.online.fr/

So, yes, that assessment is correct. So you can use the following faster code to scrape the information:

Sub ParseMultipleRowsJSON()

    Dim http As New XMLHTTP 'ms xml object library 3.0
    Dim html As New HTMLDocument 'ms html object library
    
    Dim varResponse As Variant
    Dim dict As Dictionary  'reference ms scripting runtime for this
    
    Dim intJSONRows As Integer
    Dim intJSONCols As Integer
    
    Dim strDate As String
    
    strurl = "https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC"
    
    'json online parser at: http://json.parser.online.fr/ (to see if your text is a valid json string)
    
    With http
        .Open "GET", strurl, False
        .send
        'varResponse(1) has the json string
        varResponse = Split(.responseText, "HistoricalPriceStore"":")
        
    End With
    
    'this is using the JsonConverter.bas found at https://github.com/VBA-tools/VBA-JSON (just copy and paste into a module)
    Set dict = ParseJson(varResponse(1))
    
    'initialize the excel destination vars
    intExcelRow = 2
    intExcelCol = 1
    
    Set JSONRowsCollection = dict("prices")
    For intJSONRows = 1 To JSONRowsCollection.Count
        
        Set JSONColsCollection = JSONRowsCollection(intJSONRows)
        'For intJSONCols = 1 To JSONColsCollection.Count
            'If intExcelCol > 7 Then
            '    intExcelCol = 1
            'Else
             
             
             '(((A1/60)/60)/24)+DATE(1970,1,1) - convert the unix date to regular number 43532.60417

             strdate1 = JSONColsCollection("date") / 60 / 60 / 24
             strDate = strdate1 + DateValue("1/1/1970")
             'now format the date as date
             
             
            Sheet1.Cells(intExcelRow, 1) = strDate 'Unix Timestamp date - need to convert
            Sheet1.Cells(intExcelRow, 2) = JSONColsCollection("open")
            Sheet1.Cells(intExcelRow, 3) = JSONColsCollection("high")
            Sheet1.Cells(intExcelRow, 4) = JSONColsCollection("low")
            Sheet1.Cells(intExcelRow, 5) = JSONColsCollection("close")
            Sheet1.Cells(intExcelRow, 6) = JSONColsCollection("volume")
            Sheet1.Cells(intExcelRow, 7) = JSONColsCollection("adjclose")
                'Debug.Print JSONColsCollection("date")
            'End If
            
            'intExcelCol = intExcelCol + 1
            
        'Next intJSONCols
        
        intExcelRow = intExcelRow + 1
    Next intJSONRows
    
    'clean up
    Set http = Nothing
    Set html = Nothing

End Sub

Your end product should look like this:

(no, I didn’t author all of this code, I got the ideas from various places on the internet, and put all the “pieces” together 🙂 . So kudos to all those wonderful people out there, which share their knowledge with us. Thanks!)

Let me know if you have any questions.

****************************************************


Posted

in

by

Tags: