Ken Andersen's blog about technology related subjects.
Thursday, November 2, 2017
Yahoo Finance API Deprecation
Yahoo! has killed their Finance API that many people used in free stock quote software.Years ago I wrote a series of scripts that would download a CSV file filled with stocks of local interest through the Yahoo! Finance API. The scripts would then parse the data and make it available on a private webpage for on-air announcers to be able to read off the local stock data. The simple query allowed you to tack on your desired symbols and looked like this:
wget http://download.finance.yahoo.com/d/quotes.csv\?s\=\^GSPC\,\^IXIC\,CLAR\,PG\,CLRO\,CAG\,FC\,HUN\,MMSI\,NUE\,NUS\,OA\,OSTK\,D\,SKYW\,USNA\,ZION\&f\=nl1c1p2\&e\=\.csvAt about 4:00 p.m. MDT on November 1, 2017, any call to this quotes.csv service started responding with the following:
Sorry, Unable to process request at this time -- error 999.As of this writing, calls to quotes.csv respond with an HTTP 403 status code and this text:
It has come to our attention that this service is being used in violation of the Yahoo Terms of Service. As such, the service is being discontinued. For all future markets and equities data research, please refer to finance.yahoo.com.It was sure nice of them to give the users a warning. Luckily, I found a method to to scrape the data from their webpages when they stopped allowing the Dow Jones Index to be downloaded through the quotes.csv API.
All I needed was the basic price data that came from the quotes api. I don't need historical data to go along with it. I noticed that the data I need is in the title of the page.
There had to be a way to scrape this data and store it in my own CSV. I tried simple wget queries to the desired symbol URLs, but that doesn't work. There's a lot of JavaScript on the Yahoo! stock pages which manipulates elements on the page. The desired data wasn't showing in the title of the page with wget queries. It simply shows this for the title:
OSTK : Summary for Overstock.com, Inc. - Yahoo FinanceI needed to find a way to parse JavaScript from a script. Research pointed me directly to PhantomJS. The description on their site says that "PhantomJS is a headless WebKit scriptable with a JavaScript API." That's exactly what I needed. What PhantomJS does for me is that it goes to a page, parses the JavaScript, and saves the resulting HTML into a file. What you do is setup what you want it to do in a .js file and call it with the .js file as a parameter. To get the Overstock html from Yahoo! I simply setup a .js file like this:
//render_ostk.jsYou then simply call PhantomJS with render_ostk.js as the single parameter.
var page = require('webpage').create(),
address = 'http://finance.yahoo.com/q?s=OSTK',
output = '/var/www/localhost/htdocs/ostk.html',
fs = require('fs');
page.open(address, function (status) {
if (status !== 'success') {
console.log('Unable to load the address!');
phantom.exit(1);
} else {
window.setTimeout(function () {
page.render(output);
fs.write(output,page.content,'w');
phantom.exit();
}, 20000);
}
});
phantomjs render_ostk.jsThe html page is then saved where you requested it in the output variable. Now use Python to parse the html. I came across a "beautiful" Python library called BeautifulSoup. It makes screen scraping very easy. It turns elements on the page into objects that can be called within Python. Here's my Python code for scraping the ostk.html page:
#yahoo_stock_ostk.pyThis Python code simply finds the title of the webpage, splits the text of the title into separate variables, and then saves those. These values come back as strings from BeautifulSoup, so that is why you see the type casting. I needed to do comparisons on the values to get the plus or minus signs tacked on to the strings when exporting to a CSV file. There's probably a better way to do this, but it worked this way.
import codecs
import datetime
page_name = '/var/www/localhost/htdocs/ostk.html'
local_page = codecs.open(page_name, encoding='utf-8')
from bs4 import BeautifulSoup
soup = BeautifulSoup(local_page, "html.parser")
title = soup.find('title')
titlet = title.text
values = titlet.split()
pricet = values[1]
pricet = pricet.strip()
pricet = pricet.replace(',', "")
change = values[2]
change_flt = float(change)
pchange = values[3]
pchange = pchange.strip()
if change_flt > 0:
change = '+' + change
pchange = '+' + pchange
daystring = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print('"Overstock.com, Inc.",',pricet,""",""",change,""",""",'"',pchange,'"',""",""",'"',daystring,'"')
I made a script which calls PhantomJS for each symbol I want and then runs a Python script to parse the data for each symbol and puts it in the CSV.
#!/bin/bashAt this point I got a cron job setup to run every hour to keep the CSV file updated. From here, the CSV is imported into a private webpage and the values are all together on one convenient page for our announcers.
phantomjs render_oa.js &
phantomjs render_ostk.js &
phantomjs render_d.js &
wait
python yahoo_stock_oa.py >> quotes.csv
python yahoo_stock_ostk.py >> quotes.csv
python yahoo_stock_d.py >> quotes.csv
Hopefully someone can benefit from this as they work through what to do about the Yahoo! Finance API deprecation.
Subscribe to:
Post Comments
(
Atom
)
© The Ramblings of Ken Powered by Bootstrap , Blogger templates and RWD Testing Tool
No comments :
Post a Comment