Ken Andersen's blog about technology related subjects.
Thursday, November 2, 2017
Yahoo Finance API Deprecation
Years ago I wrote a series of scripts that would download a CSV file filled with stocks of local interest through the Yahoo! Finance API. The scripts would then parse the data and make it available on a private webpage for on-air announcers to be able to read off the local stock data. The simple query allowed you to tack on your desired symbols and looked like this:
wget http://download.finance.yahoo.com/d/quotes.csv\?s\=\^GSPC\,\^IXIC\,CLAR\,PG\,CLRO\,CAG\,FC\,HUN\,MMSI\,NUE\,NUS\,OA\,OSTK\,D\,SKYW\,USNA\,ZION\&f\=nl1c1p2\&e\=\.csvAt about 4:00 p.m. MDT on November 1, 2017, any call to this quotes.csv service started responding with the following:
Sorry, Unable to process request at this time -- error 999.As of this writing, calls to quotes.csv respond with an HTTP 403 status code and this text:
It has come to our attention that this service is being used in violation of the Yahoo Terms of Service. As such, the service is being discontinued. For all future markets and equities data research, please refer to finance.yahoo.com.It was sure nice of them to give the users a warning. Luckily, I found a method to to scrape the data from their webpages when they stopped allowing the Dow Jones Index to be downloaded through the quotes.csv API.
All I needed was the basic price data that came from the quotes api. I don't need historical data to go along with it. I noticed that the data I need is in the title of the page.
There had to be a way to scrape this data and store it in my own CSV. I tried simple wget queries to the desired symbol URLs, but that doesn't work. There's a lot of JavaScript on the Yahoo! stock pages which manipulates elements on the page. The desired data wasn't showing in the title of the page with wget queries. It simply shows this for the title:
OSTK : Summary for Overstock.com, Inc. - Yahoo FinanceI needed to find a way to parse JavaScript from a script. Research pointed me directly to PhantomJS. The description on their site says that "PhantomJS is a headless WebKit scriptable with a JavaScript API." That's exactly what I needed. What PhantomJS does for me is that it goes to a page, parses the JavaScript, and saves the resulting HTML into a file. What you do is setup what you want it to do in a .js file and call it with the .js file as a parameter. To get the Overstock html from Yahoo! I simply setup a .js file like this:
//render_ostk.jsYou then simply call PhantomJS with render_ostk.js as the single parameter.
var page = require('webpage').create(),
address = 'http://finance.yahoo.com/q?s=OSTK',
output = '/var/www/localhost/htdocs/ostk.html',
fs = require('fs');
page.open(address, function (status) {
if (status !== 'success') {
console.log('Unable to load the address!');
phantom.exit(1);
} else {
window.setTimeout(function () {
page.render(output);
fs.write(output,page.content,'w');
phantom.exit();
}, 20000);
}
});
phantomjs render_ostk.jsThe html page is then saved where you requested it in the output variable. Now use Python to parse the html. I came across a "beautiful" Python library called BeautifulSoup. It makes screen scraping very easy. It turns elements on the page into objects that can be called within Python. Here's my Python code for scraping the ostk.html page:
#yahoo_stock_ostk.pyThis Python code simply finds the title of the webpage, splits the text of the title into separate variables, and then saves those. These values come back as strings from BeautifulSoup, so that is why you see the type casting. I needed to do comparisons on the values to get the plus or minus signs tacked on to the strings when exporting to a CSV file. There's probably a better way to do this, but it worked this way.
import codecs
import datetime
page_name = '/var/www/localhost/htdocs/ostk.html'
local_page = codecs.open(page_name, encoding='utf-8')
from bs4 import BeautifulSoup
soup = BeautifulSoup(local_page, "html.parser")
title = soup.find('title')
titlet = title.text
values = titlet.split()
pricet = values[1]
pricet = pricet.strip()
pricet = pricet.replace(',', "")
change = values[2]
change_flt = float(change)
pchange = values[3]
pchange = pchange.strip()
if change_flt > 0:
change = '+' + change
pchange = '+' + pchange
daystring = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print('"Overstock.com, Inc.",',pricet,""",""",change,""",""",'"',pchange,'"',""",""",'"',daystring,'"')
I made a script which calls PhantomJS for each symbol I want and then runs a Python script to parse the data for each symbol and puts it in the CSV.
#!/bin/bashAt this point I got a cron job setup to run every hour to keep the CSV file updated. From here, the CSV is imported into a private webpage and the values are all together on one convenient page for our announcers.
phantomjs render_oa.js &
phantomjs render_ostk.js &
phantomjs render_d.js &
wait
python yahoo_stock_oa.py >> quotes.csv
python yahoo_stock_ostk.py >> quotes.csv
python yahoo_stock_d.py >> quotes.csv
Hopefully someone can benefit from this as they work through what to do about the Yahoo! Finance API deprecation.
Monday, March 6, 2017
Dell OMSA Email Alerts via PowerShell
The SMTP server at my work has become quite unreliable. This is why I wasn't receiving email alerts from my Dell servers in my previous post. Also, in a web application that we host, students weren't getting email alerts that they should have been receiving. We changed that web application to use Office 365's SMTP server and email alerts have become reliable again. On the Dell servers, I was using an application I found called DellSMTPNotify.exe It worked great when our own SMTP server was reliable. It was time to take advantage of the more reliable SMTP server of our hosted Office 365 Exchange setup, but you have to use TLS encryption over SMTP to use Office 365. The DellSMTPNotify program does not support SSL or TLS encryption over SMTP as far as I can tell. It was time to investigate other solutions.
I found a PowerShell script called OMSA-Notify that someone had already written. You can, indeed, use PowerShell to send email over an encrypted connection. I just needed to figure out how to modify the OMSA-Notify script to suit my needs.
I got everything configured, so I thought, but the script just wasn't working. PowerShell just seemed to be sitting idle and would eventually give a timeout error. That's when I realized I was trying to connect to an SMTP server on the public internet, but I was running all of this on a server that was behind a proxy server. I thought I would just try and configure PowerShell to use the proxy server to send my SMTP commands, but that didn't work either. My research led me to the conclusion that SMTP commands can't be send over an HTTP proxy without installing some sort of relay program. I was at a dead end.
Later on, however, it hit me that I could setup a remote PowerShell session to one of our servers that is using a public IP address in front of the proxy server. My co-worker had already setup something similar in a PowerShell script that connects to our Microsoft Data Protection Manager server to setup automatic backups from individual client machines. To setup a server to accept remote PowerShell sessions, you simply need to run "enable-psremoting" from a PowerShell session running as Administrator. That should do it.
After getting remote PowerShell setup, my script was still failing. The smtp.send command wouldn't work because the mail message wasn't being passed to the remote session properly. I then had to setup the passing of parameters into the remote PowerShell session and actually construct the mail message in the remote session. You simply use the "-ArgumentList" parameter to send local variables to the remote PowerShell, as seen in the Invoke-Command line in the script.
Invoke-Command -Session $RoutableServer -ScriptBlock $ScriptBlockContent -ArgumentList $body, $AlertType, $env:COMPUTERNAMEThis was a fun project to figure out. I'm really happy to have server alerts coming to my inbox again. Here's a pastebin of the full modified OMSANotify.ps1: http://pastebin.com/wvvkNaVk
Wednesday, February 22, 2017
I'm So Glad That RAID Exists
I just discovered at work that a drive in one of our production servers failed about a month ago. The server kept humming along like nothing was wrong at all.
I had email reporting setup to alert our team when there is a failure. Apparently it isn't working anymore. I'll have to look into the alerting system.
With this post I just want to express my gratitude that RAID exists. The acronym RAID stands for Redundant Array of Independent Disks. It allows you to essentially combine many hard disk drives into one large drive. However, that isn't all. RAID has redundancy built in, if you pick the right configuration. In a RAID array your data is spread across all of the disks, which allows for all of the disks to work together for better performance. Parity data is also written to each disk. Parity data is the critical part. This type of data is really many calculations performed and stored on each drive. Parity data allows all of the data on one drive to be reconstructed should a drive fail.
I won't go into detail on how RAID exists in this post. There's plenty of places on the web that explain RAID. It will suffice to say that we configured our RAID array in RAID level 6. This essentially creates double parity across all of the drives. Double parity allows two drives to go dead and still have intact data.
We have a replacement drive coming tomorrow. I'll get that installed and then our array will be back in an optimal state. Right now it is basically doing calculations every time data is requested.
Our users don't even have any idea that a drive in our server has been offline for over a month. I'm really glad RAID exists.
© The Ramblings of Ken Powered by Bootstrap , Blogger templates and RWD Testing Tool