This is a quick tutorial on how to fetch stock price data from Yahoo Finance, import it into a Pandas DataFrame and then plot it.
If you're new to data science with Python I highly recommend reading A modern guide to getting started with Data Science and Python. I also recommend working with the Anaconda Python distribution.
First visit Yahoo Finance and search for a ticker. For this tutorial I used the S&P 500 ETF: SPY
After you've searched for the ticker click the Historical Prices link.
Scroll to the bottom of the page and find the Down to Spreadsheet link. Right click and copy the link address to your clipboard. The link conveniently points to a .CSV file with historical data going back to 1993 (in the case of SPY).
Let's write some Python code. First, import the necessary libraries.
import numpy as np import matplotlib.pyplot as pp import pandas as pd import seaborn import urllib.request
Instruct Python to show our plots inline on the screen.
urllib to fetch the .CSV data file from the link above.
import urllib.request urllib.request.urlretrieve( 'http://real-chart.finance.yahoo.com/table.csv?s=SPY&d=1&e=12&f=2016&g=d&a=0&b=29&c=1993', 'spy.csv' )
Inspect the first 10 lines of the data file.
Import the price data into a Pandas DataFrame using the
read_csv function. The first column contains the trading date so tell Pandas to look for dates and parse them into the correct datetime64 data type.
spy = pd.read_csv('spy.csv',parse_dates=['Date'])
Inspect the data types of the SPY DataFrame. Notice how Pandas automatically parsed them into the correct data types.
Now let's inspect the first 5 lines of the Pandas DataFrame using the
head() function. Notice how Yahoo gave us the data in reverse chronological order. The most recent data is at the beginning. This is backwards from what we need in order to plot the data.
Fix the sort order using the
spy = spy.sort_values(by='Date')
Let's make the trading date the index for the Pandas DataFrame using the
Plot the closing price of SPY over the entire date range in our DataFrame.
truncate() function to remove data prior to January 1st, 2015 and plot. i.e. Plot only data from January 1st, 2015 to present. Notice that the truncate function doesn't modify the data which is good because you'll most likely want all the data intact for later plots and analysis.