Acquiring data, the first step towards using machine learning for stock trading (drl4t-01)

Xiaoguang Li
3 min readMar 17, 2023

--

Applying machine learning to stock trading is an exciting idea but also a tough challenge. I will try to use a series of posts to introduce various necessary fundamental knowledge to beginners and eventually build a prototype of a stock trading machine learning model. Here, some essential knowledge of Python, machine learning and stock trading is required.

Machine learning always starts with data. To get stock trading data, yfinance is a good place to start.

What is yfinance?

yfinance is a Python open source library developed by Ran Aroussi that offers access to financial data on Yahoo Finance using Yahoo’s publicly available APIs. It is important to note that this library is used for research and educational purposes and is not affiliated, endorsed, or vetted by Yahoo, Inc.

yfinance provides a simple, free and easy-to-use interface, with supports for customizable parameters, to access to a wide range of comprehensive historical financial data that including stocks, ETFs, currencies and cryptocurrencies. Built on top of Pandas, yfinance returns data in the form of DataFrame, enabling seamless integration with other powerful Python data analysis tools such as Pandas, NumPy, and Matplotlib.

Overall, its ease of use, comprehensive data and customizable parameters make it a popular choice for many Python users.

Install yfinance

Assume that Jupyter notebook is installed. To install yfinance, we can use pip:

!pip install yfinance

Get Stock Object

First, import the yfinance library into our Python script:

import yfinance as yf

To get a stock’s object instance, we can use the Ticker() method and pass in the stock symbol. for example, “AAPL” stands for Apple Inc.

stock = yf.Ticker('AAPL')

Get Stock Metadata

With the stock object, yfinance offers a very simple way to get a stock’s metadata:

stock.info

Here’s the metadata of Apple’s stock we got:

{'symbol': 'AAPL',
'dividendDate': 1676505600,
'twoHundredDayAverageChangePercent': 0.09453065,
'averageAnalystRating': '2.0 - Buy',
'fiftyTwoWeekLowChangePercent': 0.30756223,
'language': 'en-US',
'preMarketChangePercent': -0.043118577,
'regularMarketDayRange': '161.271 - 162.47',
'earningsTimestampEnd': 1682942400,
'epsForward': 6.59,
'regularMarketDayHigh': 162.47,
'twoHundredDayAverageChange': 14.022446,
'askSize': 13,
'twoHundredDayAverage': 148.33755,
'bookValue': 3.581,
'marketCap': 2619694907392,
'fiftyTwoWeekHighChange': -16.130005,
'fiftyTwoWeekRange': '124.17 - 178.49',
'fiftyDayAverageChange': 11.489807,
'exchangeDataDelayedBy': 0,
'averageDailyVolume3Month': 68887365,
'firstTradeDateMilliseconds': 345479400000,
'trailingAnnualDividendRate': 0.91,
'fiftyTwoWeekLow': 124.17,
'market': 'us_market',
'regularMarketVolume': 44935963,
'quoteSourceName': 'Nasdaq Real Time Price',
'messageBoardId': 'finmb_24937',
'priceHint': 2,
'regularMarketDayLow': 161.271,
'sourceInterval': 15,
'exchange': 'NMS',
'region': 'US',
'shortName': 'Apple Inc.',
'fiftyDayAverageChangePercent': 0.07615691,
'preMarketTime': 1680268283,
'fullExchangeName': 'NasdaqGS',
'earningsTimestampStart': 1682506740,
'financialCurrency': 'USD',
'displayName': 'Apple',
'gmtOffSetMilliseconds': -14400000,
'regularMarketOpen': 161.53,
'regularMarketTime': '4:00PM EDT',
'regularMarketChangePercent': 0.9889882,
'trailingAnnualDividendYield': 0.00566026,
'quoteType': 'EQUITY',
'averageDailyVolume10Day': 64810750,
'fiftyTwoWeekLowChange': 38.190002,
'fiftyTwoWeekHighChangePercent': -0.09036923,
'typeDisp': 'Equity',
'trailingPE': 28.138649,
'tradeable': False,
'currency': 'USD',
'preMarketPrice': 162.29,
'sharesOutstanding': 15821899776,
'regularMarketPreviousClose': 160.77,
'fiftyTwoWeekHigh': 178.49,
'exchangeTimezoneName': 'America/New_York',
'regularMarketChange': 1.5899963,
'bidSize': 18,
'priceEpsCurrentYear': 27.19598,
'cryptoTradeable': False,
'fiftyDayAverage': 150.8702,
'exchangeTimezoneShortName': 'EDT',
'epsCurrentYear': 5.97,
'marketState': 'PRE',
'regularMarketPrice': 162.36,
'customPriceAlertConfidence': 'HIGH',
'preMarketChange': -0.070007324,
'forwardPE': 24.63733,
'earningsTimestamp': 1675375200,
'ask': 162.41,
'epsTrailingTwelveMonths': 5.77,
'bid': 162.1,
'priceToBook': 45.33929,
'triggerable': True,
'longName': 'Apple Inc.',
'trailingPegRatio': 2.6876}

It is also possible to use different keywords to get individual pieces of metadata, such as:

stock.info['marketCap']

Get Historical Trading Data

To get the historical trading data, we can use the history() method and pass in start date and end date as parameters. The dates should be in the format “YYYY-MM-DD”.

stock.history(start='2022-01-01', end='2022-12-31')

This will return a Pandas DataFrame containing the historical trading data for the given time period.

We also can pass in period as parameter to get most recent historical trading data. The valid periods are: “1d”, “5d”, “1mo”, “3mo”, “6mo”, “1y”, “2y”, “5y”, “10y”, “ytd”, “max”.

stock.history(period='1y')

There are some other parameters we can specify, such as:

  • interval: for example, “1d” for daily data and “1h” for hourly data
  • actions: whether to include dividends and splits, default is True

Below is an example of how to obtain daily historical data for the past two year, excluding dividends and splits:

stock.history(period='2y', interval='1d', actions=False)

Running this script gives us the following data:

The valid intervals are: “1m”, “2m”, “5m”, “15m”, “30m”, “60m”, “90m”, “1h”, “1d”, “5d”, “1wk”, “1mo”, “3mo” (“1m” is only available for the last 7 days, and data interval less than “1d” are only available for the last 60 days).

--

--

Xiaoguang Li

Master of Science in Computational Data Analytics from Georgia Tech, Senior IT Consultant at Morgan Stanley