finds.readers.taq

Class and methods to process TAQ trade and quotes tick data

  • NYSE Daily TAQ: Master, NBBO, Trades

  • marker microstructure: bid-ask spreads, trade conditions, tick test

Copyright 2022, Terence Lim

MIT License

class finds.readers.taq.TAQ(taq_file: str, index_file: str = '', symbols_file: str = '')[source]

Bases: object

Base class to manipulate a daily TAQ .csv.gz file

Parameters:
  • taq_file – raw .csv.gz input data file name

  • index_file – name of new (csv.gz) file to write indexed-gzip index

  • symbols_file – name of new (csv.gz) file to write symbols index

Notes:

  • NYSE historical samples: ftp://ftp.nyxdata.com/Historical%20Data%20Samples/

  • Uses indexed_gzip package for random access into gzip file

  • Implements 3 methods to access raw daily TAQ csv.gz files, e.g.:

    • trade(n) - next n csv lines

    • iter(trade) - iterable, by chunk with same stock symbol

    • trade[‘AAPL’] - getitem, by symbol

__call__(symbol: str) Series | None[source]

Return symbol location and size in daily taq gzip file

__getitem__(symbol: str) DataFrame[source]

Get chunk of all rows for the input symbol as a data frame

close()[source]

Close getitem file handle

index_symbols(index_file: str = '', symbols_file: str = '')[source]

Generate indexed_gzip and symbols index files

open(taq_file: str = '')[source]

Open with context manager protocol for sequential line reads

read()[source]

Read entire file as a DataFrame

finds.readers.taq.align_trades(ct: DataFrame, cq: DataFrame, open_t: Timestamp = Timestamp('1900-01-01 09:30:00'), inplace: bool = False) DataFrame | None[source]

Align each trade with prevailing and forward quotes

Parameters:
  • ct – Input dataframe of trades

  • cq – Input dataframe of nbbo quotes

  • open_t – drop quotes prior to open time

  • inplace – whether to overwrite trades dataframe or return as new copy

Returns:

DataFrame of trades with additional columns, if not inplace. else None

Notes:

  • prevailing quote at -1ns, forward quote at +5m, drop quotes before open_t

  • See Holden and Jacobsen (2014), “Liquidity Measurement”

  • Prevailing_Mid: midquote prevailing before each trade

  • Forward_Mid: midquote prevailing 5 minutes after trade

  • Tick_Test: Whether trade price above, below or equals previous trade

finds.readers.taq.bin_quotes(cq: DataFrame, value: int = 15, unit: str = 'm', open_t: Timestamp = Timestamp('1900-01-01 09:30:00'), close_t: Timestamp = Timestamp('1900-01-01 16:00:00')) DataFrame[source]

Resample quotes into time interval bins

Parameters:
  • cq – Input dataframe of nbbo quote

  • value – number of time units per bin width

  • unit – time unit in {‘h’, ‘m’, ‘s’, ‘ms’, ‘us’, ‘ns’}

  • open_t – exclusive left bound of first bin

  • close_t – inclusive right bound of last bin

Returns:

DataFrame of resampled derived quote liquidity metrics

Notes:

  • quoted: time-weighted quoted half-spread

  • depth: time-weighted average of average bid and offer sizes

  • offersize: time-weighted average offer size

  • bidsize: time-weighted average bid size

  • mid: last midquote

  • firstmid: first midquote

  • maxmid: max midquote

  • minmid: min midquote

  • retq: last midquote-to-last midquote return

Examples:

>>> resample(closed='left', label='right')
>>> agg(Series.sum, min_count=1)
>>> result[result.index > open_t], result[result.index <= close_t]
finds.readers.taq.bin_trades(ct: DataFrame, value: int = 5, unit: str = 'm', open_t: Timestamp = Timestamp('1900-01-01 09:30:00'), close_t: Timestamp = Timestamp('1900-01-01 16:00:00')) DataFrame[source]

Resample trades into time interval bins

Parameters:
  • ct – Input dataframe of trades

  • value – number of time units per bin width

  • unit – time unit in {‘h’, ‘m’, ‘s’, ‘ms’, ‘us’, ‘ns’}

  • open_t – exclusive left bound of first bin

  • close_t – inclusive right bound of last bin

Returns:

DataFrame of resampled derived trade liquidity metrics

Notes:

  • counts: number of trades in bin

  • last: last trade price in bin (ffill if none)

  • first: first trade price in bin

  • maxtrade: max trade price in bin

  • mintrade: min trade price in bin

  • ret: last-to-last trade price return

  • vwap: volume weighted average trade price

  • effective: volume-weighted effective relative half-spread

    (trade price divided by prevailing midquote, minus 1)

  • realized: volume-weighted effective relative half-spread

    (trade price divided by 5-minute forward midquote, minus 1)

  • impact: volume-weighted realized relative half-spread

    (5-minute forward midquote divided by prevailing, minus 1)

Examples:

>>> resample(closed='left', label='right')
>>> agg(Series.sum, min_count=1)
>>> result[result.index > open_t], result[result.index <= close_t]
finds.readers.taq.clean_nbbo(df: DataFrame | None, keep: List[str] = ['Best_Bid_Price', 'Best_Bid_Size', 'Best_Offer_Price', 'Best_Offer_Size']) DataFrame | None[source]

Remove bad quotes

Parameters:
  • df – Dataframe containing one day’s nbbo quotes of a stock

  • keep – List of columns to keep

Notes:

  • requires prices and size > 0 and offer > bid price

  • spread <= $5

  • cancel correction != ‘B’

  • condition in [‘A’,’B’,’H’,’O’,’R’,’W’])

  • keep largest sequence number if same time stamp

  • drop duplicated records

finds.readers.taq.clean_trade(df: DataFrame | None, open_t: Timestamp = Timestamp('1900-01-01 09:30:00'), close_t: Timestamp = Timestamp('1900-01-01 16:00:00'), cond: str = 'MOZBTLGWJK145789') DataFrame | None[source]

Remove bad trades

Parameters:
  • df – Dataframe containing one day’s trades of a stock

  • open_t – Exclude records on or before this opening time

  • close_t – Exclude records after this closing time

  • cond – condition chars to exclude

Notes:

  • Requires correction code = 0, price and volume > 0

  • Sale Conditions to exclude by default:

    • M = Market Center Close Price

    • O = Market Center Opening Trade

    • Z = Sold (Out of Sequence)

    • L = Sold Last (Late Reporting)

    • B = Bunched Trade

    • G = Bunched Sold Trade

    • W = Average Price Trade

    • 4 = Derivatively Priced

    • 5 = Re-opening Prints

    • 7 = Qualified Contingent Trade

    • 8 = Placeholder for 611 Exempt

    • 9 = Corrected Consolidated Close Price per the Listing Market

    • K = Rule 127 (NYSE only) or Rule 155 Trade (NYSE MKT only)

    • T = Extended Hours Trade

finds.readers.taq.itertaq(trades: TAQ, quotes: TAQ, master: DataFrame, open_t: Timestamp = Timestamp('1900-01-01 09:30:00'), close_t: Timestamp = 0, cusips: List[str] = [], symbols: List[str] = [], verbose=1, has_shares: bool = True)[source]

Iterates over and filters daily taq trades and quotes by symbol

Parameters:
  • trades – Instance of TAQ trades object

  • quotes – Instance of TAQ nbbo quotes object

  • master – Reference table from Master file, indexed by symbol

  • cusips – List of cusips to select

  • symbols – List of symbols (space separated security class) to select.

  • open_t – Earliest Timestamp of valid trades and quotes

  • close_t – Latest Timestamp to keep trades and quotes, inclusive

  • verbose – Whether to echo messages for debugging

  • has_shares – If True, require ‘Shares_Outstanding’ > 0 in master table

finds.readers.taq.opentaq(date, taqdir: str)[source]

Helper to initialize all master dataframe, trade and quote objects

finds.readers.taq.plot_taq(left1: DataFrame, right1: DataFrame | None = None, left2: DataFrame | None = None, right2: DataFrame | None = None, num: int | None = None, title: str = '', open_t: Timestamp | None = None, close_t: Timestamp | None = None)[source]

Convenience method for 1x2 primary/secondary-y subplots of tick data

finds.readers.taq.taq_from_csv(chunk: str, columns: List[str] = []) DataFrame[source]

Convert csv from TAQ to dataframe with correct dtypes

Parameters:
  • chunk – A chunk of csv text

  • columns – List of column names, else in first line of chunk

Returns:

DataFrame with correct dtypes and column names

Notes:

  • column names (provided or parsed from first line) indicate the corresponding known list of dtypes for nbbo, trade or mast