finds.readers.taq
Class and methods to process TAQ trade and quotes tick data
NYSE Daily TAQ: Master, NBBO, Trades
marker microstructure: bid-ask spreads, trade conditions, tick test
Copyright 2022, Terence Lim
MIT License
- class finds.readers.taq.TAQ(taq_file: str, index_file: str = '', symbols_file: str = '')[source]
Bases:
object
Base class to manipulate a daily TAQ .csv.gz file
- Parameters:
taq_file – raw .csv.gz input data file name
index_file – name of new (csv.gz) file to write indexed-gzip index
symbols_file – name of new (csv.gz) file to write symbols index
Notes:
NYSE historical samples: ftp://ftp.nyxdata.com/Historical%20Data%20Samples/
Uses indexed_gzip package for random access into gzip file
Implements 3 methods to access raw daily TAQ csv.gz files, e.g.:
trade(n) - next n csv lines
iter(trade) - iterable, by chunk with same stock symbol
trade[‘AAPL’] - getitem, by symbol
- __call__(symbol: str) Series | None [source]
Return symbol location and size in daily taq gzip file
- __getitem__(symbol: str) DataFrame [source]
Get chunk of all rows for the input symbol as a data frame
- finds.readers.taq.align_trades(ct: DataFrame, cq: DataFrame, open_t: Timestamp = Timestamp('1900-01-01 09:30:00'), inplace: bool = False) DataFrame | None [source]
Align each trade with prevailing and forward quotes
- Parameters:
ct – Input dataframe of trades
cq – Input dataframe of nbbo quotes
open_t – drop quotes prior to open time
inplace – whether to overwrite trades dataframe or return as new copy
- Returns:
DataFrame of trades with additional columns, if not inplace. else None
Notes:
prevailing quote at -1ns, forward quote at +5m, drop quotes before open_t
See Holden and Jacobsen (2014), “Liquidity Measurement”
Prevailing_Mid: midquote prevailing before each trade
Forward_Mid: midquote prevailing 5 minutes after trade
Tick_Test: Whether trade price above, below or equals previous trade
- finds.readers.taq.bin_quotes(cq: DataFrame, value: int = 15, unit: str = 'm', open_t: Timestamp = Timestamp('1900-01-01 09:30:00'), close_t: Timestamp = Timestamp('1900-01-01 16:00:00')) DataFrame [source]
Resample quotes into time interval bins
- Parameters:
cq – Input dataframe of nbbo quote
value – number of time units per bin width
unit – time unit in {‘h’, ‘m’, ‘s’, ‘ms’, ‘us’, ‘ns’}
open_t – exclusive left bound of first bin
close_t – inclusive right bound of last bin
- Returns:
DataFrame of resampled derived quote liquidity metrics
Notes:
quoted: time-weighted quoted half-spread
depth: time-weighted average of average bid and offer sizes
offersize: time-weighted average offer size
bidsize: time-weighted average bid size
mid: last midquote
firstmid: first midquote
maxmid: max midquote
minmid: min midquote
retq: last midquote-to-last midquote return
Examples:
>>> resample(closed='left', label='right') >>> agg(Series.sum, min_count=1) >>> result[result.index > open_t], result[result.index <= close_t]
- finds.readers.taq.bin_trades(ct: DataFrame, value: int = 5, unit: str = 'm', open_t: Timestamp = Timestamp('1900-01-01 09:30:00'), close_t: Timestamp = Timestamp('1900-01-01 16:00:00')) DataFrame [source]
Resample trades into time interval bins
- Parameters:
ct – Input dataframe of trades
value – number of time units per bin width
unit – time unit in {‘h’, ‘m’, ‘s’, ‘ms’, ‘us’, ‘ns’}
open_t – exclusive left bound of first bin
close_t – inclusive right bound of last bin
- Returns:
DataFrame of resampled derived trade liquidity metrics
Notes:
counts: number of trades in bin
last: last trade price in bin (ffill if none)
first: first trade price in bin
maxtrade: max trade price in bin
mintrade: min trade price in bin
ret: last-to-last trade price return
vwap: volume weighted average trade price
- effective: volume-weighted effective relative half-spread
(trade price divided by prevailing midquote, minus 1)
- realized: volume-weighted effective relative half-spread
(trade price divided by 5-minute forward midquote, minus 1)
- impact: volume-weighted realized relative half-spread
(5-minute forward midquote divided by prevailing, minus 1)
Examples:
>>> resample(closed='left', label='right') >>> agg(Series.sum, min_count=1) >>> result[result.index > open_t], result[result.index <= close_t]
- finds.readers.taq.clean_nbbo(df: DataFrame | None, keep: List[str] = ['Best_Bid_Price', 'Best_Bid_Size', 'Best_Offer_Price', 'Best_Offer_Size']) DataFrame | None [source]
Remove bad quotes
- Parameters:
df – Dataframe containing one day’s nbbo quotes of a stock
keep – List of columns to keep
Notes:
requires prices and size > 0 and offer > bid price
spread <= $5
cancel correction != ‘B’
condition in [‘A’,’B’,’H’,’O’,’R’,’W’])
keep largest sequence number if same time stamp
drop duplicated records
- finds.readers.taq.clean_trade(df: DataFrame | None, open_t: Timestamp = Timestamp('1900-01-01 09:30:00'), close_t: Timestamp = Timestamp('1900-01-01 16:00:00'), cond: str = 'MOZBTLGWJK145789') DataFrame | None [source]
Remove bad trades
- Parameters:
df – Dataframe containing one day’s trades of a stock
open_t – Exclude records on or before this opening time
close_t – Exclude records after this closing time
cond – condition chars to exclude
Notes:
Requires correction code = 0, price and volume > 0
Sale Conditions to exclude by default:
M = Market Center Close Price
O = Market Center Opening Trade
Z = Sold (Out of Sequence)
L = Sold Last (Late Reporting)
B = Bunched Trade
G = Bunched Sold Trade
W = Average Price Trade
4 = Derivatively Priced
5 = Re-opening Prints
7 = Qualified Contingent Trade
8 = Placeholder for 611 Exempt
9 = Corrected Consolidated Close Price per the Listing Market
K = Rule 127 (NYSE only) or Rule 155 Trade (NYSE MKT only)
T = Extended Hours Trade
- finds.readers.taq.itertaq(trades: TAQ, quotes: TAQ, master: DataFrame, open_t: Timestamp = Timestamp('1900-01-01 09:30:00'), close_t: Timestamp = 0, cusips: List[str] = [], symbols: List[str] = [], verbose=1, has_shares: bool = True)[source]
Iterates over and filters daily taq trades and quotes by symbol
- Parameters:
trades – Instance of TAQ trades object
quotes – Instance of TAQ nbbo quotes object
master – Reference table from Master file, indexed by symbol
cusips – List of cusips to select
symbols – List of symbols (space separated security class) to select.
open_t – Earliest Timestamp of valid trades and quotes
close_t – Latest Timestamp to keep trades and quotes, inclusive
verbose – Whether to echo messages for debugging
has_shares – If True, require ‘Shares_Outstanding’ > 0 in master table
- finds.readers.taq.opentaq(date, taqdir: str)[source]
Helper to initialize all master dataframe, trade and quote objects
- finds.readers.taq.plot_taq(left1: DataFrame, right1: DataFrame | None = None, left2: DataFrame | None = None, right2: DataFrame | None = None, num: int | None = None, title: str = '', open_t: Timestamp | None = None, close_t: Timestamp | None = None)[source]
Convenience method for 1x2 primary/secondary-y subplots of tick data
- finds.readers.taq.taq_from_csv(chunk: str, columns: List[str] = []) DataFrame [source]
Convert csv from TAQ to dataframe with correct dtypes
- Parameters:
chunk – A chunk of csv text
columns – List of column names, else in first line of chunk
- Returns:
DataFrame with correct dtypes and column names
Notes:
column names (provided or parsed from first line) indicate the corresponding known list of dtypes for nbbo, trade or mast