finds.structured.structured

Base class for structured data sets

CRSP (daily, monthly, names, delistings, distributions, shares outstanding)
S&P/CapitalIQ Compustat (Annual, Quarterly, Key Development, customers)
IBES Summary

Notes:

Optionally cache SQL query results to Redis store

MIT License

class finds.structured.structured.Lookup(sql: SQL, source: str, target: str, date_field: str, table: str, fillna: Any)[source]

Bases: object

Loads dated identifier mappings to memory, to lookup by date

Parameters:

sql – SQL connection instance
source – Name of source identifier key
target – Name of target identifier key to return
date_field – Name of date field in database table
table – Physical SQL table name containing identifier mappings
fillna – Value to return if not found

__call__(label: List | str, date: int = 99999999) → Any[source]: Return target identifiers matched to source as of date

__getitem__(labels)[source]

class finds.structured.structured.Structured(sql: SQL, bd: BusDay, tables: Dict[str, Table], identifier: str, name: str, rdb: RedisDB | None = None, verbose: int = 0)[source]

Bases: object

Base class for interface to structured datasets, stored in SQL

Parameters:

sql – Connection instance to mysql database
bd – Custom business calendar instance
tables – Sqlalchemy Tables and names defined for this datasets group
identifier – Name of field of unique identifier key
name – Display name for this datasets group
rdb – Connector to Redis cache store, if desired

Variables:

identifier – Field name of identifier key by this dataset group
name – Display name for this dataset group

__getitem__(dataset: str) → Table[source]: Return the table object corresponding to a dataset name

build_lookup(source: str, target: str, date_field: str, dataset: str, fillna: Any) → Any[source]: Helper to build lookup of target from source identifiers :param source: Name of source identifier key :param target: Name of target identifier key to return :param date_field: Name of date field in database table :param dataset: Internal name of table containing identifier mappings :param fillna: Value to return if not found

drop_all()[source]: Drop all associated tables from SQL database

get_linked(dataset: str, fields: List[str], date_field: str, link_perm: str, link_date: str, where: str = '', limit: int | str | None = None) → DataFrame[source]

Query a dataset, and join ‘links’ table to return data with permno

Parameters:

dataset – Name internal Table to query data from
fields – Data fields to retrieve
date_field – Name of date field in data table
link_date – Name of link date field in ‘links’ table
link_perm – Name of permno field in ‘links’ table
where – Where clause (optional)
limit – Maximum rows to return (optional)

Returns:

DataFrame containing result of query

get_permnos(keys: List[str], date: int, link_perm: str, link_date: str, permno: str) → DataFrame[source]

Returns matching permnos as of a prevailing date from ‘links’ table

Parameters:

keys – Input list of identifiers to lookup
date – Prevailing date of link
link_perm – Name of permno field in ‘links’ table
link_date – Name of link date field in ‘links’ table
permno – Name of field to output permnos to

Returns:

List of Linked permnos, as of prevailing date; missing set to 0

load_csv(dataset: str, csvfile: str, drop: Dict[str, List[Any]] = {}, keep: Dict[str, List[Any]] = {}, replace: Dict[str, Tuple[Any, Any]] = {}, sep: str = ',', encoding: str = 'latin-1', header: Any = 0, low_memory: bool = False, na_filter: bool = False, **kwargs) → DataFrame[source]

Insert ignore into SQL table from csvfile, and return as DataFrame

Parameters:

dataset – dataset name
csvfile – csv file name
keep – {column: values} keep rows whose columns have any of values
drop – {column: values} drop rows with any given value in column
replace – {column: [old,new]} specifies values to replace in column
sep – args for pd.read_csv
encoding – args for pd.read_csv
header – args for pd.read_csv
low_memory – args for pd.read_csv
na_filter – args for pd.read_csv

Returns:

DataFrame containing loaded data

Notes:

Create new table, if not exists, using associated schema
New records with duplicate key are dropped (insert ignore used)

load_dataframe(df: DataFrame, table: Table, to_replace: Any = None, value: Any = None, overwrite: bool = True) → int[source]

Load dataframe to SQLAlchemy table object using associated schema

Parameters:

df – DataFrame to load from
table – Destination Table object
to_replace – Original value or list of values to replace
value – Value to replace with
overwrite – Whether to overwrite or append

Returns:

Number of rows loaded

Notes:

DataFrame should contain same column names as Table,
DataFrame columns types are converted to Table column types
Duplicate primary fields are removed, keeping first

read_dataframe(table: str, where: str = '') → DataFrame[source]

Read signal values from sql and return as data frame

Parameters:

table – Table to read from
where – Where clause str for sql select

Returns:

DataFrame of query results