finds.structured.structured

Base class for structured data sets

  • CRSP (daily, monthly, names, delistings, distributions, shares outstanding)

  • S&P/CapitalIQ Compustat (Annual, Quarterly, Key Development, customers)

  • IBES Summary

Notes:

  • Optionally cache SQL query results to Redis store

Copyright 2022-2024, Terence Lim

MIT License

class finds.structured.structured.Lookup(sql: SQL, source: str, target: str, date_field: str, table: str, fillna: Any)[source]

Bases: object

Loads dated identifier mappings to memory, to lookup by date

Parameters:
  • sql – SQL connection instance

  • source – Name of source identifier key

  • target – Name of target identifier key to return

  • date_field – Name of date field in database table

  • table – Physical SQL table name containing identifier mappings

  • fillna – Value to return if not found

__call__(label: List | str, date: int = 99999999) Any[source]

Return target identifiers matched to source as of date

__getitem__(labels)[source]
class finds.structured.structured.Structured(sql: SQL, bd: BusDay, tables: Dict[str, Table], identifier: str, name: str, rdb: RedisDB | None = None, verbose: int = 0)[source]

Bases: object

Base class for interface to structured datasets, stored in SQL

Parameters:
  • sql – Connection instance to mysql database

  • bd – Custom business calendar instance

  • tables – Sqlalchemy Tables and names defined for this datasets group

  • identifier – Name of field of unique identifier key

  • name – Display name for this datasets group

  • rdb – Connector to Redis cache store, if desired

Variables:
  • identifier – Field name of identifier key by this dataset group

  • name – Display name for this dataset group

__getitem__(dataset: str) Table[source]

Return the table object corresponding to a dataset name

build_lookup(source: str, target: str, date_field: str, dataset: str, fillna: Any) Any[source]

Helper to build lookup of target from source identifiers :param source: Name of source identifier key :param target: Name of target identifier key to return :param date_field: Name of date field in database table :param dataset: Internal name of table containing identifier mappings :param fillna: Value to return if not found

drop_all()[source]

Drop all associated tables from SQL database

get_linked(dataset: str, fields: List[str], date_field: str, link_perm: str, link_date: str, where: str = '', limit: int | str | None = None) DataFrame[source]

Query a dataset, and join ‘links’ table to return data with permno

Parameters:
  • dataset – Name internal Table to query data from

  • fields – Data fields to retrieve

  • date_field – Name of date field in data table

  • link_date – Name of link date field in ‘links’ table

  • link_perm – Name of permno field in ‘links’ table

  • where – Where clause (optional)

  • limit – Maximum rows to return (optional)

Returns:

DataFrame containing result of query

get_permnos(keys: List[str], date: int, link_perm: str, link_date: str, permno: str) DataFrame[source]

Returns matching permnos as of a prevailing date from ‘links’ table

Parameters:
  • keys – Input list of identifiers to lookup

  • date – Prevailing date of link

  • link_perm – Name of permno field in ‘links’ table

  • link_date – Name of link date field in ‘links’ table

  • permno – Name of field to output permnos to

Returns:

List of Linked permnos, as of prevailing date; missing set to 0

load_csv(dataset: str, csvfile: str, drop: Dict[str, List[Any]] = {}, keep: Dict[str, List[Any]] = {}, replace: Dict[str, Tuple[Any, Any]] = {}, sep: str = ',', encoding: str = 'latin-1', header: Any = 0, low_memory: bool = False, na_filter: bool = False, **kwargs) DataFrame[source]

Insert ignore into SQL table from csvfile, and return as DataFrame

Parameters:
  • dataset – dataset name

  • csvfile – csv file name

  • keep – {column: values} keep rows whose columns have any of values

  • drop – {column: values} drop rows with any given value in column

  • replace – {column: [old,new]} specifies values to replace in column

  • sep – args for pd.read_csv

  • encoding – args for pd.read_csv

  • header – args for pd.read_csv

  • low_memory – args for pd.read_csv

  • na_filter – args for pd.read_csv

Returns:

DataFrame containing loaded data

Notes:

  • Create new table, if not exists, using associated schema

  • New records with duplicate key are dropped (insert ignore used)

load_dataframe(df: DataFrame, table: Table, to_replace: Any = None, value: Any = None, overwrite: bool = True) int[source]

Load dataframe to SQLAlchemy table object using associated schema

Parameters:
  • df – DataFrame to load from

  • table – Destination Table object

  • to_replace – Original value or list of values to replace

  • value – Value to replace with

  • overwrite – Whether to overwrite or append

Returns:

Number of rows loaded

Notes:

  • DataFrame should contain same column names as Table,

  • DataFrame columns types are converted to Table column types

  • Duplicate primary fields are removed, keeping first

read_dataframe(table: str, where: str = '') DataFrame[source]

Read signal values from sql and return as data frame

Parameters:
  • table – Table to read from

  • where – Where clause str for sql select

Returns:

DataFrame of query results