finds.structured.structured
Base class for structured data sets
CRSP (daily, monthly, names, delistings, distributions, shares outstanding)
S&P/CapitalIQ Compustat (Annual, Quarterly, Key Development, customers)
IBES Summary
Notes:
Optionally cache SQL query results to Redis store
Copyright 2022-2024, Terence Lim
MIT License
- class finds.structured.structured.Lookup(sql: SQL, source: str, target: str, date_field: str, table: str, fillna: Any)[source]
Bases:
object
Loads dated identifier mappings to memory, to lookup by date
- Parameters:
sql – SQL connection instance
source – Name of source identifier key
target – Name of target identifier key to return
date_field – Name of date field in database table
table – Physical SQL table name containing identifier mappings
fillna – Value to return if not found
- class finds.structured.structured.Structured(sql: SQL, bd: BusDay, tables: Dict[str, Table], identifier: str, name: str, rdb: RedisDB | None = None, verbose: int = 0)[source]
Bases:
object
Base class for interface to structured datasets, stored in SQL
- Parameters:
sql – Connection instance to mysql database
bd – Custom business calendar instance
tables – Sqlalchemy Tables and names defined for this datasets group
identifier – Name of field of unique identifier key
name – Display name for this datasets group
rdb – Connector to Redis cache store, if desired
- Variables:
identifier – Field name of identifier key by this dataset group
name – Display name for this dataset group
- build_lookup(source: str, target: str, date_field: str, dataset: str, fillna: Any) Any [source]
Helper to build lookup of target from source identifiers :param source: Name of source identifier key :param target: Name of target identifier key to return :param date_field: Name of date field in database table :param dataset: Internal name of table containing identifier mappings :param fillna: Value to return if not found
- get_linked(dataset: str, fields: List[str], date_field: str, link_perm: str, link_date: str, where: str = '', limit: int | str | None = None) DataFrame [source]
Query a dataset, and join ‘links’ table to return data with permno
- Parameters:
dataset – Name internal Table to query data from
fields – Data fields to retrieve
date_field – Name of date field in data table
link_date – Name of link date field in ‘links’ table
link_perm – Name of permno field in ‘links’ table
where – Where clause (optional)
limit – Maximum rows to return (optional)
- Returns:
DataFrame containing result of query
- get_permnos(keys: List[str], date: int, link_perm: str, link_date: str, permno: str) DataFrame [source]
Returns matching permnos as of a prevailing date from ‘links’ table
- Parameters:
keys – Input list of identifiers to lookup
date – Prevailing date of link
link_perm – Name of permno field in ‘links’ table
link_date – Name of link date field in ‘links’ table
permno – Name of field to output permnos to
- Returns:
List of Linked permnos, as of prevailing date; missing set to 0
- load_csv(dataset: str, csvfile: str, drop: Dict[str, List[Any]] = {}, keep: Dict[str, List[Any]] = {}, replace: Dict[str, Tuple[Any, Any]] = {}, sep: str = ',', encoding: str = 'latin-1', header: Any = 0, low_memory: bool = False, na_filter: bool = False, **kwargs) DataFrame [source]
Insert ignore into SQL table from csvfile, and return as DataFrame
- Parameters:
dataset – dataset name
csvfile – csv file name
keep – {column: values} keep rows whose columns have any of values
drop – {column: values} drop rows with any given value in column
replace – {column: [old,new]} specifies values to replace in column
sep – args for pd.read_csv
encoding – args for pd.read_csv
header – args for pd.read_csv
low_memory – args for pd.read_csv
na_filter – args for pd.read_csv
- Returns:
DataFrame containing loaded data
Notes:
Create new table, if not exists, using associated schema
New records with duplicate key are dropped (insert ignore used)
- load_dataframe(df: DataFrame, table: Table, to_replace: Any = None, value: Any = None, overwrite: bool = True) int [source]
Load dataframe to SQLAlchemy table object using associated schema
- Parameters:
df – DataFrame to load from
table – Destination Table object
to_replace – Original value or list of values to replace
value – Value to replace with
overwrite – Whether to overwrite or append
- Returns:
Number of rows loaded
Notes:
DataFrame should contain same column names as Table,
DataFrame columns types are converted to Table column types
Duplicate primary fields are removed, keeping first