finds.database.sql
SQL class wrapper, with convenience methods for pandas DataFrames
Copyright 2022-2024, Terence Lim
MIT License
- class finds.database.sql.SQL(user: str, password: str, host: str = 'localhost', port: str = '3306', database: str = '', autocommit: str = 'true', charset: str = 'utf8', temp: str = 'temp3658', **kwargs)[source]
- Bases: - Database- Interface to sqlalchemy, with convenience functions for dataframes - classmethod Index(*args) Index[source]
- Wraps sqlalchemy.Index() with auto-generated index name from args 
 - Table(key: str, *args, **kwargs) Table[source]
- Wraps sqlalchemy.Table() after removing key from metadata 
 - static create_database(user: str, password: str, host: str = 'localhost', port: str = '3306', database: str = '', **kwargs)[source]
- Create new database using this user’s credentials 
 - load_dataframe(table: str, df: DataFrame, index_label: str = '', to_sql: bool = True, replace: bool = False)[source]
- Load dataframe into sql table, ignoring duplicate primary keys - Parameters:
- table – Physical name of table to insert into 
- df – Source dataframe 
- index_label – Column name to load index as, None (default) to ignore 
- to_sql – first attempt pandas.to_sql(), which may fail if duplicate keys; then/else insert ignore from temp table instead. 
- replace – set True to overwrite table, else append (default) 
 
 
 - load_infile(table: str, csvfile: str, options: str = '')[source]
- Load table from csv file, using mysql’s load data local infile - Parameters:
- table – Physical name of table to load into 
- csvfile – CSV filename 
- options – String appended to SQL load infile query 
 
 
 - pivot(table: str, index: str, columns: str, values: str, where: str = '', limit: int | None = None, chunksize: int | None = None) DataFrame[source]
- Return sql query result as pivoted data frame - Parameters:
- table – Physical name of table to retrieve from 
- index – Field name to select as dataframe index 
- columns – Field name to select as column labels 
- values – Field name to select as values 
- where – Where clause, optional 
- limit – Maximum optional number of rows or chunks to return 
- chunksize – To optionally buildup results in chunks of this size 
 
- Returns:
- Query result as a pivoted (wide) DataFrame 
 
 - read_dataframe(q: str)[source]
- Return sql query result as data frame - Parameters:
- q – query string or SQLAlchemy Selectable 
- Returns:
- DataFrame of results 
- Raises:
- RuntimeError – Failed to run query 
 
 - run(q) Dict | None[source]
- Execute sql command - Parameters:
- q – query string 
- Returns:
- The result set {‘data’, ‘columns’}, or None. 
- Raises:
- RuntimeError – failed to run query 
 - Examples - >>> sql.run("show databases") >>> sql.run("show tables") >>> sql.run('select * from testing') >>> sql.run('select distinct permno from benchmarks') >>> sql.run("show create table _") >>> sql.run("describe _") >>> sql.run("truncate table _", fetch=False) 
 - summary(table: str, val: str, key: str = '') DataFrame[source]
- Return summary statistics for a field, optionally grouped-by key - Parameters:
- table – Physical name of table 
- val – Field name to summarise 
- key – Field to group by 
 
- Returns:
- DataFrame with columns (count, average, max, min) 
 - Examples - >>> sql.summary('annual', 'revt', 'sic') 
 
- finds.database.sql.as_dtypes(df: DataFrame, columns: Dict, drop_duplicates: List[str] = [], sort_values: List[str] = [], keep: str = 'first', replace: Dict[str, Tuple[Any, Any]] = {}) DataFrame[source]
- Convert DataFrame dtypes to the given sqlalchemy Column types - Parameters:
- df – Input DataFrame to apply new data types from target columns 
- columns – Target sqlalchemy column types as dict of {column: type} 
- sort_values – List of column names to sort by 
- drop_duplicates – list of fields if all duplicated to drop rows 
- keep – ‘first’ or ‘last’ row to keep if drop duplicates 
- replace – dict of {column label: tuple(old, replacement) values} 
 
- Returns:
- DataFrame with columns and rows transformed 
 - Notes: - Columns of DataFrame are dropped if not specified in columns input 
- If input is None, then return empty DataFrame with given column types 
- Blank values in boolean and int fields are set to False/0. 
- Invalid/blank values in double field are coerced to NaN. 
- Invalid values in int field are coerced to 0