finds.database.sql
SQL class wrapper, with convenience methods for pandas DataFrames
Copyright 2022-2024, Terence Lim
MIT License
- class finds.database.sql.SQL(user: str, password: str, host: str = 'localhost', port: str = '3306', database: str = '', autocommit: str = 'true', charset: str = 'utf8', temp: str = 'temp3658', **kwargs)[source]
Bases:
Database
Interface to sqlalchemy, with convenience functions for dataframes
- classmethod Index(*args) Index [source]
Wraps sqlalchemy.Index() with auto-generated index name from args
- Table(key: str, *args, **kwargs) Table [source]
Wraps sqlalchemy.Table() after removing key from metadata
- static create_database(user: str, password: str, host: str = 'localhost', port: str = '3306', database: str = '', **kwargs)[source]
Create new database using this user’s credentials
- load_dataframe(table: str, df: DataFrame, index_label: str = '', to_sql: bool = True, replace: bool = False)[source]
Load dataframe into sql table, ignoring duplicate primary keys
- Parameters:
table – Physical name of table to insert into
df – Source dataframe
index_label – Column name to load index as, None (default) to ignore
to_sql – first attempt pandas.to_sql(), which may fail if duplicate keys; then/else insert ignore from temp table instead.
replace – set True to overwrite table, else append (default)
- load_infile(table: str, csvfile: str, options: str = '')[source]
Load table from csv file, using mysql’s load data local infile
- Parameters:
table – Physical name of table to load into
csvfile – CSV filename
options – String appended to SQL load infile query
- pivot(table: str, index: str, columns: str, values: str, where: str = '', limit: int | None = None, chunksize: int | None = None) DataFrame [source]
Return sql query result as pivoted data frame
- Parameters:
table – Physical name of table to retrieve from
index – Field name to select as dataframe index
columns – Field name to select as column labels
values – Field name to select as values
where – Where clause, optional
limit – Maximum optional number of rows or chunks to return
chunksize – To optionally buildup results in chunks of this size
- Returns:
Query result as a pivoted (wide) DataFrame
- read_dataframe(q: str)[source]
Return sql query result as data frame
- Parameters:
q – query string or SQLAlchemy Selectable
- Returns:
DataFrame of results
- Raises:
RuntimeError – Failed to run query
- run(q) Dict | None [source]
Execute sql command
- Parameters:
q – query string
- Returns:
The result set {‘data’, ‘columns’}, or None.
- Raises:
RuntimeError – failed to run query
Examples
>>> sql.run("show databases") >>> sql.run("show tables") >>> sql.run('select * from testing') >>> sql.run('select distinct permno from benchmarks') >>> sql.run("show create table _") >>> sql.run("describe _") >>> sql.run("truncate table _", fetch=False)
- summary(table: str, val: str, key: str = '') DataFrame [source]
Return summary statistics for a field, optionally grouped-by key
- Parameters:
table – Physical name of table
val – Field name to summarise
key – Field to group by
- Returns:
DataFrame with columns (count, average, max, min)
Examples
>>> sql.summary('annual', 'revt', 'sic')
- finds.database.sql.as_dtypes(df: DataFrame, columns: Dict, drop_duplicates: List[str] = [], sort_values: List[str] = [], keep: str = 'first', replace: Dict[str, Tuple[Any, Any]] = {}) DataFrame [source]
Convert DataFrame dtypes to the given sqlalchemy Column types
- Parameters:
df – Input DataFrame to apply new data types from target columns
columns – Target sqlalchemy column types as dict of {column: type}
sort_values – List of column names to sort by
drop_duplicates – list of fields if all duplicated to drop rows
keep – ‘first’ or ‘last’ row to keep if drop duplicates
replace – dict of {column label: tuple(old, replacement) values}
- Returns:
DataFrame with columns and rows transformed
Notes:
Columns of DataFrame are dropped if not specified in columns input
If input is None, then return empty DataFrame with given column types
Blank values in boolean and int fields are set to False/0.
Invalid/blank values in double field are coerced to NaN.
Invalid values in int field are coerced to 0