finds.database.sql

SQL class wrapper, with convenience methods for pandas DataFrames

MIT License

class finds.database.sql.SQL(user: str, password: str, host: str = 'localhost', port: str = '3306', database: str = '', autocommit: str = 'true', charset: str = 'utf8', temp: str = 'temp3658', **kwargs)[source]

Bases: Database

Interface to sqlalchemy, with convenience functions for dataframes

classmethod Index(*args) → Index[source]: Wraps sqlalchemy.Index() with auto-generated index name from args

Table(key: str, *args, **kwargs) → Table[source]: Wraps sqlalchemy.Table() after removing key from metadata

create_all()[source]: Create all tables in metadata

static create_database(user: str, password: str, host: str = 'localhost', port: str = '3306', database: str = '', **kwargs)[source]: Create new database using this user’s credentials

create_engine()[source]: Call and store sqlalchemy.create_engine() and MetaData()

load_dataframe(table: str, df: DataFrame, index_label: str = '', to_sql: bool = True, replace: bool = False)[source]

Load dataframe into sql table, ignoring duplicate primary keys

Parameters:

table – Physical name of table to insert into
df – Source dataframe
index_label – Column name to load index as, None (default) to ignore
to_sql – first attempt pandas.to_sql(), which may fail if duplicate keys; then/else insert ignore from temp table instead.
replace – set True to overwrite table, else append (default)

load_infile(table: str, csvfile: str, options: str = '')[source]

Load table from csv file, using mysql’s load data local infile

Parameters:

table – Physical name of table to load into
csvfile – CSV filename
options – String appended to SQL load infile query

pivot(table: str, index: str, columns: str, values: str, where: str = '', limit: int | None = None, chunksize: int | None = None) → DataFrame[source]

Return sql query result as pivoted data frame

Parameters:

table – Physical name of table to retrieve from
index – Field name to select as dataframe index
columns – Field name to select as column labels
values – Field name to select as values
where – Where clause, optional
limit – Maximum optional number of rows or chunks to return
chunksize – To optionally buildup results in chunks of this size

Returns:

Query result as a pivoted (wide) DataFrame

read_dataframe(q: str)[source]

Return sql query result as data frame

Parameters:: q – query string or SQLAlchemy Selectable
Returns:: DataFrame of results
Raises:: RuntimeError – Failed to run query

remove(key: str)[source]: Remove a table by key name from metadata instance

rollback()[source]: Call sessionmaker() to rollback current transaction in progress

run(q) → Dict | None[source]

Execute sql command

Parameters:: q – query string
Returns:: The result set {‘data’, ‘columns’}, or None.
Raises:: RuntimeError – failed to run query

Examples

>>> sql.run("show databases")
>>> sql.run("show tables")
>>> sql.run('select * from testing')
>>> sql.run('select distinct permno from benchmarks')
>>> sql.run("show create table _")
>>> sql.run("describe _")
>>> sql.run("truncate table _", fetch=False)

summary(table: str, val: str, key: str = '') → DataFrame[source]

Return summary statistics for a field, optionally grouped-by key

Parameters:

table – Physical name of table
val – Field name to summarise
key – Field to group by

Returns:

DataFrame with columns (count, average, max, min)

Examples

>>> sql.summary('annual', 'revt', 'sic')

finds.database.sql.as_dtypes(df: DataFrame, columns: Dict, drop_duplicates: List[str] = [], sort_values: List[str] = [], keep: str = 'first', replace: Dict[str, Tuple[Any, Any]] = {}) → DataFrame[source]

Convert DataFrame dtypes to the given sqlalchemy Column types

Parameters:

df – Input DataFrame to apply new data types from target columns
columns – Target sqlalchemy column types as dict of {column: type}
sort_values – List of column names to sort by
drop_duplicates – list of fields if all duplicated to drop rows
keep – ‘first’ or ‘last’ row to keep if drop duplicates
replace – dict of {column label: tuple(old, replacement) values}

Returns:

DataFrame with columns and rows transformed

Notes:

Columns of DataFrame are dropped if not specified in columns input
If input is None, then return empty DataFrame with given column types
Blank values in boolean and int fields are set to False/0.
Invalid/blank values in double field are coerced to NaN.
Invalid values in int field are coerced to 0