finds.database.sql

SQL class wrapper, with convenience methods for pandas DataFrames

Copyright 2022-2024, Terence Lim

MIT License

class finds.database.sql.SQL(user: str, password: str, host: str = 'localhost', port: str = '3306', database: str = '', autocommit: str = 'true', charset: str = 'utf8', temp: str = 'temp3658', **kwargs)[source]

Bases: Database

Interface to sqlalchemy, with convenience functions for dataframes

classmethod Index(*args) Index[source]

Wraps sqlalchemy.Index() with auto-generated index name from args

Table(key: str, *args, **kwargs) Table[source]

Wraps sqlalchemy.Table() after removing key from metadata

create_all()[source]

Create all tables in metadata

static create_database(user: str, password: str, host: str = 'localhost', port: str = '3306', database: str = '', **kwargs)[source]

Create new database using this user’s credentials

create_engine()[source]

Call and store sqlalchemy.create_engine() and MetaData()

load_dataframe(table: str, df: DataFrame, index_label: str = '', to_sql: bool = True, replace: bool = False)[source]

Load dataframe into sql table, ignoring duplicate primary keys

Parameters:
  • table – Physical name of table to insert into

  • df – Source dataframe

  • index_label – Column name to load index as, None (default) to ignore

  • to_sql – first attempt pandas.to_sql(), which may fail if duplicate keys; then/else insert ignore from temp table instead.

  • replace – set True to overwrite table, else append (default)

load_infile(table: str, csvfile: str, options: str = '')[source]

Load table from csv file, using mysql’s load data local infile

Parameters:
  • table – Physical name of table to load into

  • csvfile – CSV filename

  • options – String appended to SQL load infile query

pivot(table: str, index: str, columns: str, values: str, where: str = '', limit: int | None = None, chunksize: int | None = None) DataFrame[source]

Return sql query result as pivoted data frame

Parameters:
  • table – Physical name of table to retrieve from

  • index – Field name to select as dataframe index

  • columns – Field name to select as column labels

  • values – Field name to select as values

  • where – Where clause, optional

  • limit – Maximum optional number of rows or chunks to return

  • chunksize – To optionally buildup results in chunks of this size

Returns:

Query result as a pivoted (wide) DataFrame

read_dataframe(q: str)[source]

Return sql query result as data frame

Parameters:

q – query string or SQLAlchemy Selectable

Returns:

DataFrame of results

Raises:

RuntimeError – Failed to run query

remove(key: str)[source]

Remove a table by key name from metadata instance

rollback()[source]

Call sessionmaker() to rollback current transaction in progress

run(q) Dict | None[source]

Execute sql command

Parameters:

q – query string

Returns:

The result set {‘data’, ‘columns’}, or None.

Raises:

RuntimeError – failed to run query

Examples

>>> sql.run("show databases")
>>> sql.run("show tables")
>>> sql.run('select * from testing')
>>> sql.run('select distinct permno from benchmarks')
>>> sql.run("show create table _")
>>> sql.run("describe _")
>>> sql.run("truncate table _", fetch=False)
summary(table: str, val: str, key: str = '') DataFrame[source]

Return summary statistics for a field, optionally grouped-by key

Parameters:
  • table – Physical name of table

  • val – Field name to summarise

  • key – Field to group by

Returns:

DataFrame with columns (count, average, max, min)

Examples

>>> sql.summary('annual', 'revt', 'sic')
finds.database.sql.as_dtypes(df: DataFrame, columns: Dict, drop_duplicates: List[str] = [], sort_values: List[str] = [], keep: str = 'first', replace: Dict[str, Tuple[Any, Any]] = {}) DataFrame[source]

Convert DataFrame dtypes to the given sqlalchemy Column types

Parameters:
  • df – Input DataFrame to apply new data types from target columns

  • columns – Target sqlalchemy column types as dict of {column: type}

  • sort_values – List of column names to sort by

  • drop_duplicates – list of fields if all duplicated to drop rows

  • keep – ‘first’ or ‘last’ row to keep if drop duplicates

  • replace – dict of {column label: tuple(old, replacement) values}

Returns:

DataFrame with columns and rows transformed

Notes:

  • Columns of DataFrame are dropped if not specified in columns input

  • If input is None, then return empty DataFrame with given column types

  • Blank values in boolean and int fields are set to False/0.

  • Invalid/blank values in double field are coerced to NaN.

  • Invalid values in int field are coerced to 0