finds.recipes.econs

Economics tools

  • Bai and Ng (2002), McCracken and Ng (2015, 2020) factors-EM algorithm

Copyright 2022, Terence Lim

MIT License

finds.recipes.econs.approximate_factors(X: DataFrame, kmax: int = 0, p: int = 2, max_iter: int = 50, tol: float = 1e-12, verbose: int = 1) DataFrame[source]

Fill in missing values with factor model EM algorithm Bai and Ng (2002)

Parameters:
  • X – T observations/samples in rows, N variables/features in columns

  • kmax – Maximum number of factors. If 0, set to rank from SVD minus 1

  • p – If 0, number of factors is fixed as kmax. Else picks one of three information criterion methods in Bai & Ng (2002) to auto-select

Returns:

DataFrame with missing values imputed with factor EM algorithm

finds.recipes.econs.fillna_em(X: ndarray, add_intercept: bool = True, tol: float = 1e-12, maxiter: int = 200, verbose: int = 1) Tuple[ndarray, DataFrame][source]

Fill missing data with EM Normal distribution

finds.recipes.econs.fstats(x: Series | ndarray, tail: float = 0.15) ndarray[source]

Helper to compute F-stats at all candidate break points

Parameters:
  • x – Input Series

  • tail – Tail fractions to skip computations

Returns:

Array of f-stats at each candidate break-point

finds.recipes.econs.integration_order(df: Series, verbose: bool = True, max_order: int = 5, pvalue: float = 0.05, lags: str | int = 'AIC') int[source]

Returns order of integration by iteratively testing for unit root

Parameters:
  • df – Input Series

  • verbose – Whether to display results

  • max_order – maximum number of orders to test

  • pvalue – Required p-value to reject Dickey-Fuller unit root

  • lags – Method automatically determine lag length, or maxlag; in {“AIC”, “BIC”, “t-stat”}, int (maxlag), 0 (12*(nobs/100)^{1/4})

Returns:

Integration order, or -1 if max_order exceeded

finds.recipes.econs.least_squares(data: DataFrame, y: List[str] = ['y'], x: List[str] = ['x'], add_constant: bool = True, stdres: bool = False) Series | DataFrame[source]

To compute least square coefficients, supports groupby().apply

Parameters:
  • data – DataFrame with x and y series in columns

  • x – List of x columns

  • y – List of y columns

  • add_constant – Whether to add intercept as first column

  • stdres – Whether to output residual stdev

Returns:

Series (if only single y input) of regression coefficients, or DataFrame (multiple y) with coeffs, and optionally stdres, in columns

finds.recipes.econs.mrsq(X: DataFrame, kmax: int) DataFrame[source]

Return marginal R2 of each variable from incrementally adding factors

Parameters:
  • X – T observations/samples in rows, N variables/features in columns

  • kmax – maximum number of factors. If 0, set to rank from SVD

Returns:

DataFrame with marginal R2 with component in each column

Notes:

From matlab code, Bai and Ng (2002) and McCracken at

https://research.stlouisfed.org/econ/mccracken/fred-databases/

finds.recipes.econs.select_baing(X: DataFrame, kmax: int = 0, p: int = 2) int[source]

Determine number of factors based on Bai & Ng (2002) info criterion

Parameters:
  • X – T observations/samples in rows, N variables/features in columns

  • p – int in [1, 2, 3] to use PCp1 or PCp2 or PCp3 penalty

  • kmax – Maximum number of factors. If 0, set to rank from SVD

Returns:

best number of factors based on ICp{p} criterion, or 0 if not determined

Notes:

  • Simplified the calculation of residual variance from adding components: is just the eigenvalues, no need to compute projections

  • The IC curve appears to have multiple minimums: the first “local” minimum is selected – may be want prior bound on number of factors.