finds.recipes.econs

Economics tools

Bai and Ng (2002), McCracken and Ng (2015, 2020) factors-EM algorithm

MIT License

finds.recipes.econs.approximate_factors(X: DataFrame, kmax: int = 0, p: int = 2, max_iter: int = 50, tol: float = 1e-12, verbose: int = 1) → DataFrame[source]

Fill in missing values with factor model EM algorithm Bai and Ng (2002)

Parameters:

X – T observations/samples in rows, N variables/features in columns
kmax – Maximum number of factors. If 0, set to rank from SVD minus 1
p – If 0, number of factors is fixed as kmax. Else picks one of three information criterion methods in Bai & Ng (2002) to auto-select

Returns:

DataFrame with missing values imputed with factor EM algorithm

finds.recipes.econs.fillna_em(X: ndarray, add_intercept: bool = True, tol: float = 1e-12, maxiter: int = 200, verbose: int = 1) → Tuple[ndarray, DataFrame][source]: Fill missing data with EM Normal distribution

finds.recipes.econs.fstats(x: Series | ndarray, tail: float = 0.15) → ndarray[source]

Helper to compute F-stats at all candidate break points

Parameters:

x – Input Series
tail – Tail fractions to skip computations

Returns:

Array of f-stats at each candidate break-point

finds.recipes.econs.integration_order(df: Series, verbose: bool = True, max_order: int = 5, pvalue: float = 0.05, lags: str | int = 'AIC') → int[source]

Returns order of integration by iteratively testing for unit root

Parameters:

df – Input Series
verbose – Whether to display results
max_order – maximum number of orders to test
pvalue – Required p-value to reject Dickey-Fuller unit root
lags – Method automatically determine lag length, or maxlag; in {“AIC”, “BIC”, “t-stat”}, int (maxlag), 0 (12*(nobs/100)^{1/4})

Returns:

Integration order, or -1 if max_order exceeded

finds.recipes.econs.least_squares(data: DataFrame, y: List[str] = ['y'], x: List[str] = ['x'], add_constant: bool = True, stdres: bool = False) → Series | DataFrame[source]

To compute least square coefficients, supports groupby().apply

Parameters:

data – DataFrame with x and y series in columns
x – List of x columns
y – List of y columns
add_constant – Whether to add intercept as first column
stdres – Whether to output residual stdev

Returns:

Series (if only single y input) of regression coefficients, or DataFrame (multiple y) with coeffs, and optionally stdres, in columns

finds.recipes.econs.mrsq(X: DataFrame, kmax: int) → DataFrame[source]

Return marginal R2 of each variable from incrementally adding factors

Parameters:

X – T observations/samples in rows, N variables/features in columns
kmax – maximum number of factors. If 0, set to rank from SVD

Returns:

DataFrame with marginal R2 with component in each column

Notes:

From matlab code, Bai and Ng (2002) and McCracken at: https://research.stlouisfed.org/econ/mccracken/fred-databases/

finds.recipes.econs.select_baing(X: DataFrame, kmax: int = 0, p: int = 2) → int[source]

Determine number of factors based on Bai & Ng (2002) info criterion

Parameters:

X – T observations/samples in rows, N variables/features in columns
p – int in [1, 2, 3] to use PCp1 or PCp2 or PCp3 penalty
kmax – Maximum number of factors. If 0, set to rank from SVD

Returns:

best number of factors based on ICp{p} criterion, or 0 if not determined

Notes:

Simplified the calculation of residual variance from adding components: is just the eigenvalues, no need to compute projections
The IC curve appears to have multiple minimums: the first “local” minimum is selected – may be want prior bound on number of factors.