finds.recipes.econs
Economics tools
Bai and Ng (2002), McCracken and Ng (2015, 2020) factors-EM algorithm
Copyright 2022, Terence Lim
MIT License
- finds.recipes.econs.approximate_factors(X: DataFrame, kmax: int = 0, p: int = 2, max_iter: int = 50, tol: float = 1e-12, verbose: int = 1) DataFrame [source]
Fill in missing values with factor model EM algorithm Bai and Ng (2002)
- Parameters:
X – T observations/samples in rows, N variables/features in columns
kmax – Maximum number of factors. If 0, set to rank from SVD minus 1
p – If 0, number of factors is fixed as kmax. Else picks one of three information criterion methods in Bai & Ng (2002) to auto-select
- Returns:
DataFrame with missing values imputed with factor EM algorithm
- finds.recipes.econs.fillna_em(X: ndarray, add_intercept: bool = True, tol: float = 1e-12, maxiter: int = 200, verbose: int = 1) Tuple[ndarray, DataFrame] [source]
Fill missing data with EM Normal distribution
- finds.recipes.econs.fstats(x: Series | ndarray, tail: float = 0.15) ndarray [source]
Helper to compute F-stats at all candidate break points
- Parameters:
x – Input Series
tail – Tail fractions to skip computations
- Returns:
Array of f-stats at each candidate break-point
- finds.recipes.econs.integration_order(df: Series, verbose: bool = True, max_order: int = 5, pvalue: float = 0.05, lags: str | int = 'AIC') int [source]
Returns order of integration by iteratively testing for unit root
- Parameters:
df – Input Series
verbose – Whether to display results
max_order – maximum number of orders to test
pvalue – Required p-value to reject Dickey-Fuller unit root
lags – Method automatically determine lag length, or maxlag; in {“AIC”, “BIC”, “t-stat”}, int (maxlag), 0 (12*(nobs/100)^{1/4})
- Returns:
Integration order, or -1 if max_order exceeded
- finds.recipes.econs.least_squares(data: DataFrame, y: List[str] = ['y'], x: List[str] = ['x'], add_constant: bool = True, stdres: bool = False) Series | DataFrame [source]
To compute least square coefficients, supports groupby().apply
- Parameters:
data – DataFrame with x and y series in columns
x – List of x columns
y – List of y columns
add_constant – Whether to add intercept as first column
stdres – Whether to output residual stdev
- Returns:
Series (if only single y input) of regression coefficients, or DataFrame (multiple y) with coeffs, and optionally stdres, in columns
- finds.recipes.econs.mrsq(X: DataFrame, kmax: int) DataFrame [source]
Return marginal R2 of each variable from incrementally adding factors
- Parameters:
X – T observations/samples in rows, N variables/features in columns
kmax – maximum number of factors. If 0, set to rank from SVD
- Returns:
DataFrame with marginal R2 with component in each column
Notes:
- From matlab code, Bai and Ng (2002) and McCracken at
https://research.stlouisfed.org/econ/mccracken/fred-databases/
- finds.recipes.econs.select_baing(X: DataFrame, kmax: int = 0, p: int = 2) int [source]
Determine number of factors based on Bai & Ng (2002) info criterion
- Parameters:
X – T observations/samples in rows, N variables/features in columns
p – int in [1, 2, 3] to use PCp1 or PCp2 or PCp3 penalty
kmax – Maximum number of factors. If 0, set to rank from SVD
- Returns:
best number of factors based on ICp{p} criterion, or 0 if not determined
Notes:
Simplified the calculation of residual variance from adding components: is just the eigenvalues, no need to compute projections
The IC curve appears to have multiple minimums: the first “local” minimum is selected – may be want prior bound on number of factors.