Approximate Factor Models#

It is better to be vaguely right than precisely wrong - Carveth Read

Approximate factor models provide a simplified yet effective way to represent time series data by identifying common factors that explain variation across observed variables. A critical step in time series analysis is ensuring stationarity, typically assessed using tests like the Augmented Dickey-Fuller (ADF) test. Transformations such as differencing and logarithmic scaling are often applied to remove non-stationary components. Principal Component Analysis (PCA) is commonly used for factor extraction, and the optimal number of components can be selected using criteria like the Bayesian Information Criterion (BIC). To handle missing data, imputation methods such as the Expectation-Maximization (EM) algorithm are used.

# By: Terence Lim, 2020-2025 (terence-lim.github.io)
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from finds.readers import Alfred, fred_md, fred_qd
from finds.recipes import (approximate_factors, mrsq, select_baing, integration_order,
                           remove_outliers, is_outlier)
from secret import credentials
VERBOSE = 0
#%matplotlib qt
alf = Alfred(api_key=credentials['fred']['api_key'], verbose=VERBOSE)
## Retrieve recession periods from FRED
vspans = alf.date_spans('USREC')
DataFrame(vspans, columns=['Start', 'End'])
Start End
0 1854-12-31 1854-12-31
1 1857-06-30 1858-12-31
2 1860-10-31 1861-06-30
3 1865-04-30 1867-12-31
4 1869-06-30 1870-12-31
5 1873-10-31 1879-03-31
6 1882-03-31 1885-05-31
7 1887-03-31 1888-04-30
8 1890-07-31 1891-05-31
9 1893-01-31 1894-06-30
10 1895-12-31 1897-06-30
11 1899-06-30 1900-12-31
12 1902-09-30 1904-08-31
13 1907-05-31 1908-06-30
14 1910-01-31 1912-01-31
15 1913-01-31 1914-12-31
16 1918-08-31 1919-03-31
17 1920-01-31 1921-07-31
18 1923-05-31 1924-07-31
19 1926-10-31 1927-11-30
20 1929-08-31 1933-03-31
21 1937-05-31 1938-06-30
22 1945-02-28 1945-10-31
23 1948-11-30 1949-10-31
24 1953-07-31 1954-05-31
25 1957-08-31 1958-04-30
26 1960-04-30 1961-02-28
27 1969-12-31 1970-11-30
28 1973-11-30 1975-03-31
29 1980-01-31 1980-07-31
30 1981-07-31 1982-11-30
31 1990-07-31 1991-03-31
32 2001-03-31 2001-11-30
33 2007-12-31 2009-06-30
34 2020-02-29 2020-04-30

Integration order#

Augmented Dickey-Fuller test#

The Augmented Dickey-Fuller (ADF) test is one of the most widely used methods for detecting unit roots in a time series. It operates by performing an ordinary least squares (OLS) regression, where the first difference of the series is regressed on its lagged level, along with deterministic components and lagged differences. The general form of the ADF regression is:

\[ \Delta Y_t = \gamma Y_{t-1} + (\delta_0 + \delta_1 t) + \lambda_1 \Delta Y_{t-1} + ... + \lambda_p \Delta Y_{t-p} \]

If \(\gamma = 0 \), then \( Y_t \) follows a random walk and is non-stationary, implying the presence of a unit root. The alternative hypothesis, \( \gamma < 0 \), suggests that \( Y_t \) is covariance-stationary. This is a one-sided test, as positive values of \( \gamma \) would indicate an autoregressive coefficient greater than one, leading to instability.

When the test fails to reject the null hypothesis of a unit root, differencing is required to transform the series into a stationary form. Best practice involves repeatedly applying the ADF test to the differenced data until stationarity is achieved.

qd_df, qd_codes = fred_qd() # 202004
md_df, md_codes = fred_md() # 201505
qd_date = max(qd_df.index)
md_date = max(md_df.index)
FRED-QD vintage: quarterly/current.csv
FRED-MD vintage: monthly/current.csv

Transformations#

The FRED-MD and FRED-QD datasets include transformation codes that indicate appropriate preprocessing steps, such as applying logarithmic transformations or differencing, to ensure stationarity. The number of differences required depends on the presence of unit roots.

print(f"Number of series by suggested tcode transformations ({md_date}):")
tcodes = DataFrame.from_dict({i: alf.tcode[i] for i in range(1, 8)},
                             orient='index')\
                  .join(qd_codes['transform'].value_counts().rename('fred-qd'))\
                  .join(md_codes['transform'].value_counts().rename('fred-md'))\
                  .fillna(0)\
                  .astype(int)\
                  .rename_axis(index='tcode')
tcodes
Number of series by suggested tcode transformations (20250131):
diff log pct_change fred-qd fred-md
tcode
1 0 0 0 22 11
2 1 0 0 32 19
3 2 0 0 0 0
4 0 1 0 0 10
5 1 1 0 140 52
6 2 1 0 49 33
7 1 0 1 1 1
# For each series, compare fitted integration order with tcode
out = {}
stationary_out = {}
for label, df, transforms in [['md', md_df, md_codes['transform']],
                              ['qd', qd_df, qd_codes['transform']]]:
    stationary = dict()
    for series_id, tcode in transforms.items():
        # apply transformation if series tcode is valid
        if tcode in range(1, 8):
            # take logs of series if specified by tcode
            s = np.log(df[series_id]) if tcode in [4, 5, 6] else df[series_id]

            # estimate integration order
            order = integration_order(s.dropna(), pvalue=0.05)

            # expected order specified by tcode
            expected_order = 2 if tcode == 7 else ((tcode - 1) % 3)

            # accumulate results for this series
            stationary[series_id] = {'tcode': tcode,
                                     'I(p)': order,
                                     'different': order - expected_order,
                                     'title': alf.header(series_id)}
#            print(series_id, tcode, expected_order, order)

    # collect results for display
    stationary = DataFrame.from_dict(stationary, orient='index')
    stationary = stationary.sort_values(stationary.columns.to_list())
    c = stationary.groupby(['tcode','I(p)'])['title'].count().reset_index()
    out[label] = c.pivot(index='tcode', columns='I(p)', values='title')\
                  .fillna(0).astype(int)
    out[label].columns=[f"I({p})" for p in out[label].columns]
    stationary_out[label] = stationary[stationary['different'] > 0]
print('Series by tcode, transformations and estimated order of integration:')
results = pd.concat([tcodes.drop(columns='fred-md'),
                     out['qd'],
                     tcodes['fred-md'],
                     out['md']], axis=1).fillna(0).astype(int)
print('Integration order by transformation')
results
Series by tcode, transformations and estimated order of integration:
Integration order by transformation
diff log pct_change fred-qd I(0) I(1) I(2) fred-md I(0) I(1) I(2)
tcode
1 0 0 0 22 17 5 0 11 11 0 0
2 1 0 0 32 11 19 2 19 4 15 0
3 2 0 0 0 0 0 0 0 0 0 0
4 0 1 0 0 0 0 0 10 7 3 0
5 1 1 0 140 30 107 3 52 14 38 0
6 2 1 0 49 0 29 20 33 0 30 3
7 1 0 1 1 0 1 0 1 0 1 0
print('FRED-MD Series with unit root after transformation')
stationary_out['md']
FRED-MD Series with unit root after transformation
tcode I(p) different title
PERMITMW 4 1 1 New Privately-Owned Housing Units Authorized i...
HOUSTMW 4 1 1 New Privately-Owned Housing Units Started: Tot...
HOUSTNE 4 1 1 New Privately-Owned Housing Units Started: Tot...
print('FRED-QD Series with unit root after transformation')
stationary_out['qd']
FRED-QD Series with unit root after transformation
tcode I(p) different title
GS1TB3M 1 1 1 *** GS1TB3M ***
NWPI 1 1 1 *** NWPI ***
TLBSNNBBDI 1 1 1 *** TLBSNNBBDI ***
TLBSNNCBBDI 1 1 1 *** TLBSNNCBBDI ***
HWI 1 1 1 Help Wanted Index for United States
GFDEBTN 2 2 1 Federal Debt: Total Public Debt
S&P div yield 2 2 1 S&P's Composite Common Stock: Dividend Yield
CES2000000008 5 2 1 Average Hourly Earnings of Production and Nons...
TLBSHNO 5 2 1 Households and Nonprofit Organizations; Total ...
OPHMFG 5 2 1 Manufacturing Sector: Labor Productivity (Outp...

Factor selection#

Bai and Ng (2002) provide a systematic approach for extracting and determining the optimal number of factors in an approximate factor model of time series, and imputing missing or outlier observations in the dataset.

Principal component analysis (PCA)#

Principal component analysis finds a low-dimensional representation of a data set that contains as much as possible of the variation of its \(p\) features. Each principal component is a linear combination of the original features, capturing the most significant patterns in the data.

The first principal component is the normalized linear combination of features:

\[ Z_1 = \phi_{11} X_1 + \phi_{21} X_2 + ... + \phi_{p1} X_p \]

which maximizes variance. Subsequent principal components, such as \( Z_2 \) and \( Z_3 \), are determined iteratively, ensuring that each is uncorrelated with the previously computed components while explaining the next highest level of variance.

BIC criterion#

Determining the appropriate number of factors is a crucial step in approximate factor modeling. The Bayesian Information Criterion (BIC) selects the optimal number of components by balancing model fit and complexity. Lower BIC values indicate a better trade-off between variance explanation and model parsimony.

Data imputation#

When dealing with missing data in time series, the Expectation-Maximization (EM) algorithm can be employed to estimate missing values from factors iteratively. After each imputation iteration, factors are recovered as projections from PCA, ensuring that the underlying structure of the data is captured.

# Verify BaiNg implemention on original FRED-MD and FRED-QD
qd_df, qd_codes = fred_qd('assets/FRED-QD_2020m04.csv', url='')
md_df, md_codes = fred_md('assets/FRED-MD_2015m5.csv', url='')
for freq, df, transforms in [['monthly', md_df, md_codes['transform']],
                             ['quarterly', qd_df, qd_codes['transform']]]:    
    # Apply tcode transformations
    transformed = []
    for col in df.columns:
        transformed.append(alf.transform(df[col],
                                         tcode=transforms[col],
                                         freq=freq[0]))
    data = pd.concat(transformed, axis=1).iloc[2:]
    cols = list(data.columns)
    sample = data.index[((np.count_nonzero(np.isnan(data), axis=1)==0)
                         | (data.index <= 20141231))
                        & (data.index >= 19600301)]

    # set missing and outliers to NaN
    x = data.loc[sample]
    x = remove_outliers(x)    # default fence 'iq10' is 10 times IQ 

    # compute factors EM and auto select number of components, r
    Z = approximate_factors(x, p=2, verbose=VERBOSE)
    r = select_baing(Z, p=2)

    # show marginal R2's of series to each component
    mR2 = mrsq(Z, r).to_numpy()
    print(f"FRED-{freq[0].upper()}D {freq} series:")
    print(DataFrame({'selected': r,
                     'variance explained': np.sum(np.mean(mR2[:, :r], axis=0)),
                     'start': min(sample),
                     'end': max(sample),
                     'obs': Z.shape[0],
                     'series': Z.shape[1]},
                    index=[f'factors']))

    for k in range(r):
        args = np.argsort(-mR2[:, k])
        print(f"Factor:{1+k} Variance Explained={np.mean(mR2[:,k]):.4f}")
        print(DataFrame.from_dict({mR2[arg, k].round(4):
                                   {'series': cols[arg],
                                    'description': alf.header(cols[arg])}
                                   for arg in args[:10]},
                                  orient='index'))
FRED-QD vintage: assets/FRED-QD_2020m04.csv
FRED-MD vintage: assets/FRED-MD_2015m5.csv
FRED-MD monthly series:
         selected  variance explained     start       end  obs  series
factors         8            0.485832  19600331  20141231  658     134
Factor:1 Variance Explained=0.1613
           series                                 description
0.7424     USGOOD              All Employees, Goods-Producing
0.7235     PAYEMS                All Employees, Total Nonfarm
0.7002     MANEMP                All Employees, Manufacturing
0.6565       NAPM                                *** NAPM ***
0.6540  IPMANSICS  Industrial Production: Manufacturing (SIC)
0.6513    DMANEMP                All Employees, Durable Goods
0.6314     INDPRO          Industrial Production: Total Index
0.6037    NAPMNOI                             *** NAPMNOI ***
0.6026     NAPMPI                              *** NAPMPI ***
0.5601     CUMFNS   Capacity Utilization: Manufacturing (SIC)
Factor:2 Variance Explained=0.0703
          series                                        description
0.6259   T10YFFM  10-Year Treasury Constant Maturity Minus Feder...
0.6164    AAAFFM  Moody's Seasoned Aaa Corporate Bond Minus Fede...
0.5856    BAAFFM  Moody's Seasoned Baa Corporate Bond Minus Fede...
0.5832    T5YFFM  5-Year Treasury Constant Maturity Minus Federa...
0.4785  TB6SMFFM     6-Month Treasury Bill Minus Federal Funds Rate
0.4779  TB3SMFFM     3-Month Treasury Bill Minus Federal Funds Rate
0.4322    T1YFFM  1-Year Treasury Constant Maturity Minus Federa...
0.2367  COMPAPFF            3-Month Commercial Paper Minus FEDFUNDS
0.2258    BUSINV                         Total Business Inventories
0.1896   NAPMPRI                                    *** NAPMPRI ***
Factor:3 Variance Explained=0.0652
                 series                                        description
0.7192      CUSR0000SAC  Consumer Price Index for All Urban Consumers: ...
0.7066  DNDGRG3M086SBEA  Personal consumption expenditures: Nondurable ...
0.6700         CPIAUCSL  Consumer Price Index for All Urban Consumers: ...
0.6392    CUSR0000SA0L5  Consumer Price Index for All Urban Consumers: ...
0.6115    CUUR0000SA0L2  Consumer Price Index for All Urban Consumers: ...
0.5923            PCEPI  Personal Consumption Expenditures: Chain-type ...
0.5907         CPITRNSL  Consumer Price Index for All Urban Consumers: ...
0.5482         CPIULFSL  Consumer Price Index for All Urban Consumers: ...
0.4863           PPIFCG  Producer Price Index by Commodity for Finished...
0.4779           PPIITM  Producer Price Index by Commodity Intermediate...
Factor:4 Variance Explained=0.0547
        series                                        description
0.4294     GS1  Market Yield on U.S. Treasury Securities at 1-...
0.4228     GS5  Market Yield on U.S. Treasury Securities at 5-...
0.4123     AAA          Moody's Seasoned Aaa Corporate Bond Yield
0.3995   TB6MS  6-Month Treasury Bill Secondary Market Rate, D...
0.3893    GS10  Market Yield on U.S. Treasury Securities at 10...
0.3621     BAA          Moody's Seasoned Baa Corporate Bond Yield
0.3179   TB3MS  3-Month Treasury Bill Secondary Market Rate, D...
0.3139    CP3M        3-Month AA Financial Commercial Paper Rates
0.2681   HOUST  New Privately-Owned Housing Units Started: Tot...
0.2562  HOUSTW  New Privately-Owned Housing Units Started: Tot...
Factor:5 Variance Explained=0.0425
          series                                        description
0.2481   PERMITW  New Privately-Owned Housing Units Authorized i...
0.2374    PERMIT  New Privately-Owned Housing Units Authorized i...
0.2316    HOUSTW  New Privately-Owned Housing Units Started: Tot...
0.2225       GS5  Market Yield on U.S. Treasury Securities at 5-...
0.2221       GS1  Market Yield on U.S. Treasury Securities at 1-...
0.2146     HOUST  New Privately-Owned Housing Units Started: Tot...
0.2028      GS10  Market Yield on U.S. Treasury Securities at 10...
0.1951  PERMITMW  New Privately-Owned Housing Units Authorized i...
0.1949    T1YFFM  1-Year Treasury Constant Maturity Minus Federa...
0.1932     TB6MS  6-Month Treasury Bill Secondary Market Rate, D...
Factor:6 Variance Explained=0.0365
          series                                        description
0.2186   IPCONGD              Industrial Production: Consumer Goods
0.1793   ISRATIO         Total Business: Inventories to Sales Ratio
0.1736    NAPMEI                                     *** NAPMEI ***
0.1628  IPDCONGD  Industrial Production: Durable Consumer Goods:...
0.1577   IPFINAL              Industrial Production: Final Products
0.1545  TB6SMFFM     6-Month Treasury Bill Minus Federal Funds Rate
0.1425      NAPM                                       *** NAPM ***
0.1416    NAPMII                                     *** NAPMII ***
0.1414    ACOGNO          Manufacturers' New Orders: Consumer Goods
0.1373   IPFPNSS  Industrial Production: Final Products and Noni...
Factor:7 Variance Explained=0.0292
               series                                        description
0.5165        S&P 500          S&P's Common Stock Price Index: Composite
0.5159    S&P: indust        S&P's Common Stock Price Index: Industrials
0.4002  S&P div yield       S&P's Composite Common Stock: Dividend Yield
0.2764   S&P PE ratio  S&P's Composite Common Stock: Price-Earnings R...
0.2564        UMCSENT         University of Michigan: Consumer Sentiment
0.1030        IPCONGD              Industrial Production: Consumer Goods
0.1019         EXCAUS  Canadian Dollars to U.S. Dollar Spot Exchange ...
0.0728        IPFINAL              Industrial Production: Final Products
0.0644       IPDCONGD  Industrial Production: Durable Consumer Goods:...
0.0565        IPFPNSS  Industrial Production: Final Products and Noni...
Factor:8 Variance Explained=0.0262
               series                                        description
0.5375       TWEXMMTH  Nominal Major Currencies U.S. Dollar Index (Go...
0.2309         EXSZUS     Swiss Francs to U.S. Dollar Spot Exchange Rate
0.2111         EXUSUK  U.S. Dollars to U.K. Pound Sterling Spot Excha...
0.1497         EXJPUS     Japanese Yen to U.S. Dollar Spot Exchange Rate
0.1332         SRVPRD                   All Employees, Service-Providing
0.1199  CES0600000008  Average Hourly Earnings of Production and Nons...
0.1156         ACOGNO          Manufacturers' New Orders: Consumer Goods
0.1128  CES3000000008  Average Hourly Earnings of Production and Nons...
0.1104         USGOVT                          All Employees, Government
0.1010        USTRADE                        All Employees, Retail Trade
FRED-QD quarterly series:
         selected  variance explained     start       end  obs  series
factors         7              0.4981  19600331  20191231  240     248
Factor:1 Variance Explained=0.2014
           series                                        description
0.8382     USPRIV                       All Employees, Total Private
0.8184     USGOOD                     All Employees, Goods-Producing
0.8165      OUTMS  Manufacturing Sector: Real Sectoral Output for...
0.8124  IPMANSICS         Industrial Production: Manufacturing (SIC)
0.8120     PAYEMS                       All Employees, Total Nonfarm
0.8057     INDPRO                 Industrial Production: Total Index
0.7758     MANEMP                       All Employees, Manufacturing
0.7717     HOANBS  Nonfarm Business Sector: Hours Worked for All ...
0.7667    DMANEMP                       All Employees, Durable Goods
0.7651     UNRATE                                  Unemployment Rate
Factor:2 Variance Explained=0.0824
               series                                        description
0.4995         AAAFFM  Moody's Seasoned Aaa Corporate Bond Minus Fede...
0.4761         T5YFFM  5-Year Treasury Constant Maturity Minus Federa...
0.4622         PERMIT  New Privately-Owned Housing Units Authorized i...
0.4404         BUSINV                         Total Business Inventories
0.4231          HOUST  New Privately-Owned Housing Units Started: Tot...
0.4055        PERMITS  New Privately-Owned Housing Units Authorized i...
0.3925  S&P div yield       S&P's Composite Common Stock: Dividend Yield
0.3849            TCU                  Capacity Utilization: Total Index
0.3674      CPF3MTB3M  3-Month Commercial Paper Minus 3-Month Treasur...
0.3633       GS10TB3M                                   *** GS10TB3M ***
Factor:3 Variance Explained=0.0727
                 series                                        description
0.7569    CUSR0000SA0L2  Consumer Price Index for All Urban Consumers: ...
0.7405      CUSR0000SAC  Consumer Price Index for All Urban Consumers: ...
0.7368  DGDSRG3Q086SBEA  Personal consumption expenditures: Goods (chai...
0.7212          PCECTPI  Personal Consumption Expenditures: Chain-type ...
0.7065         CPITRNSL  Consumer Price Index for All Urban Consumers: ...
0.6963  DNDGRG3Q086SBEA  Personal consumption expenditures: Nondurable ...
0.6798    CUSR0000SA0L5  Consumer Price Index for All Urban Consumers: ...
0.6712         CPIAUCSL  Consumer Price Index for All Urban Consumers: ...
0.6472          WPSID61  Producer Price Index by Commodity: Intermediat...
0.6352         CPIULFSL  Consumer Price Index for All Urban Consumers: ...
Factor:4 Variance Explained=0.0467
               series                                        description
0.4044          IMFSL    Institutional Money Market Funds (DISCONTINUED)
0.3457  CES9093000001                    All Employees, Local Government
0.3259  CES9092000001                    All Employees, State Government
0.2484         EXUSEU            U.S. Dollars to Euro Spot Exchange Rate
0.2482         USGOVT                          All Employees, Government
0.2364        GFDEBTN                    Federal Debt: Total Public Debt
0.2251        REVOLSL    Revolving Consumer Credit Owned and Securitized
0.2202         USSERV                      All Employees, Other Services
0.2128        COMPRMS  Manufacturing Sector: Real Hourly Compensation...
0.2126           NWPI                                       *** NWPI ***
Factor:5 Variance Explained=0.0375
             series                                        description
0.3664       OPHMFG  Manufacturing Sector: Labor Productivity (Outp...
0.3093          HWI                Help Wanted Index for United States
0.2989         NWPI                                       *** NWPI ***
0.2961       AWHMAN  Average Weekly Hours of Production and Nonsupe...
0.2703       OPHPBS  Business Sector: Labor Productivity (Output pe...
0.2358       OPHNFB  Nonfarm Business Sector: Labor Productivity (O...
0.2307     UNRATELT                                   *** UNRATELT ***
0.2189      UNLPNBS  Nonfarm Business Sector: Unit Nonlabor Payment...
0.2131       ULCMFG  Manufacturing Sector: Unit Labor Costs for All...
0.1989  TLBSNNCBBDI                                *** TLBSNNCBBDI ***
Factor:6 Variance Explained=0.0303
                 series                                        description
0.2669           CONSPI    Nonrevolving consumer credit to Personal Income
0.2388            ULCBS  Business Sector: Unit Labor Costs for All Workers
0.2379           ULCNFB  Nonfarm Business Sector: Unit Labor Costs for ...
0.2173         CONSUMER               Consumer Loans, All Commercial Banks
0.2016           EXUSEU            U.S. Dollars to Euro Spot Exchange Rate
0.1979           AHETPI  Average Hourly Earnings of Production and Nons...
0.1865         NONREVSL  Nonrevolving Consumer Credit Owned and Securit...
0.1613          TOTALSL        Total Consumer Credit Owned and Securitized
0.1453  B020RE1Q156NBEA  Shares of gross domestic product: Exports of g...
0.1361          UMCSENT         University of Michigan: Consumer Sentiment
Factor:7 Variance Explained=0.0270
             series                                        description
0.2663   USEPUINDXM  Economic Policy Uncertainty Index for United S...
0.1954     TNWBSHNO  Households and Nonprofit Organizations; Net Wo...
0.1889      TABSHNO  Households and Nonprofit Organizations; Total ...
0.1872      S&P 500          S&P's Common Stock Price Index: Composite
0.1856  S&P: indust        S&P's Common Stock Price Index: Industrials
0.1808    TFAABSHNO  Households and Nonprofit Organizations; Total ...
0.1756    NASDAQCOM                             NASDAQ Composite Index
0.1623     GS10TB3M                                   *** GS10TB3M ***
0.1394       OPHPBS  Business Sector: Labor Productivity (Output pe...
# Compute approximate factors from current FRED-MD
data, t = fred_md()    # fetch dataframe of current fred-md and transform codes
data.index = pd.to_datetime(data.index, format='%Y%m%d')
transforms = t['transform']
data = pd.concat([alf.transform(data[col], tcode=transforms[col], freq='m')
                  for col in data.columns],  axis=1)

# remove outliers and impute using Bai-Ng approach
r = 8   # fix number of factors = 8
X = remove_outliers(data, method='farout')
X = approximate_factors(X, kmax=r, p=0, verbose=VERBOSE)
FRED-MD vintage: monthly/current.csv

Recover factors as the projections from PCA

# Extract factors from PCA projections on imputed data
y = StandardScaler().fit_transform(X)
pca = PCA(n_components=r).fit(y)
factors = DataFrame(pca.transform(y), index=data.index, columns=range(1, 1+r))
# Plot extracted factors
fig, axes = plt.subplots(nrows=4, ncols=2, figsize=(10, 12), sharex=True, sharey=True)
axes = axes.flatten()
for col, ax in zip(factors.columns, axes):
    flip = -np.sign(max(factors[col]) + min(factors[col])) # try match sign
    (flip*factors[col]).plot(ax=ax, color=f"C{col}")
    for a,b in vspans:
        if b >= min(factors.index):
            ax.axvspan(max(a, min(factors.index)),
                       min(b, max(factors.index)),
                       alpha=0.4,
                       color='grey')
    ax.spines["top"].set_visible(False)
    ax.spines["right"].set_visible(False)
    ax.legend([f"Factor {col} Estimate", 'NBER Recession'])
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.suptitle(f"Factor Estimates {factors.index[0]:%b-%Y}:"
             f"{factors.index[-1]:%b-%Y}", fontsize='x-large')
plt.show()
_images/ae6a308fbcc54fba70b4e184d03ca2f1bfe3f7a665ff1cd70b2376df62f63691.png

References:

Bai, Jushan and Ng, Serena, 2002, Determining the Number of Factors in Approximate Factor Models, Econometrica 70:1, 191-2211

McCracken, Michael W., and Serena Ng, 2016. FRED-MD: A monthly database for macroeconomic research, Journal of Business & Economic Statistics, 34(4), 574-589.

Michael W. McCracken, Serena Ng, 2020, FRED-QD: A Quarterly Database for Macroeconomic Research.