Economic Indicators#

What we learn from history is that people don’t learn from history - Warren Buffett

Economic data is fundamental to financial analysis, policymaking, and investment strategies. However, many economic indicators are subject to revisions, meaning initial estimates may change over time as more accurate data becomes available. Understanding these revisions is crucial for interpreting past economic conditions, refining forecasting models, and making informed decisions. We explore retrieving data from online sources such as the Federal Reserve Economic Data (FRED), its archival counterpart (ALFRED), and key derived datasets such as FRED-MD and FRED-QD. Additionally, we examine the impact of data revisions on critical economic indicators like Total Nonfarm Payrolls (PAYEMS), and methods for detecting outliers in historical data.

# By: Terence Lim, 2020-2025 (terence-lim.github.io)
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import matplotlib.pyplot as plt
import textwrap
from finds.readers import Alfred, fred_md, fred_qd
from finds.utils import plot_date, plot_groupbar
from finds.recipes import is_outlier
from datetime import datetime
from pprint import pprint
from secret import credentials
VERBOSE = 0
# %matplotlib qt

FRED#

Federal Reserve Economic Data (FRED) is a widely used online database maintained by the Federal Reserve Bank of St. Louis, providing access to hundreds of thousands of economic data series from national and international sources. Users can retrieve data via the website, an Excel add-in, or API calls.

Retrieving data from websites#

Economic data can be retrieved from the web through several methods:

  1. Downloading structured files – Many websites provide data in formats like CSV, Excel, or JSON, making it easy to import into analytical tools.

  2. Web scraping – Extracting information directly from web pages by identifying specific HTML tags or text patterns.

  3. Using APIs – Some platforms, including FRED, offer APIs that allow developers to automate data retrieval via structured queries.

Download structured files#

Many economic data providers allow users to download pre-structured files containing historical and current data. These files often include metadata, timestamps, and adjustment information.

# This URL is the location of the FRED-MD csv file from the St Louis FRED
url = 'https://www.stlouisfed.org/-/media/project/frbstl/stlouisfed/research/fred-md/monthly/current.csv'
# Pandas has several built-in readers for csv, xml, json, excel and even html files          
df = pd.read_csv(url, header=0)
df
sasdate RPI W875RX1 DPCERA3M086SBEA CMRMTSPLx RETAILx INDPRO IPFPNSS IPFINAL IPCONGD ... DNDGRG3M086SBEA DSERRG3M086SBEA CES0600000008 CES2000000008 CES3000000008 UMCSENTx DTCOLNVHFNM DTCTHFNM INVEST VIXCLSx
0 Transform: 5.000 5.0 5.000 5.000000e+00 5.00000 5.0000 5.0000 5.0000 5.0000 ... 6.000 6.000 6.00 6.00 6.00 2.0 6.00 6.00 6.0000 1.0000
1 1/1/1959 2583.560 2426.0 15.188 2.766768e+05 18235.77392 21.9616 23.3868 22.2620 31.6664 ... 18.294 10.152 2.13 2.45 2.04 NaN 6476.00 12298.00 84.2043 NaN
2 2/1/1959 2593.596 2434.8 15.346 2.787140e+05 18369.56308 22.3917 23.7024 22.4549 31.8987 ... 18.302 10.167 2.14 2.46 2.05 NaN 6476.00 12298.00 83.5280 NaN
3 3/1/1959 2610.396 2452.7 15.491 2.777753e+05 18523.05762 22.7142 23.8459 22.5651 31.8987 ... 18.289 10.185 2.15 2.45 2.07 NaN 6508.00 12349.00 81.6405 NaN
4 4/1/1959 2627.446 2470.0 15.435 2.833627e+05 18534.46600 23.1981 24.1903 22.8957 32.4019 ... 18.300 10.221 2.16 2.47 2.08 NaN 6620.00 12484.00 81.8099 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
788 8/1/2024 20007.209 16322.1 121.052 1.530317e+06 710038.00000 103.0135 100.9825 100.9803 102.2118 ... 119.653 128.291 31.26 35.81 27.97 67.9 551667.22 933066.90 5327.6461 19.6750
789 9/1/2024 20044.142 16333.7 121.690 1.541305e+06 716388.00000 102.5969 100.3826 100.0630 101.9696 ... 119.220 128.682 31.44 36.00 28.11 70.1 553347.06 934283.59 5368.5818 17.6597
790 10/1/2024 20128.752 16397.9 121.948 1.539382e+06 720393.00000 102.0854 99.5434 98.9267 101.3127 ... 119.064 129.169 31.55 36.22 28.14 70.5 554377.25 937299.96 5407.3304 19.9478
791 11/1/2024 20161.687 16432.8 122.519 1.544190e+06 725925.00000 102.2549 99.8216 99.4970 101.7893 ... 119.112 129.375 31.61 36.21 28.29 71.8 555000.61 938899.31 5382.4019 15.9822
792 12/1/2024 20184.060 16457.8 123.013 NaN 729191.00000 103.1942 100.5351 100.1302 102.2582 ... 119.689 129.760 31.73 36.44 28.34 74.0 NaN NaN 5370.6184 15.6997

793 rows × 127 columns

Web scraping#

Web scraping involves extracting data from unstructured web pages by identifying patterns in the HTML structure. This method is useful when structured data files are unavailable, but it requires compliance with website policies.

# URL that displays the most popular series in the FRED economic data web site
url = f"https://fred.stlouisfed.org/tags/series?ob=pv&pageID=1"
# use requests package to retrieve the web page
import requests
data = requests.get(url)
data    # a response code of 200 indicates the request has succeeded
<Response [200]>
# the content is just a byte-string that you can parse with Python string (or other) methods
data.content[:200]
b'<!DOCTYPE html>\n<html lang="en">\n<head>\n    <meta http-equiv="X-UA-Compatible" content="IE=edge">\n    <meta charset="utf-8">\n                <title>Economic Data Series by Tag | FRED | St. Louis Fed</'
# use the BeautifulSoup package to parse html formats
from bs4 import BeautifulSoup
soup = BeautifulSoup(data.content, 'lxml')

# based on this snippet, we want to extract the href property of the series-title class tag
print(soup.decode()[39000:40000])
s="series-title pager-series-title-gtm" href="/series/T10Y2Y" id="titleLink" style="font-size:1.2em; padding-bottom: 2px">10-Year Treasury Constant Maturity Minus 2-Year Treasury Constant Maturity</a></h3>
</div>
<div class="display-results-popularity-bar d-none d-sm-block col-sm-2">
<span aria-label="popularity 100% popular" class="popularity-bar-span-parent" data-target="popularity-bar-span-T10Y2Y" tabindex="0" title="100% popular">
<span aria-hidden="true" class="popularity_bar" style="padding-top: 3px; padding-left:60px;"> </span> <span aria-hidden="true" class="popularity_bar_background" id="popularity-bar-span-T10Y2Y"> </span></span>
</div>
</td>
</tr>
<tr class="series-pager-attr">
<td colspan="2">
<div class="series-meta series-group-meta">
<span class="attributes">Percent, Not Seasonally Adjusted</span>
<br class="clear"/>
</div>
<div class="series-meta">
<input aria-labelledby="unitLinkT10Y2Y" class="pager-item-checkbox pager-check-series-gtm" name="sids[0]" type="checkbox" v
# identify all the tags whose class starts with 'series-title'
tags = soup.findAll(name='a', attrs={'class': 'series-title'})
tags[0]   # show first tag found
<a class="series-title pager-series-title-gtm" href="/series/T10Y2Y" id="titleLink" style="font-size:1.2em; padding-bottom: 2px">10-Year Treasury Constant Maturity Minus 2-Year Treasury Constant Maturity</a>
# extract desired substring (which is a data series mnemonic) from the href property
details = [tag.get('href').split('/')[-1] for tag in tags]  # only want substring after last '/'
details[0]  # show first mnemonic string found
'T10Y2Y'

Using APIs#

APIs (Application Programming Interfaces) enable direct communication with data servers, allowing for real-time data retrieval. Many economic research institutions, including the St Louis Fed, offer APIs to access macroeconomic data programmatically.

# an API call is simply a URL string containing your parameters for the request
url = "{root}?series_id={series_id}&file_type={file_type}&api_key={api_key}".format(
    root="https://api.stlouisfed.org/fred/series", # base url of the API call
    series_id=details[0],                          # mnemonic of the data series to retrieve
    file_type='json',                              # request data be returned in json format
    api_key=credentials['fred']['api_key'])        # private api key (obtain from FRED for free)
# make the API call to retrieve the data
data = requests.get(url)
data.content
b'{"realtime_start":"2025-02-28","realtime_end":"2025-02-28","seriess":[{"id":"T10Y2Y","realtime_start":"2025-02-28","realtime_end":"2025-02-28","title":"10-Year Treasury Constant Maturity Minus 2-Year Treasury Constant Maturity","observation_start":"1976-06-01","observation_end":"2025-02-28","frequency":"Daily","frequency_short":"D","units":"Percent","units_short":"%","seasonal_adjustment":"Not Seasonally Adjusted","seasonal_adjustment_short":"NSA","last_updated":"2025-02-28 16:02:07-06","popularity":100,"notes":"Starting with the update on June 21, 2019, the Treasury bond data used in calculating interest rate spreads is obtained directly from the U.S. Treasury Department (https:\\/\\/www.treasury.gov\\/resource-center\\/data-chart-center\\/interest-rates\\/Pages\\/TextView.aspx?data=yield).\\r\\nSeries is calculated as the spread between 10-Year Treasury Constant Maturity (BC_10YEAR) and 2-Year Treasury Constant Maturity (BC_2YEAR). Both underlying series are published at the U.S. Treasury Department (https:\\/\\/www.treasury.gov\\/resource-center\\/data-chart-center\\/interest-rates\\/Pages\\/TextView.aspx?data=yield)."}]}'
# use the json package to convert byte-string data content
import json
v = json.loads(data.content)
v
{'realtime_start': '2025-02-28',
 'realtime_end': '2025-02-28',
 'seriess': [{'id': 'T10Y2Y',
   'realtime_start': '2025-02-28',
   'realtime_end': '2025-02-28',
   'title': '10-Year Treasury Constant Maturity Minus 2-Year Treasury Constant Maturity',
   'observation_start': '1976-06-01',
   'observation_end': '2025-02-28',
   'frequency': 'Daily',
   'frequency_short': 'D',
   'units': 'Percent',
   'units_short': '%',
   'seasonal_adjustment': 'Not Seasonally Adjusted',
   'seasonal_adjustment_short': 'NSA',
   'last_updated': '2025-02-28 16:02:07-06',
   'popularity': 100,
   'notes': 'Starting with the update on June 21, 2019, the Treasury bond data used in calculating interest rate spreads is obtained directly from the U.S. Treasury Department (https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yield).\r\nSeries is calculated as the spread between 10-Year Treasury Constant Maturity (BC_10YEAR) and 2-Year Treasury Constant Maturity (BC_2YEAR). Both underlying series are published at the U.S. Treasury Department (https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yield).'}]}
# Pandas can create a DataFrame directly from a dict data structure
df = DataFrame(v['seriess'])
df
id realtime_start realtime_end title observation_start observation_end frequency frequency_short units units_short seasonal_adjustment seasonal_adjustment_short last_updated popularity notes
0 T10Y2Y 2025-02-28 2025-02-28 10-Year Treasury Constant Maturity Minus 2-Yea... 1976-06-01 2025-02-28 Daily D Percent % Not Seasonally Adjusted NSA 2025-02-28 16:02:07-06 100 Starting with the update on June 21, 2019, the...

ALFRED (Archival FRED)#

ALFRED extends FRED’s functionality by preserving historical versions of economic data. This allows researchers to track how data revisions impact economic narratives over time.

today = int(datetime.today().strftime('%Y%m%d'))
alf = Alfred(api_key=credentials['fred']['api_key'], verbose=VERBOSE)

FRED series categories#

One of the most closely watched FRED series is Total Nonfarm Payroll Employment (PAYEMS), a key labor market indicator. This series belongs to broader employment-related categories.

# Retrieve grandparent,  parent and siblings of series                               
series_id, freq = 'PAYEMS', 'M'
category = alf.categories(series_id).iloc[0]
grand_category = alf.get_category(category['parent_id'])
parent_category = alf.get_category(category['id'])
category.to_frame().T
id name parent_id
PAYEMS 32305 Total Nonfarm 11
print(f"Super category {grand_category['id']}: {grand_category['name']}")
if 'notes' in grand_category:
    print(textwrap.fill(grand_category['notes']))
Super category 11: Current Employment Statistics (Establishment Survey)
The establishment survey provides data on employment, hours, and
earnings by industry.  Numerous conceptual and methodological
differences between the current population (household) and
establishment surveys result in important distinctions in the
employment estimates derived from the surveys. Among these are:   The
household survey includes agricultural workers, the self- employed,
unpaid family workers, and private household workers among the
employed. These groups are excluded from the establishment survey.
The household survey includes people on unpaid leave among the
employed. The establishment survey does not.   The household survey is
limited to workers 16 years of age and older. The establishment survey
is not limited by age.   The household survey has no duplication of
individuals, because individuals are counted only once, even if they
hold more than one job. In the establishment survey, employees working
at more than one job and thus appearing on more than one payroll are
counted separately for each appearance.   For more information, visit
http://www.bls.gov/news.release/empsit.tn.htm.
print("Parent categories:")
for child in grand_category['children']:
    node = alf.get_category(child['id'])
    if node:
        print(f" {node['id']}: {node['name']} "
              f" (children={len(node['children'])}, series={len(node['series'])})")
Parent categories:
 32305: Total Nonfarm  (children=0, series=5)
 32306: Total Private  (children=0, series=27)
 32307: Goods-Producing  (children=0, series=27)
 32326: Service-Providing  (children=0, series=1)
 32308: Private Service-Providing  (children=0, series=27)
 32309: Mining and Logging  (children=0, series=39)
 32310: Construction  (children=0, series=41)
 32311: Manufacturing  (children=0, series=31)
 32312: Durable Goods  (children=0, series=63)
 32313: Nondurable Goods  (children=0, series=55)
 32314: Trade, Transportation, and Utilities  (children=0, series=27)
 32315: Wholesale Trade  (children=0, series=33)
 32316: Retail Trade  (children=0, series=55)
 32317: Transportation and Warehousing  (children=0, series=47)
 32318: Utilities  (children=0, series=27)
 32319: Information  (children=0, series=39)
 32320: Financial Activities  (children=0, series=51)
 32321: Professional and Business Services  (children=0, series=55)
 32322: Education and Health Services  (children=0, series=51)
 32323: Leisure and Hospitality  (children=0, series=41)
 32324: Other Services  (children=0, series=33)
 32325: Government  (children=0, series=23)
print("Sibling series:")
for child in parent_category['series']:
    if child['id'] == series_id:
        node = child
    print(f"  {child['id']}: {child['title']} {child['seasonal_adjustment']}"
          f" (popularity={child['popularity']})")
Sibling series:
  CES0000000010: Women Employees, Total Nonfarm Seasonally Adjusted (popularity=4)
  CES0000000039: Women Employees-To-All Employees Ratio: Total Nonfarm Seasonally Adjusted (popularity=16)
  CEU0000000010: Women Employees, Total Nonfarm Not Seasonally Adjusted (popularity=1)
  PAYEMS: All Employees, Total Nonfarm Seasonally Adjusted (popularity=83)
  PAYNSA: All Employees, Total Nonfarm Not Seasonally Adjusted (popularity=47)
print(f"{node['id']}: {node['title']} {node['seasonal_adjustment']}",
      f" ({node['observation_start']}-{node['observation_end']})")
print()
print(textwrap.fill(node['notes']))
PAYEMS: All Employees, Total Nonfarm Seasonally Adjusted  (1939-01-01-2025-01-01)

All Employees: Total Nonfarm, commonly known as Total Nonfarm Payroll,
is a measure of the number of U.S. workers in the economy that
excludes proprietors, private household employees, unpaid volunteers,
farm employees, and the unincorporated self-employed. This measure
accounts for approximately 80 percent of the workers who contribute to
Gross Domestic Product (GDP).  This measure provides useful insights
into the current economic situation because it can represent the
number of jobs added or lost in an economy. Increases in employment
might indicate that businesses are hiring which might also suggest
that businesses are growing. Additionally, those who are newly
employed have increased their personal incomes, which means (all else
constant) their disposable incomes have also increased, thus fostering
further economic expansion.  Generally, the U.S. labor force and
levels of employment and unemployment are subject to fluctuations due
to seasonal changes in weather, major holidays, and the opening and
closing of schools. The Bureau of Labor Statistics (BLS) adjusts the
data to offset the seasonal effects to show non-seasonal changes: for
example, women's participation in the labor force; or a general
decline in the number of employees, a possible indication of a
downturn in the economy. To closely examine seasonal and non-seasonal
changes, the BLS releases two monthly statistical measures: the
seasonally adjusted All Employees: Total Nonfarm (PAYEMS) and All
Employees: Total Nonfarm (PAYNSA), which is not seasonally adjusted.
The series comes from the 'Current Employment Statistics
(Establishment Survey).'  The source code is: CES0000000001

Revisions and vintage dates#

Economic data revisions occur as new information becomes available, improving the accuracy of initial estimates. The Bureau of Labor Statistics (BLS), for instance, releases an initial estimate of Total Nonfarm Payroll Employment (PAYEMS) on the first Friday of each month. However, this figure is a very rough estimate, which is then revised in subsequent months as more firm-level data is collected.

These revisions can be significant, sometimes altering economic assessments. ALFRED, the archival FRED tool, allows users to compare initial estimates with later revisions. For the monthly values of PAYEMS in 2023, we examine the total amount of changes at each subsequent revision.

start, end = 20230101, 20231231
data = {}
print(f"{alf.header(series_id)} (retrieved {today}):")
latest = alf(series_id, start=start, end=end, freq=freq, realtime=True)
latest
All Employees, Total Nonfarm (retrieved 20250302):
PAYEMS realtime_start realtime_end
date
20230131 154780 20250207 99991231
20230228 155086 20250207 99991231
20230331 155171 20250207 99991231
20230430 155387 20250207 99991231
20230531 155614 20250207 99991231
20230630 155871 20250207 99991231
20230731 156019 20250207 99991231
20230831 156176 20250207 99991231
20230930 156334 20250207 99991231
20231031 156520 20250207 99991231
20231130 156661 20250207 99991231
20231231 156930 20250207 99991231
print("First Release:")
data[0] = alf(series_id, release=1, start=start, end=end, freq=freq, realtime=True)
data[0]
First Release:
PAYEMS realtime_start realtime_end
date
20230131 155073 20230203 20230309
20230228 155350 20230310 20230406
20230331 155569 20230407 20230504
20230430 155673 20230505 20230601
20230531 156105 20230602 20230706
20230630 156204 20230707 20230803
20230731 156342 20230804 20230831
20230831 156419 20230901 20231005
20230930 156874 20231006 20231102
20231031 156923 20231103 20231207
20231130 157087 20231208 20240104
20231231 157232 20240105 20240201
print("Second Release:")
data[1] = alf(series_id, release=2, start=start, end=end, freq=freq, realtime=True)
data[1]
Second Release:
PAYEMS realtime_start realtime_end
date
20230131 155039 20230310 20230406
20230228 155333 20230407 20230504
20230331 155420 20230505 20230601
20230430 155766 20230602 20230706
20230531 155995 20230707 20230803
20230630 156155 20230804 20230831
20230731 156232 20230901 20231005
20230831 156538 20231006 20231102
20230930 156773 20231103 20231207
20231031 156888 20231208 20240104
20231130 157016 20240105 20240201
20231231 157347 20240202 20240307
print("Third Release:")
data[2] = alf(series_id, release=3, start=start, end=end, freq=freq, realtime=True)
data[2]
Third Release:
PAYEMS realtime_start realtime_end
date
20230131 155007 20230407 20240201
20230228 155255 20230505 20240201
20230331 155472 20230602 20240201
20230430 155689 20230707 20240201
20230531 155970 20230804 20240201
20230630 156075 20230901 20240201
20230731 156311 20231006 20240201
20230831 156476 20231103 20240201
20230930 156738 20231208 20240201
20231031 156843 20240105 20240201
20231130 157014 20240202 20250206
20231231 157304 20240308 20250206
print("Fourth Release:")
data[3] = alf(series_id, release=4, start=start, end=end, freq=freq, realtime=True)
data[3]
Fourth Release:
PAYEMS realtime_start realtime_end
date
20230131 154773 20240202 20250206
20230228 155060 20240202 20250206
20230331 155206 20240202 20250206
20230430 155484 20240202 20250206
20230531 155787 20240202 20250206
20230630 156027 20240202 20250206
20230731 156211 20240202 20250206
20230831 156421 20240202 20250206
20230930 156667 20240202 20250206
20231031 156832 20240202 20250206
20231130 156661 20250207 99991231
20231231 156930 20250207 99991231
df = pd.concat([(data[i][series_id] - data[i-1][series_id]).rename(f"Revision {i}")
                for i in range(1, len(data))], axis=1)
labels = pd.concat([data[i]['realtime_start'].rename(f"Revision {i}")
                    for i in range(1, len(data))], axis=1).fillna(0).astype(int)
DataFrame(df.sum(axis=0).rename("Total revisions ('000)"))
Total revisions ('000)
Revision 1 -349
Revision 2 -348
Revision 3 -2095
#df = pd.concat([data[i][series_id].rename(f"Revision {i}")
#                for i in range(1, len(data))], axis=1)
#labels = pd.concat([data[i]['realtime_start'].rename(f"Revision {i}")
#                    for i in range(1, len(data))], axis=1).fillna(0).astype(int)
fig, ax = plt.subplots(figsize=(12, 6))
plot_groupbar(df, labels=labels, ax=ax)
plt.legend()
plt.ylabel(f'Change in {series_id}')
plt.title(f'Revisions and vintage dates of {series_id}')
plt.tight_layout()
plt.show()
_images/04a7b27fabd47aa136609d5a6d64133decee6debac16a8ec5ab3c665f827cb88.png

FRED-MD and FRED-QD#

FRED-MD (Monthly Database) and FRED-QD (Quarterly Database) are curated datasets that streamline access to macroeconomic indicators. These datasets mimic the coverage of macroeconomic datasets used in the research literature and are updated in real-time, relieving users from the task of incorporating data changes and revisions. Historical monthly snap-shots of the datasets are also available.

Release dates#

The timing of data releases is crucial for market participants and policymakers.

md_df, md_transform = fred_md()
end = md_df.index[-1]
out = {}
for i, title in enumerate(md_df.columns):
    out[title] = alf(series_id=title,
                     release=1,
                     start=end, # within 4 days of monthend
                     end=end,
                     realtime=True)
    if title.startswith('S&P'):  # stock market data available same day close
        out[title] = Series({end: end}, name='realtime_start').to_frame()
    elif title in alf.splice_:   # these series were renamed or spliced
        if isinstance(Alfred.splice_[title], str):  # if renamed
            out[title] = alf(series_id=Alfred.splice_[title],
                             release=1,
                             start=end-4,  # within 4 days of monthend
                             end=end,
                             realtime=True)
        else:  # if FRED-MD series was spliced
            out[title] = pd.concat([alf(series_id=sub,
                                        reglease=1,
                                        start=end-4,  # within 4 days of monthend
                                        end=end,
                                        realtime=True)
                                    for sub in Alfred.splice_[title][1:]])
FRED-MD vintage: monthly/current.csv
# date convention of Consumer Sentiment
df = alf('UMCSENT', release=1, realtime=True)
out['UMCSENT'] = df[df['realtime_start'] > end - 4].iloc[:1]
# weekly averages of Claims
df = alf('ICNSA', release=1, realtime=True)
out['CLAIMS'] = df[df['realtime_start'] > end - 4].iloc[:1]
# Plot release dates of series in FRED-MD
release = Series({k: str(min(v['realtime_start'])) if v is not None and len(v)
                  else None  for k,v in out.items()}).sort_values()
fig, ax = plt.subplots(clear=True, num=1, figsize=(13, 5))
ax.plot(pd.to_datetime(release, errors='coerce'))
ax.axvline(release[~release.isnull()].index[-1], c='r')
ax.set_title(f"Current ({end}) FRED-MD series, retrieved {today}")
ax.set_ylabel('First Release Date')
ax.set_xticks(np.arange(len(release)))
ax.set_xticklabels(release.index, rotation=90, fontsize='xx-small')
plt.tight_layout()
_images/3859e13092079686e77ee461ecd22388a67081a1f65b206b36890d9f15e70bdc.png
# Check if recently released data available to update latest FRED-MD                         
md_missing = md_df.iloc[-1]
md_missing = md_missing[md_missing.isnull()]
print("Recent values available to update missing in current FRED-MD")
for series_id in md_missing.index:
    print(alf.splice(series_id).iloc[-3:])
Recent values available to update missing in current FRED-MD
date
20241031    1538666.0
20241130    1544822.0
20241231    1555153.0
Name: CMRMTSPL, dtype: float64
date
20241031    7839
20241130    8156
20241231    7600
Name: HWI, dtype: int64
date
20241130    1.145345
20241231    1.103689
20250131         NaN
Name: HWIURATIO, dtype: float64
date
20241031    248120.0
20241130    248160.0
20241231    248851.0
Name: ACOGNO, dtype: float64
date
20241031    2585582.0
20241130    2588757.0
20241231    2584314.0
Name: BUSINV, dtype: float64
date
20241031    1.37
20241130    1.37
20241231    1.35
Name: ISRATIO, dtype: float64
date
20241031    3736897.53
20241130    3745366.76
20241231    3763355.59
Name: NONREVSL, dtype: float64
date
20241130    149.697308
20241231    149.793644
20250131           NaN
Name: CONSPI, dtype: float64
date
20241231    37.90
20250131    37.66
20250228    37.53
Name: S&P PE ratio, dtype: float64
date
20241031    554951.25
20241130    556075.09
20241231    558854.68
Name: DTCOLNVHFNM, dtype: float64
date
20241031    938525.34
20241130    941204.79
20241231    946489.00
Name: DTCTHFNM, dtype: float64
# Find any missing series observations, if any, now available to update current FRED-MD
Series(release.values, index=[(s, alf.header(s)) for s in release.index])\
    .tail(len(md_missing))
(W875RX1, Real personal income excluding current transfer receipts)                              20250131
(ACOGNO, Manufacturers' New Orders: Consumer Goods)                                              20250204
(HWI, Help Wanted Index for United States)                                                       20250204
(NONREVSL, Nonrevolving Consumer Credit Owned and Securitized)                                   20250207
(CONSPI, Nonrevolving consumer credit to Personal Income)                                        20250207
(BUSINV, Total Business Inventories)                                                             20250214
(ISRATIO, Total Business: Inventories to Sales Ratio)                                            20250214
(CMRMTSPL, Real Manufacturing and Trade Industries Sales)                                        20250228
(DTCOLNVHFNM, Consumer Motor Vehicle Loans Owned by Finance Companies, Level)                    20250228
(DTCTHFNM, Total Consumer Loans and Leases Owned and Securitized by Finance Companies, Level)    20250228
(COMPAPFF, 3-Month Commercial Paper Minus FEDFUNDS)                                                  None
dtype: object

Outliers#

  1. Interquartile Range (IQR) Approach – Filters data within median ± 10 times the interquartile range to minimize extreme values.

  2. Tukey’s Rule – Proposed by John Tukey, this method classifies data points as “outliers” if they fall beyond 1.5 times the interquartile range (IQR) of the first or third quartile, that is outside of [Q1 - 1.5(Q3-Q1), Q3 + 1.5(Q3-Q1)], and as “far out” if beyond 3 times the IQR.

payems = alf('PAYEMS', freq=freq, realtime=True, diff=1, log=1).dropna().iloc[:,0]
payems
date
19390228    0.005898
19390331    0.005962
19390430   -0.006162
19390531    0.006789
19390630    0.006678
              ...   
20240930    0.001517
20241031    0.000278
20241130    0.001647
20241231    0.001934
20250131    0.000899
Name: PAYEMS, Length: 1032, dtype: float64
for method in ['tukey', 'farout', 'iq10']:
    print(f"Outliers fraction detected by {method}:", np.mean(is_outlier(payems, method=method)).round(4))
payems.iloc[is_outlier(payems, method='iq10')]
Outliers fraction detected by tukey: 0.0969
Outliers fraction detected by farout: 0.0329
Outliers fraction detected by iq10: 0.0029
date
19450930   -0.049622
20200430   -0.145794
20200630    0.034217
Name: PAYEMS, dtype: float64

Box-and-whiskers plot

A box plot shows the quartiles of the data while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers”, which are more than some multiple of the inter-quartile range (IQR) beyond the first and third quartiles.

import seaborn as sns
fig, ax = plt.subplots(figsize=(12, 6))
sns.boxplot(payems, ax=ax, orient='h', whis=3) # whiskers at 3xIQR
<Axes: xlabel='PAYEMS'>
_images/5cadd12db02d8613eaabaff26d8e64805d3a47d9f4392a91d9725a5063013819.png

Referenes:

https://fred.stlouisfed.org/

https://www.stlouisfed.org/research/economists/mccracken/fred-databases

McCracken, M. W., & Ng, S. (2016). FRED-MD: A Monthly Database for Macroeconomic Research. Journal of Business & Economic Statistics, 34(4), 574–589.

McCracken, M.W., Ng, S., 2020. FRED-QD: A Quarterly Database for Macroeconomic Research, Federal Reserve Bank of St. Louis Working Paper 2020- 005

Katrina Stierholz, 2018, Economic Data Revisions: What They Are and Where to Find Them https://journals.ala.org/index.php/dttp/article/view/6383/8404