Reinforcement Learning#

I have not failed. I’ve just found 10,000 ways that won’t work - Thomas A. Edison

We combine financial modeling with reinforcement learning (RL) to evaluate static and adaptive retirement spending strategies. Traditional approaches, such as the 4% rule, assume fixed withdrawal rates and asset allocations, that may not adapt well to changing market conditions. Leveraging historical economic data, simulations, and deep learning techniques, we apply RL to learn dynamic strategies which adjust asset allocations based on financial conditions, minimizing the risk of retirees outliving their savings.

# By: Terence Lim, 2020-2025 (terence-lim.github.io)
from pandas import DataFrame, Series
import pandas as pd
import numpy as np
import math
import random
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import numpy as np
import gymnasium as gym
from gymnasium import spaces
from stable_baselines3 import DQN
from typing import List, Tuple
from finds.database import SQL
from finds.structured import BusDay
from finds.utils import Store, subplots, set_xticks
import torch
from secret import credentials, paths
store = Store(paths['scratch'])
#pd.set_option('display.max_rows', None)
VERBOSE = 0
gym.logger.min_level = gym.logger.ERROR  # Suppress warnings
# open connections
sql = SQL(**credentials['sql'], verbose=VERBOSE)
bd = BusDay(sql, verbose=VERBOSE)
outdir = paths['scratch'] / 'RL'

Retirement spending policy#

A retirement spending policy guides how retirees withdraw funds from their savings to sustain their lifestyle while minimizing the risk of depleting assets. The key elements include the withdrawal strategy, which defines the initial withdrawal rate (e.g., the 4% rule) and its adjustments over time (e.g., inflation-linked or dynamic withdrawals); asset allocation, which balances stocks, bonds, and other investments to optimize growth and risk; time horizon, representing the expected duration of withdrawals, typically spanning 20-30 years or more; and market conditions, including interest rates, inflation, and asset returns, which influence portfolio sustainability. Effective policies seek to balance spending needs, longevity risk, and market fluctuations to minimize the risk of outliving their assets.

Benz, Ptak and Rekenthaler (2022) found that “For retirees who seek a fixed real withdrawal from their portfolio in retirement, a starting withdrawal rate of 3.8% is safe in Morningstar’s model over a 30-year time horizon, assuming a 90% success rate (defined here as a 90% likelihood of not running out of funds) and a balanced portfolio.”

SBBI data#

The Stocks, Bonds, Bills, and Inflation (SBBI) dataset provides historical monthly, quarterly, and yearly total returns and yields for major U.S. asset classes, including large-cap stocks, small-cap stocks, corporate bonds, government bonds, and inflation. This data, which dates back to 1926, is commonly used for retirement portfolio simulations.

# Read sbbi data
sbbi_file = paths['data'] / 'SBBI/stocks-bonds-bills-and-inflation-data.xlsx'
df = pd.read_excel(sbbi_file,
                   sheet_name=0,
                   skiprows=list(range(9)) + [10],
                   header=0,
                   usecols='A,B,G,P',
                   index_col=0)
columns_official = df.columns.tolist()
df.columns = ['stocks', 'bonds', 'inflation']
df.index = bd.to_date(df.index)
df
stocks bonds inflation
19260131 0.000000 0.013756 0.000000
19260228 -0.038462 0.006313 0.000000
19260331 -0.057471 0.004129 -0.005587
19260430 0.025305 0.007589 0.005618
19260531 0.017918 0.001412 -0.005587
... ... ... ...
20240831 0.024257 NaN 0.000814
20240930 0.021357 0.100107 0.001604
20241031 -0.009069 -0.046509 0.001151
20241130 0.058701 0.068325 -0.000542
20241231 -0.023838 -0.044901 0.000355

1188 rows × 3 columns

Scenario generator#

To create realistic economic simulations, we extract different 30-year retirement periods from the 100-year span of historical data. These scenarios help model how varying economic conditions impact retirement outcomes.

# Scenario generator: episode, backtest a sample path
class Episodes:
    def __init__(self, data: DataFrame, T: int, num_loops: int = 1):
        self.data = np.log(1 + data)
        self.high = list(self.data.max() + self.data.std()*.1)
        self.low = list(self.data.min() - self.data.std()*.1)
        self.T = T              # number of years per episode
        self.M = (T + 1) * 12   # number of monthly observations per episode
        self.num_loops = num_loops

    def __len__(self):
        return (len(self.data) - self.M + 1) * self.num_loops
        
    def __iter__(self):
        rows = []
        for i in range(self.num_loops):
            rows += np.random.permutation(len(self.data) - self.M + 1).tolist()
        for t in rows:
            df = self.data.iloc[t:(t + self.M), :].reset_index(drop=True)
            yield df.groupby(df.index // 12)\
                    .sum()\
                    .set_index(self.data.index[(t + 11):(t + self.M):12])

Summary statistics of all 30-year episodes:

T = 30
means = {s: [] for s in df.columns}
episodes = Episodes(df, T=T)
for episode in iter(episodes):
    for s in df.columns:
        means[s].append(episode[s].mean())
print('30-year sample periods:')
DataFrame({s: np.mean(x) for s, x in means.items()} | {'N': len(episodes)},
          index=['annualized mean'])
30-year sample periods:
stocks bonds inflation N
annualized mean 0.105185 0.05502 0.03604 817

We assess the effectiveness of a fixed annual withdrawal strategy by analyzing its performance under different conditions. Key inputs for such strategies include: ortfolio asset allocation (e.g., stock/bond mix); market environment (e.g., past returns and inflation rates); and expected duration of withdrawals.

For example, a typical rule of a fixed 4% withdrawal rate (adjusted for inflation) for a 50% stock / 50% bond portfolio can be evaluated across rolling 30-year periods to estimate the likelihood that a retiree’s savings will last the full duration.

The Base model is a simple buy-and-hold strategy where funds are invested in a fixed allocation at the start of retirement and are not rebalanced, even if allocation weights drift over time. Withdrawals remain fixed as a percentage of initial wealth, adjusted for inflation.

class BaseModel:
    """Buy-and-hold allocation model where assets weights drift from initial"""
    
    name = 'Buy-and-Hold'
    def __init__(self, T: int, W: List[float]):
        assert W[-1] > 0         # spend must be positive
        self.initial = dict(T=int(T), W=np.array(W))
        
    def reset(self, market_changes: List[float]):
        self.T = int(self.initial["T"])
        self.W = np.array(self.initial["W"])
        return list(market_changes)

    def step(self, action: List, market_changes: List) -> Tuple:
        assert self.T > 0
        self.W[:-1] = np.array(action).flatten() * self.W[:-1].sum() # rebalance
        self.W = self.W * np.exp(np.array(market_changes))           # price changes
        self.T = self.T - 1
        wealth = sum(self.W[:-1])  # remaining wealth
        if wealth < self.W[-1]:
            truncated = True          # is not enough for spend  
            self.W[:-1] = 0.0            
        else:                 
            truncated = False         # is sufficient for spend
            spend = self.W[-1] * self.W[:-1] / wealth  # allocate and deduct spending
            self.W[:-1] = self.W[:-1] - spend
        terminated = not self.T or truncated
        return terminated, truncated

    def predict(self, obs: List) -> List:
        """Allow initial asset allocation to drift"""
        return self.W[:-1] / sum(self.W[:-1])

The Fixed model adds annual rebalancing, ensuring that the portfolio maintains a constant stock/bond allocation throughout retirement.

class FixedModel(BaseModel):
    """Allocation model where assets are rebalanced to fixed weights"""
    
    name = 'Annual-Rebalance'
    def predict(self, obs: List) -> List:
        """Action to rebalance asset allocation to fixed initial weight"""
        return self.initial['W'][:-1] / sum(self.initial['W'][:-1])

To assess the risk of retirees running out of money, we simulate rolling 30-year periods and track shortfalls. Our starting scenarios assume a 50/50 stock-bond allocation and a 4% inflation-adjusted withdrawal rule. The probability of shortfall measures the fraction of simulations where assets were depleted before reaching 30 years.

alloc = 50   # 50-50 stocks/bonds initial allocation
rule = 4.0   # 4 percent spending policy
model = BaseModel(T=T, W=[alloc, 100-alloc, rule])
result = {}
for n, episode in enumerate(iter(episodes)):
    obs = model.reset(episode.iloc[0])

    for year in episode.index[1:]:
        obs = episode.loc[year].to_list()
        action = model.predict(obs)
        terminated, truncated = model.step(action, obs)
        if truncated:
            break
    result[episode.index[0]] = model.T
prob = np.mean(np.array(list(result.values())) != 0)
print('Number of 30-year scenerios:', len(episodes))
print('Probability of shortfall:   ', round(prob, 4))
Number of 30-year scenerios: 817
Probability of shortfall:    0.0747

The graph illustrates the years in which retirees ran out of money, with bar heights representing the number of years their assets fell short of the goal. Additionally, plots of compounded asset returns and inflation highlight how periods of lower investment returns and higher inflation increased the risk of shortfall.

fails = Series(result).sort_index().rename('shortfall_years')
market = df.reindex(fails.index).cumsum()
fails.index = fails.index.astype(str).str.slice(0,4)
market.index = fails.index
ax = market.plot()
ax.set_title(f"Retirement Shortfalls: {alloc}/{100-alloc} allocation, {rule}% spend")
ax.set_ylabel('cumulative returns of asset classes and CPI')
ax.set_xlabel('Year of retirement')
ax.legend(loc='upper left')
##
bx = ax.twinx()
fails.plot(kind='bar', width=1.0, color='C4', ax=bx)
bx.set_ylabel('Number of shortfall years')
bx.legend(['shortfall years'], loc='lower right')
set_xticks(ax=ax, nskip=23, rotation=90)

plt.tight_layout()
_images/35e74e4bccf0658ef6beee494201d31d1aa5033cff49cd7d96c9f429dd1fedf6.png

Historical simulations#

Next, we expanded the analysis to include a broader range of asset allocations (0% to 100% in stocks) and withdrawal rates (3% to 5%) to evaluate their impact on portfolio longevity.

# range of spending policy rules
rules = np.arange(3, 5.1, 0.1)

# simulate fixed and initial equity allocations from 0 to 100%
allocs = np.arange(0, 105, 5)
TAIL = 0.95
tail = int(100 * (1 - TAIL))
def compute_shortfall(x, q):
    """Compute average number of years of shortfall given tail probability level, q"""
    x = sorted(x)   # each simulation's results (years of shortfall)
    q = int(q * len(x))
    return x[q], np.mean(x[q:])
for num, Model in enumerate([BaseModel, FixedModel]):
    fail = DataFrame(columns=rules, index=allocs, dtype=float)
    shortfall = DataFrame(columns=rules, index=allocs, dtype=float)
    quantile = DataFrame(columns=rules, index=allocs, dtype=float)

    for alloc in tqdm(allocs):
        for rule in rules:

            # Evaluate for this allocation strategy and spending policy
            model = Model(T=T, W=[alloc, 100-alloc, rule])
            result = []
            for n, episode in enumerate(iter(episodes)): # for every 30-year sample
                obs = model.reset(episode.iloc[0])
                for year in episode.index[1:]:
                    obs = episode.loc[year].to_list()
                    action = model.predict(obs)
                    terminated, truncated = model.step(action, obs)
                    if truncated:
                        break
                result.append(model.T)
            fail.loc[alloc, rule] = np.mean(np.array(result) != 0)
            quantile.loc[alloc, rule], shortfall.loc[alloc, rule] = compute_shortfall(result, TAIL)
    store[model.name] = dict(fail=fail, shortfall=shortfall)
100%|██████████| 21/21 [07:33<00:00, 21.57s/it]
100%|██████████| 21/21 [07:32<00:00, 21.57s/it]

We compare the proportion of scenarios where the Base buy-and-hold strategy performed better or worse than the Fixed strategy with annual rebalancing.

#for model in [BaseModel, FixedModel]:
#    fail = store[model.name]['fail']
#    print(f"Probability of Shortfall: with {model.name} allocation")
#    print(fail.iloc[::-1, :].round(2).to_string())
print('Buy-and-hold outperformed Annual-Rebalance:', 
      round(np.mean(store[FixedModel.name]['fail'] > store[BaseModel.name]['fail']), 3))
print('Annual-Rebalance outperformed Buy-and-Hold:', 
      round(np.mean(store[FixedModel.name]['fail'] < store[BaseModel.name]['fail']), 3))
Buy-and-hold outperformed Annual-Rebalance: 0.374
Annual-Rebalance outperformed Buy-and-Hold: 0.297

Risk of spending shortfall#

To measure the likelihood and severity of depleting funds before the end of retirement, we use two key metrics:

  • Probability of shortfall: The percentage of simulations where retirees outlived their assets.

  • Expected shortfall period: This metric quantifies the severity of shortfalls by estimating how many years retirees would be without funds in the worst-case scenarios (e.g., the worst 5% of simulations).

for model in [BaseModel, FixedModel]:
    fail = store[model.name]['fail'].iloc[::-1, :]
    plt.figure(figsize=(8, 6))
    sns.heatmap(fail, annot=True, cbar=True, fmt='.2f', annot_kws=dict(fontsize='xx-small'),
                xticklabels=np.round(fail.columns, 1), yticklabels=fail.index)

    # Labels and title
    plt.xlabel("Spending Rule")
    plt.ylabel("Allocation to Stocks")
    plt.title(f"Probability of Shortfall: with {model.name} strategy")
    plt.show()
_images/0154e90e4b93daf92e6150950d16505cc3cbef93e6b81d367617a4a98f50ff77.png _images/91408338cfcd2a4d95568016510b8d358ce31c731423f0c4a48beea00a7d81f5.png

Probability of shortfall:

Contour lines in the probability of shortfall graph show the combinations of asset allocation and withdrawal rates that result in the same likelihood of running out of money.

def plot_contour(Z, levels, title, label):
    """Helper to plot contour lines at given levels"""
    X, Y = np.meshgrid(rules, allocs)
    fig, ax = plt.subplots(figsize=(10, 6))
    cp = ax.contour(X, Y, Z, levels=levels, cmap='cool')
    ax.set_title(f"{title} with {model.name} strategy")
    ax.set_xticks(rules)
    ax.set_xlabel('spending policy (%)')
    ax.set_yticks(allocs)
    ax.set_ylabel(f'{model.name} equity allocation (%)')
    ax.grid(which='both')
    fig.colorbar(cp, label=label)
    plt.tight_layout()
for model in [BaseModel, FixedModel]:    
    fail = store[model.name]['fail']
    plot_contour(fail, levels=[0.0, .01,.05,.1,.15],
                 title="Probability of Shortfall", 
                 label="countour levels: probability of shortfall")
_images/d06a0f38432c316699443d04af5f493074cd65bf6d4e3f69fd29f606e9132440.png _images/c66c319d9dac9c3bc4e2fc606ddcf7f1bf1d74477a0e4269efa9586e082eb5a4.png

Expected shortfall period:

This metric estimates the potential duration of financial shortfalls, which helps retirees plan for worst-case scenarios.

#for model in [BaseModel, FixedModel]:    
#    shortfall = store[model.name]['shortfall']
#    print(f"Expected Years of Shortfall in {tail}% tail: (model.name) allocation")
#    print(shortfall.iloc[::-1, :].round(1).to_string())
print('Buy-and-hold outperformed Annual-Rebalance:', 
      round(np.mean(store[FixedModel.name]['shortfall'] > store[BaseModel.name]['shortfall']), 3))
print('Annual-Rebalance outperformed Buy-and-Hold:', 
      round(np.mean(store[FixedModel.name]['shortfall'] < store[BaseModel.name]['shortfall']), 3))
Buy-and-hold outperformed Annual-Rebalance: 0.059
Annual-Rebalance outperformed Buy-and-Hold: 0.61
for model in [BaseModel, FixedModel]:    
    shortfall = store[model.name]['shortfall']
    plot_contour(shortfall, levels=[0, 1, 2, 3, 4, 6, 8, 10], 
                 title=f"Expected Years of Shortfall in {tail}% tail",
                 label="contour levels: number of years")
_images/d9d919db489496e89a4bfbf9b70bc5a7a9e9eea3d7dc746cdff627ce0a43476f.png _images/02d68c23c4872ef8fb45678cbb122274e9d25d6295fd5d6d2fb60c0085fd2b3e.png

Deep reinforcement learning#

Unlike static asset allocation models, reinforcement learning (RL) can learn optimal strategies that adapt to market conditions and remaining wealth.

Gymnasium environment#

The Gymnasium (formerly OpenAI Gym) library provides a Pythonic interface for reinforcement learning problems. Stable Baselines3 (SB3) implements a set of RL algorithms in PyTorchf OpenAI’s Gym library.

https://gymnasium.farama.org/index.html

DLR-RM/stable-baselines3

State space#

During training, the RL model learns to predict optimal spending actions based on the current financial state. The state space includes:

  • current wealth and allocation,

  • recent market (equity and bonds) and inflation changes

  • spending amount

  • years since retirement

Actions#

Exploitation: Selects the action with the highest expected value, based on past training data (e.g., Q-learning, SARSA).

Exploration: Tries alternative strategies to discover better long-term policies.

Reward function#

  • If assets are depleted, the model applies a severe penalty of \(-100\) times the square of remaining years: \(-(100 T^2)\)

  • If wealth is positive, the reward is proportional to the wealth-to-spending coverage ratio: \(\sqrt{\frac{W}{S(T+1)}}\)

FAIL = 100    # failure reward factor
class CustomEnv(gym.Env):
    """Custom gymnasium environment, using Episodes scenario generator"""
    def __init__(self, model: BaseModel, episodes: Episodes):
        super().__init__()
        self.model = model            # for stepping through a 30-year episode
        self.episodes = episodes      # for generating a sample 30-year episode
        self.iterator = iter(self.episodes)  # iterator to reset a 30-year sample
        T = self.model.initial['T']
        W = self.model.initial['W']
        low = np.array([0] + episodes.low + [0]*len(W))
        high = np.array([T] + episodes.high + [T]*len(W))
        self.observation_space = spaces.Box(low=low, high=high)
        self.action_space = spaces.Discrete(21)
        #self.action_space = spaces.Box(low=0.0, high=1.0)
        
    def reset(self, seed=0):
        super().reset(seed=seed)

        # generate a fresh 30-year episode
        self.episode = next(self.iterator, None)
        if self.episode is None:
            self.iterator = iter(self.episodes)
            self.episode = next(self.iterator)
        self.n = 0

        # return initial observations
        deltas = self.model.reset(self.episode.iloc[self.n])
        obs = [self.model.T] + list(deltas) + self.model.W.tolist()
        return obs, {}


    def step(self, action):
        S = self.model.W[-1]            # amount to spend at t-1
        W = np.sum(self.model.W[:-1])   # wealth at t-1
        T = self.model.T                # year remaining till termination

        # Convert action to asset allocation weights
        action = action * 0.05
        action = np.array([action, 1-action])
        
        # Grab next market move at time t
        self.n = self.n + 1
        deltas = self.episode.iloc[self.n].tolist()

        # Apply rebalance wealth and market move (t=1)
        terminated, truncated = self.model.step(action, deltas)

        # Calculate reward (t=1)
        reward = -(T*T*FAIL) if truncated else math.sqrt(W / (S*T))

        # Return as next observation
        obs = [T] + list(deltas) + self.model.W.tolist()
        return obs, reward, terminated, truncated, {}

Helpers to evaluate trained model

def evaluate(env, model, episodes):
    """Return success likelihood, shortfalls and asset allocation actions"""
    result = []
    actions = {t: [] for t in range(episodes.T + 1)}  # to store predicted actions
    for n, episode in enumerate(iter(episodes)):
        obs, info = env.reset()
        terminated = False
        while not terminated:
            action, _states = model.predict(np.array(obs))
            actions[env.model.T].append(float(action))
            obs, rewards, terminated, truncated, info = env.step(action)
        result.append(env.model.T if truncated else 0)
    #print(n, truncated, action, rewards, env.model.W)
    return np.mean(np.array(result) != 0), *compute_shortfall(result, TAIL), actions

Deep Q-Network (DQN)#

We use Deep Q-Networks (DQN) from Stable Baselines3, which is designed for discrete action spaces. This approach employs deep reinforcement learning to maximize wealth longevity while adapting asset allocation strategies dynamically.

TIMESTEPS = int(5e5)
initial_alloc = 50
fail, actions, shortfall, quantile = {}, {}, {}, {}
for rule in tqdm(rules):  # train a model for each spending rule

    # define and train model for this spending rule
    name = str(round(rule, 1))
    W = [initial_alloc, 100-initial_alloc, rule]
    env = CustomEnv(model=BaseModel(T=T, W=W), episodes=episodes)
    clf = DQN('MlpPolicy', env, verbose=VERBOSE)
    clf.learn(total_timesteps=TIMESTEPS)
    clf.save(f"{outdir}/{name}")

    # evaluate model
    test_clf = clf.load(f"{outdir}/{name}", env=None)
    fail[name], quantile[name], shortfall[name], actions[name] =\
        evaluate(env, test_clf, episodes)

    store['dqn'] = dict(fail=fail, shortfall=shortfall,
                        quantile=quantile, actions=actions)
100%|██████████| 21/21 [2:00:19<00:00, 343.79s/it]
print(DataFrame(fail, index=["Deep RL"]).round(2).to_string())
         3.0  3.1  3.2  3.3   3.4  3.5  3.6   3.7   3.8   3.9   4.0   4.1   4.2   4.3  4.4   4.5   4.6   4.7   4.8   4.9   5.0
Deep RL  0.0  0.0  0.0  0.0  0.01  0.0  0.0  0.01  0.02  0.02  0.03  0.06  0.06  0.08  0.1  0.11  0.14  0.12  0.16  0.21  0.19
# Plot average allocations over time
labels = list(np.arange(T) + 1)
fig, axs = subplots(nrows=2, ncols=2, figsize=(10, 8), sharex=True, sharey=True)
plt.suptitle('Deep RL average equity allocations, by year after retirement')
for ax, rule in zip(axs, ["3.5", "4.0", "4.5", "5.0"]):
    y = [[a*5 for a in actions[rule][i]] for i in labels]
    mean = [np.mean(a) for a in y]
    median = [np.median(a) for a in y]
    ax.plot(labels, mean, label='mean')
    ax.plot(labels, median, label='median')
    ax.set_title(f"Spending policy: {rule}%")
    ax.set_xlabel('years after retirement')
    ax.set_ylabel('equity allocation (%)')
    ax.legend()
plt.tight_layout()
_images/69932e5e34a9cb7c4529d3b77b5e8adfb9a5ad9d2429d1469b721f41f9841ba8.png

References:

Richard S. Sutton and Andrew G. Barto, 2018, “Reinforcement Learning: An Introduction”, MIT Press.

Christine Benz, Jeffrey Ptak John Rekenthaler, Dec. 12, 2022, “The State of Retirement Income: 2022. A look at how higher bond yields, lower equity valuations, and inflation affect starting safe withdrawal rates”, Morningstar Portfolio and Planning Research.