Skip to content

Add calibration targets for retirement contributions #553

@PavelMakarchuk

Description

@PavelMakarchuk

Summary

Retirement contribution variables are significantly underestimated relative to administrative benchmarks. Most lack calibration targets entirely, and the two that exist (traditional_ira_contributions and roth_ira_contributions) have issues — the IRA target is sourced from back-of-envelope math and overshoots the actual deduction, and the Roth IRA target is structurally ineffective due to a bug in the CPS allocation logic.

This matters for any reform that expands the AGI base to include retirement contributions (e.g., the CRFB AGI surtax reform on surtax_reform branch in policyengine-us).

Current Model Values vs Benchmarks (2026 simulation)

Variable Model (2026) Benchmark Gap Calibrated?
traditional_ira_contributions $26.8B $13.2B 2x over Yes ($25B — too high)
traditional_401k_contributions $245.4B $567.9B -57% under No
traditional_403b_contributions $0.0B (bundled in 401k) N/A No
self_employed_pension_contribution_ald $5.9B $29.5B -80% under No
self_employed_pension_contributions (input) $15.4B No
roth_ira_contributions $0.0B ~$39B Broken Yes ($39B — ineffective)
roth_401k_contributions $0.7B Unknown No

Benchmark Sources

IRS SOI Publication 1304, Table 1.4 (Tax Year 2022)

BEA/FRED National Income Accounts

  • Total DC employer + employee contributions: $815.4B — FRED series Y351RC1A027NBEA
  • Employer DC contributions only: $247.5B — FRED series W351RC0A144NBEA
  • Employee DC contributions (derived): $815.4B - $247.5B = $567.9B
  • This covers 401(k), 403(b), 457, and TSP elective deferrals.

Proposed Calibration Changes in HARD_CODED_TOTALS (loss.py)

1. Fix traditional_ira_contributions: $25B → $13B

The current $25B target is from SOI IRA accumulation tables (total contributions including non-deductible). Since traditional_ira_contributions flows directly into the ALD with no deductibility logic in policyengine-us, the target should match the actual deduction claimed on returns: $13.2B from SOI 1304.

2. Add traditional_401k_contributions: target ~$568B

Not currently calibrated. The variable is a plain input that flows directly into pre_tax_contributions.yaml (subtracted from wages). The BEA employee DC figure ($567.9B) is the right benchmark since it represents actual elective deferrals.

3. Add self_employed_pension_contribution_ald: target ~$29.5B

Not currently calibrated. Unlike the other variables, this one has a formula: min(contributions, self_employment_income). The SOI 1304 Keogh figure ($29.5B) represents the actual deduction claimed, which is the right target for the ALD variable. Note: calibrating the ALD directly may be more effective than calibrating the input (self_employed_pension_contributions), since the SE income cap is binding for many filers.

4. Remove roth_ira_contributions: $39B → remove

The CPS allocation logic (cps.py:713-728) gives traditional_ira_contributions the full IRA limit first, then sets roth_ira_limit = limit_ira - traditional_ira_contributions. This is mathematically guaranteed to produce $0 for Roth IRA in all cases — either traditional IRA exhausts the limit (roth_ira_limit = 0) or it exhausts the remaining pool (remaining = 0). The $39B target is dead weight. Fixing the allocation logic is a separate issue.

How Variables Flow Into AGI

Variable Mechanism Deductibility Logic
traditional_ira_contributions ALD (deductions.yaml) None — raw value IS the deduction
traditional_401k_contributions Pre-tax payroll (pre_tax_contributions.yaml) None — raw value subtracted from wages
traditional_403b_contributions Pre-tax payroll (same file) None — raw value subtracted from wages
self_employed_pension_contribution_ald ALD via formula min(contributions, SE_income) Yes — capped at SE income
roth_ira_contributions Does not reduce AGI N/A (post-tax)
roth_401k_contributions Does not reduce AGI N/A (post-tax)

CPS Allocation Context

All retirement contributions originate from a single CPS variable: RETCB_VAL. The allocation waterfall in cps.py:620-728:

  1. Self-employed pension (if person has SE income) — full amount
  2. Traditional 401(k) — up to annual limit
  3. Roth 401(k) — up to annual limit from remainder
  4. Traditional IRA — up to IRA limit from remainder
  5. Roth IRA — remainder within IRA limit (structurally $0, see above)

No 403(b) or 457 allocation (line 631 comment: "Assume no 403(b) or 457 contributions for now").

Related

  • CRFB AGI surtax reform (surtax_reform branch in policyengine-us) needs accurate retirement contribution data
  • Roth IRA allocation bug should be tracked separately

Microdata Sources

Variable CPS PUF Source Field
traditional_ira_contributions Yes (from RETCB_VAL waterfall) No CPS ASEC RETCB_VAL, allocated after 401k in priority
traditional_401k_contributions Yes (from RETCB_VAL waterfall) No CPS ASEC RETCB_VAL, first allocation for wage earners
traditional_403b_contributions Not allocated No CPS comment: 'Assume no 403(b) or 457 contributions for now'
self_employed_pension_contributions Yes (from RETCB_VAL waterfall) No CPS ASEC RETCB_VAL, allocated first if person has SE income
roth_ira_contributions Yes (from RETCB_VAL waterfall) No CPS ASEC RETCB_VAL, allocated last — structurally $0
roth_401k_contributions Yes (from RETCB_VAL waterfall) No CPS ASEC RETCB_VAL, allocated after traditional 401k

All retirement contributions originate from a single CPS variable RETCB_VAL (person.RETCB_VAL in cps.py:682). The PUF does not separately report retirement contributions — they are embedded in the AGI calculation.

Pre-Calibration Values (extended_cps_2024, full weights)

Variable Pre-Cal Value Target Ratio
traditional_ira_contributions $0.0B $13.2B 0.00x
traditional_401k_contributions $441.1B $567.9B 0.78x
self_employed_pension_contribution_ald $13.7B $29.5B 0.46x

Important: traditional_ira_contributions is $0 for all records in the extended CPS because the RETCB_VAL allocation waterfall consumes contributions in 401k before reaching IRA. Calibration cannot fix a variable that is $0 for every record — the allocation logic (cps.py:694-728) needs fixing first.

CPS RETCB_VAL Documentation

From the 2024 CPS ASEC Data Dictionary (p. 47):

RETCB_VAL — Retirement contribution, amount
Values: 0 = none or NIU; 1–99999 = amount contributed
Universe: RETCB_YN = 1

RETCB_YN — Retirement contribution, y/n
Values: 0 = NIU; 1 = yes; 2 = no
Universe: All people 15 years and over

RETCB_VAL is a single bundled total with no account-type breakdown. Census asks "how much did you contribute to retirement accounts?" but not "to which type?" The distribution variables (RINT_SC1/SC2) have source codes (401k, 403b, Roth IRA, Regular IRA, Keogh, SEP) but RETCB_VAL does not.

Root Cause: Sequential Waterfall

The old allocation waterfall (cps.py:682–728) gave 401(k) first priority, consuming nearly all of RETCB_VAL before reaching IRA. Since most CPS respondents report RETCB_VAL under the $23K 401(k) limit, IRA always received $0. The Roth IRA allocation was also mathematically guaranteed to produce $0 (traditional IRA either exhausted the limit or the remaining pool first).

Fix: Proportional Split (PR #554)

Replace the waterfall with a proportional allocation using administrative shares:

Split Traditional Roth Source
DC vs IRA 90.8% DC 9.2% IRA BEA/FRED + IRS SOI Tables 5 & 6
Within DC 85% traditional 15% Roth Vanguard How America Saves 2024, PSCA 67th Annual Survey
Within IRA 39.2% traditional 60.8% Roth IRS SOI Tables 5 & 6 (TY 2022): trad $22.5B / roth $35.0B

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions