-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Summary
Retirement contribution variables are significantly underestimated relative to administrative benchmarks. Most lack calibration targets entirely, and the two that exist (traditional_ira_contributions and roth_ira_contributions) have issues — the IRA target is sourced from back-of-envelope math and overshoots the actual deduction, and the Roth IRA target is structurally ineffective due to a bug in the CPS allocation logic.
This matters for any reform that expands the AGI base to include retirement contributions (e.g., the CRFB AGI surtax reform on surtax_reform branch in policyengine-us).
Current Model Values vs Benchmarks (2026 simulation)
| Variable | Model (2026) | Benchmark | Gap | Calibrated? |
|---|---|---|---|---|
traditional_ira_contributions |
$26.8B | $13.2B | 2x over | Yes ($25B — too high) |
traditional_401k_contributions |
$245.4B | $567.9B | -57% under | No |
traditional_403b_contributions |
$0.0B | (bundled in 401k) | N/A | No |
self_employed_pension_contribution_ald |
$5.9B | $29.5B | -80% under | No |
self_employed_pension_contributions (input) |
$15.4B | — | — | No |
roth_ira_contributions |
$0.0B | ~$39B | Broken | Yes ($39B — ineffective) |
roth_401k_contributions |
$0.7B | — | Unknown | No |
Benchmark Sources
IRS SOI Publication 1304, Table 1.4 (Tax Year 2022)
- IRA payments (deduction): $13,166,590,000 (2.4M filers)
- Payments to a Keogh plan (deduction): $29,483,344,000 (972K filers)
- Source: IRS SOI Tax Stats - Individual Information Return, Table 1.4 — file
22in14ar.xls, Row "All returns, total", columns 124 and 116 respectively.
BEA/FRED National Income Accounts
- Total DC employer + employee contributions: $815.4B — FRED series Y351RC1A027NBEA
- Employer DC contributions only: $247.5B — FRED series W351RC0A144NBEA
- Employee DC contributions (derived): $815.4B - $247.5B = $567.9B
- This covers 401(k), 403(b), 457, and TSP elective deferrals.
Proposed Calibration Changes in HARD_CODED_TOTALS (loss.py)
1. Fix traditional_ira_contributions: $25B → $13B
The current $25B target is from SOI IRA accumulation tables (total contributions including non-deductible). Since traditional_ira_contributions flows directly into the ALD with no deductibility logic in policyengine-us, the target should match the actual deduction claimed on returns: $13.2B from SOI 1304.
2. Add traditional_401k_contributions: target ~$568B
Not currently calibrated. The variable is a plain input that flows directly into pre_tax_contributions.yaml (subtracted from wages). The BEA employee DC figure ($567.9B) is the right benchmark since it represents actual elective deferrals.
3. Add self_employed_pension_contribution_ald: target ~$29.5B
Not currently calibrated. Unlike the other variables, this one has a formula: min(contributions, self_employment_income). The SOI 1304 Keogh figure ($29.5B) represents the actual deduction claimed, which is the right target for the ALD variable. Note: calibrating the ALD directly may be more effective than calibrating the input (self_employed_pension_contributions), since the SE income cap is binding for many filers.
4. Remove roth_ira_contributions: $39B → remove
The CPS allocation logic (cps.py:713-728) gives traditional_ira_contributions the full IRA limit first, then sets roth_ira_limit = limit_ira - traditional_ira_contributions. This is mathematically guaranteed to produce $0 for Roth IRA in all cases — either traditional IRA exhausts the limit (roth_ira_limit = 0) or it exhausts the remaining pool (remaining = 0). The $39B target is dead weight. Fixing the allocation logic is a separate issue.
How Variables Flow Into AGI
| Variable | Mechanism | Deductibility Logic |
|---|---|---|
traditional_ira_contributions |
ALD (deductions.yaml) | None — raw value IS the deduction |
traditional_401k_contributions |
Pre-tax payroll (pre_tax_contributions.yaml) | None — raw value subtracted from wages |
traditional_403b_contributions |
Pre-tax payroll (same file) | None — raw value subtracted from wages |
self_employed_pension_contribution_ald |
ALD via formula min(contributions, SE_income) |
Yes — capped at SE income |
roth_ira_contributions |
Does not reduce AGI | N/A (post-tax) |
roth_401k_contributions |
Does not reduce AGI | N/A (post-tax) |
CPS Allocation Context
All retirement contributions originate from a single CPS variable: RETCB_VAL. The allocation waterfall in cps.py:620-728:
- Self-employed pension (if person has SE income) — full amount
- Traditional 401(k) — up to annual limit
- Roth 401(k) — up to annual limit from remainder
- Traditional IRA — up to IRA limit from remainder
- Roth IRA — remainder within IRA limit (structurally $0, see above)
No 403(b) or 457 allocation (line 631 comment: "Assume no 403(b) or 457 contributions for now").
Related
- CRFB AGI surtax reform (
surtax_reformbranch in policyengine-us) needs accurate retirement contribution data - Roth IRA allocation bug should be tracked separately
Microdata Sources
| Variable | CPS | PUF | Source Field |
|---|---|---|---|
traditional_ira_contributions |
Yes (from RETCB_VAL waterfall) | No | CPS ASEC RETCB_VAL, allocated after 401k in priority |
traditional_401k_contributions |
Yes (from RETCB_VAL waterfall) | No | CPS ASEC RETCB_VAL, first allocation for wage earners |
traditional_403b_contributions |
Not allocated | No | CPS comment: 'Assume no 403(b) or 457 contributions for now' |
self_employed_pension_contributions |
Yes (from RETCB_VAL waterfall) | No | CPS ASEC RETCB_VAL, allocated first if person has SE income |
roth_ira_contributions |
Yes (from RETCB_VAL waterfall) | No | CPS ASEC RETCB_VAL, allocated last — structurally $0 |
roth_401k_contributions |
Yes (from RETCB_VAL waterfall) | No | CPS ASEC RETCB_VAL, allocated after traditional 401k |
All retirement contributions originate from a single CPS variable RETCB_VAL (person.RETCB_VAL in cps.py:682). The PUF does not separately report retirement contributions — they are embedded in the AGI calculation.
Pre-Calibration Values (extended_cps_2024, full weights)
| Variable | Pre-Cal Value | Target | Ratio |
|---|---|---|---|
traditional_ira_contributions |
$0.0B | $13.2B | 0.00x |
traditional_401k_contributions |
$441.1B | $567.9B | 0.78x |
self_employed_pension_contribution_ald |
$13.7B | $29.5B | 0.46x |
Important: traditional_ira_contributions is $0 for all records in the extended CPS because the RETCB_VAL allocation waterfall consumes contributions in 401k before reaching IRA. Calibration cannot fix a variable that is $0 for every record — the allocation logic (cps.py:694-728) needs fixing first.
CPS RETCB_VAL Documentation
From the 2024 CPS ASEC Data Dictionary (p. 47):
RETCB_VAL — Retirement contribution, amount
Values: 0 = none or NIU; 1–99999 = amount contributed
Universe: RETCB_YN = 1
RETCB_YN — Retirement contribution, y/n
Values: 0 = NIU; 1 = yes; 2 = no
Universe: All people 15 years and over
RETCB_VAL is a single bundled total with no account-type breakdown. Census asks "how much did you contribute to retirement accounts?" but not "to which type?" The distribution variables (RINT_SC1/SC2) have source codes (401k, 403b, Roth IRA, Regular IRA, Keogh, SEP) but RETCB_VAL does not.
Root Cause: Sequential Waterfall
The old allocation waterfall (cps.py:682–728) gave 401(k) first priority, consuming nearly all of RETCB_VAL before reaching IRA. Since most CPS respondents report RETCB_VAL under the $23K 401(k) limit, IRA always received $0. The Roth IRA allocation was also mathematically guaranteed to produce $0 (traditional IRA either exhausted the limit or the remaining pool first).
Fix: Proportional Split (PR #554)
Replace the waterfall with a proportional allocation using administrative shares:
| Split | Traditional | Roth | Source |
|---|---|---|---|
| DC vs IRA | 90.8% DC | 9.2% IRA | BEA/FRED + IRS SOI Tables 5 & 6 |
| Within DC | 85% traditional | 15% Roth | Vanguard How America Saves 2024, PSCA 67th Annual Survey |
| Within IRA | 39.2% traditional | 60.8% Roth | IRS SOI Tables 5 & 6 (TY 2022): trad $22.5B / roth $35.0B |