Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation#554
Draft
PavelMakarchuk wants to merge 18 commits intomainfrom
Draft
Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation#554PavelMakarchuk wants to merge 18 commits intomainfrom
PavelMakarchuk wants to merge 18 commits intomainfrom
Conversation
When PUF imputation replaces social_security values, the sub-components (retirement, disability, survivors, dependents) were left unchanged, creating a mismatch. This caused a base-year discontinuity where projected years had ~3x more SS recipients than the base year, producing artificial 9-point Gini swings. The new reconcile_ss_subcomponents() function rescales sub-components proportionally after PUF imputation. New recipients (CPS had zero SS but PUF imputed positive) default to retirement. Fixes #551 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New PUF recipients (CPS had zero SS) now get assigned based on age: >= 62 -> retirement, < 62 -> disability. Matches the CPS fallback logic. Falls back to retirement if age is unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of the simple age >= 62 heuristic, train a QRF on CPS records (where the reason-code split is known) to predict shares for new PUF recipients. Uses age, gender, marital status, and other demographics. Falls back to age heuristic when microimpute is unavailable or training data is insufficient (< 100 records). 14 tests covering: - Proportional rescaling (existing recipients) - Age heuristic fallback (4 tests) - QRF share prediction (4 tests including sum-to-one and elderly-predicted-as-retirement) - Edge cases (zero imputation, missing subs, no SS) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CPS-PUF link is statistical (not identity-based), so the paired CPS record's sub-component split is just one noisy draw. A QRF trained on all CPS SS recipients gives a better expected prediction. Also removes unnecessary try/except ImportError guard for microimpute. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes CI failure: DE TANF deficit_rate parameter started at 2024-10-01 in 1.570.7, causing ParameterNotFoundError for Jan 2024 simulations. Fixed in newer releases (start date corrected to 2011-10-01). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Correct traditional_ira_contributions: $25B → $13.2B (SOI 1304 Table 1.4 actual deduction, not total contributions) - Add traditional_401k_contributions: $567.9B (BEA/FRED employee DC contributions) - Add self_employed_pension_contribution_ald: $29.5B (SOI 1304 Table 1.4 Keogh plan deduction) - Remove roth_ira_contributions: structurally $0 due to CPS allocation bug (#553) - Update both loss.py and etl_national_targets.py Closes #553 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace sequential waterfall with proportional allocation based on administrative data. The old waterfall gave 401(k) first priority, consuming all of RETCB_VAL and leaving IRA contributions at $0 for every record. The Roth IRA allocation was also mathematically guaranteed to produce $0. The new approach splits RETCB_VAL proportionally: - DC vs IRA: 90.8% / 9.2% (BEA/FRED vs IRS SOI) - Within DC: 85% traditional / 15% Roth (Vanguard/PSCA) - Within IRA: 39.2% traditional / 60.8% Roth (IRS SOI Tables 5 & 6) All fractions are stored in imputation_parameters.yaml with sources. Contribution limits are still enforced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Split 401(k) target: $567.9B total employee DC deferrals (BEA/FRED) into traditional $482.7B (85%) and Roth $85.2B (15%) using Vanguard How America Saves 2024 dollar share estimate - Add roth_ira_contributions target: $35.0B from IRS SOI Accumulation Tables 5 & 6 (TY 2022) — direct administrative source - Update etl_national_targets.py in parallel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes CI failure caused by DE TANF deficit_rate parameter missing history before 2024-10-01. The fix was merged in policyengine-us PR #7170 but uv.lock was pinned at 1.570.7. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Variables with formulas in policyengine-us are recomputed by the simulation engine, so storing them wastes space and can mislead validation. This removes 9 such variables from the extended CPS output (saving ~15MB). Also adds tests verifying: - No formula variables are stored (except person_id) - Stored input values match what the simulation computes - SS sub-components sum to total social_security per person Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update _drop_formula_variables and tests to catch variables that use `adds` or `subtracts` (e.g. social_security), not just explicit formulas. These are also recomputed by the simulation engine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
baogorek
approved these changes
Feb 26, 2026
Collaborator
baogorek
left a comment
There was a problem hiding this comment.
Looks good, just please update this one bullet point in the PR body (if this is indeed correct):
Says:
- Remove roth_ira_contributions target ($39B) — structurally $0 due to CPS allocation bug where traditional IRA always exhausts the IRA limit first
But:
code actually keeps roth_ira_contributions and updates it from $39B to $35.0B in both loss.py and etl_national_targets.py (which it can do be
…om/PolicyEngine/policyengine-us-data into calibrate-retirement-contributions # Conflicts: # uv.lock
PUF clones previously copied CPS retirement contributions blindly, so a record with $0 wages could have $50k in 401(k) contributions. Train a QRF on CPS data (which has realistic income-to-contribution relationships) and predict onto the PUF half using PUF-imputed income. Post-prediction constraints enforce contribution caps, zero-out rules for records with no wages/SE income, and non-negativity. - Remove traditional_ira_contributions from IMPUTED_VARIABLES - Add CPS_RETIREMENT_VARIABLES, RETIREMENT_PREDICTORS constants - Add _get_retirement_limits() with year-specific 401k/IRA caps - Add _impute_retirement_contributions() following _impute_weeks_unemployed pattern - Integrate into puf_clone_dataset() variable routing loop - Add 34 unit tests covering constraints, routing, and limits Closes #561 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2 tasks
Add income predictors (interest, dividends, pension, SS) to the retirement contribution QRF, matching issue #561's recommendation. Split RETIREMENT_PREDICTORS into demographic and income sublists so the test side correctly sources income from PUF imputations. Also add validation/validate_retirement_imputation.py for post-build constraint checking and aggregate comparison against calibration targets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Combines three related pieces of work for retirement contribution calibration (issue #561):
1. Calibration targets (original PR #554)
2. SS sub-component reconciliation (from PR #552)
social_security_retirement, reconcile DI/survivor/dependent sub-components3. QRF imputation for retirement contributions on PUF clone half (new)
Problem: When building the Extended CPS, PUF clone records get PUF-imputed income via QRF, but retirement contribution variables were just blindly duplicated from the CPS donor. This means a PUF clone with $0 wages could have $50k in 401(k) contributions — there's no model linking contributions to income.
Solution: Train a QRF on the CPS half (which has realistic income↔contribution relationships) and predict retirement contributions onto the PUF clone half using PUF-imputed income as input. Follows the exact pattern of the existing
_impute_weeks_unemployed().Variables modeled:
traditional_401k_contributionsroth_401k_contributionstraditional_ira_contributionsroth_ira_contributionsself_employed_pension_contributionsPredictors (13 total):
Post-prediction constraints:
Data flow: CPS half keeps original values; PUF half gets CPS-trained QRF predictions. The
traditional_ira_contributionswas moved out ofIMPUTED_VARIABLES(PUF-based QRF) into the CPS-trained model for consistency with the other retirement variables.Test plan
Validation
A post-build validation script is provided at
validation/validate_retirement_imputation.py. After a full build, run:It checks:
Files changed
policyengine_us_data/calibration/puf_impute.py_get_retirement_limits(),_impute_retirement_contributions(), routing inpuf_clone_dataset()policyengine_us_data/tests/test_calibration/test_retirement_imputation.pyvalidation/validate_retirement_imputation.pyconftest.py(root)policyengine_us_data/tests/test_calibration/conftest.pyCloses #561
🤖 Generated with Claude Code