-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Bug
When PUF imputation replaces social_security values, the sub-components (social_security_retirement, social_security_disability, social_security_survivors, social_security_dependents) are not updated to match. This creates a discontinuity when policyengine-us crosses the base year boundary.
Root cause
In puf_impute.py, social_security is in the IMPUTED_VARIABLES list (line 38) and gets replaced with PUF-imputed values. But the sub-components (set from CPS source codes in cps.py lines 521-556) remain unchanged.
For the base year (2024), social_security uses the PUF-imputed input:
- 2,281 nonzero records, weighted to 33.6M people
For projected years (2025+), social_security falls through to its formula (adds the 4 sub-components), which use the original CPS values:
- 7,197 nonzero records, weighted to 67.8M people
Impact
This causes ~3x more Social Security recipients to appear in projected years vs the base year. In microsimulation, this artificially reduces the Gini coefficient by shifting 7.7% of households from zero to positive market income (10.3% → 2.6% with zero market income).
Net income Gini: 0.6502 (2024) → 0.5625 (2025) — an unrealistic 9-point drop driven by this artifact.
Proposed fix
When PUF imputation replaces social_security, split the imputed value across the sub-components proportionally (using the CPS source code ratios). This ensures consistency between the total and sub-components for households with PUF-imputed SS values.
For records where the CPS has sub-component splits:
social_security_retirement = puf_ss * (cps_retirement / cps_total_ss)- Same for disability, survivors, dependents
For records where PUF adds new SS income (no CPS split available):
- Default to retirement (the most common category)
This ensures the formula path (sum of sub-components) matches the input path (social_security directly) in both level and distribution.