Skip to content

PUF imputation overwrites social_security but not sub-components, causing base-year discontinuity #551

@MaxGhenis

Description

@MaxGhenis

Bug

When PUF imputation replaces social_security values, the sub-components (social_security_retirement, social_security_disability, social_security_survivors, social_security_dependents) are not updated to match. This creates a discontinuity when policyengine-us crosses the base year boundary.

Root cause

In puf_impute.py, social_security is in the IMPUTED_VARIABLES list (line 38) and gets replaced with PUF-imputed values. But the sub-components (set from CPS source codes in cps.py lines 521-556) remain unchanged.

For the base year (2024), social_security uses the PUF-imputed input:

  • 2,281 nonzero records, weighted to 33.6M people

For projected years (2025+), social_security falls through to its formula (adds the 4 sub-components), which use the original CPS values:

  • 7,197 nonzero records, weighted to 67.8M people

Impact

This causes ~3x more Social Security recipients to appear in projected years vs the base year. In microsimulation, this artificially reduces the Gini coefficient by shifting 7.7% of households from zero to positive market income (10.3% → 2.6% with zero market income).

Net income Gini: 0.6502 (2024) → 0.5625 (2025) — an unrealistic 9-point drop driven by this artifact.

Proposed fix

When PUF imputation replaces social_security, split the imputed value across the sub-components proportionally (using the CPS source code ratios). This ensures consistency between the total and sub-components for households with PUF-imputed SS values.

For records where the CPS has sub-component splits:

  • social_security_retirement = puf_ss * (cps_retirement / cps_total_ss)
  • Same for disability, survivors, dependents

For records where PUF adds new SS income (no CPS split available):

  • Default to retirement (the most common category)

This ensures the formula path (sum of sub-components) matches the input path (social_security directly) in both level and distribution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions