-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Problem
When building the extended CPS, the PUF clone half receives CPS-only variables (like retirement contributions) either by:
- Direct duplication from the CPS donor record (
elsebranch inpuf_clone_dataset) - PUF override via
OVERRIDDEN_IMPUTED_VARIABLES(e.g.pre_tax_contributions)
Neither approach preserves the relationship between these variables and income. A PUF clone with $0 wages can end up with $50k in 401(k) contributions, because there's no model linking contributions to the income variables that are common between CPS and PUF.
This creates implausible records and makes calibration harder — you can't calibrate away a structural data quality issue.
Proposed solution
For variables that exist in CPS but not PUF, train predictive models using the CPS half to predict these variables from features common to both CPS and PUF:
Common predictors (available in both datasets):
- Wages/salary income (
employment_income) - Self-employment income
- Interest/dividend income
- Age
- Filing status
- Number of dependents
- Social Security income (sub-components)
- Pension/retirement income
Variables to model (examples):
pre_tax_contributions(retirement contributions — see Add calibration targets for retirement contributions #553)traditional_401k_contributionstraditional_ira_contributionsroth_401k_contributionsself_employed_pension_contributions- Other CPS-only variables currently in
OVERRIDDEN_IMPUTED_VARIABLESthat should respect income relationships
Approach
- On the CPS half (which has both income variables and CPS-only variables), train lightweight models (e.g. quantile regression, gradient boosting) predicting each CPS-only variable from the common features
- Apply these models to the PUF clone half, using the PUF-derived income values as inputs
- This ensures that a PUF clone with high wages gets plausible retirement contributions, and one with $0 wages gets $0 contributions
Related
- Add calibration targets for retirement contributions #553 — Retirement contribution calibration targets (symptoms of this structural issue)
- Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation #554 — Proportional retirement contribution allocation (fixes CPS-side allocation)
- Reconcile SS sub-components after PUF imputation #552 — SS sub-component reconciliation and formula variable cleanup
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels