-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Problem
Currently the FRS only records individuals who are actively making student loan repayments via PAYE. This means the base dataset has no representation of borrowers who hold a loan but earn below their repayment threshold — a significant population, particularly for newer plan types.
Impact on projections
When uprate_student_loan_plans() in policyengine-uk projects forward, it assigns new plan holders by drawing from tertiary-educated NONE people in eligible cohort/age bands. For Plan 5 (started uni 2023+), the pool of NONE people with uni_start_year >= 2023 who are aged 21+ is very small in the 2023-24 FRS base — because the first Plan 5 graduates were ~20 years old at time of survey, and even a high take-up rate applied to a near-empty pool produces a severe undercount in projected years.
The same structural issue affects Plan 2, where the FRS captures only active PAYE repayers, missing the stock of below-threshold borrowers who nonetheless hold loans. This is the undercount identified in #237.
Proposed fix
Add an imputation step to the base FRS dataset (alongside the existing wealth, consumption, income imputations) that probabilistically assigns student_loan_plan to tertiary-educated individuals who:
- Fall in the correct cohort band for their age (pre-2012 → Plan 1, 2012–2022 → Plan 2, 2023+ → Plan 5)
- Are not already flagged as active repayers
- Are in the plausible age range for holding a loan (e.g. 21–55)
The calibration target for this imputation should be SLC's "liable to repay" count (total loan holders regardless of whether they earn above threshold), rather than "above repayment threshold" — since this imputation specifically targets the below-threshold stock.
Relationship to existing work
- Add student loan repayment calibration targets #237 adds calibration targets for borrowers above threshold — this issue covers the complementary below-threshold population
- PR feat: add SLC student loan calibration targets #277 partially corrects the Plan 2 above-threshold undercount via reweighting; this imputation would address the root cause rather than compensating through weights
- The uprating logic in policyengine-uk would then have a correctly populated pool to draw Plan 5 holders from as cohorts age forward