Student Welfare Optimization in College Matching

student_welfare_matching.md

Student Welfare Optimization in College Matching

Student-Optimal Deferred Acceptance Empirically

Theoretical Foundation

The Gale-Shapley Deferred Acceptance (DA) algorithm (1962) produces a student-optimal stable matching when students propose: each student receives their most-preferred partner consistent with stability. The key properties:

Stability: No student-college pair mutually prefers each other over their assigned match
Strategy-proofness: Truthful preference reporting is a dominant strategy for the proposing side (students)
Optimality within stability: The student-proposing DA yields the best possible stable matching for students; no other stable matching is weakly preferred by all students
Lattice structure: The set of stable matchings forms a lattice, with student-optimal and college-optimal matchings at opposite extremes

Empirical Implementations

NYC High School Match (2003)

Replaced an uncoordinated system where ~30,000 students were unassigned annually
Adopted student-proposing DA with single tiebreaking
Reduced unassigned students from 30,000 to ~3,000
Abdulkadiroglu, Pathak, and Roth found that simulations with field data favor single tiebreaking (breaking ties the same way at every school) for efficiency

Boston School Choice (2005)

Boston School Committee replaced the "Boston mechanism" (immediate acceptance) with DA
Under the old Boston mechanism, sophisticated parents strategically misrepresented preferences while unsophisticated parents (disproportionately low-income and minority) reported truthfully and were penalized
The switch to strategy-proof DA eliminated the "gaming advantage" of informed families
Abdulkadiroglu, Pathak, Roth, and Sonmez documented both sophisticated and unsophisticated strategic behavior, establishing fairness as a rationale for strategy-proof mechanisms

NRMP Medical Residency Match

Roth (1984) showed that NRMP had independently converged on a DA-equivalent algorithm
The match has operated stably since 1952, with periodic refinements (couples matching added in 1998)

Known Limitations

Not Pareto efficient: DA does not maximize total student welfare. Abdulkadiroglu, Pathak, and Roth showed that the inefficiency "can potentially be severe," and empirical findings from the NYC match corroborated this
Proposer advantage: Students get optimal stable matching, but this can still be far from their first choices at highly selective institutions
No mechanism is both stable and efficient: Stability and Pareto efficiency are fundamentally incompatible (Roth, 1982). Gains for some students from breaking stability always create justified envy for others
Tiebreaking matters: When colleges are indifferent among students, different tiebreaking rules lead to different matchings with different welfare properties

Relevance to College Admissions

U.S. college admissions does not use DA. Instead, it operates as a decentralized market with:

Students applying to multiple colleges simultaneously
Colleges making independent admission decisions
Multiple rounds (ED, EA, RD) creating a sequential matching structure
No centralized clearinghouse

This decentralized structure introduces information frictions, strategic complexity, and welfare losses that a centralized DA mechanism would partially address.

Alternative Mechanisms for Student Welfare

Mechanism Comparison

Mechanism	Strategy-Proof	Stable	Pareto Efficient	Used Where
Student-proposing DA	Yes (for students)	Yes	No	NYC schools, Boston, NRMP
College-proposing DA	No (for students)	Yes	No	Theoretical
Top Trading Cycles (TTC)	Yes	No	Yes	Theoretical; some kidney exchange variants
Boston/Immediate Acceptance	No	No	No	Pre-2005 Boston, China (variants)
Serial Dictatorship	Yes	N/A	Yes	Simple assignment problems
Decentralized (current U.S.)	N/A	No	No	U.S. college admissions

Top Trading Cycles (TTC)

Pareto efficient and strategy-proof for students
Students can form "trading cycles" to swap assignments, leading to efficiency gains over DA
Not stable: can produce justified envy (a student prefers another school that would prefer them)
Abdulkadiroglu and Sonmez (2003) proposed TTC for school choice; it was considered but not adopted in Boston or NYC due to perceived fairness concerns about justified envy
When priority structures satisfy both strong acyclicity and Kesten-acyclicity, TTC and the Boston mechanism produce equivalent outcomes

Boston/Immediate Acceptance Mechanism

Students rank schools; in each round, schools permanently accept top applicants up to capacity
Not strategy-proof: Parents must strategically rank "realistic" choices first, not true preferences
Sophisticated families game the system; unsophisticated families are harmed
Research on China's parallel college admissions (a Boston mechanism variant) found significant gender, rural-urban, and ethnic gaps in mismatching explained by risk aversion and information disadvantage
Some theoretical work suggests the Boston mechanism may produce higher aggregate welfare when all agents are fully strategic, but this assumption fails empirically

Consistent Pareto Improvement over DA

Recent theoretical work (Tang and Yu, 2014; Erdil and Ergin, 2008) proposes mechanisms that achieve Pareto improvements over student-optimal DA without sacrificing strategy-proofness
These involve finding "stable improvement cycles" -- groups of students who can swap assignments while maintaining stability
Practical significance: even small efficiency gains can matter at scale

Implications for Simulation

The decentralized U.S. college admissions market is none of these mechanisms -- it lacks strategy-proofness, stability, and efficiency. This creates space for modeling:

How much welfare is lost vs. a centralized DA mechanism?
How does information asymmetry compound these losses?
Which students bear disproportionate welfare costs?

Existing ABM Simulations of College Admissions

Reardon, Kasman, Klasik, and Baker (2016) -- Stanford CEPA

"Agent-Based Simulation Models of the College Sorting Process" Published in Journal of Artificial Societies and Social Simulation (JASSS), Vol. 19, Issue 1.

Model Architecture:

8,000 students, 40 colleges, 150 seats per college (75% capacity utilization)
Two student attributes: "resources" (socioeconomic capital) and "caliber" (academic achievement), bivariate normal with correlation 0.3
One college attribute: "quality" (running average of enrolled student caliber)
Three-stage annual cycle: application, admission, enrollment

Key Parameters:

Parameter	Value	Source
Resource-caliber correlation	0.3	ELS:2002
Quality reliability	0.7 + 0.1 x resources	Plausible estimate
Caliber enhancement	+0.1 x resources	Test prep literature
Application count	4 + 0.5 x resources	ELS:2002

Information Model:

Students observe college quality with noise; noise decreases with resources
Students observe their own caliber with some error
Information quality = 0.7 + 0.1 x resources (wealthy students have better information)

Admission Model:

Colleges rank by observed caliber and admit based on expected yield
Yield estimated from 3-year running average
Colleges adjust admission volume to fill seats

Key Findings:

Resource-caliber correlation is the dominant driver of sorting inequality (eliminating it reduced the 90th-10th percentile gap from 20x to 4x)
Information disparities, application enhancement, application count inequality, and utility preferences each produce modest individual effects but collectively create "non-trivial" stratification
Model reached equilibrium by year 10-20
Validated against IPEDS 2010-2011 data: selectivity and yield patterns matched real institutional data

Relevance: This is the closest published model to the college-sim project architecture. Key differences from our simulator: Reardon et al. use continuous distributions rather than archetype-based student generation, and a simpler two-attribute student model.

Assayed and Maheshwari (2023) -- Jordan Medical Colleges

"Agent-Based Simulation for University Students Admission: Medical Colleges in Jordan Universities"

Built in NetLogo v6.3
Two agents: high school students, medical colleges
Parameters: family income, high school GPA
Focused on seat allocation fairness
Found that high-ranking universities consistently set high GPA cutoffs
Simulated both partially centralized (each university sets cutoffs) and fully centralized (central authority allocates) scenarios

Assayed and Al-Sayed (2025) -- Survey Paper

"Student Behaviors in College Admissions: A Survey of Agent-Based Models" Published in International Journal of Emerging Multidisciplinaries.

Comprehensive survey of ABM approaches to college admissions
Identified common patterns: two agent types (students, colleges), three-stage matching (application, admission, enrollment)
Highlighted how family resources impact application strategy and outcomes
Emphasized the role of ABM in studying fairness and equity

Sirolly (2023) -- Toy Model

"A Toy Model of College Admissions"

50 colleges, 100 seats each
Students modeled with normally distributed ability W ~ N(0,1)
Noisy signals sent to colleges
Utility function: u_i(k) = I_k^(-beta) + gamma(K - k)
Students solve portfolio optimization: maximize expected utility minus application costs
Found application volume concentrates at selective colleges; information cascades amplify competitive pressure

Other Notable Models

Reardon et al. (2015) extended the base ABM to study affirmative action policy effects, simulating race-based and socioeconomic-based policies
Matching Impacts of School Admission Mechanisms (ResearchGate, 2016): compared DA, Boston, and TTC mechanisms using agent-based simulation, measuring mismatch and welfare outcomes
Lee et al. (2023, Cornell): used learned admission-prediction models as replacement for standardized tests; calibration-focused approach

Undermatching / Mismatch Literature

Hoxby and Avery (2012) -- The Foundational Paper

"The Missing 'One-Offs': The Hidden Supply of High-Achieving, Low-Income Students" NBER Working Paper 18586.

Key Findings:

25,000-35,000 low-income students annually have SAT/ACT scores and GPAs in the top 10% nationally
The vast majority do not apply to any selective college, despite being admissible
These students are geographically dispersed ("one-offs") in small towns, not concentrated in urban areas where selective colleges recruit
Selective institutions would often cost them less than non-selective alternatives due to generous financial aid
High schools serving these students have overworked counselors unfamiliar with selective admissions

Student Typology:

"Achievement-typical" low-income students: application behavior mirrors high-income peers with similar achievement (only 8% of high-achieving low-income students)
"Income-typical" low-income students: application behavior mirrors other low-income students regardless of achievement (the vast majority, ~92%)

Hoxby and Turner (2013) -- The ECO Intervention

"Expanding College Opportunities for High-Achieving, Low-Income Students"

Intervention Design:

Low-cost information packet sent to 39,682 high-achieving, low-income students (2010-2012)
Included: application guidance, financial aid information, fee waivers, college resource/graduation data
Cost: approximately $6 per student

Results:

Treated students were 46% more likely to enroll at peer-quality institutions matching their abilities
Institutions attended had graduation rates 15.1% higher on average
Instructional spending was 21.5% higher at enrolled institutions
Benefit-to-cost ratio was "extremely high, even under the most conservative assumptions"
Impact was 275x greater than equivalent spending on in-person counseling

Implication: Information intervention alone dramatically reduces undermatching. The problem is primarily informational, not financial or academic.

Determinants of Mismatch (NBER Working Paper 19286)

Key Findings:

Mismatch is driven primarily by student application and enrollment decisions, not college admission decisions
Most mismatched students either never applied to well-matched schools or were accepted but chose differently
Financial constraints, information access, and public college options all affect mismatch probability
More information = less mismatch; lower socioeconomic backgrounds = less information = more undermatch

Lincove and Cortes (2016) -- Automatic Admissions

"Match or Mismatch? Automatic Admissions and College Preferences of Low- and High-Income Students" NBER Working Paper 22559.

Studied Texas top 10% automatic admissions policy
Low-income students still undermatch even with guaranteed admission
Preferences, not access, drive much of the remaining mismatch

Bastedo and Flaster (2014) -- Methodological Critique

"Conceptual and Methodological Problems in Research on College Undermatch"

Challenged assumptions in undermatching research
Argued that definitions of "match" are often arbitrary
Questioned whether attending a more selective institution is always welfare-improving
Important caveat for simulation design: how we define "optimal match" matters

Mizala et al. (2026) -- International Evidence

"Bright but Poor: Undermatching in the Access to Postsecondary Education" American Educational Research Journal.

Extended undermatching analysis to international contexts
Confirmed that socioeconomic status is a persistent predictor of undermatching across different educational systems

Welfare Consequences of Undermatching

Empirical evidence on outcomes:

Graduation rates: Students who undermatch graduate at lower rates than peers at better-matched institutions
Earnings: Attending a more selective institution is associated with higher lifetime earnings, particularly for low-income and minority students (Dale and Krueger, 2014)
Graduate school access: Selective college attendance increases probability of graduate/professional school enrollment
Network effects: Peer quality, alumni networks, and institutional resources compound over careers

Key Parameters for Simulation

Based on the literature, these are the critical parameters for modeling student welfare in a college admissions simulation:

Student-Side Parameters

Parameter	Literature Value	Source
Resource-caliber correlation	0.3	Reardon et al. (ELS:2002)
Information quality (low-resource)	0.7 base	Reardon et al.
Information quality (high-resource)	0.7 + 0.1 x resources	Reardon et al.
Application count (low-resource)	4 applications	Reardon et al. (ELS:2002)
Application count (high-resource)	4 + 0.5 x resources (up to ~7)	Reardon et al. (ELS:2002)
Caliber enhancement from resources	+0.1 x resources	Test prep literature
Undermatching rate (low-income, high-achieving)	~92% income-typical behavior	Hoxby & Avery (2012)
Information intervention effect	+46% peer enrollment	Hoxby & Turner (2013)

College-Side Parameters

Parameter	Literature Value	Source
Yield estimation window	3-year running average	Reardon et al.
Admission volume adjustment	Based on prior year fill rate	Reardon et al.
Quality metric	Weighted average enrolled caliber	Reardon et al.
ED yield boost	Binding commitment ~90%+ yield	Common knowledge

System-Level Parameters

Parameter	Description	Typical Range
Stability	% of matched pairs with no blocking pair	85-95% in decentralized markets
Pareto efficiency	% of students who could improve without harming others	DA achieves ~85-90% of optimal
Undermatching rate	% of students at institutions below their caliber	20-40% depending on definition
Strategic behavior prevalence	% of students who misrepresent preferences	10-30% under non-strategy-proof mechanisms

Information Asymmetry Parameters

Student knowledge of own caliber: How accurately students assess their competitiveness (signal noise)
Student knowledge of college quality: How well students perceive fit and resources (correlated with SES)
College knowledge of student quality: Admissions offices observe noisy signals (GPA, SAT, essays) of true ability
Strategic sophistication: Proportion of students who optimize application portfolios (higher in high-SES)

Recommendations for College Simulator

1. Add Information Asymmetry Layer

The current simulator uses deterministic scoring. The literature strongly suggests adding:

Student perception noise: Students should have imperfect knowledge of their admission probability at each college, with noise inversely correlated with socioeconomic status
Application portfolio optimization: Students should choose where to apply based on perceived probability x perceived utility, not perfect knowledge
Counselor quality: High school counselor quality (varying by school type) should influence which colleges students consider

Implementation suggestion: Add a perceptionNoise parameter to each student archetype. Elite prep school students get low noise (0.05-0.1); rural/under-resourced students get high noise (0.3-0.5). This single parameter captures much of the Reardon et al. information asymmetry finding.

2. Model Undermatching Explicitly

Based on Hoxby and Avery:

Income-typical behavior: 92% of high-achieving low-income students should exhibit application patterns matching their income cohort, not their achievement cohort
Achievement-typical behavior: Only 8% of such students apply like high-achieving high-income peers
Geographic isolation: Students at rural or under-resourced high schools should have shorter college consideration lists biased toward local/state options

Implementation suggestion: When generating application lists for students from under-resourced high schools, apply a "consideration set filter" that removes colleges the student has never heard of (probability based on distance, marketing reach, and school counselor quality).

3. Track Welfare Metrics

Add post-simulation welfare analysis:

Match quality: For each student, compute the gap between their enrolled college's tier and their "optimal" placement based on academic index
Undermatching rate: Percentage of students enrolled at colleges 1+ tiers below their academic qualification
Overmatching rate: Percentage enrolled 1+ tiers above (these students face academic mismatch risk)
Welfare by demographic: Break down match quality by student archetype, high school type, hook status
Counterfactual DA comparison: Run the same student population through a centralized DA mechanism and compare aggregate welfare

4. Implement Yield Management Feedback

Colleges should adjust behavior over simulation runs:

Track acceptance rate vs. target enrollment
Adjust number of offers based on historical yield
This creates the dynamic feedback loop that Reardon et al. found drives equilibrium convergence (10-20 iterations)

5. Model Strategic Behavior Heterogeneity

Not all students are equally strategic:

Sophisticated applicants (high-SES, well-counseled): optimize application portfolios, use ED strategically, apply to safety/target/reach spread
Naive applicants (low-SES, poorly counseled): apply to too few schools, skip safeties, miss ED advantages, or apply only to local/familiar options
The Boston mechanism literature shows this heterogeneity causes the most welfare damage under non-strategy-proof mechanisms

6. Consider Adding a DA Benchmark Mode

For research validity, implement an optional mode where:

All students submit truthful preference rankings
All colleges submit preference rankings
A centralized DA algorithm produces the student-optimal stable matching
Compare this benchmark to the decentralized simulation outcome

This would allow measuring the "price of decentralization" in student welfare terms.

7. Calibration Targets

Validate the simulation against known empirical patterns:

Acceptance rate vs. yield rate correlation should match IPEDS data
Proportion of students within 1 tier of their "match" should be 60-80%
Low-SES undermatching rate should be 2-4x higher than high-SES
ED acceptance rate advantage should be 2-3x regular admission at top schools
Hook multiplier effects should produce demographic compositions matching published CDS data

References

Gale, D. & Shapley, L.S. (1962). "College Admissions and the Stability of Marriage." American Mathematical Monthly, 69(1), 9-15.
Roth, A.E. (2008). "Deferred Acceptance Algorithms: History, Theory, Practice, and Open Questions." International Journal of Game Theory, 36, 537-569.
Abdulkadiroglu, A., Pathak, P.A., & Roth, A.E. (2005). "The New York City High School Match." American Economic Review P&P, 95(2), 364-367.
Abdulkadiroglu, A., Pathak, P.A., Roth, A.E., & Sonmez, T. (2006). "Changing the Boston School Choice Mechanism." NBER Working Paper 11965.
Abdulkadiroglu, A. & Sonmez, T. (2003). "School Choice: A Mechanism Design Approach." American Economic Review, 93(3), 729-747.
Hoxby, C.M. & Avery, C. (2012). "The Missing 'One-Offs': The Hidden Supply of High-Achieving, Low-Income Students." NBER Working Paper 18586.
Hoxby, C.M. & Turner, S. (2013). "Expanding College Opportunities for High-Achieving, Low-Income Students." Stanford Institute for Economic Policy Research Discussion Paper 12-014.
Reardon, S.F., Kasman, M., Klasik, D., & Baker, R. (2016). "Agent-Based Simulation Models of the College Sorting Process." Journal of Artificial Societies and Social Simulation, 19(1), 8.
Pathak, P.A. & Sonmez, T. (2008). "Strategy-Proofness versus Efficiency in Matching with Indifferences: Redesigning the NYC High School Match." American Economic Review, 98(5), 1636-1689.
Erdil, A. & Ergin, H. (2008). "What's the Matter with Tie-Breaking? Improving Efficiency in School Choice." American Economic Review, 98(3), 669-689.
Bastedo, M.N. & Flaster, A. (2014). "Conceptual and Methodological Problems in Research on College Undermatch." Educational Researcher, 43(2), 93-99.
Assayed, S.K. & Maheshwari, P. (2023). "Agent-Based Simulation for University Students Admission: Medical Colleges in Jordan Universities."
Assayed, S.K. & Al-Sayed, S. (2025). "Student Behaviors in College Admissions: A Survey of Agent-Based Models." International Journal of Emerging Multidisciplinaries.
Kloosterman, A. (2020). "School choice with asymmetric information: Priority design and the curse of acceptance." Theoretical Economics.
Mizala, A. et al. (2026). "Bright but Poor: Undermatching in the Access to Postsecondary Education." American Educational Research Journal.