ABM & College Admissions: Literature Context

abm_literature_context.md


ABM & College Admissions: Literature Context

Comprehensive literature review for calibrating and grounding the college-sim agent-based model.


1. Gale-Shapley & Matching Market Theory

1.1 Gale & Shapley (1962) — Foundational Paper

Citation: Gale, D. & Shapley, L.S. (1962). "College Admissions and the Stability of Marriage." The American Mathematical Monthly, 69(1), 9–15.

Key concepts: - Two-sided matching: students have preferences over colleges, colleges have preferences over students - Deferred acceptance (DA) algorithm: one side proposes, the other tentatively accepts or rejects; proposals cascade until stable - Stability: no student-college pair both prefer each other over their current match - The student-proposing DA yields the student-optimal stable matching; the college-proposing DA yields the college-optimal one - Result: a stable matching always exists in the college admissions problem

Relevance to college-sim: - Our simulation uses a sequential round structure (ED → EA → RD) rather than DA, reflecting real-world institutional design - Real college admissions are NOT a pure stable matching — colleges use holistic review (noisy signals), students have incomplete info, and binding ED creates strategic asymmetries - The Gale-Shapley framework is the theoretical benchmark against which to understand deviations

1.2 Roth (2008) — DA History, Theory, Practice

Citation: Roth, A.E. (2008). "Deferred Acceptance Algorithms: History, Theory, Practice, and Open Questions." International Journal of Game Theory, 36(3), 537–569.

Key insights: - DA underpins real matching markets: NRMP (medical residencies), NYC/Boston school choice - Three properties a market needs: thickness (enough participants), congestion management (handle the volume), safety (incentive-compatible — truthful reporting is optimal) - Student-proposing DA is strategy-proof for students but not for colleges - In practice, colleges' capacity constraints and strategic behavior mean pure DA doesn't describe elite admissions

Relevance: Our model captures congestion (application volume limits), thickness (20 high schools × 55 colleges), but uses stochastic holistic review rather than strict preference orderings.

1.3 Abdulkadiroğlu & Sönmez (2003) — School Choice as Mechanism Design

Citation: Abdulkadiroğlu, A. & Sönmez, T. (2003). "School Choice: A Mechanism Design Approach." American Economic Review, 93(3), 729–747.

Key contributions: - Formalized K–12 school choice as a matching problem - Showed Boston mechanism (immediate acceptance) is manipulable — families misrepresent preferences - Proposed two alternatives: student-proposing DA (stable, strategy-proof for students) and top trading cycles (Pareto efficient, strategy-proof) - Stability vs. efficiency tradeoff: no mechanism can be both stable and Pareto efficient (Roth 1982)

Relevance: The stability-efficiency tradeoff directly applies to our simulation. ED/binding commitments trade off student welfare (can't compare offers) for institutional yield certainty. Our model could be extended to test alternative mechanisms.

1.4 Stability vs. Efficiency in College Admissions

  • No mechanism is both stable and efficient (fundamental impossibility result)
  • In simulations, DA is efficient in "thick" markets where every student gets placed somewhere, but manipulation incentives appear in markets with unmatched students
  • Real college admissions add complications: financial aid packages, legacy preferences, athletic recruitment, and multiple rounds — all deviations from pure DA

2. Empirical Foundations for ABM Calibration

2.1 Chetty, Deming & Friedman (2023) — "Diversifying Society's Leaders?"

Citation: Chetty, R., Deming, D.J. & Friedman, J.N. (2023). "Diversifying Society's Leaders? The Determinants and Causal Effects of Admission to Highly Selective Private Colleges." NBER Working Paper 31492.

Dataset: 2.4 million students × 139 colleges, tax records linked to admissions data (Opportunity Insights), 2010–2015 cohorts.

Key quantitative findings:

Finding Statistic
Top-1% income kids vs. middle-class (same SAT) at Ivy+ 2× more likely to attend
Source: higher admit rates (same scores) 2/3 of the gap
Source: differential application/matriculation 1/3 of the gap
Legacy admission advantage 5–6× higher admit rate (same credentials)
Share of advantage from legacy 46%
Share from athletic recruitment 24%
Share from non-academic ratings (essays, recs) 30%
Causal effect of Ivy+ on reaching top 1% earnings +50% vs. flagship public
Causal effect on elite grad school ~2×
Causal effect on prestigious firm employment ~3×

Critical finding for simulation calibration: - The three preference factors (legacy, athlete, non-academic) are uncorrelated or negatively correlated with post-college outcomes - Academic credentials (SAT/ACT) are highly predictive of post-college success - This validates our model's use of academic index as the primary signal, with hooks as admission multipliers that don't reflect academic quality

Relevance to college-sim: Directly informs hook multipliers (legacy 5–6×, athlete preference ~24% of advantage), income-SAT correlation, and yield differences by income bracket. Our chetty_yield_by_college.json is derived from this dataset.

2.2 Arcidiacono, Kinsler & Ransom (2019/2022) — Legacy and Athlete Preferences at Harvard

Citation: Arcidiacono, P., Kinsler, J. & Ransom, T. (2022). "Legacy and Athlete Preferences at Harvard." Journal of Labor Economics, 40(1). (NBER WP 26316, 2019.)

Key quantitative findings (Harvard Classes of 2014–2019):

Category Statistic
White admits who are ALDC 43%
Non-white admits who are ALDC <16% each group
Athlete admit rate 86%
Non-ALDC admit rate <5.5%
White ALDC who'd be rejected without preference ~75%
Asian-American avg SAT advantage over white +24.9 points
Hypothetical Asian-American share (academics only) 43%

ALDC = Athletes, Legacies, Dean's interest list, Children of faculty/staff

Relevance to college-sim: These are the most granular hook multiplier estimates available. Our model's hook system (athlete 3.5×, donor 4×, legacy 2.5×, first-gen 1.4×) should be validated against these empirical rates. The 86% athlete admit rate at Harvard implies an enormous multiplier relative to the ~5.5% base rate (~15.6× raw ratio, though controlling for academic quality reduces this).

2.3 Avery & Levin (2010) — Early Admissions Signaling

Citation: Avery, C. & Levin, J. (2010). "Early Admissions at Selective Colleges." American Economic Review, 100(5), 2125–56.

Key findings: - ED/EA provides a signaling mechanism — students demonstrate genuine interest - ED advantage: 20–30 percentage points higher admit rate, equivalent to ~100 SAT points - Colleges value ED because it reduces uncertainty about yield - Strategic asymmetry: wealthier students can "afford" to commit early (less need for financial aid comparison)

Relevance: Validates our model's ED multiplier system. The 20–30pp advantage aligns with our empirical ED multiplier data (e.g., Dartmouth 3.5×, Columbia 3.4×).

2.4 Dale & Krueger (2002, 2014) — Returns to College Selectivity

Citation: Dale, S.B. & Krueger, A.B. (2002). "Estimating the Payoff to Attending a More Selective College." Quarterly Journal of Economics, 117(4), 1491–1527.

Key findings: - After controlling for where students applied (revealed ambition), attending a more selective college has zero average earnings premium - Exception: Low-income students benefit significantly (~8% earnings increase per 200-point SAT increase in college average) - Implication: selection bias explains most of the apparent selectivity premium

Relevance: Challenges simple prestige-maximizing utility functions in ABMs. Our model's student utility function should perhaps weight financial fit more heavily for low-income agents, and prestige less.

2.5 Hoxby & Avery (2013) — The Missing "One-Offs"

Citation: Hoxby, C. & Avery, C. (2013). "The Missing 'One-Offs': The Hidden Supply of High-Achieving, Low-Income Students." Brookings Papers on Economic Activity, Spring 2013, 1–65.

Key findings: - Most high-achieving low-income students never apply to selective colleges - They apply to resource-poor local institutions that would actually cost MORE (after financial aid) - Two types: "achievement-typical" (apply like high-income peers) and "income-typical" (apply only locally) - Income-typical students are geographically dispersed — not in feeder school networks - Standard recruiting (campus visits, college fairs) misses them entirely

Relevance to college-sim: Our model should capture differential application behavior by income/school type. The archetype-based application count system partially handles this. Feeder-school students apply broadly; isolated students under-apply. This is a key mechanism driving stratification in the Reardon et al. ABM.

2.6 Avery, Glickman, Hoxby & Metrick (2013) — Revealed Preference Rankings

Citation: Avery, C., Glickman, M.E., Hoxby, C. & Metrick, A. (2013). "A Revealed Preference Ranking of U.S. Colleges and Universities." Quarterly Journal of Economics, 128(1), 425–467.

Key insights: - Constructed college rankings from 3,240 students' actual enrollment choices (which offer they accepted) - Uses tournament-style statistical model (Elo-like) - Rankings align roughly with selectivity but diverge from U.S. News in interesting ways - Provides empirical student utility ordering — useful for calibrating our prestige weights

Relevance: Could inform the prestige ranking and utility calculation in buildCollegeLists().

2.7 CommonApp Annual Data (2024–2025 Season)

Source: Common Application End-of-Season Report, 2024–2025.

Metric Value
Total applicants ~1.5 million
Member institutions 1,097
Applications per applicant 6.80 (up 2% from 6.64)
YoY applicant growth +5%
Fastest-growing demographics Latinx (+15%), Black (+12%)
Top state by applicant count Texas (overtook NY, CA)
International applicants -1% (first decline since 2019–20)

Relevance: Our model uses 6.8 apps/student as the baseline — this exactly matches the 2024–25 CommonApp data. The growth trends inform archetype distribution calibration.

2.8 College Board SAT Validity Research

Source: College Board (2024). "SAT Score Relationships with College GPA." (111,899 students, 4-year tracking.)

Key findings: - SAT adds 15% more predictive power over HSGPA alone for first-year college GPA - For STEM: SAT adds 38% more predictive power than HSGPA alone - SAT predicts cumulative GPA across all 4 years, not just freshman year - Predictive validity holds across demographic subgroups

Relevance: Validates using SAT + GPA as the academic index in our admission scoring model. The higher STEM prediction aligns with potential major-specific modeling.

2.9 ACT College Readiness Benchmarks

Source: ACT Research & Policy (2017). "What Are the ACT College Readiness Benchmarks?"

  • Benchmarks represent 50% probability of earning B+ or 75% probability of C+ in corresponding college courses
  • 84% of students meeting all four benchmarks graduate within 6 years
  • Hierarchical logistic models used for institution-specific predictions

Relevance: Provides external validation for our academic index → admission probability mapping.


3. Agent-Based Models of College Admissions (2010–2025)

3.1 Reardon, Kasman, Klasik & Baker (2016) — The Key ABM Paper

Citation: Reardon, S.F., Kasman, M., Klasik, D. & Baker, R. (2016). "Agent-Based Simulation Models of the College Sorting Process." Journal of Artificial Societies and Social Simulation (JASSS), 19(1), 8.

This is the most directly relevant paper to our project.

Model Architecture

Component Detail
Student agents 8,000 per simulation run
College agents 40 institutions
Stages per year Application → Admission → Enrollment
Simulation duration 30 years to equilibrium
Runs per condition 100 (stochastic noise mitigation)

Agent Decision Rules

Students: - Characterized by "resources" (SES) and "caliber" (academic quality) - Resource-caliber correlation: r = 0.3 (from ELS:2002) - Perceived caliber: C*_s = C_s + c_s + e_s (true + enhancement + noise) - Quality reliability: 0.7 + 0.1 × resources (information advantage for wealthy) - Application portfolio: maximize E[utility] = P(admission) × utility(college quality) - Caliber distribution: N(1000, 200) — matches College Board data

Colleges: - Rank applicants by perceived caliber - Admit top-ranked to fill expected enrollment (based on historical yield) - Yield rates: initial = 0.2 + 0.06 × quality_percentile

Five SES → College Sorting Mechanisms

Mechanism Effect Size Description
Resource-caliber correlation Dominant (reduces 90-10 gap from ~50% to ~20% when removed) SES linked to academic quality
Application enhancement 3–6 pp Test prep, essay coaching boost perceived credentials
Information quality 2–5 pp Wealthy students know college quality and own caliber better
Application volume 1–2 pp More apps for wealthier students
Utility preferences Negligible Differential valuation of prestige

Key Results

  • 90th percentile resources: ~93% college enrollment
  • 10th percentile resources: ~55% college enrollment
  • 90th percentile students ~20× more likely at top-10% colleges vs. 10th percentile
  • Model output matches IPEDS data on applications/admissions/yield by tier
  • Latin Hypercube sampling used for sensitivity analysis (10 combos per 5D space)

Methodological Notes

  • Fast algorithm (Appendix D): recursive portfolio selection avoids combinatorial explosion
  • Equilibrium emergence: yield rates and application patterns stabilize after ~15–20 simulated years

Relevance to college-sim: Our model shares the same three-stage structure but adds: (1) multiple admission rounds (ED/EA/RD), (2) hook multipliers, (3) logistic admission model instead of rank-cutoff, (4) real college data instead of synthetic quality distributions. We should validate that our output patterns match theirs for comparable parameter settings.

3.2 Assayed & Maheshwari (2023) — Review of ABMs for University Admissions

Citation: Assayed, S.K. & Maheshwari, P. (2023). "A Review of Agent-based Simulation for University Students Admission." Computer Science & Engineering: An International Journal (CSEIJ), 13(2).

Survey findings: - Reviewed ABMs deployed by international admission offices - Models classified by: educational attainment level and university selection behaviors - Common platforms: NetLogo (dominant), some Python/Mesa - Parameters across models: GPA, test scores, family income, geographic proximity - Gap: most models focus on K–12 or single-country systems; few model elite U.S. admissions specifically

3.3 Assayed & Al-Sayed (2025) — Student Behaviors Survey

Citation: Assayed, S.K. & Al-Sayed, S. (2025). "Student Behaviors in College Admissions: A Survey of Agent-Based Models." Int. J. Emerging Multidisciplinaries: CS & AI.

  • Explores ABM techniques for secondary education pathways and admissions
  • Focuses on equitable practices and complex decision-making
  • Reviews behavioral models including peer effects and information asymmetries

3.4 Daemen & Leoni (2025) — Netherlands Tertiary Education ABM

Citation: Daemen & Leoni (2025). "Simulating Tertiary Educational Decision Dynamics: An Agent-Based Model for the Netherlands." Journal of Economic Interaction and Coordination.

Key features: - Models economic motivations (wages, financial constraints) + sociological/psychological (peer effects, personality, geography) - Evaluates policy impacts: student grants vs. loans on enrollment by SES - Counter-intuitive finding: greater parental emphasis on achievement doesn't consistently raise district achievement - Different institutional context (Netherlands) but similar agent architecture

3.5 Sirolly (2023) — Toy Model of College Admissions

Citation: Sirolly, A. (2023). "A Toy Model of College Admissions." Blog post.

Model setup: - 50 colleges × 100 capacity, 5,000 applicants - Applicant ability: W_i ~ N(0, 1²), noisy signal W̃_i ~ N(W_i, 0.1²) - Utility: u_i(k) = I_k^(-β) + γ(K-k) - Belief shrinkage: P_α(admit | W_i) = (1-α)P(...) + α × I_k (weight on public signal)

Application inflation mechanism: 1. Applicants become pessimistic (weight public admit rates over private info) 2. Apply to more colleges as hedging behavior 3. Colleges see lower admit rates 4. Public signal becomes more pessimistic → repeat

Relevance: Captures the application volume spiral that drives real-world trends. Our model's archetype-based application count implicitly models this, but could be extended with dynamic belief updating.


4. Structural/Equilibrium Models (Non-ABM but Relevant)

4.1 Epple, Romano & Sieg (2006) — Equilibrium in Higher Education Markets

Citation: Epple, D., Romano, R. & Sieg, H. (2006). "Admission, Tuition, and Financial Aid Policies in the Market for Higher Education." Econometrica, 74(4), 885–928.

Model features: - Equilibrium model predicting: student sorting, financial aid, educational expenditures, outcomes - Strict quality hierarchy emerges endogenously - Higher-ranked colleges: need-based aid (can attract top students) - Lower-ranked colleges: merit-based aid (must compete for good students)

Relevance: Provides theoretical backing for our college tier system and could inform financial aid modeling extensions.

4.2 Chao Fu (2014) — Equilibrium in the College Market

Citation: Fu, C. (2014). "Equilibrium Tuition, Applications, Admissions, and Enrollment in the College Market." Journal of Political Economy, 122(2), 225–281.

Key features: - Structural model: students with heterogeneous abilities/preferences, application costs, uncertainty - Colleges: observe noisy signals, set tuition + admissions cutoffs - Estimated on NLSY97 data - Joint equilibrium: tuition, apps, admissions, enrollment all endogenous

Relevance: Our model treats tuition/financial aid as exogenous (via net cost data). Fu's framework shows how these could be endogenized in future extensions.


5. Policy Simulation Work

5.1 CEPA / Reardon et al. — SES-Based Affirmative Action Simulation

Citation: Reardon, S.F., Baker, R. & Kasman, M. (2017). "Can Socioeconomic Status Substitute for Race in Affirmative Action College Admissions Policies?" CEPA Working Paper 15-04.

Key findings: - Neither SES-based affirmative action nor race-targeted recruiting alone matches diversity of race-based affirmative action - Combined SES + race-targeted recruiting can achieve comparable diversity - Three policy levers with largest effects: 1. Reducing credential enhancement inequality (test prep gap) 2. Improving information quality for low-resource students 3. Subsidizing application volume for low-income students

Relevance: Our model could run these policy counterfactuals. The three mechanisms map directly to parameters in our student generation and application decision logic.

5.2 SFFA v. Harvard — Simulation Evidence in Litigation

Key simulation results from trial evidence: - Simulation D (removing race + ALDC preferences): African-American representation drops from 14% → 5% - Without race-conscious admissions: African-American admits fall ~7pp, Hispanic ~4pp - Asian-American admits increase ~3pp, white admits increase 6–8pp - These simulations used Harvard's own admissions model with parameter modifications

Relevance: Demonstrates the real-world stakes of ABM-calibrated college admissions models. Our simulation could replicate these counterfactuals.


6. Available ABM Code & Platforms

6.1 NetLogo Models

Model Source Focus
School_Choice_ABM NetLogo Community Models Library Chilean school choice with information signals
Medical College Admission (Jordan) Assayed & Maheshwari, NetLogo 6.3 Income + GPA → medical school
Matching mechanisms comparison COMSES.net (codebase 4407) Serial dictatorship, Boston, Chinese Parallel
School choice with information asymmetries Academia.edu / ResearchGate Santiago schools, income-based info gaps

6.2 Python / Mesa

  • Mesa 3 (2025): Modern Python ABM framework, could be used for a Python port of college-sim
  • No publicly available Mesa model specifically for U.S. elite college admissions found
  • Our JS-based simulation is unique in combining: real college data, multiple admission rounds, hook multipliers, and D3 visualization in a single self-contained file

6.3 Reardon et al. Code

  • The JASSS 2016 paper references code but it does not appear to be publicly available in a standard repository
  • Their fast portfolio optimization algorithm (Appendix D) is described in sufficient detail to reimplement

7. Key Parameters for Simulation Calibration — Cross-Study Summary

Parameter Value Source
Resource-caliber (SES-SAT) correlation r = 0.3 Reardon et al. 2016 / ELS:2002
Avg applications per student 6.8 CommonApp 2024–25
SAT income gap (bottom vs. top quintile) ~206 points College Board / our sat_by_income.json
Legacy admit advantage 5–6× (same credentials) Chetty et al. 2023
Athlete admit rate (Harvard) 86% vs. 5.5% base Arcidiacono et al. 2022
ALDC share of white admits (Harvard) 43% Arcidiacono et al. 2022
ED admit advantage +20–30 pp (~100 SAT equiv.) Avery & Levin 2010
Top-1% → Ivy+ attendance (same SAT) 2× vs. middle class Chetty et al. 2023
Admit rate → perception → app volume feedback α ∈ [0,1] shrinkage Sirolly 2023
Information quality advantage (SES) 0.7 + 0.1 × resources Reardon et al. 2016
College quality hierarchy Strict ordering emerges Epple et al. 2006

8. Gaps in Literature & Opportunities for College-Sim

  1. No existing ABM combines real institutional data with multiple admission rounds and hook multipliers — our model fills this gap
  2. Post-SFFA simulation: most ABMs predate the 2023 ruling; our model can simulate race-neutral alternatives
  3. Application volume dynamics: the feedback loop (lower rates → more apps → lower rates) is described theoretically but rarely modeled with real college parameters
  4. Financial aid as a strategic variable: Epple et al. model it theoretically but no ABM integrates Chetty net-cost-by-income data
  5. Waitlist dynamics: rarely modeled in ABMs despite being a real mechanism; our model includes waitlist processing
  6. Geographic/feeder school networks: Hoxby & Avery's "missing one-offs" suggest network effects in application behavior that could be added to our high school archetypes