Reardon et al. (2016) — Agent-Based Simulation Models of the College Sorting Process
abm_reardon_2016_notes.md
Reardon et al. (2016) — Agent-Based Simulation Models of the College Sorting Process
Detailed Research Notes
1. Full Citation
Reardon, S.F., Kasman, M., Klasik, D., & Baker, R. (2016). Agent-Based Simulation Models of the College Sorting Process. Journal of Artificial Societies and Social Simulation, 19(1), 8.
- DOI: https://doi.org/10.18564/jasss.2993
- URL: https://www.jasss.org/19/1/8.html
- Accepted: 02-Dec-2015
- Published: 31-Jan-2016
- Working paper: Available via Stanford CEPA at https://cepa.stanford.edu/content/agent-based-simulation-models-college-sorting-process
Author Affiliations
| Author | Affiliation (at publication) |
|---|---|
| Sean F. Reardon | Stanford University, Dept. of Education & Sociology |
| Matt Kasman | Brookings Institution, Center on Social Dynamics and Policy |
| Daniel Klasik | George Washington University, Graduate School of Education |
| Rachel Baker | UC Irvine, School of Education |
Correspondence: Rachel Baker ([email protected])
2. Abstract (Summary)
The paper develops an agent-based model to explore how dynamic processes related to socioeconomic inequality operate to sort students among colleges of varying selectivity. The model simulates a stylized two-sided matching process between students and colleges through three stages (application, admission, enrollment), iterated over 30 annual cohorts. Five mechanisms linking socioeconomic background to college sorting are examined: (1) the correlation between family resources and academic achievement, (2) the ability of high-resource students to enhance their apparent caliber, (3) unequal information quality about colleges and one's own competitiveness, (4) differential valuation of college quality, and (5) the number of applications submitted. The authors find that the resources-achievement relationship explains much of student sorting by resources, but other factors also have non-trivial influences on stratification.
3. Research Question and Motivation
Central Question
Why are students from high-income families substantially more likely to attend selective colleges than low-income peers — and which specific mechanisms drive this stratification?
The paper notes that "students from families in the top income decile are 8 times more likely to enroll in top-tier colleges than students in the lowest decile."
Why ABM?
The authors argue that agent-based modeling is the right approach for several reasons:
-
Two-sided matching complexity: College enrollment results from a "complex, two-sided matching process" involving student applications, college admissions decisions, and student enrollment choices. Traditional regression approaches cannot capture the interactive, sequential nature of this process.
-
Dynamic co-evolution: Students learn from prior cohort outcomes (e.g., estimated admission probabilities), and colleges update their quality based on enrolled students. These feedback loops create emergent dynamics that static models miss.
-
Multi-step interactions: The application-admission-enrollment pipeline involves sequential decisions where each stage depends on prior outcomes. ABM naturally handles this temporal structure.
-
Mechanism isolation: ABM allows researchers to turn individual mechanisms on and off to test their relative importance — something impossible with observational data where all mechanisms operate simultaneously.
-
Counterfactual exploration: The model can test policy interventions (e.g., improving information access) that cannot be experimentally manipulated at scale in reality.
-
Emergent patterns: The college sorting distribution emerges from individual-level decisions rather than being imposed top-down. This reveals how micro-level mechanisms produce macro-level stratification.
Distinction from Prior Work
The paper distinguishes itself from prior college enrollment models by focusing on which college students attend (sorting) rather than whether they attend (access). This reframes the problem as a two-sided matching market rather than a simple decision model.
4. Model Overview (ODD-like Structure)
The paper does not use the formal ODD (Overview, Design concepts, Details) protocol explicitly, but provides an equivalent level of detail across its main text and four appendices (A through D).
4.1 Purpose
To simulate the college sorting process and assess the relative importance of five resource-linked mechanisms in producing socioeconomic stratification across colleges of varying selectivity.
4.2 Entities, State Variables, and Scales
Temporal scale: 30 annual cohorts (years), iterated sequentially. Each year has 3 stages: application, admission, enrollment.
Spatial scale: Abstract; no geographic component. All students can apply to all colleges.
Student agents (N = 8,000 per cohort):
- caliber (C): continuous, drawn from N(1000, 200) — represents composite academic quality (GPA, test scores, essay quality, ECs, talents)
- resources (R): continuous, drawn from N(0, 1) — represents composite socioeconomic capital (income, parental education, social networks, information access)
- Corr(C, R) = 0.3 at baseline (adjustable parameter r)
- apparent_caliber: C + enhancement(R) + noise — what colleges actually observe
- perceived_own_caliber: C + enhancement(R) + noise_self — student's self-estimate
- perceived_college_quality: Q + noise_info — student's estimate of each college
- num_applications: 4 + 0.5 * R (clipped to be >= 1)
- utility_function: parameterized by resources
College agents (J = 40):
- quality (Q): continuous, initialized from N(1070, 130) — average caliber of enrolled students
- seats (m): 150 per college (total seats = 6,000 for 8,000 students; ratio 4:3)
- yield_rate: estimated from prior cohorts, function of quality percentile
- Quality updates each year based on enrolled student caliber
4.3 Process Overview and Scheduling
Each annual cycle:
Year t:
1. Generate 8,000 new students (C, R drawn from bivariate normal)
2. APPLICATION STAGE
- Students observe college quality with noise
- Students estimate own caliber with noise
- Students estimate admission probability from prior 5 years
- Students select optimal application portfolio
3. ADMISSION STAGE
- Colleges observe applicant caliber with noise
- Colleges rank applicants by perceived caliber
- Colleges admit top s_c applicants (based on predicted yield)
4. ENROLLMENT STAGE
- Students enroll in highest-utility college that admitted them
5. UPDATE
- College quality: Q'_c = 0.9 * Q_c + 0.1 * mean(enrolled caliber)
- Store admission outcomes for future probability estimation
Repeat for year t+1
4.4 Design Concepts
Adaptation: Colleges adjust yield estimates based on historical enrollment data. Students estimate admission probabilities from prior 5 years of outcomes.
Learning: Indirect — students use aggregate historical admission data (not individual feedback). Colleges learn from recent yield.
Prediction: Students predict admission probability via logit regression on prior cohort data.
Stochasticity: Noise in quality perception, caliber perception, and the bivariate normal draws for C and R. 100 iterations per experimental condition to account for random variation.
Observation: Students observe college quality and own caliber imperfectly; noise magnitude inversely related to resources.
Emergence: College quality ordering, stratification patterns, admission rates, and yield rates all emerge from agent interactions.
5. Agent Types and Counts
Students
- 8,000 per cohort (new cohort each of 30 years)
- Total students simulated per run: 240,000 (though results typically focus on later cohorts after model stabilization)
- Drawn from a bivariate normal distribution of (caliber, resources)
- Caliber: N(1000, 200) — scaled to roughly match SAT-like distributions
- Resources: N(0, 1) — standardized socioeconomic index
Colleges
- 40 colleges with 150 seats each
- Total seats: 6,000 (75% of the 8,000 student cohort)
- Quality initialized from N(1070, 130) — initially slightly above mean student caliber
- Quality updates annually: Q' = 0.9Q + 0.1mean(enrolled_caliber)
- This means college quality is a weighted moving average, with 90% persistence
Key Ratio
- Student-to-seat ratio: 8,000 / 6,000 = 4:3
- This means 25% of students will not be enrolled at any college — matching the reality that not all applicants matriculate at four-year institutions
6. Student Agent Decision Rules
6.1 How Students Form College Preferences (Perceived Utility)
Students evaluate each college based on a utility function that depends on perceived college quality and the student's resources:
U*_cs = a_s + b_s * Q*_cs
Where:
- a_s = baseline utility from attending any college (can depend on resources)
- b_s = marginal utility from college quality (can depend on resources)
- Q*_cs = student s's perception of college c's quality
Perception of college quality:
Q*_cs = Q_c + u_cs
where u_cs ~ N(0, tau_s)
The noise parameter tau_s is a decreasing function of resources:
tau_s reliability = 0.7 + 0.1 * R_s (bounded between 0.5 and 0.9)
Higher-resource students perceive college quality more accurately. In the baseline model with all mechanisms active:
- a_s = -250 + d where d = -500 for the utility differential mechanism
- b_s = 1 + e where e = 0.5 for high-resource students
This means high-resource students place higher value on college quality and have a higher threshold for attending any college.
6.2 How Students Decide Where to Apply (Portfolio Selection)
This is the most technically sophisticated part of the model. Students select an optimal application portfolio maximizing total expected utility, accounting for:
- Perceived utility of each college
- Estimated probability of admission to each college
- Portfolio interactions (applying to a safety school has diminishing returns if you already have another safety)
Admission probability estimation:
P_cs = f(C*_s - Q*_cs)
Where f is a logistic function estimated from prior 5 years of admission outcomes via logit regression. The student uses their perceived own caliber minus perceived college quality as the predictor.
Perceived own caliber:
C*_s = C_s + c_s + e_s
where:
c_s = enhancement (0.1 * R_s standard deviations)
e_s ~ N(0, sigma_s)
sigma_s reliability = 0.7 + 0.1 * R_s (bounded 0.5-0.9)
High-resource students: - Have higher apparent caliber (via enhancement from test prep, essay coaches, etc.) - Know their own caliber more precisely (less noise)
Optimal portfolio algorithm (Appendix D):
Rather than evaluating all C(40, n) combinations (which for 40 colleges and ~4 applications would be 91,390 combinations), the model uses a recursive expected-utility algorithm:
For student i with application set A_i containing n colleges ordered by increasing utility (Q_1 < Q_2 < ... < Q_n):
E_i[A_i] = P_in * Q_n + (1 - P_in) * E_i[A_i \ {a_n}]
This recursion says: the expected utility of the portfolio is the probability of getting into the best college times its utility, plus the probability of NOT getting in times the expected utility of the remaining portfolio.
This reduces computation from C(J, n) to n*(J - (n-1)/2) evaluations. For 40 colleges and 4 applications: 154 calculations vs. 91,390.
The algorithm works by: 1. Ordering colleges by utility 2. For each possible "top" college, computing the expected utility of the best (n-1)-college set below it 3. Selecting the n-college set with highest total expected utility
6.3 How Students Choose Among Acceptances (Enrollment)
Simple rule: Students enroll in the admitted college with the highest perceived utility.
Enrolled_college = argmax_{c in admitted_set} U*_cs
There is no financial aid consideration, no campus visit, no yield management from the student side — just pure utility maximization over perceived quality.
7. College Agent Decision Rules
7.1 How Colleges Evaluate Applications
Colleges observe each applicant's apparent caliber with noise:
C**_cs = C_s + c_s + w_cs
where w_cs ~ N(0, phi_s)
Note: c_s is the enhancement term (already baked into what the student presents), so colleges see the enhanced caliber plus their own measurement noise. The enhancement IS visible to colleges — it represents real things like polished essays and strong ECs that wealthy students can produce.
Colleges rank all applicants by perceived caliber (C**_cs), from highest to lowest. There is no holistic review, no hook system, no demographic preferences — purely caliber-based ranking.
7.2 How Colleges Set Admission Thresholds
Colleges do NOT have a fixed threshold. Instead, they admit enough students to fill their seats, accounting for expected yield:
Admit_count_c = m / Yield_c
Where:
- m = 150 (seats)
- Yield_c = estimated yield rate for college c
Yield estimation formula:
Yield_c = 0.2 + 0.06 * (College Quality Percentile)
This means: - Lowest-quality college (percentile 0): yield = 20% - Median college (percentile 50): yield = 23% - Highest-quality college (percentile ~100): yield = 26%
So a low-quality college admits m/0.20 = 750 students for 150 seats, while a high-quality college admits m/0.26 = 577 students.
The yield formula itself updates based on actual enrollment outcomes from prior years (not just the formula above as a fixed rule — the formula represents the initial approximation, and colleges learn).
7.3 Yield Targets
Yes — colleges implicitly target filling exactly m = 150 seats. They over-admit based on expected yield and accept the stochastic outcome. There is no waitlist mechanism. If a college under-enrolls or over-enrolls in a given year, it simply adjusts yield estimates for the next cohort.
8. Key Parameters (Complete List)
Table 1: Baseline Model Parameters
| Parameter | Symbol | Baseline Value | Source/Rationale |
|---|---|---|---|
| Students per cohort | N | 8,000 | Computational choice |
| Number of colleges | J | 40 | Computational choice |
| Seats per college | m | 150 | Derived to create realistic ratios |
| Student-to-seat ratio | N/(J*m) | 4:3 | Approximates real enrollment rates |
| College quality mean | mu_Q | 1070 | Set above student caliber mean |
| College quality SD | sigma_Q | 130 | Creates realistic quality spread |
| Student caliber mean | mu_C | 1000 | Normalized scale |
| Student caliber SD | sigma_C | 200 | Creates realistic achievement spread |
| Student resources mean | mu_R | 0 | Standardized |
| Student resources SD | sigma_R | 1 | Standardized |
| Resources-caliber correlation | r | 0.3 | ELS:2002 data |
| Quality perception reliability | rho_Q | 0.7 + 0.1*R | Assumed; bounded [0.5, 0.9] |
| Own caliber perception reliability | rho_C | 0.7 + 0.1*R | Assumed; bounded [0.5, 0.9] |
| Enhancement effect | c | 0.1*R SD | SAT prep research (~25 SAT pts) |
| Number of applications | K | 4 + 0.5*R | ELS:2002 data |
| Utility intercept adjustment | d | -500 | Mechanism 4 (utility valuation) |
| Utility slope adjustment | e | 0.5 | Mechanism 4 (utility valuation) |
| College quality persistence | alpha | 0.9 | Assumed (slow quality evolution) |
| Yield rate baseline | Y_0 | 0.2 | Approximation from IPEDS |
| Yield rate slope | Y_slope | 0.06 | Approximation from IPEDS |
| Admission probability lookback | - | 5 years | Model design choice |
| Cohort iterations | T | 30 years | Allow model stabilization |
| Runs per condition | - | 100 | Reduce stochastic variation |
Table 2: Resource Pathway Parameters by Experiment
| Model | r (calib-res) | c (enhance) | rho (info) | K (apps) | d (intercept) | e (slope) |
|---|---|---|---|---|---|---|
| 1: No resources | 0 | 0 | constant | constant | 0 | 0 |
| 2: All pathways | 0.3 | 0.1*R | varies | 4+0.5R | -500 | 0.5 |
| 3: No correlation | 0 | 0.1*R | varies | 4+0.5R | -500 | 0.5 |
| 4: No enhancement | 0.3 | 0 | varies | 4+0.5R | -500 | 0.5 |
| 5: No info diff | 0.3 | 0.1*R | constant | 4+0.5R | -500 | 0.5 |
| 6: No app volume | 0.3 | 0.1*R | varies | constant | -500 | 0.5 |
| 7: No utility diff | 0.3 | 0.1*R | varies | 4+0.5R | 0 | 0 |
| 8: Only correlation | 0.3 | 0 | constant | constant | 0 | 0 |
9. Calibration Data Sources
Primary Data: Education Longitudinal Study (ELS:2002)
The ELS:2002 is a nationally representative longitudinal study from the U.S. Department of Education tracking approximately 15,000 students from 10th grade (2002) through postsecondary outcomes. It provides:
- Resources-caliber correlation (r = 0.3): Derived from the correlation between SES composite and standardized test scores in ELS:2002
- Application count by SES: Used to calibrate K = 4 + 0.5*R
- College enrollment rates by SES: Used for validation of model outputs
Enhancement Calibration
- Based on SAT preparation research showing that coaching produces roughly 0.1 SD improvement in scores
- Corresponds to approximately 25 SAT points on the old 1600-point scale
- Sources cited include work on test preparation effects and extracurricular enrichment by SES
Yield Rate Calibration
- Derived from IPEDS (Integrated Postsecondary Education Data System) data
- Shows that more selective colleges have slightly higher yield rates
- Baseline formula: Yield = 0.2 + 0.06 * quality_percentile
Validation Data: IPEDS
Model outputs were compared against IPEDS institutional data for: - The relationship between college selectivity and application volume - The relationship between college selectivity and admission rates - The relationship between college selectivity and yield rates - Enrollment patterns across the SES distribution
10. Simulation Results — Key Patterns
10.1 Baseline Realism (Model 2: All Pathways Active)
With empirically-grounded parameters, the model produces several realistic emergent patterns:
Enrollment stratification: - Students at the 90th percentile of resources: >90% college enrollment rate - Students at the 10th percentile of resources: ~55% college enrollment rate - Gap: approximately 35-40 percentage points
Selective college access: - Students at 90th percentile resources: ~20 times more likely to attend top-10% colleges than 10th percentile students - This matches the empirical finding of 8x difference in top-tier enrollment by income decile
Emergent institutional patterns: - Higher-quality colleges receive more applications (positive correlation between quality and volume) - Higher-quality colleges have lower admission rates (more selective) - Higher-quality colleges have higher yield rates (students prefer them) - These patterns match IPEDS data qualitatively
10.2 Model 1: No Resource Influence (Control)
When all five resource mechanisms are turned off (r=0, no enhancement, equal information, equal applications, equal utility): - Resources show zero correlation with enrollment outcomes - All resource levels have equal probability of attending any quality tier - Confirms that stratification is not an artifact of the model structure
10.3 Mechanism Experiments (Models 3-8)
Removing the resources-caliber correlation (Model 3): - The single most impactful mechanism - Enrollment gap (90th vs 10th percentile) drops from ~50pp to ~20pp - Selective college gap drops from ~20x to ~4x - Explains roughly half of observed stratification
Removing application enhancement (Model 4): - Effect size: ~3pp for any-college enrollment, up to 6pp for selective college enrollment - Primarily affects top-resource students' access to elite colleges - Smaller effect than caliber correlation but still meaningful
Removing information quality differential (Model 5): - Minimal effect on any-college enrollment - 2-5pp effects on selective college enrollment - Effects concentrated among middle and upper-resource students - Better information helps mid-range students target appropriate colleges
Removing application quantity differential (Model 6): - Small but observable effects at the bottom of the resource distribution - Low-resource students benefit most from submitting more applications - Effect: improved access for bottom quartile by a few percentage points
Removing utility differential (Model 7): - Negligible effects on all outcomes - Suggests that differential valuation of quality, conditional on other factors, matters little
Only resources-caliber correlation (Model 8): - Results closely resemble the full baseline (Model 2) - This means the four other mechanisms combined produce effects similar in magnitude to the achievement gap alone - Key insight: the mechanisms are not additive — there are interactions
10.4 Latin Hypercube Sensitivity Analysis
The authors used Latin Hypercube sampling to systematically vary five parameters across 10 evenly-spaced values, creating 10 random combinations. Results were analyzed via regression of outcome gaps on parameter values.
College Enrollment Gaps (Table 3 — coefficients predicting 90-10 gap):
| Mechanism | 90-10 gap | 90-50 gap | 50-10 gap |
|---|---|---|---|
| Resources-caliber (r) | 0.58*** | 0.09* | 0.49*** |
| Enhancement (c) | 0.58*** | 0.08 | 0.50*** |
| Information (rho) | 0.40*** | -0.06 | 0.46*** |
| Application volume (K) | 0.07* | 0.01 | 0.06* |
| Utility differential | -0.06 | -0.06 | 0.01 |
Three mechanisms significantly predict college enrollment gaps: caliber correlation, enhancement, and information quality. Effects are primarily driven by disadvantage at the bottom of the resource distribution (50-10 gap).
Selective College Enrollment Gaps (Table 4):
| Mechanism | 90-10 gap | 90-50 gap | 50-10 gap |
|---|---|---|---|
| Resources-caliber (r) | 0.24*** | 0.27*** | -0.03 |
| Enhancement (c) | 0.23*** | 0.27*** | -0.04 |
| Information (rho) | 0.24*** | 0.28*** | -0.04 |
| Application volume (K) | 0.01 | 0.03 | -0.02 |
| Utility differential | -0.05 | -0.07 | 0.02 |
For selective colleges, effects are concentrated at the top (90-50 gap), meaning the advantage accrues to the most resourced students.
College Quality Gaps (Table 5 — point differences):
| Mechanism | 90-10 gap | 90-50 gap | 50-10 gap |
|---|---|---|---|
| Resources-caliber (r) | 192*** | 90*** | 102*** |
| Enhancement (c) | 141*** | 82*** | 59** |
| Information (rho) | 105*** | 68** | 37* |
| Application volume (K) | 16** | 7 | 9 |
| Utility differential | 11 | 2 | 9 |
All four non-utility mechanisms significantly predict quality gaps, with effects distributed across the resource spectrum.
11. Policy Experiments
The paper does not run explicit "policy intervention" experiments (e.g., simulating free SAT prep or college counseling programs). Instead, the mechanism removal experiments serve as implicit policy tests:
- Eliminating application enhancement (simulates equalizing SAT prep / essay coaching): 3-6pp reduction in stratification
- Equalizing information (simulates universal college counseling): 2-5pp reduction in selective enrollment gaps
- Equalizing application counts (simulates fee waivers / application support): Small improvements for bottom quartile
- Equalizing utility valuation (simulates changing attitudes about college value): Negligible effect
The authors conclude that "student- or institution-level policies (such as application coaching and college information provision to students in low-income schools or encouraging affirmative action-like policies for dimensions other than race/ethnicity) could have notable impacts on how students sort into colleges."
The model was later extended in Reardon et al. (2018) — "What Levels of Racial Diversity Can Be Achieved with Socioeconomic-Based Affirmative Action?" (JPAM, 37: 630-657) — which added race as an attribute and tested SES-based affirmative action as a substitute for race-based policies.
12. Validation Method
Internal Validation
- 100 iterations per experimental condition to establish stable means and confidence intervals
- 30-year burn-in: Model runs for 30 annual cohorts; results from later cohorts used to ensure system has reached approximate equilibrium
- Sensitivity analysis via Latin Hypercube design (systematic parameter variation)
External Validation Against IPEDS
The authors validate model outputs against Integrated Postsecondary Education Data System (IPEDS) institutional data. Specifically:
- Application volume vs. selectivity: Model reproduces the positive correlation between college quality and number of applications received
- Admission rate vs. selectivity: Model reproduces the negative correlation (higher-quality colleges have lower admit rates)
- Yield vs. selectivity: Model reproduces the positive correlation (higher-quality colleges have higher yield rates)
- Enrollment patterns by SES: Model-generated enrollment rates by resource decile qualitatively match ELS:2002 patterns
Validation Caveats
- Validation is qualitative (pattern-matching) rather than quantitative (exact calibration to specific schools)
- The model is deliberately abstract — 40 generic colleges rather than real institutions
- The authors acknowledge that "the model is a stylized representation of the college sorting process" and is not intended to precisely replicate any specific admissions system
13. Limitations Explicitly Noted
The authors explicitly acknowledge several limitations:
-
No financial aid or tuition: The model ignores college costs entirely. In reality, net price is a major factor in enrollment decisions, especially for low-SES students.
-
No race/ethnicity: The 2016 model does not include race as a student attribute or affirmative action as a college policy. (This was addressed in the 2018 extension.)
-
No network effects: Students do not share information with peers, parents, or counselors. In reality, social networks strongly influence college knowledge and application behavior.
-
Rational utility maximization: Students are modeled as expected-utility maximizers. The authors note this "oversimplifies real behavior" and likely understates actual stratification, because real low-SES students face additional behavioral barriers (complexity aversion, present bias, etc.).
-
No geographic dimension: All students can apply to all colleges equally. In reality, distance is a major factor, especially for low-income students.
-
Abstract colleges: 40 generic institutions rather than real colleges with specific characteristics, programs, locations, or financial aid policies.
-
No strategic behavior by colleges: Colleges do not engage in yield management, merit aid, marketing, or recruitment targeting.
-
Simple utility function: Two-parameter linear utility in perceived quality. Real preferences involve major, location, size, culture, peer effects, etc.
-
No post-admission negotiation: No waitlists, no gap years, no transfer students.
-
Single college quality dimension: Colleges vary only in "quality" (average enrolled caliber). No specialization, program strength, or fit matching.
14. What the Model Does NOT Include
This section catalogs features present in real college admissions but absent from the Reardon model, organized by relevance to the college-sim project:
Admissions Process Features NOT Modeled
- No Early Decision (ED) / Early Action (EA) rounds: All admissions occur in a single round per year. There is no binding ED, no strategic early application, and no ED yield boost.
- No Regular Decision vs. rolling admissions distinction: Single application-admission-enrollment cycle.
- No waitlists: Colleges make a single admit/deny decision. No deferred admission.
- No hooks: No legacy, athlete, donor, first-generation, or URM preferences in admissions.
- No holistic review: Admissions is a pure caliber ranking. No consideration of extracurriculars, essays, recommendations, or interviews beyond what is captured in the single "caliber" composite.
- No demonstrated interest: Colleges do not track or reward campus visits, information sessions, or other interest signals.
- No test-optional policies: All students have a caliber score that is always visible.
Student Features NOT Modeled
- No race/ethnicity: Students have no racial identity. No affirmative action. (Added in 2018 extension.)
- No geographic location: No in-state/out-of-state distinction, no distance preference.
- No major/program preferences: Students care only about overall college quality, not specific departments.
- No family college knowledge: Beyond the generic "resources" attribute, there is no distinction between first-generation and legacy students.
- No peer effects in application: Students do not observe or mimic peers' application strategies.
- No financial constraints: Students can apply to and enroll in any college regardless of cost.
Institutional Features NOT Modeled
- No financial aid: No merit scholarships, no need-based aid, no tuition discounting.
- No enrollment management: No strategic use of merit aid to attract high-caliber students or meet revenue targets.
- No college marketing/recruitment: Colleges do not actively recruit students.
- No selectivity gaming: Colleges do not manipulate application volume or admit rates for rankings purposes.
- No capacity constraints beyond seats: No housing, faculty, or budget constraints.
- No consortium/matching agreements: No Common Application, no shared deadlines.
Market Features NOT Modeled
- No application fees: Applications are costless (students choose portfolio size based on resources, but no per-application cost).
- No Common Application effects: No mechanism for the observed increase in applications per student over time.
- No temporal trends: Parameters are fixed over the 30-year simulation (no growing inequality, changing test policies, etc.).
- No multiple college types: No distinction between research universities, liberal arts colleges, public vs. private, etc.
15. Code Availability and Replication
Original Code
The authors do not provide their original simulation code in the paper or as supplementary material. The JASSS publication and Stanford CEPA page offer only the paper PDF. No data repository, no GitHub link, no supplementary code files are provided.
Correspondence about the model is directed to Rachel Baker ([email protected]).
Third-Party Replication
A replication of the extended 2018 model (which builds on the 2016 model) exists:
- Repository: https://github.com/lbeziaud/mosaic (GPL-3.0 license)
- Author: Louis Beziaud (2022)
- Language: Python 3.9+
- Paper replicated: Reardon et al. (2018) "What Levels of Racial Diversity Can Be Achieved with Socioeconomic-Based Affirmative Action?" — JPAM 37: 630-657
- Usage:
from model import run; colleges, students, outcomes = run() - Computational requirements: ~10 GB output, ~30 GB RAM for parallel runs, ~20 minutes on AMD Ryzen 7
A companion replication repository may exist at https://github.com/lbeziaud/re-reardon2018.
An associated replication paper was published: - Allard, T., Beziaud, L., & Gambs, S. — published via INRIA HAL (hal-04328511), discussing reproducibility of the Reardon model.
Replication Feasibility
The paper provides sufficient mathematical detail (especially in Appendices A-D) for independent reimplementation. Key algorithmic challenges: - The optimal portfolio algorithm (Appendix D) requires careful implementation of the recursive expected-utility computation - The logit-based admission probability estimation requires tracking 5 years of admission outcomes - The college quality update rule creates path dependence that affects convergence
16. Comparison with Our College-Sim Model
Features Our Model Has That Reardon Lacks
| Feature | Reardon 2016 | Our college-sim |
|---|---|---|
| ED/EA/EDII/RD rounds | No (single round) | Yes (5 rounds) |
| Named real colleges | No (40 abstract) | Yes (30 real colleges) |
| Hook multipliers | No | Yes (athlete, donor, legacy, first-gen) |
| Race/demographics | No | Yes (from College Scorecard) |
| Financial aid / net cost | No | Yes (by income bracket) |
| Geographic dimension | No | Partial (high school types) |
| Waitlist mechanism | No | Yes |
| ED yield boost | No | Yes (per-college ED multipliers) |
| Holistic review | No | Yes (EC/essay + hooks + randomness) |
| D3.js visualization | No | Yes (bezier arcs, tier colors) |
| Multiple college tiers | Implicit (by quality) | Explicit (HYPSM, Ivy+, Near-Ivy, etc.) |
| Archetype-based students | No | Yes (8 archetypes per school) |
Features Reardon Has That Our Model Lacks
| Feature | Reardon 2016 | Our college-sim |
|---|---|---|
| Optimal portfolio algorithm | Yes (EU maximization) | No (utility threshold + lognormal K) |
| Dynamic college quality | Yes (0.9Q + 0.1enrolled) | No (fixed quality) |
| Learning/adaptation | Yes (yield estimation, admission prob) | No (fixed parameters) |
| Multiple cohort iteration | Yes (30 years) | No (single cohort) |
| Admission probability estimation | Yes (logit on prior data) | No (sigmoid on academic index) |
| Information asymmetry by SES | Yes (noise inversely related to resources) | No |
| Application enhancement by SES | Yes | No (EC/essay are archetype-based) |
| Rigorous sensitivity analysis | Yes (Latin Hypercube) | No |
| Validation against IPEDS | Yes | No (calibrated to published admit rates) |
| Counterfactual mechanism isolation | Yes (8 experimental conditions) | No |
Key Design Differences
-
Matching approach: Reardon uses a multi-round, market-clearing process where agents learn over time. Our model uses a single-cohort Gale-Shapley-inspired sequential round system.
-
Student decision-making: Reardon students optimize an expected-utility portfolio. Our students use a utility threshold with hook-adjusted admission probability to build lists.
-
College decision-making: Reardon colleges rank by caliber and admit based on yield prediction. Our colleges use a logistic admission score with hook multipliers and stochastic Bernoulli trials.
-
Abstraction level: Reardon is deliberately abstract (generic students and colleges) to isolate mechanisms. Our model is deliberately concrete (real colleges, real stats) for practitioner insight.
-
Primary purpose: Reardon tests sociological mechanisms driving stratification. Our model simulates individual-level admissions outcomes for educational planning.
17. Mathematical Appendix — Detailed Formulations
Appendix A: Full Model Equations
Initialization: - J colleges, each with m seats - N students per cohort - College quality: Q_j ~ N(mu_Q, sigma_Q) for j = 1, ..., J - Student caliber: C_i ~ bivariate_normal(mu_C, sigma_C, corr=r with R_i) for i = 1, ..., N - Student resources: R_i ~ N(mu_R, sigma_R)
Application Submodel:
-
Student i perceives college j quality:
Q*_ij = Q_j + u_ij, u_ij ~ N(0, tau_i) where tau_i = sigma_Q * sqrt((1 - rho_i) / rho_i) and rho_i = min(0.9, max(0.5, 0.7 + 0.1 * R_i)) -
Student i's utility from college j:
U*_ij = a_i + b_i * Q*_ij where a_i = a_0 + d * R_i (or d for all, depending on mechanism) and b_i = 1 + e * R_i (or e for all) -
Student i perceives own caliber:
C*_i = C_i + c_i + e_i where c_i = c * R_i * sigma_C (enhancement) and e_i ~ N(0, sigma_self_i) sigma_self_i = sigma_C * sqrt((1 - rho_self_i) / rho_self_i) rho_self_i = min(0.9, max(0.5, 0.7 + 0.1 * R_i)) -
Student i estimates admission probability at college j:
P_ij = logit^{-1}(beta_0 + beta_1 * (C*_i - Q*_ij))where beta_0, beta_1 are estimated from aggregated admission/rejection data from the prior 5 cohorts using logistic regression. -
Expected utility of application portfolio A_i:
EU_i(A_i) = sum over enrollment scenarios, weighted by probabilitiesSolved via the recursive algorithm in Appendix D. -
Number of applications:
K_i = max(1, round(4 + 0.5 * R_i))
Admission Submodel:
-
College j observes applicant i's caliber:
C**_ij = C_i + c_i + w_ij, w_ij ~ N(0, phi) -
College j ranks applicants by C**_ij and admits the top s_j:
s_j = m / Y_j where Y_j = estimated yield rate -
Yield estimation:
Y_j = 0.2 + 0.06 * percentile_rank(Q_j)Updated based on actual yield from prior years.
Enrollment Submodel:
- Student i enrolls in:
enrolled_j = argmax_{j in admitted_set_i} U*_ij
Quality Update:
Q_j(t+1) = 0.9 * Q_j(t) + 0.1 * mean(C_i : i enrolled at j in year t)
Appendix C: Initialization Details
The model runs 30 years to reach approximate equilibrium. Key initialization: - Year 1: Colleges start with random quality draws from N(1070, 130) - Year 1: No prior admission data, so students use crude estimates - Years 2-5: Admission probability estimation improves as data accumulates - Years 6+: Full 5-year lookback available - Analysis typically uses years 26-30 (post-stabilization)
Appendix D: Optimal Portfolio Algorithm
The recursive algorithm for finding the optimal K-college application portfolio from J colleges:
Key insight: If we order colleges by perceived utility, the expected utility of a portfolio depends only on which colleges are in it and their individual admission probabilities.
For a portfolio A = {a_1, a_2, ..., a_K} ordered by utility (U_1 < U_2 < ... < U_K):
EU(A) = P_K * U_K + (1 - P_K) * EU(A \ {a_K})
Base case: EU({a_1}) = P_1 * U_1
This recursive structure means we can evaluate any K-college subset in K steps once we know EU of all (K-1)-college subsets.
Full algorithm: 1. Compute U_ij and P_ij for all 40 colleges 2. Sort colleges by U_ij 3. For each possible "top" college k (from K to J): - Find the best (K-1)-college subset from colleges 1 to (k-1) - Compute EU of the K-college set including college k 4. Return the K-college set with highest EU
Complexity: K * (J - (K-1)/2) evaluations For K=4, J=40: 4 * (40 - 1.5) = 154 evaluations (vs. C(40,4) = 91,390 brute force)
18. Related Work and Follow-Up Papers
Direct Extensions by the Same Authors
- Reardon, Baker, Kasman, Klasik, & Townsend (2018). "What Levels of Racial Diversity Can Be Achieved with Socioeconomic-Based Affirmative Action? Evidence from a Simulation Model." Journal of Policy Analysis and Management, 37(3), 630-657.
- Adds race as a student attribute
- Tests whether SES-based admissions preferences can achieve racial diversity comparable to race-based affirmative action
-
Uses the same 2016 ABM framework
-
Baker, Klasik, & Reardon (2018). "Race and Stratification in College Enrollment Over Time." AERA Open, 4(1).
- Uses the ABM framework to study temporal trends in racial stratification
Earlier Working Paper Version
- Reardon et al. (2014). "Simulation Models of the Effects of Race- and Socioeconomic-Based Affirmative Action Policies on Elite College Enrollment Patterns." SREE conference paper (ERIC ED562944).
- Earlier version of the affirmative action extension
Third-Party Replication and Extension
- Beziaud (2022). MOSAIC: Simulating Socioeconomic Based Affirmative Action. GitHub repository. https://github.com/lbeziaud/mosaic
- Python replication of Reardon et al. (2018)
-
GPL-3.0 licensed
-
Allard, Beziaud, & Gambs. Replication study published via INRIA HAL (hal-04328511).
- Discusses reproducibility of the Reardon model
- Notes challenges in replication without original code
19. Key Takeaways for the College-Sim Project
What We Can Learn from Reardon
-
The recursive portfolio algorithm is elegant and computationally efficient. Our utility-based list building could be improved by adopting a similar expected-utility optimization framework instead of the current threshold-based approach.
-
Dynamic college quality is an interesting feature for multi-year simulations. If we ever extend college-sim to run multiple cohorts, the Q' = 0.9Q + 0.1mean(enrolled) update rule is simple and effective.
-
Information asymmetry is a powerful and realistic mechanism we don't model. Low-SES students having noisier perceptions of both their own competitiveness and college quality is well-documented and would add realism.
-
Validation methodology: Comparing model outputs to IPEDS institutional-level data (application volumes, admit rates, yield rates by selectivity) is a straightforward validation approach we could adopt.
-
Latin Hypercube sensitivity analysis is a more rigorous approach to parameter sensitivity than ad-hoc testing. Worth considering for our simulation.
What Our Model Does Better
-
Round structure: Our ED/EA/EDII/RD pipeline captures an important strategic dimension (ED yield boost, REA restrictions) that the single-round Reardon model completely misses.
-
Hooks and holistic review: Legacy, athlete, donor, and first-gen preferences are major features of real admissions that Reardon's pure-caliber ranking ignores.
-
Real colleges and data: Calibrating to actual Harvard/Yale/MIT stats rather than abstract "college quality" makes the simulation directly useful for counseling and planning.
-
Financial considerations: Our integration of Chetty yield data and net-cost-by-income adds an economic dimension entirely absent from Reardon.
-
Visualization: D3.js interactive visualization makes the simulation accessible to non-technical users — Reardon provides no interactive component.
Potential Integration Ideas
- Noise-by-SES: Add information noise inversely proportional to family income when students build college lists (could reduce list quality for low-SES archetypes)
- Enhancement by SES: Could model SAT prep effects — high-income students get +25 SAT points
- Multi-cohort mode: Add an optional mode that runs multiple years with college quality updating
- Validation: Compare our output distributions (admit rates by tier, enrollment by archetype) against IPEDS and ELS:2002 patterns
20. References Cited in the Paper
The paper cites 45 references. Key ones for the college-sim project:
- Avery, C. & Hoxby, C.M. (2004). "Do and should financial aid packages affect students' college choices?" In College Choices: The Economics of Where to Go, When to Go, and How to Pay for It.
- Bowen, W.G. & Bok, D. (1998). The Shape of the River: Long-Term Consequences of Considering Race in College and University Admissions. Princeton University Press.
- Espenshade, T.J. & Radford, A.W. (2009). No Longer Separate, Not Yet Equal: Race and Class in Elite College Admission and Campus Life. Princeton University Press.
- Hoxby, C.M. & Avery, C. (2013). "The Missing 'One-Offs': The Hidden Supply of High-Achieving, Low-Income Students." Brookings Papers on Economic Activity.
- Hoxby, C.M. & Turner, S. (2013). "Expanding College Opportunities for High-Achieving, Low Income Students." SIEPR Discussion Paper.
- Pallais, A. (2015). "Small Differences that Matter: Mistakes in Applying to College." Journal of Labor Economics, 33(2), 493-520.
- Roth, A.E. (2008). "Deferred acceptance algorithms: history, theory, practice, and open questions." International Journal of Game Theory, 36, 537-569.
- US Department of Education (2006). Education Longitudinal Study of 2002: First Follow-up. National Center for Education Statistics.
Notes compiled: March 2026 Source: https://www.jasss.org/19/1/8.html Working paper: https://cepa.stanford.edu/content/agent-based-simulation-models-college-sorting-process