Reardon et al. (2016) — Agent-Based Simulation Models of the College Sorting Process

abm_reardon_2016_notes.md

Reardon et al. (2016) — Agent-Based Simulation Models of the College Sorting Process

Detailed Research Notes

1. Full Citation

Reardon, S.F., Kasman, M., Klasik, D., & Baker, R. (2016). Agent-Based Simulation Models of the College Sorting Process. Journal of Artificial Societies and Social Simulation, 19(1), 8.

DOI: https://doi.org/10.18564/jasss.2993
URL: https://www.jasss.org/19/1/8.html
Accepted: 02-Dec-2015
Published: 31-Jan-2016
Working paper: Available via Stanford CEPA at https://cepa.stanford.edu/content/agent-based-simulation-models-college-sorting-process

Author Affiliations

Author	Affiliation (at publication)
Sean F. Reardon	Stanford University, Dept. of Education & Sociology
Matt Kasman	Brookings Institution, Center on Social Dynamics and Policy
Daniel Klasik	George Washington University, Graduate School of Education
Rachel Baker	UC Irvine, School of Education

Correspondence: Rachel Baker ([email protected])

2. Abstract (Summary)

The paper develops an agent-based model to explore how dynamic processes related to socioeconomic inequality operate to sort students among colleges of varying selectivity. The model simulates a stylized two-sided matching process between students and colleges through three stages (application, admission, enrollment), iterated over 30 annual cohorts. Five mechanisms linking socioeconomic background to college sorting are examined: (1) the correlation between family resources and academic achievement, (2) the ability of high-resource students to enhance their apparent caliber, (3) unequal information quality about colleges and one's own competitiveness, (4) differential valuation of college quality, and (5) the number of applications submitted. The authors find that the resources-achievement relationship explains much of student sorting by resources, but other factors also have non-trivial influences on stratification.

3. Research Question and Motivation

Central Question

Why are students from high-income families substantially more likely to attend selective colleges than low-income peers — and which specific mechanisms drive this stratification?

The paper notes that "students from families in the top income decile are 8 times more likely to enroll in top-tier colleges than students in the lowest decile."

Why ABM?

The authors argue that agent-based modeling is the right approach for several reasons:

Two-sided matching complexity: College enrollment results from a "complex, two-sided matching process" involving student applications, college admissions decisions, and student enrollment choices. Traditional regression approaches cannot capture the interactive, sequential nature of this process.
Dynamic co-evolution: Students learn from prior cohort outcomes (e.g., estimated admission probabilities), and colleges update their quality based on enrolled students. These feedback loops create emergent dynamics that static models miss.
Multi-step interactions: The application-admission-enrollment pipeline involves sequential decisions where each stage depends on prior outcomes. ABM naturally handles this temporal structure.
Mechanism isolation: ABM allows researchers to turn individual mechanisms on and off to test their relative importance — something impossible with observational data where all mechanisms operate simultaneously.
Counterfactual exploration: The model can test policy interventions (e.g., improving information access) that cannot be experimentally manipulated at scale in reality.
Emergent patterns: The college sorting distribution emerges from individual-level decisions rather than being imposed top-down. This reveals how micro-level mechanisms produce macro-level stratification.

Distinction from Prior Work

The paper distinguishes itself from prior college enrollment models by focusing on which college students attend (sorting) rather than whether they attend (access). This reframes the problem as a two-sided matching market rather than a simple decision model.

4. Model Overview (ODD-like Structure)

The paper does not use the formal ODD (Overview, Design concepts, Details) protocol explicitly, but provides an equivalent level of detail across its main text and four appendices (A through D).

4.1 Purpose

To simulate the college sorting process and assess the relative importance of five resource-linked mechanisms in producing socioeconomic stratification across colleges of varying selectivity.

4.2 Entities, State Variables, and Scales

Temporal scale: 30 annual cohorts (years), iterated sequentially. Each year has 3 stages: application, admission, enrollment.

Spatial scale: Abstract; no geographic component. All students can apply to all colleges.

Student agents (N = 8,000 per cohort): - caliber (C): continuous, drawn from N(1000, 200) — represents composite academic quality (GPA, test scores, essay quality, ECs, talents) - resources (R): continuous, drawn from N(0, 1) — represents composite socioeconomic capital (income, parental education, social networks, information access) - Corr(C, R) = 0.3 at baseline (adjustable parameter r) - apparent_caliber: C + enhancement(R) + noise — what colleges actually observe - perceived_own_caliber: C + enhancement(R) + noise_self — student's self-estimate - perceived_college_quality: Q + noise_info — student's estimate of each college - num_applications: 4 + 0.5 * R (clipped to be >= 1) - utility_function: parameterized by resources

College agents (J = 40): - quality (Q): continuous, initialized from N(1070, 130) — average caliber of enrolled students - seats (m): 150 per college (total seats = 6,000 for 8,000 students; ratio 4:3) - yield_rate: estimated from prior cohorts, function of quality percentile - Quality updates each year based on enrolled student caliber

4.3 Process Overview and Scheduling

Each annual cycle:

Year t:
  1. Generate 8,000 new students (C, R drawn from bivariate normal)
  2. APPLICATION STAGE
     - Students observe college quality with noise
     - Students estimate own caliber with noise
     - Students estimate admission probability from prior 5 years
     - Students select optimal application portfolio
  3. ADMISSION STAGE
     - Colleges observe applicant caliber with noise
     - Colleges rank applicants by perceived caliber
     - Colleges admit top s_c applicants (based on predicted yield)
  4. ENROLLMENT STAGE
     - Students enroll in highest-utility college that admitted them
  5. UPDATE
     - College quality: Q'_c = 0.9 * Q_c + 0.1 * mean(enrolled caliber)
     - Store admission outcomes for future probability estimation
  Repeat for year t+1

4.4 Design Concepts

Adaptation: Colleges adjust yield estimates based on historical enrollment data. Students estimate admission probabilities from prior 5 years of outcomes.

Learning: Indirect — students use aggregate historical admission data (not individual feedback). Colleges learn from recent yield.

Prediction: Students predict admission probability via logit regression on prior cohort data.

Stochasticity: Noise in quality perception, caliber perception, and the bivariate normal draws for C and R. 100 iterations per experimental condition to account for random variation.

Observation: Students observe college quality and own caliber imperfectly; noise magnitude inversely related to resources.

Emergence: College quality ordering, stratification patterns, admission rates, and yield rates all emerge from agent interactions.

5. Agent Types and Counts

Students

8,000 per cohort (new cohort each of 30 years)
Total students simulated per run: 240,000 (though results typically focus on later cohorts after model stabilization)
Drawn from a bivariate normal distribution of (caliber, resources)
Caliber: N(1000, 200) — scaled to roughly match SAT-like distributions
Resources: N(0, 1) — standardized socioeconomic index

Colleges

40 colleges with 150 seats each
Total seats: 6,000 (75% of the 8,000 student cohort)
Quality initialized from N(1070, 130) — initially slightly above mean student caliber
Quality updates annually: Q' = 0.9Q + 0.1mean(enrolled_caliber)
This means college quality is a weighted moving average, with 90% persistence

Key Ratio

Student-to-seat ratio: 8,000 / 6,000 = 4:3
This means 25% of students will not be enrolled at any college — matching the reality that not all applicants matriculate at four-year institutions

6. Student Agent Decision Rules

6.1 How Students Form College Preferences (Perceived Utility)

Students evaluate each college based on a utility function that depends on perceived college quality and the student's resources:

U*_cs = a_s + b_s * Q*_cs

Where: - a_s = baseline utility from attending any college (can depend on resources) - b_s = marginal utility from college quality (can depend on resources) - Q*_cs = student s's perception of college c's quality

Perception of college quality:

Q*_cs = Q_c + u_cs
where u_cs ~ N(0, tau_s)

The noise parameter tau_s is a decreasing function of resources:

tau_s reliability = 0.7 + 0.1 * R_s    (bounded between 0.5 and 0.9)

Higher-resource students perceive college quality more accurately. In the baseline model with all mechanisms active: - a_s = -250 + d where d = -500 for the utility differential mechanism - b_s = 1 + e where e = 0.5 for high-resource students

This means high-resource students place higher value on college quality and have a higher threshold for attending any college.

6.2 How Students Decide Where to Apply (Portfolio Selection)

This is the most technically sophisticated part of the model. Students select an optimal application portfolio maximizing total expected utility, accounting for:

Perceived utility of each college
Estimated probability of admission to each college
Portfolio interactions (applying to a safety school has diminishing returns if you already have another safety)

Admission probability estimation:

P_cs = f(C*_s - Q*_cs)

Where f is a logistic function estimated from prior 5 years of admission outcomes via logit regression. The student uses their perceived own caliber minus perceived college quality as the predictor.

Perceived own caliber:

C*_s = C_s + c_s + e_s
where:
  c_s = enhancement (0.1 * R_s standard deviations)
  e_s ~ N(0, sigma_s)
  sigma_s reliability = 0.7 + 0.1 * R_s  (bounded 0.5-0.9)

High-resource students: - Have higher apparent caliber (via enhancement from test prep, essay coaches, etc.) - Know their own caliber more precisely (less noise)

Optimal portfolio algorithm (Appendix D):

Rather than evaluating all C(40, n) combinations (which for 40 colleges and ~4 applications would be 91,390 combinations), the model uses a recursive expected-utility algorithm:

For student i with application set A_i containing n colleges ordered by increasing utility (Q_1 < Q_2 < ... < Q_n):

E_i[A_i] = P_in * Q_n + (1 - P_in) * E_i[A_i \ {a_n}]

This recursion says: the expected utility of the portfolio is the probability of getting into the best college times its utility, plus the probability of NOT getting in times the expected utility of the remaining portfolio.

This reduces computation from C(J, n) to n*(J - (n-1)/2) evaluations. For 40 colleges and 4 applications: 154 calculations vs. 91,390.

The algorithm works by: 1. Ordering colleges by utility 2. For each possible "top" college, computing the expected utility of the best (n-1)-college set below it 3. Selecting the n-college set with highest total expected utility

6.3 How Students Choose Among Acceptances (Enrollment)

Simple rule: Students enroll in the admitted college with the highest perceived utility.

Enrolled_college = argmax_{c in admitted_set} U*_cs

There is no financial aid consideration, no campus visit, no yield management from the student side — just pure utility maximization over perceived quality.

7. College Agent Decision Rules

7.1 How Colleges Evaluate Applications

Colleges observe each applicant's apparent caliber with noise:

C**_cs = C_s + c_s + w_cs
where w_cs ~ N(0, phi_s)

Note: c_s is the enhancement term (already baked into what the student presents), so colleges see the enhanced caliber plus their own measurement noise. The enhancement IS visible to colleges — it represents real things like polished essays and strong ECs that wealthy students can produce.

Colleges rank all applicants by perceived caliber (C**_cs), from highest to lowest. There is no holistic review, no hook system, no demographic preferences — purely caliber-based ranking.

7.2 How Colleges Set Admission Thresholds

Colleges do NOT have a fixed threshold. Instead, they admit enough students to fill their seats, accounting for expected yield:

Admit_count_c = m / Yield_c

Where: - m = 150 (seats) - Yield_c = estimated yield rate for college c

Yield estimation formula:

Yield_c = 0.2 + 0.06 * (College Quality Percentile)

This means: - Lowest-quality college (percentile 0): yield = 20% - Median college (percentile 50): yield = 23% - Highest-quality college (percentile ~100): yield = 26%

So a low-quality college admits m/0.20 = 750 students for 150 seats, while a high-quality college admits m/0.26 = 577 students.

The yield formula itself updates based on actual enrollment outcomes from prior years (not just the formula above as a fixed rule — the formula represents the initial approximation, and colleges learn).

7.3 Yield Targets

Yes — colleges implicitly target filling exactly m = 150 seats. They over-admit based on expected yield and accept the stochastic outcome. There is no waitlist mechanism. If a college under-enrolls or over-enrolls in a given year, it simply adjusts yield estimates for the next cohort.

8. Key Parameters (Complete List)

Table 1: Baseline Model Parameters

Parameter	Symbol	Baseline Value	Source/Rationale
Students per cohort	N	8,000	Computational choice
Number of colleges	J	40	Computational choice
Seats per college	m	150	Derived to create realistic ratios
Student-to-seat ratio	N/(J*m)	4:3	Approximates real enrollment rates
College quality mean	mu_Q	1070	Set above student caliber mean
College quality SD	sigma_Q	130	Creates realistic quality spread
Student caliber mean	mu_C	1000	Normalized scale
Student caliber SD	sigma_C	200	Creates realistic achievement spread
Student resources mean	mu_R	0	Standardized
Student resources SD	sigma_R	1	Standardized
Resources-caliber correlation	r	0.3	ELS:2002 data
Quality perception reliability	rho_Q	0.7 + 0.1*R	Assumed; bounded [0.5, 0.9]
Own caliber perception reliability	rho_C	0.7 + 0.1*R	Assumed; bounded [0.5, 0.9]
Enhancement effect	c	0.1*R SD	SAT prep research (~25 SAT pts)
Number of applications	K	4 + 0.5*R	ELS:2002 data
Utility intercept adjustment	d	-500	Mechanism 4 (utility valuation)
Utility slope adjustment	e	0.5	Mechanism 4 (utility valuation)
College quality persistence	alpha	0.9	Assumed (slow quality evolution)
Yield rate baseline	Y_0	0.2	Approximation from IPEDS
Yield rate slope	Y_slope	0.06	Approximation from IPEDS
Admission probability lookback	-	5 years	Model design choice
Cohort iterations	T	30 years	Allow model stabilization
Runs per condition	-	100	Reduce stochastic variation

Table 2: Resource Pathway Parameters by Experiment

Model	r (calib-res)	c (enhance)	rho (info)	K (apps)	d (intercept)	e (slope)
1: No resources	0	0	constant	constant	0	0
2: All pathways	0.3	0.1*R	varies	4+0.5R	-500	0.5
3: No correlation	0	0.1*R	varies	4+0.5R	-500	0.5
4: No enhancement	0.3	0	varies	4+0.5R	-500	0.5
5: No info diff	0.3	0.1*R	constant	4+0.5R	-500	0.5
6: No app volume	0.3	0.1*R	varies	constant	-500	0.5
7: No utility diff	0.3	0.1*R	varies	4+0.5R	0	0
8: Only correlation	0.3	0	constant	constant	0	0

9. Calibration Data Sources

Primary Data: Education Longitudinal Study (ELS:2002)

The ELS:2002 is a nationally representative longitudinal study from the U.S. Department of Education tracking approximately 15,000 students from 10th grade (2002) through postsecondary outcomes. It provides:

Resources-caliber correlation (r = 0.3): Derived from the correlation between SES composite and standardized test scores in ELS:2002
Application count by SES: Used to calibrate K = 4 + 0.5*R
College enrollment rates by SES: Used for validation of model outputs

Enhancement Calibration

Based on SAT preparation research showing that coaching produces roughly 0.1 SD improvement in scores
Corresponds to approximately 25 SAT points on the old 1600-point scale
Sources cited include work on test preparation effects and extracurricular enrichment by SES

Yield Rate Calibration

Derived from IPEDS (Integrated Postsecondary Education Data System) data
Shows that more selective colleges have slightly higher yield rates
Baseline formula: Yield = 0.2 + 0.06 * quality_percentile

Validation Data: IPEDS

Model outputs were compared against IPEDS institutional data for: - The relationship between college selectivity and application volume - The relationship between college selectivity and admission rates - The relationship between college selectivity and yield rates - Enrollment patterns across the SES distribution

10. Simulation Results — Key Patterns

10.1 Baseline Realism (Model 2: All Pathways Active)

With empirically-grounded parameters, the model produces several realistic emergent patterns:

Enrollment stratification: - Students at the 90th percentile of resources: >90% college enrollment rate - Students at the 10th percentile of resources: ~55% college enrollment rate - Gap: approximately 35-40 percentage points

Selective college access: - Students at 90th percentile resources: ~20 times more likely to attend top-10% colleges than 10th percentile students - This matches the empirical finding of 8x difference in top-tier enrollment by income decile

Emergent institutional patterns: - Higher-quality colleges receive more applications (positive correlation between quality and volume) - Higher-quality colleges have lower admission rates (more selective) - Higher-quality colleges have higher yield rates (students prefer them) - These patterns match IPEDS data qualitatively

10.2 Model 1: No Resource Influence (Control)

When all five resource mechanisms are turned off (r=0, no enhancement, equal information, equal applications, equal utility): - Resources show zero correlation with enrollment outcomes - All resource levels have equal probability of attending any quality tier - Confirms that stratification is not an artifact of the model structure

10.3 Mechanism Experiments (Models 3-8)

Removing the resources-caliber correlation (Model 3): - The single most impactful mechanism - Enrollment gap (90th vs 10th percentile) drops from ~50pp to ~20pp - Selective college gap drops from ~20x to ~4x - Explains roughly half of observed stratification

Removing application enhancement (Model 4): - Effect size: ~3pp for any-college enrollment, up to 6pp for selective college enrollment - Primarily affects top-resource students' access to elite colleges - Smaller effect than caliber correlation but still meaningful

Removing information quality differential (Model 5): - Minimal effect on any-college enrollment - 2-5pp effects on selective college enrollment - Effects concentrated among middle and upper-resource students - Better information helps mid-range students target appropriate colleges

Removing application quantity differential (Model 6): - Small but observable effects at the bottom of the resource distribution - Low-resource students benefit most from submitting more applications - Effect: improved access for bottom quartile by a few percentage points

Removing utility differential (Model 7): - Negligible effects on all outcomes - Suggests that differential valuation of quality, conditional on other factors, matters little

Only resources-caliber correlation (Model 8): - Results closely resemble the full baseline (Model 2) - This means the four other mechanisms combined produce effects similar in magnitude to the achievement gap alone - Key insight: the mechanisms are not additive — there are interactions

10.4 Latin Hypercube Sensitivity Analysis

The authors used Latin Hypercube sampling to systematically vary five parameters across 10 evenly-spaced values, creating 10 random combinations. Results were analyzed via regression of outcome gaps on parameter values.

College Enrollment Gaps (Table 3 — coefficients predicting 90-10 gap):

Mechanism	90-10 gap	90-50 gap	50-10 gap
Resources-caliber (r)	0.58***	0.09*	0.49***
Enhancement (c)	0.58***	0.08	0.50***
Information (rho)	0.40***	-0.06	0.46***
Application volume (K)	0.07*	0.01	0.06*
Utility differential	-0.06	-0.06	0.01

Three mechanisms significantly predict college enrollment gaps: caliber correlation, enhancement, and information quality. Effects are primarily driven by disadvantage at the bottom of the resource distribution (50-10 gap).

Selective College Enrollment Gaps (Table 4):

Mechanism	90-10 gap	90-50 gap	50-10 gap
Resources-caliber (r)	0.24***	0.27***	-0.03
Enhancement (c)	0.23***	0.27***	-0.04
Information (rho)	0.24***	0.28***	-0.04
Application volume (K)	0.01	0.03	-0.02
Utility differential	-0.05	-0.07	0.02

For selective colleges, effects are concentrated at the top (90-50 gap), meaning the advantage accrues to the most resourced students.

College Quality Gaps (Table 5 — point differences):

Mechanism	90-10 gap	90-50 gap	50-10 gap
Resources-caliber (r)	192***	90***	102***
Enhancement (c)	141***	82***	59**
Information (rho)	105***	68**	37*
Application volume (K)	16**	7	9
Utility differential	11	2	9

All four non-utility mechanisms significantly predict quality gaps, with effects distributed across the resource spectrum.

11. Policy Experiments

The paper does not run explicit "policy intervention" experiments (e.g., simulating free SAT prep or college counseling programs). Instead, the mechanism removal experiments serve as implicit policy tests:

Eliminating application enhancement (simulates equalizing SAT prep / essay coaching): 3-6pp reduction in stratification
Equalizing information (simulates universal college counseling): 2-5pp reduction in selective enrollment gaps
Equalizing application counts (simulates fee waivers / application support): Small improvements for bottom quartile
Equalizing utility valuation (simulates changing attitudes about college value): Negligible effect

The authors conclude that "student- or institution-level policies (such as application coaching and college information provision to students in low-income schools or encouraging affirmative action-like policies for dimensions other than race/ethnicity) could have notable impacts on how students sort into colleges."

The model was later extended in Reardon et al. (2018) — "What Levels of Racial Diversity Can Be Achieved with Socioeconomic-Based Affirmative Action?" (JPAM, 37: 630-657) — which added race as an attribute and tested SES-based affirmative action as a substitute for race-based policies.

12. Validation Method

Internal Validation

100 iterations per experimental condition to establish stable means and confidence intervals
30-year burn-in: Model runs for 30 annual cohorts; results from later cohorts used to ensure system has reached approximate equilibrium
Sensitivity analysis via Latin Hypercube design (systematic parameter variation)

External Validation Against IPEDS

The authors validate model outputs against Integrated Postsecondary Education Data System (IPEDS) institutional data. Specifically:

Application volume vs. selectivity: Model reproduces the positive correlation between college quality and number of applications received
Admission rate vs. selectivity: Model reproduces the negative correlation (higher-quality colleges have lower admit rates)
Yield vs. selectivity: Model reproduces the positive correlation (higher-quality colleges have higher yield rates)
Enrollment patterns by SES: Model-generated enrollment rates by resource decile qualitatively match ELS:2002 patterns

Validation Caveats

Validation is qualitative (pattern-matching) rather than quantitative (exact calibration to specific schools)
The model is deliberately abstract — 40 generic colleges rather than real institutions
The authors acknowledge that "the model is a stylized representation of the college sorting process" and is not intended to precisely replicate any specific admissions system

13. Limitations Explicitly Noted

The authors explicitly acknowledge several limitations:

No financial aid or tuition: The model ignores college costs entirely. In reality, net price is a major factor in enrollment decisions, especially for low-SES students.
No race/ethnicity: The 2016 model does not include race as a student attribute or affirmative action as a college policy. (This was addressed in the 2018 extension.)
No network effects: Students do not share information with peers, parents, or counselors. In reality, social networks strongly influence college knowledge and application behavior.
Rational utility maximization: Students are modeled as expected-utility maximizers. The authors note this "oversimplifies real behavior" and likely understates actual stratification, because real low-SES students face additional behavioral barriers (complexity aversion, present bias, etc.).
No geographic dimension: All students can apply to all colleges equally. In reality, distance is a major factor, especially for low-income students.
Abstract colleges: 40 generic institutions rather than real colleges with specific characteristics, programs, locations, or financial aid policies.
No strategic behavior by colleges: Colleges do not engage in yield management, merit aid, marketing, or recruitment targeting.
Simple utility function: Two-parameter linear utility in perceived quality. Real preferences involve major, location, size, culture, peer effects, etc.
No post-admission negotiation: No waitlists, no gap years, no transfer students.
Single college quality dimension: Colleges vary only in "quality" (average enrolled caliber). No specialization, program strength, or fit matching.

14. What the Model Does NOT Include

This section catalogs features present in real college admissions but absent from the Reardon model, organized by relevance to the college-sim project:

Admissions Process Features NOT Modeled

No Early Decision (ED) / Early Action (EA) rounds: All admissions occur in a single round per year. There is no binding ED, no strategic early application, and no ED yield boost.
No Regular Decision vs. rolling admissions distinction: Single application-admission-enrollment cycle.
No waitlists: Colleges make a single admit/deny decision. No deferred admission.
No hooks: No legacy, athlete, donor, first-generation, or URM preferences in admissions.
No holistic review: Admissions is a pure caliber ranking. No consideration of extracurriculars, essays, recommendations, or interviews beyond what is captured in the single "caliber" composite.
No demonstrated interest: Colleges do not track or reward campus visits, information sessions, or other interest signals.
No test-optional policies: All students have a caliber score that is always visible.

Student Features NOT Modeled

No race/ethnicity: Students have no racial identity. No affirmative action. (Added in 2018 extension.)
No geographic location: No in-state/out-of-state distinction, no distance preference.
No major/program preferences: Students care only about overall college quality, not specific departments.
No family college knowledge: Beyond the generic "resources" attribute, there is no distinction between first-generation and legacy students.
No peer effects in application: Students do not observe or mimic peers' application strategies.
No financial constraints: Students can apply to and enroll in any college regardless of cost.

Institutional Features NOT Modeled

No financial aid: No merit scholarships, no need-based aid, no tuition discounting.
No enrollment management: No strategic use of merit aid to attract high-caliber students or meet revenue targets.
No college marketing/recruitment: Colleges do not actively recruit students.
No selectivity gaming: Colleges do not manipulate application volume or admit rates for rankings purposes.
No capacity constraints beyond seats: No housing, faculty, or budget constraints.
No consortium/matching agreements: No Common Application, no shared deadlines.

Market Features NOT Modeled

No application fees: Applications are costless (students choose portfolio size based on resources, but no per-application cost).
No Common Application effects: No mechanism for the observed increase in applications per student over time.
No temporal trends: Parameters are fixed over the 30-year simulation (no growing inequality, changing test policies, etc.).
No multiple college types: No distinction between research universities, liberal arts colleges, public vs. private, etc.

15. Code Availability and Replication

Original Code

The authors do not provide their original simulation code in the paper or as supplementary material. The JASSS publication and Stanford CEPA page offer only the paper PDF. No data repository, no GitHub link, no supplementary code files are provided.

Correspondence about the model is directed to Rachel Baker ([email protected]).

Third-Party Replication

A replication of the extended 2018 model (which builds on the 2016 model) exists:

Repository: https://github.com/lbeziaud/mosaic (GPL-3.0 license)
Author: Louis Beziaud (2022)
Language: Python 3.9+
Paper replicated: Reardon et al. (2018) "What Levels of Racial Diversity Can Be Achieved with Socioeconomic-Based Affirmative Action?" — JPAM 37: 630-657
Usage: from model import run; colleges, students, outcomes = run()
Computational requirements: ~10 GB output, ~30 GB RAM for parallel runs, ~20 minutes on AMD Ryzen 7

A companion replication repository may exist at https://github.com/lbeziaud/re-reardon2018.

An associated replication paper was published: - Allard, T., Beziaud, L., & Gambs, S. — published via INRIA HAL (hal-04328511), discussing reproducibility of the Reardon model.

Replication Feasibility

The paper provides sufficient mathematical detail (especially in Appendices A-D) for independent reimplementation. Key algorithmic challenges: - The optimal portfolio algorithm (Appendix D) requires careful implementation of the recursive expected-utility computation - The logit-based admission probability estimation requires tracking 5 years of admission outcomes - The college quality update rule creates path dependence that affects convergence

16. Comparison with Our College-Sim Model

Features Our Model Has That Reardon Lacks

Feature	Reardon 2016	Our college-sim
ED/EA/EDII/RD rounds	No (single round)	Yes (5 rounds)
Named real colleges	No (40 abstract)	Yes (30 real colleges)
Hook multipliers	No	Yes (athlete, donor, legacy, first-gen)
Race/demographics	No	Yes (from College Scorecard)
Financial aid / net cost	No	Yes (by income bracket)
Geographic dimension	No	Partial (high school types)
Waitlist mechanism	No	Yes
ED yield boost	No	Yes (per-college ED multipliers)
Holistic review	No	Yes (EC/essay + hooks + randomness)
D3.js visualization	No	Yes (bezier arcs, tier colors)
Multiple college tiers	Implicit (by quality)	Explicit (HYPSM, Ivy+, Near-Ivy, etc.)
Archetype-based students	No	Yes (8 archetypes per school)

Features Reardon Has That Our Model Lacks

Feature	Reardon 2016	Our college-sim
Optimal portfolio algorithm	Yes (EU maximization)	No (utility threshold + lognormal K)
Dynamic college quality	Yes (0.9Q + 0.1enrolled)	No (fixed quality)
Learning/adaptation	Yes (yield estimation, admission prob)	No (fixed parameters)
Multiple cohort iteration	Yes (30 years)	No (single cohort)
Admission probability estimation	Yes (logit on prior data)	No (sigmoid on academic index)
Information asymmetry by SES	Yes (noise inversely related to resources)	No
Application enhancement by SES	Yes	No (EC/essay are archetype-based)
Rigorous sensitivity analysis	Yes (Latin Hypercube)	No
Validation against IPEDS	Yes	No (calibrated to published admit rates)
Counterfactual mechanism isolation	Yes (8 experimental conditions)	No

Key Design Differences

Matching approach: Reardon uses a multi-round, market-clearing process where agents learn over time. Our model uses a single-cohort Gale-Shapley-inspired sequential round system.
Student decision-making: Reardon students optimize an expected-utility portfolio. Our students use a utility threshold with hook-adjusted admission probability to build lists.
College decision-making: Reardon colleges rank by caliber and admit based on yield prediction. Our colleges use a logistic admission score with hook multipliers and stochastic Bernoulli trials.
Abstraction level: Reardon is deliberately abstract (generic students and colleges) to isolate mechanisms. Our model is deliberately concrete (real colleges, real stats) for practitioner insight.
Primary purpose: Reardon tests sociological mechanisms driving stratification. Our model simulates individual-level admissions outcomes for educational planning.

17. Mathematical Appendix — Detailed Formulations

Appendix A: Full Model Equations

Initialization: - J colleges, each with m seats - N students per cohort - College quality: Q_j ~ N(mu_Q, sigma_Q) for j = 1, ..., J - Student caliber: C_i ~ bivariate_normal(mu_C, sigma_C, corr=r with R_i) for i = 1, ..., N - Student resources: R_i ~ N(mu_R, sigma_R)

Application Submodel:

Student i perceives college j quality: Q*_ij = Q_j + u_ij, u_ij ~ N(0, tau_i) where tau_i = sigma_Q * sqrt((1 - rho_i) / rho_i) and rho_i = min(0.9, max(0.5, 0.7 + 0.1 * R_i))
Student i's utility from college j: U*_ij = a_i + b_i * Q*_ij where a_i = a_0 + d * R_i (or d for all, depending on mechanism) and b_i = 1 + e * R_i (or e for all)
Student i perceives own caliber: C*_i = C_i + c_i + e_i where c_i = c * R_i * sigma_C (enhancement) and e_i ~ N(0, sigma_self_i) sigma_self_i = sigma_C * sqrt((1 - rho_self_i) / rho_self_i) rho_self_i = min(0.9, max(0.5, 0.7 + 0.1 * R_i))
Student i estimates admission probability at college j: P_ij = logit^{-1}(beta_0 + beta_1 * (C*_i - Q*_ij)) where beta_0, beta_1 are estimated from aggregated admission/rejection data from the prior 5 cohorts using logistic regression.
Expected utility of application portfolio A_i: EU_i(A_i) = sum over enrollment scenarios, weighted by probabilities Solved via the recursive algorithm in Appendix D.
Number of applications: K_i = max(1, round(4 + 0.5 * R_i))

Admission Submodel:

College j observes applicant i's caliber: C**_ij = C_i + c_i + w_ij, w_ij ~ N(0, phi)
College j ranks applicants by C**_ij and admits the top s_j: s_j = m / Y_j where Y_j = estimated yield rate
Yield estimation: Y_j = 0.2 + 0.06 * percentile_rank(Q_j) Updated based on actual yield from prior years.

Enrollment Submodel:

Student i enrolls in: enrolled_j = argmax_{j in admitted_set_i} U*_ij

Quality Update:

Q_j(t+1) = 0.9 * Q_j(t) + 0.1 * mean(C_i : i enrolled at j in year t)

Appendix C: Initialization Details

The model runs 30 years to reach approximate equilibrium. Key initialization: - Year 1: Colleges start with random quality draws from N(1070, 130) - Year 1: No prior admission data, so students use crude estimates - Years 2-5: Admission probability estimation improves as data accumulates - Years 6+: Full 5-year lookback available - Analysis typically uses years 26-30 (post-stabilization)

Appendix D: Optimal Portfolio Algorithm

The recursive algorithm for finding the optimal K-college application portfolio from J colleges:

Key insight: If we order colleges by perceived utility, the expected utility of a portfolio depends only on which colleges are in it and their individual admission probabilities.

For a portfolio A = {a_1, a_2, ..., a_K} ordered by utility (U_1 < U_2 < ... < U_K):

EU(A) = P_K * U_K + (1 - P_K) * EU(A \ {a_K})

Base case: EU({a_1}) = P_1 * U_1

This recursive structure means we can evaluate any K-college subset in K steps once we know EU of all (K-1)-college subsets.

Full algorithm: 1. Compute U_ij and P_ij for all 40 colleges 2. Sort colleges by U_ij 3. For each possible "top" college k (from K to J): - Find the best (K-1)-college subset from colleges 1 to (k-1) - Compute EU of the K-college set including college k 4. Return the K-college set with highest EU

Complexity: K * (J - (K-1)/2) evaluations For K=4, J=40: 4 * (40 - 1.5) = 154 evaluations (vs. C(40,4) = 91,390 brute force)

Direct Extensions by the Same Authors

Reardon, Baker, Kasman, Klasik, & Townsend (2018). "What Levels of Racial Diversity Can Be Achieved with Socioeconomic-Based Affirmative Action? Evidence from a Simulation Model." Journal of Policy Analysis and Management, 37(3), 630-657.
Adds race as a student attribute
Tests whether SES-based admissions preferences can achieve racial diversity comparable to race-based affirmative action
Uses the same 2016 ABM framework
Baker, Klasik, & Reardon (2018). "Race and Stratification in College Enrollment Over Time." AERA Open, 4(1).
Uses the ABM framework to study temporal trends in racial stratification

Earlier Working Paper Version

Reardon et al. (2014). "Simulation Models of the Effects of Race- and Socioeconomic-Based Affirmative Action Policies on Elite College Enrollment Patterns." SREE conference paper (ERIC ED562944).
Earlier version of the affirmative action extension

Third-Party Replication and Extension

Beziaud (2022). MOSAIC: Simulating Socioeconomic Based Affirmative Action. GitHub repository. https://github.com/lbeziaud/mosaic
Python replication of Reardon et al. (2018)
GPL-3.0 licensed
Allard, Beziaud, & Gambs. Replication study published via INRIA HAL (hal-04328511).
Discusses reproducibility of the Reardon model
Notes challenges in replication without original code

19. Key Takeaways for the College-Sim Project

What We Can Learn from Reardon

The recursive portfolio algorithm is elegant and computationally efficient. Our utility-based list building could be improved by adopting a similar expected-utility optimization framework instead of the current threshold-based approach.
Dynamic college quality is an interesting feature for multi-year simulations. If we ever extend college-sim to run multiple cohorts, the Q' = 0.9Q + 0.1mean(enrolled) update rule is simple and effective.
Information asymmetry is a powerful and realistic mechanism we don't model. Low-SES students having noisier perceptions of both their own competitiveness and college quality is well-documented and would add realism.
Validation methodology: Comparing model outputs to IPEDS institutional-level data (application volumes, admit rates, yield rates by selectivity) is a straightforward validation approach we could adopt.
Latin Hypercube sensitivity analysis is a more rigorous approach to parameter sensitivity than ad-hoc testing. Worth considering for our simulation.

What Our Model Does Better

Round structure: Our ED/EA/EDII/RD pipeline captures an important strategic dimension (ED yield boost, REA restrictions) that the single-round Reardon model completely misses.
Hooks and holistic review: Legacy, athlete, donor, and first-gen preferences are major features of real admissions that Reardon's pure-caliber ranking ignores.
Real colleges and data: Calibrating to actual Harvard/Yale/MIT stats rather than abstract "college quality" makes the simulation directly useful for counseling and planning.
Financial considerations: Our integration of Chetty yield data and net-cost-by-income adds an economic dimension entirely absent from Reardon.
Visualization: D3.js interactive visualization makes the simulation accessible to non-technical users — Reardon provides no interactive component.

Potential Integration Ideas

Noise-by-SES: Add information noise inversely proportional to family income when students build college lists (could reduce list quality for low-SES archetypes)
Enhancement by SES: Could model SAT prep effects — high-income students get +25 SAT points
Multi-cohort mode: Add an optional mode that runs multiple years with college quality updating
Validation: Compare our output distributions (admit rates by tier, enrollment by archetype) against IPEDS and ELS:2002 patterns

20. References Cited in the Paper

The paper cites 45 references. Key ones for the college-sim project:

Avery, C. & Hoxby, C.M. (2004). "Do and should financial aid packages affect students' college choices?" In College Choices: The Economics of Where to Go, When to Go, and How to Pay for It.
Bowen, W.G. & Bok, D. (1998). The Shape of the River: Long-Term Consequences of Considering Race in College and University Admissions. Princeton University Press.
Espenshade, T.J. & Radford, A.W. (2009). No Longer Separate, Not Yet Equal: Race and Class in Elite College Admission and Campus Life. Princeton University Press.
Hoxby, C.M. & Avery, C. (2013). "The Missing 'One-Offs': The Hidden Supply of High-Achieving, Low-Income Students." Brookings Papers on Economic Activity.
Hoxby, C.M. & Turner, S. (2013). "Expanding College Opportunities for High-Achieving, Low Income Students." SIEPR Discussion Paper.
Pallais, A. (2015). "Small Differences that Matter: Mistakes in Applying to College." Journal of Labor Economics, 33(2), 493-520.
Roth, A.E. (2008). "Deferred acceptance algorithms: history, theory, practice, and open questions." International Journal of Game Theory, 36, 537-569.
US Department of Education (2006). Education Longitudinal Study of 2002: First Follow-up. National Center for Education Statistics.

Notes compiled: March 2026 Source: https://www.jasss.org/19/1/8.html Working paper: https://cepa.stanford.edu/content/agent-based-simulation-models-college-sorting-process

Reardon et al. (2016) — Agent-Based Simulation Models of the College Sorting Process

Reardon et al. (2016) — Agent-Based Simulation Models of the College Sorting Process

Detailed Research Notes

1. Full Citation

Author Affiliations

2. Abstract (Summary)

3. Research Question and Motivation

Central Question

Why ABM?

Distinction from Prior Work

4. Model Overview (ODD-like Structure)

4.1 Purpose

4.2 Entities, State Variables, and Scales

4.3 Process Overview and Scheduling

4.4 Design Concepts

5. Agent Types and Counts

Students

Colleges

Key Ratio

6. Student Agent Decision Rules

6.1 How Students Form College Preferences (Perceived Utility)

6.2 How Students Decide Where to Apply (Portfolio Selection)

6.3 How Students Choose Among Acceptances (Enrollment)

7. College Agent Decision Rules

7.1 How Colleges Evaluate Applications

7.2 How Colleges Set Admission Thresholds

7.3 Yield Targets

8. Key Parameters (Complete List)

Table 1: Baseline Model Parameters

Table 2: Resource Pathway Parameters by Experiment

9. Calibration Data Sources

Primary Data: Education Longitudinal Study (ELS:2002)

Enhancement Calibration

Yield Rate Calibration

Validation Data: IPEDS

10. Simulation Results — Key Patterns

10.1 Baseline Realism (Model 2: All Pathways Active)

10.2 Model 1: No Resource Influence (Control)

10.3 Mechanism Experiments (Models 3-8)

10.4 Latin Hypercube Sensitivity Analysis

11. Policy Experiments

12. Validation Method

Internal Validation

External Validation Against IPEDS

Validation Caveats

13. Limitations Explicitly Noted

14. What the Model Does NOT Include

Admissions Process Features NOT Modeled

Student Features NOT Modeled

Institutional Features NOT Modeled

Market Features NOT Modeled

15. Code Availability and Replication

Original Code

Third-Party Replication

Replication Feasibility

16. Comparison with Our College-Sim Model

Features Our Model Has That Reardon Lacks

Features Reardon Has That Our Model Lacks

Key Design Differences

17. Mathematical Appendix — Detailed Formulations

Appendix A: Full Model Equations

Appendix C: Initialization Details

Appendix D: Optimal Portfolio Algorithm

18. Related Work and Follow-Up Papers

Direct Extensions by the Same Authors

Earlier Working Paper Version

Third-Party Replication and Extension

19. Key Takeaways for the College-Sim Project

What We Can Learn from Reardon

What Our Model Does Better

Potential Integration Ideas

20. References Cited in the Paper