High School to College Enrollment: Feeder School Data Sources

data_feeder_schools.md


High School to College Enrollment: Feeder School Data Sources

National Student Clearinghouse

The National Student Clearinghouse (NSC) is the most comprehensive source of student-level postsecondary enrollment data in the U.S., covering ~3,600 institutions enrolling 97%+ of all students.

What's Publicly Available

  • High School Benchmarks Report: Annual report on HS graduates' postsecondary enrollment, persistence, and completion. Tracks first-fall college enrollment by graduating class. Available as downloadable data dashboards.

  • Current Term Enrollment Estimates: Aggregate enrollment trends released 3x/year (November preliminary, January and June comprehensive).

  • Research Reports: Published analyses on enrollment patterns, persistence, transfer rates.

What's NOT Publicly Available

  • StudentTracker: The core product that links individual HS students to their college enrollment outcomes. Available only to subscribing high schools and districts — not to the general public. Schools upload student rosters and receive back which colleges their graduates enrolled in.

  • Student-level microdata: Not available for download. Researchers can apply for access through the NSC Research Center.

Simulation Relevance

NSC aggregate reports provide national baselines (e.g., what % of HS grads enroll in 4-year vs 2-year, persistence rates by school type) but do NOT provide school-to-school linkage data publicly. The HS-to-college mapping lives inside StudentTracker, which is paywalled.


State-Level Data Sources

Several states publish high-school-level college-going rates that can serve as proxies for feeder patterns.

California Department of Education (CDE)

  • College-Going Rate (CGR) Data: Downloadable CSV files showing college-going rates at state, county, district, and individual school level, disaggregated by race/ethnicity and student group.

  • 12-Month CGR Files: Track % of HS completers enrolling in postsecondary within 12 months.

  • 16-Month CGR Files: Extended tracking window.

  • Limitation: Shows aggregate college-going rates per HS, but does NOT break down which specific colleges students enrolled in.

Texas Education Agency (TEA)

  • Student Data: Statewide enrollment by grade, race/ethnicity, gender, economic status, program participation.

  • College-going data available through TEA's accountability system but less granular than California's.

Illinois

  • NCES Digest State Dashboard for Illinois provides aggregate postsecondary enrollment data.

Key Limitation

State databases generally track whether HS graduates go to college, not which specific college. The HS-to-specific-college linkage requires NSC StudentTracker or institution-specific data.


Research Papers with Feeder Data

Chetty, Deming, Friedman (2023) — "Diversifying Society's Leaders?"

  • NBER Working Paper 31492 | Full PDF

  • Used anonymized admissions data from Ivy-Plus colleges (Ivy League + Stanford, MIT, Duke, UChicago) linked to tax records and SAT/ACT scores.

  • Key findings on HS type:

  • Children from top-1% families are 2x as likely to attend Ivy-Plus as middle-class students with comparable test scores.

  • The rich-kid advantage in non-academic ratings is "almost entirely driven by the fact that they are much more likely to attend elite private high schools."

  • Three drivers of high-income admissions advantage: (1) legacy preferences, (2) non-academic credential weighting, (3) athletic recruitment.

  • Children from high-income families have no admissions advantage at flagship public colleges.

  • Data availability: Opportunity Insights data portal provides downloadable CSV files with college mobility statistics by institution and birth cohort.

Arcidiacono, Kinsler, Ransom — "Legacy and Athlete Preferences at Harvard"

  • NBER Working Paper 26316 | PDF

  • Used Harvard admissions microdata from the SFFA v. Harvard trial (Classes of 2014-2019).

  • Key findings:

  • 43% of white admits were ALDCs (Athletes, Legacies, Dean's list, Children of faculty/staff).

  • ~75% of ALDC admits would have been rejected without those preferences.

  • 68%+ of recruited athletes, legacies, and dean's list applicants are white vs. <41% of typical applicants.

  • Admit rates: athletes 86%, legacy 33%, dean's list 42%, faculty children 47% — vs. ~6% overall.

  • Data: Harvard admissions data includes demographic, geographic, academic measures, internal Harvard ratings (academic, extracurricular, athletic, personal), plus HS counselor/teacher letter ratings. Data is not publicly downloadable — available only through court records.

Arcidiacono — "What the SFFA Cases Reveal About Racial Preferences"

  • PDF

  • Companion analysis of the SFFA trial data showing admit rate differentials by race.

  • African American applicants' admit rates ~4x higher than comparable white applicants; Hispanic applicants ~2.4x.

Chetty et al. — "Mobility Report Cards" (2017)

  • Paper | Data

  • Children with parents in top 1% are 77x more likely to attend Ivy-Plus than children from bottom 20%.

  • Income segregation across colleges comparable to income segregation across census tracts.

  • Downloadable data: College-level mobility statistics (parent income distributions, student earnings outcomes) available as CSV at opportunityinsights.org/data.

Mulhern — "Changing College Choices with Personalized Admissions Information at Scale"

  • Paper | PNAS version

  • Uses Naviance data to study how showing HS students past admission outcomes from their own school affects application behavior.

  • Demonstrates that personalized HS-to-college outcome data reduces undermatching.

Glasener — "Shaping Elite College Pathways: Mapping the Field of Feeder Schools"


Public Datasets

Opportunity Insights (Best Available)

  • URL: opportunityinsights.org/data

  • What's there: College-level data on parent income distributions, student earnings, mobility rates. Linkable to IPEDS college identifiers.

  • Format: CSV downloads with readme documentation.

  • Limitation: College-level aggregates, not HS-to-college linkage. But provides the income/mobility context that shapes feeder dynamics.

IPEDS (Integrated Postsecondary Education Data System)

  • URL: nces.ed.gov/ipeds

  • What's there: Annual survey data from every Title IV institution — enrollment, graduation rates, finances, student demographics. 12 interrelated survey components.

  • Limitation: College-side data only. No HS origin information.

College Scorecard

  • URL: collegescorecard.ed.gov/data

  • What's there: Institution-level and field-of-study-level data going back to 1997. Includes earnings outcomes, debt, completion rates.

  • Format: Downloadable CSV files.

  • Limitation: No HS-to-college linkage.

Common Data Set (CDS)

  • URL: commondataset.org

  • What's there: Standardized self-reported data from colleges including acceptance rates, enrolled student profiles (SAT/ACT ranges, GPA distributions), financial aid.

  • Limitation: College-reported aggregates. No HS-level breakdown.

Kaggle Datasets

Dataset URL Notes
Elite College Admissions kaggle.com/datasets/mexwell/elite-college-admissions Admissions data for selective colleges
College Admissions (Qian) kaggle.com/datasets/samsonqian/college-admissions General college admissions data
US College Data kaggle.com/datasets/yashgpt/us-college-data Institutional characteristics
US Schools Dataset kaggle.com/datasets/andrewmvd/us-schools-dataset K-12 school data

Note: None of the Kaggle datasets provide direct HS-to-college linkage. They are primarily college-side or student-attribute datasets.

GitHub Repositories

  • Used by ~10,000+ high schools. Contains HS-specific scattergrams (GPA/test scores vs. admit/deny at specific colleges).

  • Not publicly accessible — requires authenticated school login.

  • Data threshold: scattergrams shown only if HS has 5+ applicants to that college.

  • Academic researchers have obtained access for studies (see Mulhern paper above).


Journalism Sources

Harvard Crimson Feeder School Investigation (2024) — Most Detailed Source

  • Interactive: "Most Schools Dream of Sending Students to Harvard. These 21 Expect To."

  • Data Widget

  • Analysis Article

  • Methodology: Analyzed Freshman Register data for 15 matriculated classes (2009-2024).

  • Key data points:

  • 21 schools sent 2,200+ students to Harvard over 15 years

  • 1 in 11 accepted students comes from these 21 schools

  • Top feeders (100+ students each, 2009-2024): Boston Latin, Phillips Academy Andover, Stuyvesant, Phillips Exeter

  • 5% of freshmen come from just 7 schools: Boston Latin, Phillips Andover, Stuyvesant, Noble & Greenough, Phillips Exeter, Trinity (NYC), Lexington HS

  • Of 21 schools: 12 private (avg tuition ~$64K), 9 public (4 selective magnet, 4 affluent suburban, 1 local)

  • Private school students = ~25-30% of Ivy undergrad classes (25.5% Harvard, 27% Princeton, 32.4% Brown, 37.9% Cornell)

Harvard Crimson (2013) — Historical Feeder Analysis

Prep Review — Multi-University Feeder Rankings

  • Harvard Feeders | Yale Feeders | Princeton Feeders | MIT Feeders

  • Methodology: Top 30 feeder schools per university, ranked by % of graduates matriculating to that university over past 5 years.

  • Eligibility: College-prep schools with grade 12/PG, minimum 40-student average graduating class.

  • Provides per-university feeder rankings — useful for cross-referencing patterns.

Chicardgo School — LA Private School Elite College Placement Index

  • LA Private School Rankings

  • Blog/Methodology

  • Uses proprietary Elite College Placement Index (LA-ECPI) based on matriculation to T25 National and T15 Liberal Arts colleges (70% weighting).

  • Top LA schools: Harvard-Westlake (50% elite placement), Polytechnic (40%), Marlborough (40%).

  • Data modeled from publicly available school profiles and matriculation lists.

Crimson Education / Rise

NPR (2023)


Most Useful for Simulation

Tier 1: Directly Usable Data

Source What It Provides Format Access
Harvard Crimson 2024 feeder data Named schools, counts (100+ per school over 15 years), public/private split, tuition Structured article with data widget Free, web
Prep Review rankings Top 30 feeders per HYPSM university, % matriculation rates Web tables Free, web
Chicardgo ECPI LA private school placement rates into T25 colleges Web tables Free, web
California CDE CGR files Per-school college-going rates, demographic breakdowns CSV download Free, govt
Opportunity Insights data College-level parent income distributions, mobility rates CSV download Free

Tier 2: Contextual / Calibration Data

Source What It Provides Format Access
Arcidiacono SFFA data (via papers) ALDC admit rates, HS type effects on ratings Published tables/figures Free (papers)
Chetty "Diversifying" paper Private HS advantage quantified, non-academic rating gaps Published tables/figures Free (paper)
CDS / IPEDS / College Scorecard College-side acceptance rates, SAT/GPA ranges, enrollment CSV download Free, govt
NSC High School Benchmarks National HS-to-college enrollment baselines Report/dashboard Free

Tier 3: Restricted but Valuable

Source What It Provides Why Restricted
NSC StudentTracker Actual HS-to-college enrollment linkage Subscription required
Naviance Per-HS scattergrams of admits/denies at specific colleges School login required
SFFA trial microdata Harvard admissions records with HS identifiers Court records, not easily accessible
  1. Use Harvard Crimson 2024 data to define feeder school archetypes: elite boarding (Exeter, Andover), selective magnet (Stuyvesant, Boston Latin, TJHSST), affluent suburban (Lexington, Scarsdale, Brookline), elite day school (Noble & Greenough, Trinity).

  2. Use Prep Review to cross-reference feeder patterns across HYPSM — some schools are Harvard feeders but not MIT feeders, etc.

  3. Use Chetty/Arcidiacono papers to calibrate the private-school advantage multiplier: private school students ~2x admission probability at comparable test scores, driven by non-academic ratings + legacy + athletics.

  4. Use CDE CGR data to set realistic college-going rate baselines for different school types (affluent suburban ~90%+, average public ~65%, low-income ~40%).

  5. Use Opportunity Insights data to calibrate income-to-enrollment relationships: top-1% families produce 77x the Ivy-Plus enrollment of bottom-20% families.

  6. Derive feeder school parameters: For the simulation's 20 high schools, map each to an archetype with calibrated feeder rates:

  7. Elite boarding school: ~15-20% of grads to HYPSM, ~40-50% to T20

  8. Selective magnet: ~8-12% to HYPSM, ~30-40% to T20

  9. Affluent suburban: ~5-8% to HYPSM, ~20-30% to T20

  10. Average public: ~0.5-1% to HYPSM, ~5-10% to T20

  11. Under-resourced public: ~0.1% to HYPSM, ~2-5% to T20