Protocol - Race/Ethnic Residential Segregation - U.S. Census

Add to My Toolkit

The protocol is based on extracting data from the U.S. Census Bureau on a set of variables related to the concept of residential segregation. Residential segregation describes the distribution of different race/ethnic groups across smaller areal units (e.g., census tracts) within larger areas (e.g., counties or metropolitan statistical areas [MSAs]). The Dissimilarity Index is one of the most commonly used race/ethnic residential segregation measures. All the relevant variables are available from the decennial censuses or the American Community Survey (ACS) 5-year estimates. Once the data are extracted the proposed measure, the Dissimilarity Index can be calculated.

Which data set should be used?

Users interested in using measures of residential segregation in conjunction with the Neighborhood Concentrated Disadvantage protocol should use data from the ACS 5-year estimates for consistency of data sources. Users who are interested in using the 100% count data rather than estimates, or making comparisons of residential segregation in metropolitan areas across time (e.g. 1990 vs. 2010) should use data from the decennial censuses. The protocol here describes how to use Summary File 1 (SF1) files (i.e., 100% sample data) from the decennial censuses to calculate the Dissimilarity Index. Users interested in using the ACS data should refer to the other protocol for this measure.

Specific Instructions

Assuming that information on current address (see PhenX Demographics domain, Current Address measure) has been collected for a study respondent, then it is possible to use geocoding to link the address of a study participant to his or her local neighborhood (or other large geographical unit).

It is necessary to extract data for smaller units (e.g., census tracts) to calculate the Dissimilarity Index for each larger unit. To aid comparability between studies, the Social Environment Working Group recommends that researchers set the smaller area to the census tract and the larger area to the metropolitan statistical area.

Additionally, researchers can use the census variables to calculate more basic diversity scores at the census tract level such as the entropy index.

The most common conceptualization of residential segregation is based on the dimension of evenness (Taeuber & Taeuber, 1965; White, 1986; Massey & Denton, 1988; Reardon & O’Sullivan, 2004), and the most widely used measure of residential segregation is the Dissimilarity Index, sometimes referred to as D. This measure is computationally straightforward to calculate from Census data, and while the index of dissimilarity was originally applied in a comparison of two different population groups (most often Whites and Blacks), recent papers have extended this measure to the multiple race/ethnic group case (Reardon & Firebaugh, 2002), and others have extended the 2 and multigroup measure by incorporating the spatial dimension using data from adjacent or proximate census units and weighting accordingly (see White, 1983; Wong, 1993; Reardon & O’Sullivan, 2004; Reardon et al., 2008).




The Dissimilarity Index is based on U.S. Census Bureau data. This protocol describes how to make calculations using the decennial census Summary File 1, referred to as the SF1. The SF1 is the short form of the U.S. Census collected from everyone; it is also referred to as 100% data. These calculations can be made using the 1990, 2000, and 2010 decennial Census.

The 2000 and 2010 SF1 data can be downloaded at https://census.gov.

2010 SF1 MS Access data: https://www.census.gov/data/datasets/2010/dec/summary-file-1.html

2000 SF1 MS Access data: https://www.census.gov/data/datasets/2000/dec/summary-file-1.html

Web version for 2010 data: https://data.census.gov/cedsci/table?q=p5&tid=DECENNIALSF12010.P5

A repository of resources for decennial Census data can be found at U.S. Census https://www.census.gov/programs-surveys/decennial-census/data.html. Users not familiar with Census data should consult the technical materials. The technical documentation for the 2010 Census is available at https://www2.census.gov/programs-surveys/decennial/2010/technical-documentation/complete-tech-docs/summary-file/sf1.pdf. Technical documentation for the 1990 and 2000 Census SF1 data is provided in the references section.

This protocol focuses on the race and ethnicity data in the 2010 SF1 file. The key race/ethnicity data in the 2010 Census are found in "Table P5: Hispanic or Latino Origin by Race." This table is preferred over other possible race and race/ethnic tables available, as it provides data on the main race/ethnic groups in the United States and explicitly incorporates data on Hispanic or Latino populations, otherwise not available in the race-only tables.


Variable Code

Variable Name




  Not Hispanic or Latino:


    White alone


    Black or African American alone


    American Indian and Alaska Native alone


    Asian alone


    Native Hawaiian and Other Pacific Islander alone


    Some other race alone


    Two or more races:


  Hispanic or Latino:


    White alone


    Black or African American alone


    American Indian and Alaska Native alone


    Asian alone


    Native Hawaiian and Other Pacific Islander alone


    Some other race alone


    Two or more races:

The race/ethnic data are available for all small census geographies-such as census block, census block group, and census tract-and can be easily extracted for almost any geographic level.

Researchers can use the data in this table to easily calculate basic variables (e.g., the percentage of any race and/or ethnicity group) or to combine groups (e.g., all minorities).

The Dissimilarity Index provides data on larger areas (e.g., metropolitan statistical areas) using smaller level data.

The most common conceptualization of residential segregation is based on the dimension of evenness. Evenness refers to the differential distribution of the subject population across neighborhoods in a large area (e.g., metropolitan area). It ranges from 0 (complete integration) to 1 (complete segregation) and indicates the percentage of a group’s population that would have to change residence for each neighborhood to have the same percentage of that group as the metropolitan area overall. It is computed as:


n is the number of tracts in the larger area (e.g., a metropolitan area),

xi is the population size of the minority group of interest in tract i,

X is the population of the minority group in the larger area (e.g., metropolitan area) as a whole,

yi is the population of the reference group (usually non-Hispanic Whites) in tract i, and,

Y is the population of the reference group in the larger area (e.g., metropolitan) area as a whole.

The calculation requires the computation of the totals for each group across all subareas within a larger region (e.g., all census tracts within a county), the proportion of each group within each subarea, the absolute difference between the proportions, and the sum of the absolute differences. The latter number is multiplied by 0.5 to generate a result between 0.0 and 1.0. A value of 0.0 would indicate there were the same proportions of majority and minority group populations in each subarea, as in the larger regions’ population. If all subareas within the region contain members of just one group (i.e., there is no co-residence) then D equals 1.0, indicating complete segregation.

Extending the Dissimilarity Index: The Multigroup Analog

While much early research on segregation looked at two groups (e.g., Black and White, or majority and minority), today’s society is multiethnic. Two-group measures are useful but limited for describing complex patterns of segregation. The choice to use a two-group or multigroup D depends on the specific question of interest. In a region where the population is composed of three groups (e.g., White non-Hispanic, Black non-Hispanic, and Hispanic), we may be interested in

a) segregation between two specific groups (e.g., How segregated are White from Black residents?); or

b) segregation among all three groups (e.g., How segregated are White non-Hispanic, Black non-Hispanic, and Hispanic residents from each other?).

The two-group measure can still be used by comparing all possible pairs of population groups (Morrill, 1995), but these are not comprehensive, and multiple groups are not treated simultaneously. To address segregation among multiple groups requires a multigroup analog to D (Morgan et al., 1975; Sakoda, 1981). The multigroup analog describes the extent to which two or more population groups are similarly distributed among subareas. The formula for multigroup dissimilarity (from Reardon & Firebaugh, 2002) is:


T is total population,

M is the number of groups m,

J is the number of subareas or units j,

tj is number of individuals in subarea j,

πm is the proportion in group m,

πjm is the proportion in group m, of those in unit j, and

I is the Simpson’s Interaction Index, given by

The interpretation of multigroup D (sometimes labeled as D(m)) is the same as D (see Wong, 1993).

In the Stata statistical software package, the command seg (installed by typing "ssc install seg" from within Stata) will compute D (Reardon, 2002).

Researchers have extended segregation measures by incorporating the spatial dimension (see White, 1983; Wong, 1993; Reardon & O’Sullivan, 2004). There are spatially modified versions of the D index (see Wong, 1993).

Personnel and Training Required

Knowledge of Census data products and websites, such as the Census website (https://data.census.gov), and/or publicly available data portals such as https://www.nhgis.org/, and/or commercial geospatial data products, such as that provided by vendors like GeoLytics (https://www.geolytics.com) or Social Explorer (https://www.socialexplorer.com/).

The extracted data need to be manipulated, and the Index of Dissimilarity needs to be calculated.

Equipment Needs

Access to a desktop/laptop computer with internet access to download raw data from the U.S. Census Bureaus American Factfinder website (http://data.census.gov). Statistical Packages (e.g., SPSS, SAS) for data manipulation.

Requirement CategoryRequired
Major equipment No
Specialized training No
Specialized requirements for biospecimen collection No
Average time of greater than 15 minutes in an unaffected individual No
Mode of Administration

Secondary Data Analysis


Infant, Toddler, Child, Adolescent, Adult, Senior, Pregnancy


Not applicable: Derived from publicly available secondary data

Selection Rationale

The PhenX Social Environments Working Group preferred an objective measure of racial/ethnic residential segregation using U.S. Census Bureau data. A questionnaire that relies on subjective judgment based on retrospective ascertainment is likely to be unreliable.



Logical Observation Identifiers Names and Codes (LOINC) Race - ethnic resid segregation proto 63038-4 LOINC
caDSR Form PhenX PX211402 - Race Ethnic Residential Segregation Us Census 6912513 caDSR Form
Derived Variables


Process and Review

The Social Determinants of Health-X (SDoH-X) WG reviewed this protocol in May 2022.

Guidance from the SDoH-X WG includes:

• Replaced or Updated protocol

The Expert Review Panel #2 reviewed the measures in the Demographics, Social Environments and Environmental Exposures domains.

Guidance from the ERP includes:

• Updated protocol

• New Data Dictionary

Back-compatible: there are changes to the Data Dictionary, previous version of the Data Dictionary and Variable mapping in Toolkit archive (link)

Protocol Name from Source

U.S. Census Bureau, Census, 1990, 2000, 2010


Recommended data sources include:

U.S. Census Bureau decennial Census (1990, 2000, and 2010), available from https://www.census.gov/programs-surveys/decennial-census/data.html.

U.S. Census Bureau. (1991). 1990 Census of Population and Housing, Summary Tape File 1, Technical Documentation. Available from https://www2.census.gov/programs-surveys/decennial/1990/technical-documentation/complete-tech-docs/summary-files/d1-d90-s100-14-tech.zip.

U.S. Census Bureau. (2001). Census 2000, Summary File 1, Technical Documentation, Available from https://www2.census.gov/programs-surveys/decennial/2000/technical-documentation/complete-tech-docs/summary-files/sf1.pdf.

U.S. Census Bureau. (2012). 2010 Census Summary File 1, Technical Documentation. Available from https://www2.census.gov/programs-surveys/decennial/2010/technical-documentation/complete-tech-docs/summary-file/sf1.pdf.

Census website: https://data.census.gov.

Note that several online sources provide Dissimilarity Index scores for selected metropolitan statistical areas, counties, and school districts (and across Census years). See, for example, the American Communities Project and the School Segregation Project at the Brown and Lewis Mumford Center at Albany (https://www.brown.edu/academics/spatial-structures-in-social-sciences/).

General References

Iceland, J., & Douzet, F. (2006). Measuring racial and ethnic segregation. Hrodote, 122(3): 25-43.

Iceland, J., Weinberg, D. H., & Steinmetz, E. (2002). Racial and ethnic residential segregation in the United States: 1980-2000 (U.S. Census Bureau, Series CENSR‑3). Washington DC: U.S. Government Printing Office.

Massey, D. S., & Denton, N. A. (1988). The dimensions of residential segregation. Social Forces,67, 281-315.

Morgan, P.M., Murphy, R.F., Willis, R.A., Hubbard, D.W., & Norton, J.M. (1975). Dental health of Louisiana residents based on the ten-state nutrition survey. Public Health Reports, 90(2), 173-178.

Morrill, R.L. (1995). Aging in place, age specific migration and natural decrease. The Annals of Regional Science, 29(1), 41-66.

Reardon, S. F. (2006). A conceptual framework for measuring segregation and its associations with population outcomes. In J. M. Oakes & J. S. Kaufman (Eds.), Methods in social epidemiology (pp. 169-192). San Francisco, CA: Wiley and Sons/Jossey-Bass.

Reardon, S. F., & Firebaugh, G. (2002). Measures of multi-group segregation. Sociological Methodology, 32, 33-67.

Reardon, S. F., Matthews, S. A., O’Sullivan, D., Lee, B. A., Firebaugh, G., Farrell, C. R., & Bischoff, K. (2008). The geographic scale of metropolitan racial segregation. Demography,45(3), 489-514.

Reardon, S. F., & O’Sullivan, D. (2004). Measures of spatial segregation. Sociological Methodology, 34, 121-162.

Sakoda, J.M. (1981). A generalized index of dissimilarity. Demography, 18(2), 245-50.

Taeuber, K. E., & Taeuber, A. F. (1965). Negroes in cities: Residential segregation and neighborhood change. Chicago, IL: Aldine.

Theil, H. (1972). Statistical decomposition analysis (Vol. 14). Amsterdam, The Netherlands: North-Holland.

White, M. J. (1983). The measurement of spatial segregation. American Journal of Sociology,88, 1008-1018.

White, M. J. (1986). Segregation and diversity measures in population distribution. Population Index,52, 198-221.

Wong, D. S. (1993). Spatial indices of segregation. Urban Studies, 30, 559-572.

Protocol ID


Export Variables
Variable Name Variable IDVariable DescriptiondbGaP Mapping
PX211402100000 Hispanic or Latino Variable Mapping
PX211402140000 Hispanic or Latino - Asian alone N/A
PX211402120000 Hispanic or Latino - Black or African more
American alone show less
PX211402150000 Hispanic or Latino - Native Hawaiian and more
Other Pacific Islander alone show less
PX211402130000 Hispanic or Latino - American Indian and more
Alaska Native alone show less
PX211402160000 Hispanic or Latino - Some other race alone N/A
PX211402170000 Hispanic or Latino - Two or more races alone N/A
PX211402110000 Hispanic or Latino - White alone N/A
PX211402020000 Not Hispanic or Latino Variable Mapping
PX211402060000 Not Hispanic or Latino - Asian alone N/A
PX211402040000 Not Hispanic or Latino - Black or African more
American alone show less
PX211402070000 Not Hispanic or Latino - Native Hawaiian and more
Other Pacific Islander alone show less
PX211402050000 Not Hispanic or Latino - American Indian and more
Alaska Native alone show less
PX211402080000 Not Hispanic or Latino - Some other race alone N/A
PX211402090000 Not Hispanic or Latino - Two or more races alone N/A
PX211402030000 Not Hispanic or Latino - White alone N/A
PX211402010000 Total Population N/A
Social Environments
Measure Name

Race/Ethnic Residential Segregation

Release Date

October 8, 2010


Race/Ethnic Residential Segregation is a measure of neighborhood race/ethnic residential segregation, based on data from the U.S. Census Bureau.


This measure examines various population characteristics to determine the degree of race/ethnic residential segregation, the degree to which various groups reside in different neighborhoods (Iceland & Douzet, 2006). Race/ethnic residential segregation, particularly when resulting from discrimination, can have negative consequences for minority group members. Race/ethnic residential segregation can limit residential choice, constrain economic and educational opportunities by limiting people’s access to good schools and jobs, serve to concentrate poverty in disadvantaged neighborhoods, and contribute to social exclusion and alienation (Massey & Denton, 1988). Residential segregation also affects the nature and quality of intergroup relations in society: segregation reduces contact between groups and is usually thought to cause and reflect polarization across communities (Reardon, 2006). Following Reardon (2006), a region is segregated to the extent to which individuals of a different group live in different neighborhoods in a region. That is, the term segregation does not apply to individual neighborhoods but to larger regions (e.g., school districts, counties, metropolitan statistical areas).


ACS, American Community Survey, neighborhood, Neighborhood Disadvantage, Residential Segregation, Social Determinants of Health, U.S. Census, environmental health disparities, neighborhood built environment

Measure Protocols
Protocol ID Protocol Name
211402 Race/Ethnic Residential Segregation - U.S. Census
211403 Race/Ethnic Residential Segregation - American Community Survey
211404 Race/Ethnic Residential Segregation - Separation (S) Index, Unbiased

There are no publications listed for this protocol.