Protocol - Race/Ethnic Residential Segregation - Separation (S) Index, Unbiased

Add to My Toolkit

The protocol is based on extracting data from the U.S. Census Bureau on a set of variables related to the concept of residential segregation. Residential segregation describes the distribution of different race/ethnic groups across smaller areal units (e.g., census tracts) in larger areas (e.g., counties or metropolitan statistical areas[MSAs]). The Separation Index (also known as the eta squared) is one of the most commonly used race/ethnic residential segregation measures. All the relevant variables are available from the decennial censuses or the American Community Survey (ACS) 5-year estimates. Once the data are extracted, the Separation Index can be calculated.

Which data set should be used?

Users interested in using measures of residential segregation in conjunction with the Neighborhood Concentrated Disadvantage protocol should use data from the ACS 5-year estimates for consistency of data sources. However, users should be aware that segregation index values calculated using sample data will be inflated in comparison with scores calculated using 100% count data. The reason for this is that measures of uneven distribution register deviations from “parity,” and these will be more common when using sample data because of the effect of sampling error. 

Users who are interested in using the 100% count data rather than estimates or making comparisons of residential segregation in metropolitan areas across time (e.g., 1990 vs. 2010) should use data from the decennial censuses. The protocol here describes the process using 5-year estimates from the ACS. Users interested in using the decennial census data should refer to the alternate protocol.

Specific Instructions

Assuming that information on current address (see PhenX Demographics domain, Current Address measure) has been collected for a study respondent, then it is possible to use geocoding to link the address of a study participant to his or her local neighborhood (or other large geographical unit).

It is necessary to extract data for smaller units (e.g., census tracts) to calculate the Dissimilarity Index for each larger unit. To aid comparability between studies, the Social Environment Working Group recommends that researchers set the smaller area to the census tract and the larger area to the metropolitan statistical area.

Additionally, researchers can use the census variables to calculate more basic diversity scores at the census-tract level, such as the entropy index.

The most common conceptualization of residential segregation is based on the dimension of evenness (Massey & Denton, 1988; Reardon & O’Sullivan, 2004; Taeuber & Taeuber, 1965; White, 1986). The most widely used measure of residential segregation is the Dissimilarity Index, sometimes referred to as D. This measure is computationally straightforward to calculate from census data, and although D was originally applied in a comparison of two population groups (most often whites and blacks), recent papers have extended this measure to the multiple race/ethnic group case (Reardon & Firebaugh, 2002). Others have extended the two populations and multigroup measure by incorporating the spatial dimension using data from adjacent or proximate census units and weighting accordingly (Reardon et al., 2008; Reardon & O’Sullivan, 2004; White, 1983; Wong, 1993). 

The WG recommends that investigators calculate values of the Separation Index (S) to supplement and compare with values of D. S has been used extensively in previous research but under a variety of names (e.g., the variance ratio, eta squared, Zoloth’s S, Coleman’s r, and more). S consistently fares better than D in reviews on technical criteria for segregation measures (Fossett, 2017; Reardon & Firebaugh, 2002; White, 1986; Zoloth, 1976) and is far less susceptible than D to the problem of index bias (Fossett, 2017; Winship, 1977). 

Massey and Denton (1988) state that “residential segregation is the degree to which two or more groups live separately from one another, in different parts of the urban environment.” Based on this definition, it is useful to calculate and compare values of S and D because values of S provide a more certain indication that uneven distribution involves separation of groups into ethnically homogeneous areas. In contrast, values of D can be high even when the groups in the comparison live together in areas that differ only modestly on ethnic composition (Fossett, 2017). This can occur because D responds strongly to small departures from parity that do not involve separation of groups into ethnically homogeneous areas. The only way to identify this pattern is to calculate values of D and S and compare them (Fossett, 2017). 

When comparing the standard S-index to the standard D-index, if the calculated numbers are similar, either calculation can be used. However, if the numbers differ, it is recommended to use the standard S-index formula. Further comparison can be made between the standard S-index and unbiased S-index. If the calculated numbers are similar, either calculation can be used. However, if the numbers differ, it is recommended to use the unbiased S-index formula. 


The ACS data used in this protocol can be accessed by using Excel to read the Summary Files or using the “Download Center” at the U.S. Census Bureau’s American FactFinder portal at http://factfinder.census.gov. Users can find additional information on these tools at the following locations:

Using Excel to Access Summary Files: http://www2.census.gov/programs-surveys/acs/summary_file/2014/documentation/tech_docs/ACS_SF_Excel_Import_Tool.pdf

Using the Download Center: http://www2.census.gov/programs-surveys/acs/summary_file/2014/documentation/tech_docs/How_to_Access_ACS_Estimates_AFF.pdf

The technical documentation for the ACS summary files is available online at http://www.census.gov/programs-surveys/acs/technical-documentation.html. Select the “Summary File Documentation” link, and then select the data set of interest. Users not familiar with Census data should consult the technical materials.

The key race/ethnicity data in the ACS are found in "Table B03002: Hispanic or Latino by Race." This table is preferred over other possible race and race/ethnic tables available, as it provides data on the main race/ethnic groups in the United States and explicitly incorporates data on Hispanic or Latino populations, otherwise not available in the race-only tables.

Variable Code

Variable Name




  Not Hispanic or Latino:


    White alone


    Black or African American alone


    American Indian and Alaska Native alone


    Asian alone


    Native Hawaiian and Other Pacific Islander alone


    Some other race alone


    Two or more races:


      Two races including Some other race


      Two races excluding Some other race, and three or more races


  Hispanic or Latino:


    White alone


    Black or African American alone


    American Indian and Alaska Native alone


    Asian alone


    Native Hawaiian and Other Pacific Islander alone


    Some other race alone


    Two or more races:


      Two races including Some other race


      Two races excluding Some other race, and three or more races

The race/ethnic data are available for all small census geographies—such as census block, census block group, and census tract—and can be easily extracted for almost any geographic level. Note: Although block group data have long been available from the Census File Transfer Protocol site, the Census Bureau did not make block groups available for download at American FactFinder until the release of the 2009-2013 ACS. Information about accessing block group data for earlier years is available at http://www.census.gov/library/video/acs_block_group.html.

Researchers can use the data in this table to easily calculate basic variables (e.g., the percentage of any race and/or ethnicity group) or to combine groups (e.g., all minorities).

Unbiased Versions of S via Difference of Means Calculations

Index Score =


n1 and n2 are the counts for the reference and comparison groups, respectively, in spatial unit i,

N1 and N2 are the counts for the reference and comparison groups, respectively, for the larger area as a whole,

yi is a score for “scaled contact with the reference group” assigned on the basis of an index-specific function of the reference group proportion in the population of spatial unit i given by pi = n1i/(n1i+n2i), and

are group means for “scaled contact with reference group”. 

In the case of S, the functions for assigning scores on scaled contact with the reference group (yi) based on the reference group proportion in the population of spatial unit i (pi) is simple and easy to implement. 

For S, yi = pi. Accordingly, S registers the simple group difference in average contact with the reference group. 

S takes value of 0 when the two groups have identical levels of contact with the reference group. This occurs when the two groups live together in smaller areas in the same proportions seen for the larger area as a whole. S takes value of 1 when the comparison group has no contact with the reference group and the reference group has only contact with itself. This occurs when the two groups live apart in areas that are homogeneous. 

These formulations of S are mathematically equivalent to the “standard” formulas for S given earlier (derivations are provided in Fossett 2017). They thus yield scores that are identical to the scores obtained using the standard formulas and thus will have the same bias components. 

Obtaining Unbiased Index Scores for S

Bias is eliminated from S by calculating the value of pi as follows: 

for members of the reference group, pi = (n1i−1)/(n1i+n2i−1), and

for members of the comparison group, pi = (n1i−0)/(n1i+n2i−1).

The resulting adjusted values of pi are applied as before. The values of S obtained using the adjusted values of pi in the difference of means formula will be free of bias (Fossett 2017).1

The adjustment to pi shown above removes the impact of self-contact on the value pi. In so doing, it completely eliminates index bias at the point of initial measurement. The basis for this welcome result is simple. The expected value of contact with the reference group among neighbors (excluding the individual under consideration) is unbiased; it is the same for both groups. But the expected value of contact with the reference group based from self-contact is biased; it is always positive for members of the reference group (larger in value when counts involved are small) and always zero for members of the comparison group. Extending the Dissimilarity Index and the Separation Index: The Multigroup Analog 

While much early research on segregation looked at two groups (e.g., black and white, or majority and minority), today’s society is multiethnic. Two-group measures are useful but limited for describing complex patterns of segregation. The choice to use a two-group or multigroup D or S depends on the specific question of interest. In a region where the population is composed of three groups (e.g., white non-Hispanic, black non-Hispanic, and Hispanic), we may be interested in

a) segregation between two specific groups (e.g., How segregated are white from black residents?); or

b) segregation among all three groups (e.g., How segregated are white non-Hispanic, black non-Hispanic, and Hispanic residents from each other?).

The two-group measure can still be used by comparing all possible pairs of population groups (Morrill, 1995), but these are not comprehensive, and multiple groups are not treated simultaneously. To address segregation among multiple groups requires a multigroup analog to D (Morgan et al., 1975; Sakoda, 1981). The multigroup analog describes the extent to which two or more population groups are similarly distributed among subareas. The formulas for multigroup dissimilarity (D) and multigroup separation (S) (from Reardon & Firebaugh, 2002) are:


T is total population,

M is the number of groups m,

J is the number of subareas or units j,

tj is number of individuals in subarea j,

πm is the proportion in group m,

πjm is the proportion in group m, of those in unit j, and

I is the Simpson’s Interaction Index, given by

In the Stata statistical software package, the command seg (installed by typing "ssc install seg" from within Stata) will compute both two-group and multigroup versions of S (Reardon & Firebaugh, 2002).2

Researchers have extended segregation measures by incorporating the spatial dimension (White, 1983; Wong, 1993; Reardon & O’Sullivan, 2004). Fossett (2017) introduces spatial formulations of S and other popular measures of uneven distribution.

Unbiased versions of multigroup indices have not been developed.

1There is one further adjustment. Singleton individuals—individuals who happen to be the only member of either group residing in the spatial unit, are excluded from the calculations as the adjusted calculation of pi will be undefined for them. In practice, this is a rare occurrence.   

2 The seg program calculates S under multiple mathematically equivalent formulations including the “normalized exposure index” and the “squared coefficient of variation index”.



Personnel and Training Required

Knowledge of census data products and websites, such as American FactFinder and/or publicly available data portals (e.g., https://nhgis.org/) and/or commercial geospatial data products, such as those provided by vendors like GeoLytics or Social Explorer.

The extracted data need to be manipulated, and the Index of Dissimilarity needs to be calculated.

Equipment Needs

Access to a desktop or laptop computer with Internet access to download raw data from the U.S. Census Bureau’s American FactFinder website. Statistical packages (e.g., SPSS, SAS) for data manipulation.

Requirement CategoryRequired
Major equipment No
Specialized training No
Specialized requirements for biospecimen collection No
Average time of greater than 15 minutes in an unaffected individual No
Mode of Administration

Secondary Data Analysis

Life Stage

Infant, Toddler, Child, Adolescent, Adult, Senior, Pregnancy


Not applicable; derived from publicly available secondary data

Selection Rationale

The Separation Index (S) provides an objective measure of racial/ethnic residential segregation using U.S. Census Bureau data. A questionnaire that relies on subjective judgment based on retrospective ascertainment is likely to be unreliable.

Winship (1977) established that the Dissimilarity Index (D), and to a lesser extend S, are potentially subject to non-negligible upward bias under certain circumstances. The bias component of D can be large and create misleadingly high values when areal units have small population counts for one or both groups under even distribution. The problem is well known to researchers and has prevented them from assessing segregation involving small groups or from assessing segregation in smaller communities where segregation would need to be assessed using block data with small population counts.

“After-the-fact” adjustments to remove the unwanted impact of bias on index scores have been proposed (e.g., Carrington & Troske, 1997; Winship, 1977), but they do not perform well in practical applications (Fossett, 2017) and have not gained wide use. 

The unbiased version of S is obtained by measuring segregation with the “difference of means” framework introduced in Fossett (2017). This framework casts all widely used measures of uneven distribution in the following formulation.

A recent methodological study (Fossett, 2017) has introduced formulas for calculating a refined version of D that is “unbiased”; that is, the formulas for the unbiased version yield scores for D that are free of the potentially serious problem of upward index bias discussed in Winship (1977). When index bias is not a problem, they yield scores identical to scores obtained using “standard” formulas. When index bias is a problem, they yield scores that are appropriately lower because upward bias has been eliminated.

Researchers should calculate values of S to supplement and compare with values of D. S has been used extensively in previous research but under a variety of names (e.g., the variance ratio, eta squared, Zoloth’s S, Coleman’s r, and more). S consistently fares better than D in reviews on technical criteria for segregation measures (Fossett, 2017; Reardon & Firebaugh, 2002; White, 1986; Zoloth, 1976) and is far less susceptible than D to the problem of index bias (Fossett, 2017; Winship, 1977). 



caDSR Common Data Elements (CDE) Social Environment Race/Ethnic Residential Segregation Assessment Score 3151013 CDE Browser
Derived Variables


Process and Review

Not applicable

Protocol Name from Source

American Community Survey (ACS), 5-year estimates


Brown University. (2019). Spatial structures in social sciences. Retrieved May 28, 2019, from https://www.brown.edu/academics/spatial-structures-in-social-sciences/

Fossett, M. (2017). New methods for measuring and analyzing segregation. Springer. 

U.S. Census Bureau. (2019). American Community Survey (ACS) data products (5-year estimates). Retrieved May 28, 2019, from http://www.census.gov/programs-surveys/acs

U.S. Census Bureau. (2019). American FactFinder. Retrieved May 28, 2019, from http://factfinder.census.gov

General References

Carrington, W. J., & Troske, K. R. (1997). On measuring segregation in samples with small units. Journal of Business & Economic Statistics, 15(4), 402–409.

Fossett, M. (2017). New methods for measuring and analyzing segregation. Springer.

Iceland, J., & Douzet, F. (2006). Measuring racial and ethnic segregation. Herodote, 122(3), 25–43.

Iceland, J., Weinberg, D. H., & Steinmetz, E. (2002). Racial and ethnic residential segregation in the United States: 1980-2000 (U.S. Census Bureau, Series CENSR-3). Washington, DC: U.S. Government Printing Office.

James, D. R., & Taeuber, K. E. (1985). Measures of segregation. Sociological Methodology, 15, 1–32.

Massey, D. S., & Denton, N. A. (1988). The dimensions of residential segregation. Social Forces, 67, 281–315.

Morgan, P. M., Murphy, R. F., Willis, R. A., Hubbard, D. W., & Norton, J. M. (1975). Dental health of Louisiana residents based on the ten-state nutrition survey. Public Health Reports, 90(2), 173–178.

Morrill, R. L. (1995). Aging in place, age specific migration and natural decrease. Annals of Regional Science, 29(1), 41–66.

Reardon, S. F. (2006). A conceptual framework for measuring segregation and its associations with population outcomes. In J. M. Oakes & J. S. Kaufman (Eds.), Methods in social epidemiology (pp. 169–192). San Francisco, CA: Wiley and Sons/Jossey-Bass.

Reardon, S. F., & Firebaugh, G. (2002). Measures of multi-group segregation. Sociological Methodology, 32, 33–67.

Reardon, S. F., Matthews, S. A., O’Sullivan, D., Lee, B. A., Firebaugh, G., Farrell, C. R., & Bischoff, K. (2008). The geographic scale of metropolitan racial segregation. Demography, 45(3), 489–514.

Reardon, S. F., & O’Sullivan, D. (2004). Measures of spatial segregation. Sociological Methodology, 34, 121–162.

Sakoda, J. M. (1981). A generalized index of dissimilarity. Demography, 18(2), 245–250.

Taeuber, K. E., & Taeuber, A. F. (1965). Negroes in cities: Residential segregation and neighborhood change. Chicago, IL: Aldine.

Theil, H. (1972). Statistical decomposition analysis (vol. 14). Amsterdam, Netherlands: North-Holland.

White, M. J. (1983). The measurement of spatial segregation. American Journal of Sociology, 88, 1008–1018.

White, M. J. (1986). Segregation and diversity measures in population distribution. Population Index, 52, 198–221.

Winship, C. (1977). A revaluation of indexes of residential segregation. Social Forces, 55(4), 1058–1066.

Wong, D. S. (1993). Spatial indices of segregation. Urban Studies, 30, 559–572.

Zoloth, B. S. (1976). Alternative measures of school segregation. Land Economics, 52(3), 278–298.

Protocol ID


Export Variables
Variable Name Variable IDVariable DescriptiondbGaP Mapping
Social Environments
Measure Name

Race/Ethnic Residential Segregation

Release Date

May 11, 2020


Race/Ethnic Residential Segregation is a measure of neighborhood race/ethnic residential segregation, based on data from the U.S. Census Bureau.


This measure examines various population characteristics to determine the degree of race/ethnic residential segregation, the degree to which various groups reside in different neighborhoods (Iceland & Douzet, 2006). Race/ethnic residential segregation, particularly when resulting from discrimination, can have negative consequences for minority group members. Race/ethnic residential segregation can limit residential choice, constrain economic and educational opportunities by limiting people’s access to good schools and jobs, serve to concentrate poverty in disadvantaged neighborhoods, and contribute to social exclusion and alienation (Massey & Denton, 1988). Residential segregation also affects the nature and quality of intergroup relations in society: segregation reduces contact between groups and is usually thought to cause and reflect polarization across communities (Reardon, 2006). Following Reardon (2006), a region is segregated to the extent to which individuals of a different group live in different neighborhoods in a region. That is, the term segregation does not apply to individual neighborhoods but to larger regions (e.g., school districts, counties, metropolitan statistical areas).


ACS, American Community Survey, Neighborhood, Neighborhood Disadvantage, Residential Segregation, Social Determinants of Health, U.S. Census