Overview & State Estimates Table of Contents

State Estimates of Uninsured Children, January 1998

Technical Appendix

In this Technical Appendix we document the methodology employed in creating the state estimates presented in this report. We begin with a brief overview and then present a step-by-step description of the procedures that we followed.

A. OVERVIEW

The Census Bureau assigned a sample weight to each of the approximately 50,000 households that responded to the March 1998 CPS. This weight indicates the number of households in the population that each sample household represents. The weight incorporates a number of factors in addition to a sample household's probability of being selected into the sample. These factors include an adjustment for nonresponding households and a series of corrections designed to bring the sample into closer agreement with independent estimates of the size, age and sex structure, and racial/ethnic composition of the population. The population controls used in weighting the CPS include only one total that is specific to each state: the number of persons age 16 and older. There are no state controls for the size or composition of the child population or the composition of the adult population. While several controls are applied at the national level, CPS state estimates of many characteristics are less accurate than they would be if controls were applied at the state level.

Each observation in the CPS sample was selected to represent households in only one state. For example, a sample household from Maryland with a weight of 4,000 represents 4,000 households in Maryland. When we borrow strength across states by the method of reweighting used here, we create 51 new weights for this Maryland household, and we distribute the household weight of 4,000 across the 51 states. The sample household continues to represent 4,000 households nationally, but it may now represent only 400 households in Maryland and 3,600 households spread across the other states. The other 3,600 Maryland households that this sample household previously represented are now represented by similar households from other states. The sample weights of all 50,000 CPS households are allocated across the 51 states in such a way that a set of state-specific control totals defined and constructed for this application is reproduced in each state.(1)

The step that creates the 51 state weights for each CPS household is in fact the last of 10 steps. While the assignment of 51 weights per sample household may sound complex, the nine steps that precede it constitute the bulk of the work in producing the reweighted database, and most of the detailed description that follows pertains to those nine steps. First, however, we review the basic design decisions that precede these estimation steps.

The specification of control totals is of great importance in determining how well a particular reweighting of a CPS database accomplishes its objective of supporting accurate state estimates. The process of specifying these control totals involves the interaction between what we would like to control, given the tabulations that we intend to produce, and what we are best able to control, given the data that are available.(2) Control totals developed from external sources with little or no sampling error yield the greatest improvement in the accuracy of state estimates, providing that they are relevant to the tabulations that we wish to produce. Such controls allow us to borrow strength across data sources. Clearly, if we had access to error-free counts of uninsured children in each state we could greatly improve the accuracy of our state tabulations of uninsured children by poverty level and age. That we lack such controls, however, is one reason why we must use other methods of estimation.

The state control totals and the corresponding household-level characteristics to which the controls are applied are displayed in Table A.1. The variables are grouped according to the source of the control totals. For the variables that capture the age, race, and ethnic structure of a state's child population, we use population totals derived from administrative (mainly vital records) and decennial census data. The totals for the class A variables have essentially no sampling error, which, as we said, is a highly desirable property. Unfortunately, there are no such totals for the numbers of children in various poverty categories or the numbers of uninsured children. For those totals, we must rely on sample-based estimates. But rather than using direct sample estimates from the CPS, we improve their precision by using empirical Bayes shrinkage methods to produce totals for the class E variables. These methods average direct sample estimates with predictions from regression models. The dependent variable in such a regression is the direct sample estimate, and the predictors are state characteristics measured by decennial census and administrative records data (e.g., the poverty rate according to the census; the infant mortality rate, obtained from vital records; or the ratio of children enrolled in Medicaid, according to Medicaid administrative data, to the total population of children, derived from a combination of census data and administrative data).

For the last two control variables listed in Table A.1, the class D variables, we use direct sample estimates of their totals. The main purpose for including these two variables is to restrict somewhat the borrowing of strength from reweighting. Specifically, for the one state (the District of Columbia) with no households outside large central cities with substantial black or Hispanic populations, no weight is given to a household if it is not from such a central city.(3) Likewise, for the 21 states that have no large central cities with substantial black or Hispanic populations, no weight is given to a household from such a central city in another state. For example, no Wyoming weight is given to a household from New York City.

TABLE A.1
CONTROL VARIABLES/TOTALS USED IN REWEIGHTING
Household Control Variable State Control Total
Class A: Variables for which we use Administrative estimates of totals
Number of children age 0 Population age 0
Number of children ages 1-5 Population ages 1-5
Number of children ages 6-13 Population ages 6-13
Number of children ages 14-18 Population ages 14-18
Number of Hispanic children ages 0-18 Hispanic population ages 0-18
Number of non-Hispanic black children ages 0-18 Non-Hispanic black population ages 0-18
Class E: Variables for which we use Empirical Bayes shrinkage estimates of totals
Number of children < 50% FPL Number of children < 50% FPL
Number of children 50 to < 100% FPL Number of children 50 to < 100% FPL
Number of children 100 to < 200% FPL Number of children 100 to < 200% FPL
Number of children 200 to < 350% FPL Number of children 200 to < 350% FPL
Number of uninsured children < 100% FPL Number of uninsured children < 100% FPL
Number of uninsured children 100 to < 200% FPL Number of uninsured children 100 to < 200% FPL
Number of uninsured children 200% FPL or greater Number of uninsured children 200% FPL or greater
Class D: Variables for which we use Direct sample estimates of totals
Indicator that household is in a large central city with a substantial black or Hispanic population Number of households in large central cities with substantial black or Hispanic populations
Indicator that household is not in a large central city with a substantial black or Hispanic population Number of households not in large central cities with substantial black or Hispanic populations

The steps needed to derive control totals can become complex for at least two reasons. First, empirical Bayes estimation is itself complex and may also include steps that entail elaborate operations--such as smoothing estimated variances. Second, if a state control is obtained from an external source or by using empirical Bayes estimation, its introduction as a control is likely to change--and generally improve--the estimates of other totals to which it is related. For example, the CPS does not control the size of the Hispanic population at the state level, and Hispanic children tend to have higher uninsured rates than non-Hispanic children. By introducing estimates of the state Hispanic population as controls, we may improve the precision of the state estimates of uninsured children that we also want to use as controls. Rather than introducing all of the controls simultaneously in one step, it is desirable to introduce them sequentially so that the controls introduced at one step can allow us to obtain better estimates of controls that can be introduced--along with the earlier controls--at a later step.

B. STEP-BY-STEP PROCEDURE

Development of the reweighted database required 10 steps:

  1. Derive estimates of Class A totals
  2. Adjust the weights within each state to reproduce the totals derived in Step 1
  3. Derive direct sample estimates of Class E totals using the weights from Step 2
  4. Select regression models to predict the Class E totals
  5. Derive empirical Bayes shrinkage estimates of Class E totals
  6. Adjust the weights within each state to reproduce the totals from Steps 1 and 5
  7. Derive direct sample estimates of Class D totals using the weights from Step 6
  8. Obtain adjusted totals for the Class A variables pertaining to numbers of Hispanic children and non-Hispanic black children
  9. Adjust the weights within each state to reproduce the totals from Step 1 (for the first four Class A variables), Step 8 (for the last two Class A variables), Step 5 (for the Class E variables), and Step 7 (for the Class D variables)
  10. Reweight the March 1998 CPS database from Step 9 to borrow strength across states, using the control totals from Step 1 for Class A variables, Step 5 for Class E variables, and Step 7 for Class D variables

We describe the 10 steps in detail below.

1. Derive Estimates of Class A Totals

The source of Class A totals was the Census Bureau's state population estimates by age, race, sex, and Hispanic origin. These estimates are based on the most recent decennial census and carried forward by a combination of vital statistics and other administrative data.

The population estimates published by the Census Bureau are sometimes described as "census-level" estimates because they are intended to represent the population counts that would be obtained if a decennial census were conducted. As is well known, there is a net undercount of the population by the census when it is actually conducted, and there would surely be a net undercount if a census were conducted sometime between 1990 and 2000. The Census Bureau's estimate of what the net undercount would have been had a census been conducted in, say, 1997 is the estimated undercount in the 1990 census. Accordingly, the Bureau has developed and published a "net population adjustment matrix" that contains for each state the estimated undercount by single year of age, sex, race, and Hispanic origin. When the Bureau publishes population estimates, the net undercounts are subtracted from the Bureau's best estimates of the actual population totals to obtain the published totals. To develop adjusted population estimates, we "undid" this last step, adding the net undercounts to the published population totals.

The published estimates refer to July 1 of each year. We averaged successive July 1 estimates to obtain estimates for January 1, the end of the reference period for much of the data collected in the March CPS.(4)

2. Control the Weights to the Totals Derived in Step 1

Applying the controls from Step 1 may alter individual state estimates of uninsured children and low income children. Before developing empirical Bayes estimates of uninsured children and low income children, therefore, we "raked" the CPS weights to the Class A totals derived in Step 1. Raking is a widely used procedure for adjusting sample weights. For a specified set of characteristics of the sampled population, it brings weighted sums obtained from the sample into agreement with totals obtained from external sources. The raking was done within each state--that is, there was no borrowing of strength across states at this point.

In an effort to avoid extremely large upward adjustments to weights in states with small numbers of Hispanics or blacks, we used four different raking models: (1) rake to all totals except the totals for Hispanics and non-Hispanic blacks, (2) rake to all totals except the total for Hispanics, (3) rake to all totals except the total for non-Hispanic blacks, and (4) rake to all totals. In general, we used the first model if a state has few Hispanics and non-Hispanic blacks. We used the second and third models if a state has relatively few Hispanics or relatively few non-Hispanic blacks, respectively. We used the fourth model for the remaining states.(5)

3. Derive Direct Sample Estimates of Class E Totals

Using the weights from Step 2, we calculated direct sample estimates of the Class E totals, which are needed in Steps 4 and 5.

We estimated percentages rather than counts (that is, the percentage uninsured rather than the total number) in order to standardize for state population size, which is necessary for the next two steps. For each of the four income variables, the denominator of the percentage is the total number of children. For the three uninsured variables, the denominator is the total number of children in the indicated poverty category.

We estimated variances and covariances for the direct sample estimates using a jackknife estimator, treating the CPS rotation groups as replicate samples. These estimates are required for the calculation of empirical Bayes estimates in Step 5.

4. Select Regression Models to Predict the Class E Totals

In developing regression models to predict state income distributions and uninsured rates, we considered a wide range of potential predictors, summarized in Table A.2. We selected models based on their predictive abilities. In addition, we checked for and did not find strong evidence of correctable, persistent bias in the predictions for groups of states defined by diverse characteristics, such as population size, percent Hispanic, and the other variables considered as predictors.

Regression models predicting the Class E totals were estimated for the March CPS samples for 1995, 1996, 1997, and 1998 so that we could evaluate the performance of alternative models in different years and select final model specifications based on their fit across the four years.


TABLE A.2

POTENTIAL PREDICTORS EVALUATED IN REGRESSION MODELS FOR POVERTY LEVEL AND UNINSURED RATE
 
Characteristics of and Participation in Social Welfare Programs
Participation in:
Food Stamp Program
National School Lunch Program
Supplemental Security Income
Medicaid
Unemployment Insurance
Fraction of children eligible for Medicaid by poverty category
Age distribution of children enrolled in Medicaid
 
Income and Poverty
Poverty rate among federal tax return filers and the nonfiler rate
Per capita income (from National Income and Product Accounts)
Median household income (from census)
Percentage of population by poverty category (census)
Percentage of child population by poverty category (census)
 
Demographic Characteristics of Population
Population total and population growth Racial/ethnic distribution--percentage black, percentage Hispanic
Migration--percentage noncitizen (census), net international migration rate
Urban/rural distribution (census)
 
Health and Vital Statistics
Immunization rate
Infant mortality rate and low birth weight rate Child death rate and teen violent death rate
Teen birth rate
 
Employment and Education
Proportion of jobs by sector (e.g., agriculture, manufacturing, services, government)
Proportion of jobs in small establishments
Proportion of adults who are self-employed (census)
Educational attainment of adults (no HS diploma, at least a BA) (census)
 
Living Arrangements
Percentage of children by number and employment status of parents in household (census)
Percentage of children institutionalized (census)
Percentage of nonelderly persons in nonfamily households (census)
Percentage of households with no children or nonfamily (census)

We selected a single best model for all four of the Class E poverty variables (that is, the percentage of children in each of the poverty classes listed in Table A.1). This model included the following predictors:

For the three Class E insurance coverage variables (the percentage uninsured in each of three poverty classes, which are listed in Table A.1) we identified separate best models; however, all three models included the following predictors:

The best model for the first insurance coverage variable, the percentage uninsured among children below 100 percent of poverty, also included the following predictors:

The best model for the second insurance coverage variable, the percentage uninsured among children between 100 and 200 percent of poverty, included the following predictors (in addition to the three listed above):

The best model for the third insurance coverage variable, the percentage uninsured among children at or above 200 percent of poverty, included the following predictor (in addition to the three that were included in all three models):

The models have reasonable face validity; that is, the predictors have plausible relationships, generally, to the variables being predicted.

5. Derive Empirical Bayes Shrinkage Estimates of Class E Totals

While regression models were estimated for each of four years, empirical Bayes estimates were ultimately needed for just March 1998. We estimated the four poverty variables and three uninsured variables as weighted averages of the direct sample estimates calculated in Step 3 and regression predictions from the models selected in Step 4. The relative weighting of the direct sample estimates and regression predictions varied by state, depending on the state-specific variances of the direct sample estimates and the overall fit of the regression models (which does not vary by state).

We obtained estimated counts from the estimated percentages. We used estimates of the population ages 0 to 18 from Step 1 to convert the empirical Bayes estimates of poverty percentages to poverty counts. We then ratio-adjusted the state counts so that they would sum to direct sample estimates of national totals (from Step 3) for each of the four poverty variables. A ratio adjustment of this kind is standard practice in small area estimation; it is necessary because the counts derived from the empirical Bayes estimates do not necessarily sum to the national totals. We used the adjusted state poverty counts to convert the empirical Bayes estimates of uninsured percentages to uninsured counts, and we ratio-adjusted the state estimates of these three variables to the direct sample estimates of national totals.

6. Rake Weights to Totals from Steps 1 and 5

Using the four raking models that we applied in Step 2, but modified to include the Class E variables, we raked the weights within each state to the totals obtained in Steps 1 and 5. This step was necessary so that the Class D controls calculated as direct sample estimates in Step 7 would be consistent with the Class A and Class E controls.

7. Derive Direct Sample Estimates of Class D Totals

Using the weights obtained from Step 6, we calculated direct sample estimates of the two Class D totals in each state.

8. Obtain Adjusted Totals for the Class A Variables

The Class A controls pertaining to the numbers of Hispanic children and non-Hispanic black children could not be applied fully in Steps 2 and 5 because some states had too few sample observations in one or both of these two groups. At this point, then, the weighted sums of Hispanic children and non-Hispanic black children do not agree with the Step 1 controls at the national level. To correct this problem, we grouped the states in which the Hispanic control could not be applied earlier, and we ratio adjusted the direct sample estimates of Hispanic children for this group of states as a whole so that the adjusted totals sum to the Step 1 totals for the group. We then repeated the process to obtain adjusted totals of non-Hispanic black children.

9. Rake Weights within Each State

Within each state we raked the weights to totals derived from Step 1 (for the first four Class A variables), Step 8 (for the last two Class A variables), Step 5 (for the Class E variables), and Step 7 (for the Class D variables). In this step, there is just one raking model. We do not have to treat states with low percentages of Hispanics or blacks differently because we are raking to the state totals created in Step 8 rather than the Step 1 totals for Hispanic children and non-Hispanic black children.

10. Reweight the March 1998 CPS Database to Borrow Strength Across States

Using the control totals from Step 1 for Class A variables, Step 5 for Class E variables, and Step 7 for Class D variables, we applied the reweighting procedure to obtain 51 state weights for each sample household. With this procedure there are two constraints on the state weights: (1) all control totals must be satisfied for all states and (2) for each household, the national weight given to the household after reweighting--that is, the sum of the household's state weights--must equal the weight given to the household at the conclusion of Step 9. These constraints and the maximum likelihood estimation algorithm are described in detail in Schirm and Zaslavsky (1997).


Footnotes:

1. With no other restriction the reweighting problem would not have a unique solution. The original configuration of one weight per sample household could reproduce the control totals, for example. The reweighting algorithm achieves a unique solution by first distributing the sample weights uniformly across the 51 states and then altering these initial weights the least amount that is necessary to reproduce the control totals (Zaslavsky 1988). This solution maximizes the amount of borrowing across states.

2. At a minimum a control variable must have a counterpart that is measured in the CPS. That is, there has to be a corresponding household-level variable to which a control can be applied.

3. A large central city with a substantial black or Hispanic population had, according to 1990 decennial census data, a total population of at least 100,000 and at least 25 percent of the residents classified as black or Hispanic.

4. Because the regression models estimated in Step 4 were to be replicated with four years of data, Steps 1 through 3 had to be performed for these same four years as well--1995 through 1998.

5. More precisely, we calculated for each state the expected number of CPS households with Hispanic children and the expected number with non-Hispanic black children. If, for example, the expected number of CPS households with Hispanic children was less than ten, but the expected number of CPS households with non-Hispanic black children was ten or higher, we used the second raking model.


REFERENCES

Czajka, John L., and Kimball Lewis. "Using National Survey Data to Analyze Children’s Health Insurance Coverage: An Assessment of Issues." Washington, DC: Mathematica Policy Research, May 1999.

Czajka, John L., Margo L. Rosenbach, and Allen L. Schirm. "Uninsured Children in New Jersey: Estimates of Their Number and Characteristics." Washington, DC: Mathematica Policy Research, April 1999.

Irvin, Carol, and John L. Czajka. "Simulation of Medicaid and SCHIP Eligibility: Implications of Findings from 10 States." Washington, DC: Mathematica Policy Research, April 2000.

National Research Council. Small Area Estimates of School Age Children in Poverty, Interim Report 2, Evaluation of Revised 1993 County Estimates for Title I Allocations, edited by Constance F. Citro, Michael L. Cohen, and Graham Kalton. Panel on Estimates of Poverty for Small Geographic Areas, Committee on National Statistics. Washington, DC: National Academy Press, 1998.

Schirm, Allen L., and Alan M. Zaslavsky. "Model-based Microsimulation Estimates for States When State Programs Vary." 1998 Proceedings of the Section on Survey Research Methods. Alexandria, VA: American Statistical Association, 1998.

Schirm, Allen L., and Alan M. Zaslavsky. "Reweighting Households to Develop Microsimulation Estimates for States." 1997 Proceedings of the Section on Survey Research Methods. Alexandria, VA: American Statistical Association, 1997.

Schirm, Allen L., and Cindy Long. "Fund Allocation and Small Area Estimation in the WIC Program." 1995 Proceedings of the Section on Survey Research Methods. Alexandria, VA: American Statistical Association, 1995.

Zaslavsky, Alan. "Representing Local Area Adjustments by Reweighting of Households." Survey Methodology, vol. 14, no. 2, December 1988.


Overview & State Estimates Table of Contents

Where to?

Top of Page
Table of Contents

Home Pages:
Office of Health Policy (HP)
Assistant Secretary for Planning and Evaluation (ASPE)
U.S. Department of Health and Human Services (HHS)

Last updated September 17, 2000