Approaches to Evaluating Welfare Reform:
Lessons from Five State Demonstrations

Chapter 5:
Data Collection

This chapter addresses three questions concerning data collection in impact evaluations of welfare reform:

  1. What types of baseline data (predemonstration data) are needed on each case, and what are the best methods for collecting these data?
  2. What is the role of follow-up surveys in impact evaluations in which administrative data are available on many key outcomes? What issues should they address? What are appropriate sample sizes and follow-up periods?
  3. What are appropriate standards for quality surveys, in terms of survey administration, response rates, and maintaining sample over time in longitudinal surveys? How should state officials monitor the progress of surveys?

This chapter does not consider the appropriate data collection strategies for the analysis of program implementation or for other evaluation objectives. This focus should be kept in mind in reviewing the recommendations made.

A. BASELINE DATA

Baseline data are data on characteristics and experiences of experimental (or demonstration) and control (or comparison) group members, before the intervention occurs for the experimental or demonstration group, and before the comparable follow-up period begins for the control or comparison group. Such data may be obtained either from administrative records or special data collection efforts. Baseline data are critical to nonexperimental evaluations, since they are needed to control for preexisting differences between the demonstration and comparison groups. In an experimental evaluation, random assignment ensures that the experimental and control groups are the same, on average, in their background characteristics, so controlling for background characteristics is not critical in obtaining unbiased impact estimates. However, baseline data collection merits attention in experimental evaluations for several reasons. Baseline data (1) provide a check on the integrity of random assignment, (2) are used in improving the precision of impact estimates, (3) are used to define subgroups for analysis, and (4) are critical to any nonexperimental analyses (for example, of welfare recidivism, which must be analyzed using nonexperimental methods since not all experimental and control group members leave welfare during the follow-up period). Baseline data may also be important sources of contact information for follow-up surveys. Nonetheless, there have been no explicit standards or requirements for baseline data collection in the federal waiver process.

1. Issues

The rest of this section focuses on baseline data collection in a random-assignment evaluation.(1) Three major issues are involved:

  1. What data should be collected?
  2. When should the data be collected, or (if drawing data from administrative records) what period should the data cover?
  3. What are the relative advantages and disadvantages of collecting baseline data from administrative records, from a special form filled out at intake, or from a survey?

a. Types of Data

The three major types of baseline data are (1) data on background characteristics, (2) identifying information on each individual, and (3) contact information.

Background Characteristics. In an experimental evaluation, detail on background characteristics of sample members is less critical than in a nonexperimental evaluation. However, two kinds of data (which may overlap) generally are very useful: (1) characteristics that define subgroups of interest in the analysis (usually including basic demographic and socioeconomic characteristics), and (2) past histories of the outcomes of interest. In a random-assignment evaluation, multivariate regression models of the outcomes are used largely to reduce the variance of the impact estimates. Variables measuring past histories of the outcomes are generally the most important control variables in such multivariate models because they lead to the largest reductions in the variance of the impact estimate. Data on both case characteristics and past history of the outcome (ideally, over several years) are valuable in assessing the importance of threats to the experimental design, such as crossovers, since they permit assessment of whether the cases that are contaminated or lost from the sample are different from those that remain. Such data may provide enough information to adjust for any experimental-control differences. Other types of background information that are of less central importance include variables that can be used to identify statistical models predicting the effects of program components (see Chapter VI for further discussion). Examples could include data on attitudes toward and knowledge of the welfare system or access to services.

Identifying Information. It is critical to collect enough identifying information at the time of random assignment to ensure that each individual in a case can be tracked in the full range of data systems to be used in the evaluation. In particular, each person's social security number must be carefully entered and verified. This is especially challenging for individuals who are part of applicant cases that are denied benefits or decide to withdraw their application, since eligibility workers have less incentive to obtain full and accurate information on such individuals.

Contact Information. If a follow-up survey is planned, information should be collected at the time of random assignment that will make it easier to contact the case head (the primary adult in the case) for an interview at a later date. Such information includes phone numbers and mailing addresses, as well as names, addresses, and phone numbers for several friends or family members who typically know where the sample member can be reached.

b. Timing of Baseline Data Collection

The issue concerning the timing of baseline data collection is whether it is necessary that data pertain to a period strictly before random assignment or whether the data may cover a period that goes slightly beyond the date of random assignment. If administrative data are being used for baseline characteristics, one concern is that data on new applicants will generally reflect a time slightly after random assignment (for example, if data are extracted only at the end of the quarter), and may thus be affected by the program. If a survey or form filled out at application or redetermination is being used, the issue is whether data must be collected at the time of random assignment (generally no more than a few days or a few hours before random assignment, but with retrospective questions), or whether it is acceptable to collect data within a few days, a few weeks, or even a few months after random assignment.

c. Source of Baseline Data

Good baseline data can be collected either from administrative records or from special forms or surveys, if sufficient planning and resources are devoted to the effort.

Administrative Records. Administrative records are the best source for historical data on outcome variables, especially if the state maintains these records in a consistent format over time. Unemployment Insurance (UI) records data on employment and earnings generally are available. States vary in the quality of these data and in archiving procedures, however, so it may be difficult to obtain these data retrospectively.(2) Administrative data from the AFDC program and related programs from before random assignment generally will be available for ongoing cases, and may be traceable for applicants who participated in the past. Administrative data are less attractive sources for basic demographic data, however. This is because for cases with no previous AFDC history, data as of the end of the month or quarter after random assignment (often, but not always, as entered at application) usually are the only data available. Furthermore, the baseline data entered into the administrative system may be of poorer quality for applicants who are not approved for assistance (if the data are there at all). Administrative data usually are the sources for key identifiers such as social security numbers, but such data again are likely to be of higher quality for approved applicant cases than for denied cases (if data on denied cases are tracked at all); thus, the quality may differ between experimental and control group cases. Finally, administrative data generally are poor sources of contact information.

In principle, administrative systems may be modified to address some of these problems. For example, systems can be modified to record information on denied applicants or to keep certain background variables as recorded at the time of random assignment. Still, such data must be entered by staff for whom they are not immediately useful, and who might also be learning new procedures.

Surveys or Special Forms. Surveys or special forms can be attractive for baseline data collection because they allow collection of information that is not usually in automated data systems. In addition, if timed appropriately, they can be used to obtain data on denied applicant cases. Such special data collection efforts are expensive, but so are modifications to large automated systems. In general, the most useful strategy for collection of baseline data is to collect such data in the program office immediately before random assignment (either in a brief interview with an intake worker or a staff member from the evaluation contractor or through a paper form filled out by the sample member) and then to have staff members review the data for completeness and accuracy.

A telephone survey just after random assignment is a less desirable strategy. Even if sample information is sent to the evaluator quickly, there is often a lag of several months between when the survey begins and when the sample member is located. Surveys several months after random assignment run the risk of lower response rates, contamination by the intervention, and different response rates for experimental and control group members; consequently, they do not provide true baseline data. If the evaluator is not yet chosen and the survey is not yet designed when random assignment begins, there will be further lags before data collection can begin.

If the major reason to do a baseline survey is to obtain contact information for a follow-up survey, and a contact information form was not filled out at application/redetermination, a postcard sent to research sample cases may be acceptable. However, this approach also runs the risk of contamination if there is a differential response rate for treatment and control group members. A small incentive payment to sample members for returning the postcard may help avoid such differences.

2. State Approaches

In the four state evaluations reviewed that have experimental designs, three are relying primarily on baseline data from administrative records:

Special forms or surveys were not used to collect baseline information in Colorado and Michigan. California supplemented the administrative data with a telephone survey, and Minnesota relied completely on an intake form:

3. Analysis and Recommendations

An ideal evaluation would combine California's pre-program longitudinal data with Minnesota's baseline information form. Not all states have the resources to do this. However, the following steps toward collecting better baseline data should receive priority.

First, we recommend that states conducting random-assignment evaluations collect at least minimal baseline information at intake; DHHS could develop a prototype form to be adapted to each state's needs. The form should be brief and should focus on basic background information, identification information for all family members, and contact information. It should be filled out by the applicant or recipient jointly with a welfare agency staff member just before random assignment and eligibility determination. Use of a baseline form would require recipients to go through random assignment at redetermination (as recommended for other reasons in Chapter IV). If possible, the staff person responsible for these forms should be someone other than the eligibility worker, and obtaining good data on these forms should be designated as a key part of this person's job.

Second, we recommend that states maintain historical data on program participation and benefits in such a way that the data may be linked to create longitudinal files. If feasible, we recommend that states create the longitudinal files. This effort could be linked to the new requirements for lifetime limits on cash assistance, which will require states to move in this direction. When historical administrative data on outcomes are available, states should use these data in their welfare reform evaluations.

B. ROLE OF FOLLOW-UP SURVEYS

Many of the welfare reform changes are likely to have their most immediate impacts on employment, earnings, and public assistance outcomes. Such impacts can be measured through administrative data. Nonetheless, the waiver terms and conditions required states, to the extent feasible, to collect data on outcomes related to family stability and children's welfare. These outcomes are often of considerably policy interest, but cannot readily be measured with administrative data. States have almost always proposed surveys in response to this requirement.

This section considers the appropriate role of follow-up surveys of experimental (or demonstration) and control (or comparison) group members in an impact analysis. Other possible roles for surveys (such as to collect data on program participation as part of a process analysis) are not considered here.

1. Issues

There are four major questions concerning the role of surveys in welfare reform impact evaluations:

  1. Is a follow-up survey needed at all?
  2. If a survey is conducted, what questions should it address?
  3. What standards should be set for sample sizes for a follow-up survey?
  4. What considerations should affect the timing and frequency of follow-up surveys?

This section focuses on designing a survey to meet the goals of the impact analysis; Section C focuses on operational issues related to collecting high-quality data in surveys.

a. Is a Survey Necessary?

If administrative data cover all of the major outcomes of interest in the evaluation, conducting a survey to get at additional outcomes may not be worthwhile. If resources permit, surveys may be used to study the major outcomes in more depth or to obtain data on secondary outcomes. Such surveys, however, may be too expensive for small states or small evaluations to pursue. To be useful to the impact evaluation, enough resources should be available to obtain high response rates (see Section C).

On the other hand, some interventions primarily target outcomes for which there are no readily available administrative data. Examples include "family cap" provisions (under which no additional benefits are awarded when another child is born to someone on assistance), provisions designed to increase school attendance and immunization rates, and changes in the AFDC-UP program that are intended to promote marriage and family stability. Even in these instances, a survey may not be the only option. There are often less readily available sources of administrative data (such as birth records, school records data, or Medicaid records) that may be more cost-effective or may provide data of better quality than survey data. These alternative data sources have limitations as well, but should be considered carefully. The plan for a survey to be done by the evaluator several years after implementation may take the pressure off state agency staff to obtain other administrative data and, thus, have a counterproductive effect. If alternative sources of administrative data are not planned for early on, important opportunities may be lost, since some of these sources (such as school records) require signed consent forms, and such signatures are most effectively obtained at program intake.

b. Scope and Focus of the Survey

One problem with a survey is that it can become a chance to find out "everything we always wanted to know about the welfare population but have not had the chance to ask." Many stakeholders may wish to pursue their own issues through the survey. Once a survey is undertaken, collecting additional information has relatively low cost (up to a point), so the desire to pursue many issues is understandable. For example, most of the interviewer's time in administering a survey is often used in locating the respondent and gaining cooperation, so the cost of adding another 10 minutes to a 30-minute survey may be modest in comparison. Once an interview goes beyond about 45 minutes, however, maintaining respondent cooperation becomes substantially more difficult.

Nonetheless, the more the survey targets key questions of interest, the easier it is to design the survey to get the best results at the lowest cost. For example, if effects of provisions for UP families are of particular interest, it may be useful to oversample these families. If measuring child care costs is a major concern, it may be useful to stratify the sample by the presence of preschool children. If the major goal of the survey is to obtain information on child care and transportation costs, then questions in this area may need to be quite detailed, even if that implies omitting questions on other interesting but less essential topics.

c. Sample Sizes

Appropriate sample size standards for surveys are also a concern. Because survey data collection is so expensive, there is consensus that sample sizes for surveys need not be as large as in the part of the evaluation based on administrative records. As in selection of the overall sample, however, it is useful to be clear about the precision standard being used and the trade-offs between survey cost and precision.

d. Timing and Frequency of Surveys

In determining the timing of surveys, the challenge is to strike a balance between ensuring an adequate follow-up period for the full impacts of welfare reform to be realized and preventing the follow-up period from being so long that locating respondents and accurate recall become major problems. Multiple surveys (as opposed to a single follow-up survey) may be useful if the impact of a program is thought to evolve over time or if the plan is to focus on different issues at different points in time (for example, a first survey to focus on program participation/process issues and a second survey to focus on costs of working). Multiple surveys at relatively short intervals also offer the opportunity to update contact information and thus make it easier to locate sample members at later follow-up points.

2. State Approaches

On the basis of the state evaluations we have reviewed, the surveys planned or in progress in current waiver demonstrations are broad in scope, generally include samples of from 1,000 to 2,000 cases, and are scheduled to occur either at regular intervals or once relatively late in the demonstration period. Information collected tends to include background information and two types of outcomes: (1) economic outcomes not obtainable from administrative data, such as hours of work, wages, participation in non- JOBS education and training, and costs of work such as child care and transportation; and (2) noneconomic outcomes, such as family structure, fertility, health status, health behaviors, and food security. The large number of outcomes pursued in some instances has led to lengthy and expensive surveys. The sample sizes appear modest (particularly for assessing impacts on outcomes such as family structure) because such impacts are expected to be small and therefore more difficult to detect.

The surveys planned or conducted in the five states are described here:

All of these surveys have a broad focus, and many include collecting more detailed data on outcomes already available to some extent in administrative data.

3. Analysis and Recommendations

In Chapter II, we discussed the usefulness of narrowing or prioritizing the list of outcomes covered in welfare reform evaluations. Many states have proposed follow-up surveys, in large part to respond to the broad array of outcomes they have been required to examine in the terms and conditions for federal waivers. We recommend that surveys be used more judiciously. In particular, we recommend that states consider other sources of administrative data that may be available as alternatives to surveys; examples include vital statistics and school records. Obtaining information from administrative systems outside the welfare agency presents many challenges, including confidentiality; however, such alternatives may provide more reliable data at lower cost. We also recommend that surveys focus on a few selected topics (except in particularly large or important evaluations, where it makes sense to invest the resources for a broader survey). The goals of a survey should be clearly stated and attainable with the resources planned; poorly designed surveys may be costly but yield little reliable information.

DHHS could help to ensure that particular topics are covered in a similar manner in states that are attempting to tackle similar problems; one approach would be to promote joint effort in instrument design. A good example of how the federal government has played such a role is the demonstrations of cashing out Food Stamps in the early 1990s; the Food and Nutrition Service funded development of a common food use instrument for evaluations in three states.

C. ACHIEVING HIGH RESPONSE RATES

In our review of the five state waiver evaluations, we found varying approaches to surveys; these approaches depended in part on the evaluator's experience in surveys of low-income populations. We also noted that some of the surveys have achieved relatively low response rates; DHHS staff reports that low response rates have been a concern in welfare reform waiver evaluations in other states as well. Low response rates are a particular concern in using a survey for the impact evaluation, since the lower the response rate, the more risk of bias in impact estimates based on respondents alone. Here, we consider appropriate standards for an acceptable response rate for a follow-up survey to be used in estimating impacts, as well as the survey practices that are particularly conducive to achieving high response rates and maintaining sample and data quality over time.

1. Issues

High response rates are critical in surveys that are part of an impact evaluation in order to minimize the potential for nonresponse to bias the impact estimates. Nonresponse may bias the impact estimate because those who do not respond to the survey may experience different program impacts from those who do respond. If nonresponse is not correlated with experimental/control status, estimated impacts are unbiased for those who do complete the survey, but not for the overall population. If nonresponse is correlated with experimental/control status (as, for instance, when experimental group members leave assistance earlier, and are then harder to locate because contact information is more out of date) then the impact estimates will be biased even for respondents.

Thus, there are two major issues:

  1. Should there be a minimum standard for an acceptable response rate in a follow-up survey to be used in developing impact estimates and, if so, what should the standard be, both for initial and (if applicable) later rounds of followup?
  2. What knowledge is there in the evaluation community concerning survey practices that are conducive to achieving and maintaining high response rates, and how can federal and state officials promote use of these practices?

In discussing these issues, we draw heavily on our experiences as working evaluators and on the insights of the expert panel convened for this project.

a. Standards for Response Rates

The Office of Management and Budget sets the standard of a minimum 80 percent response rate in surveys funded directly by the federal government. For surveys of low-income populations, achieving high response rates is a particular challenge. Low-income families tend to be more mobile, often do not have telephones, and may be suspicious of outsiders asking them questions because of concern about losing government benefits. In some areas, large subgroups of the low-income population do not speak English. However, the authors and experts consulted for this report believe that response rates in the 75 to 80 percent range are achievable with low-income populations when quality survey methods are used. Response rates may be raised to around 85 percent with ample resources for tracking and repeated interview attempts, but they rarely exceed this level.

b. Survey Methods that Promote High Response Rates

In the evaluation community, a number of factors are known to be important in obtaining high survey response rates in follow-up surveys of low-income populations:

The lack of any of these factors does not necessarily imply in itself that a survey will have poor response rates, but it may be seen as a risk factor that requires careful monitoring. An additional risk factor exists if an organization that lacks a track record in interviewing low-income populations conducts the survey. Experienced survey organizations have staff from the level of survey director to interviewer who are adept in the techniques of reaching low-income respondents, as well as resources to plan and organize survey operations to achieve high response rates within the time frames needed for timely followup.

2. State Approaches

Among the five waiver evaluations reviewed for this report, two (in Wisconsin and Michigan) have not yet started their follow-up surveys. However, we document how each evaluator planned to conduct the survey; in three of the states, we have information on response rates. We first review how well each evaluation did (or expects to do) in following the survey practices discussed earlier, and then the response rates realized.

a. Survey Mode

The University of Colorado is using a mixed-mode approach for follow-up surveys for the CPREP evaluation. They assumed 40 percent of interviews would be in person. Mixed-mode interviewing also is being used in the Minnesota evaluations and is planned in the Michigan evaluation. In California and Wisconsin, only telephone interviews are being conducted. In the Minnesota surveys, RTI is using computer-assisted interviewing both for telephone and in-person interviews (CATI/CAPI).(4) The California survey is a CATI survey; this was the reason given for not using in-person followup. For the Wisconsin evaluation, MAXIMUS originally had planned to do mail surveys with telephone followup, but later switched to doing all surveys by telephone.

b. Respondent Payments

Respondents are being paid in the California and Colorado evaluations, and payments are planned in Michigan. In Minnesota, for the 12-month followup, payments are only being offered to those who, according to Minnesota's AFDC system, are not on assistance or do not have a phone number. For the 36-month followup, all respondents will be paid. There is no mention of respondent payments in the Wisconsin WNW evaluation plan. Payments are countable income in all of these states.

c. Initial Contact Information

Lack of contact information from the time of random assignment has been a major problem for the California and Colorado surveys. In California, the initial baseline interviewing did not begin until a year after the demonstration began, in part because of delays in obtaining sample and in part because the development of the survey instrument was delayed, as many stakeholders requested additions and revisions to the survey.(5) The sampling delays occurred because the sample is selected about two months after intake from the state Medicaid data system, and it then takes about another month before the counties forward initial data on sampled cases to the UC-Berkeley Survey Research Center. These data must be processed and samples selected before interviewing can begin. By the time attempts were made to contact sample members, the contact information from the county case files was about a year old. Despite the use of various tracking methods (discussed more later), the Survey Research Center was at a considerable disadvantage because of the delay before the initial contact was made and the lack of information on friends and relatives (since there was no distinct research sample intake at which such information could be collected).

The situation in Colorado was similar. The evaluator sent out letters introducing the survey and contact information forms to be mailed in as soon as possible. However, because the ongoing case sample was selected before the evaluation contractor was chosen, and because it took time to transfer sample information to the contractor, about four to six months had elapsed between the selection of the ongoing case sample and the mailing of the letters. Only 18 percent of the contact forms were completed and returned (but low response rates are not unusual in mail surveys).

In the Wisconsin WNW evaluation, the evaluator planned to obtain contact information in the survey for the process analysis, which was scheduled to occur about four months after sample intake. However, that survey was delayed because the state data system was going through a major revision, and sample information on recipient cases was not made available to MAXIMUS until nine months after the intervention began. The process analysis survey thus occurred 10 to 12 months after enrollment.

The 12-month survey for the Minnesota MFIP evaluation, in contrast, has had the advantage of drawing contact information from a form filled out at the time of random assignment, and has achieved high response rates, as discussed further later.

d. Follow-Up Contact Information

In Wisconsin, the plan was to conduct annual surveys after the case left assistance; given the two-year WNW time limit on benefits, the first survey would probably occur within 18 months after the initial contact for the process study. Surveys in California and Colorado are planned to occur at 12 to 18-month intervals. In the Michigan and Minnesota evaluations, at least part of the sample would be surveyed only once, as much as three to four years after random assignment. We do not know of any plans to contact sample members during the intervening period.

e. Tracking Methods

Careful use of a range of tracking procedures and of tracking databases can be critical in locating respondents. In California and Colorado, as discussed above, the evaluators were handicapped from the start by delays between when enrollment in the demonstration began and when the survey began. In both California and Colorado, as soon as the survey organizations had the sample information, they sent out requests by mail for contact information to the address in the public assistance records. In California, UC- Berkeley offered a $5.00 incentive for returning the information. The response rate to this request was 30 percent in California and 18 percent in Colorado.

In California, for the Wave I survey, the UC-Berkeley Survey Research Center asked the county welfare departments to check their records on cases they were not able to locate; three of the four counties were able to comply with these requests. In addition, the Survey Research Center used directory assistance and address corrections from the post office, as well as the state Parent Locator system used for child support enforcement. For the Wave II survey, they did not use the Parent Locator system; by that time, however, they had on-line access to credit bureau and motor vehicle registration databases, as well as contact information obtained in the Wave I interview. They did not attempt to contact nonrespondents to Wave I.

In Colorado, the evaluator is relying largely on checks of state automated systems for the AFDC, Food Stamp, and child support enforcement programs, as well as on-line telephone directories. They sometimes have reviewed hard-copy case files for the names of friends or relatives. Although one progress report discusses obtaining credit bureau data, there is no indication this was implemented. The original Colorado survey plans indicated that on each successive wave they would only contact those who had completed the previous wave; this would lead to smaller samples sizes for later waves. There is no indication they have reconsidered this plan at this time, although use of better tracking methods could improve response rates on later waves.

The Wisconsin WNW work plan discusses only directory assistance and public assistance record checks. We have no information on tracking methods planned for the Michigan and Minnesota surveys. However, Abt Associates, who will conduct the Michigan survey, and RTI, who is conducting the Minnesota surveys, are experienced firms with access to a wide range of tracking methods. In Minnesota, MDRC staff reported that cases were tracked and interviewed throughout the state and sometimes even if they had moved to other states.

f. Response Rate Goals and Actual Experience

In the surveys in California and Colorado, response rates have been low enough to put the usefulness of these surveys for the impact analysis in considerable doubt. For the first Colorado follow-up survey, after nearly four months of survey operations, the response rate for the ongoing sample was 41 percent, much lower than the 60 percent goal; the rate was the same for experimentals and controls. Locating respondents was the key problem. In California, the Wave I English/Spanish survey began a year after random assignment and took 10 months to complete for ongoing cases; the response rate was just under 60 percent, but oversampling allowed UC-Berkeley to reach the desired number of completes.(6) Locating respondents again was the major problem. The response rate for Wave II, which began 18 months after Wave I, has been over 80 percent of those reached in Wave I, or about 50 percent of the original sample. No attempts were made to contact, for Wave II, cases who were not interviewed as part of Wave I.

In the Michigan demonstration, the target response rate for the survey is 80 percent. We have no information on actual experience, but the long follow-up period of four years makes this seem an ambitious goal. In Minnesota, the target for the 12-month followup was 85 percent; for the 36-month followup the target is 80 percent. The first follow-up survey in Minnesota achieved a response rate of 84 percent, very close to the goal; MDRC staff members report that they and RTI decided to end the survey slightly below the target to preserve resources for the second follow-up survey. No target response rate was set for the Wisconsin WNW demonstration. The follow-up survey in Wisconsin has not yet occurred.

3. Analysis and Recommendations

We recommend specifying a response rate standard for surveys that are to be used for impact analysis. An appropriate standard would be from 70 to 80 percent (with the lower end of the range for later rounds of followup); such response rates are achievable with the types of practices discussed in Section C.1, including contact information collected at intake and updated regularly. Welfare agencies could also encourage achievement of high response rates by exempting respondent payments from countable income. In addition, it should be standard practice to compare the characteristics of respondents and nonrespondents in the available administrative data, to assess the likely magnitude of response bias.

We also recommend that the sponsors of welfare reform evaluations monitor data collection plans carefully to ensure that survey practices needed to achieve high response rates are being used. In particular, we recommend not approving any survey plan that lacks two or more of the "best practices" described in Section C.1. If the state is not able to invest the level of resources implied by these practices, then the survey may not produce data of sufficient quality for an impact analysis. Finally, we recommend close monitoring of surveys that lack any one of these practices or that are conducted by a survey organization that is relatively inexperienced with low-income populations. If such surveys produce low response rates early on, then states should carefully consider whether to add resources to survey operations (for example, by adding field followup to a telephone survey) or whether to discontinue the survey altogether.

Notes

(1) We focus on experimental evaluations because baseline data collection in these evaluations has typically received less attention. Data collection needs are similar in nonexperimental evaluations with comparison site designs. In nonexperimental designs in which a pre-program sample (or pre-program data on the same sample) serves as a comparison group, data needs typically are greater--comparable to those for the demonstration follow-up period.

(2)UI records data are data collected from employers and maintained by state UI agencies to use in determining whether an individual qualifies for UI benefits. These data include information on all jobs individuals hold in a quarter and total earnings in each quarter for each job. These data generally are available to other state agencies for legitimate research (with appropriate confidentiality restrictions).

(3)For example, in the evaluation of the Minority Female Single Parent Demonstration, there were 12-month and 30-month follow-up interviews. By recontacting all sample members for the 30-month interview, regardless of whether they had completed the 12-month interview, response rates for the 30-month interview were increased from 73 to 80 percent (Rangarajan et al.1992).

(4)In computer-assisted telephone interviewing (CATI), the survey instrument appears before the interviewer on a computer screen, and responses are typed directly into a computer. Skips of particular questions based on particular responses may be programmed into the computer, reducing the scope for interviewer error, and the data entered by interviewers is converted directly into a research database. The availability of portable personal computers has led to recent growth of computer-assisted personal interviewing (CAPI), but this technology is less widely available.

(5)As noted earlier, the impact evaluation in California will rely mostly on baseline data from administrative records.

(6)The Wave I foreign language survey had a higher response rate than the English/Spanish survey (around 70 percent).


Where to?

Top of Page
Table of Contents

Home Pages:
Human Services Policy
Assistant Secretary for Planning and Evaluation
U.S. Department of Health and Human Services

Updated 09/24/01