U.S. Department of Health and Human Services
Examination of the Equivalence of Treatment and Control Groups and the Comparability of Baseline Data
Randall S. Brown and Peter A. Mossel
Mathematica Policy Research, Inc.
This report was prepared under contract #HHS-100-80-0157 between the U.S. Department of Health and Human Services (HHS), Office of Social Services Policy (now the Office of Disability, Aging and Long-Term Care Policy) and Mathematica Policy Research, Inc. For additional information about the study, you may visit the DALTCP home page at http://aspe.hhs.gov/daltcp/home.htm or contact the office at HHS/ASPE/DALTCP, Room 424E, H.H. Humphrey Building, 200 Independence Avenue, SW, Washington, DC 20201. The e-mail address is: webmaster.DALTCP@hhs.gov. The DALTCP Project Officer was Robert Clark.
The National Long Term Care Demonstration was established by the U.S. Department of Health and Human Services to evaluate community-based approaches to long term care for the elderly. Specifically, the channeling demonstration is testing two models of organizing community care as alternatives to the current institutionally oriented system. Both offer a central point of intake for individuals in need, systematic assessment of their needs, and ongoing case management to arrange and monitor the provision of services. The basic case management model is designed to manage services currently available to clients; the financial control model is intended to expand the range of publicly financed services available to the client while controlling total costs. Through contracts with the participating states, local agencies in ten communities around the country were selected to implement the demonstration, five implementing each model. The demonstration is designed to determine (1) the impact of these approaches on costs, utilization of services, informal caregivers, and client well-being; (2) the feasibility of implementing future programs like channeling; and (3) its cost-effectiveness.
In designing the evaluation of the demonstration, great care was taken to ensure that the results of that evaluation would not be called into serious doubt because of methodological shortcomings. Thus, an experimental design was used, under which eligible channeling applicants in each of the 10 sites were randomly assigned to the treatment group, which was offered channeling services, or to the control group, which was not. Because of the random assignment, the control group should be very similar to the treatment group, in the aggregate, on both observable and unobservable characteristics, and therefore, their experience should provide the best possible estimate of what would have happened to the treatment group had the demonstration not existed. Under the design, estimates of program impacts are obtained by statistical comparison of the post-randomization experience of the two groups. The integrity of this design was preserved by the implementation of procedures designed to prevent "contamination" of the control group; that is, demonstration staff interacting with or making efforts on the behalf of control group members. These procedures included the use of research interviewers to collect the follow-up data on outcome measures for both groups.
One aspect of the evaluation design which could, however, raise questions about the accuracy of the estimates of channeling impacts that eventually will be obtained is the fact that the baseline data were collected by different types of interviewers for the two groups. The combination of several factors--conflicts between research needs and good case management practices, budget constraints, and the desire to minimize the burden on sample members--led to the decision that baseline data would be collected by channeling staff for members of the treatment group, and by research interviewers for the control group. For a variety of reasons, this difference in data collection could result in differences between the two groups on observed data for some characteristics, when in fact no real differences exist between the two groups on these baseline characteristics. Estimates of channeling impacts that are obtained from regression models which use these baseline data as auxiliary control variables could then be distorted, because these artificial differences between the two groups are treated as real pre-treatment differences that must be accounted for (netted out) by the statistical procedure.
The purposes of this report are to determine whether the baseline data for treatments and controls are comparable and, if they are not comparable, what should be done to ensure that regression estimates of channeling impacts are not biased by such differences. Reasons why baseline data may differ for the two groups were identified, including:
True differences at randomization due to chance
True differences due to different patterns of attrition between randomization and baseline
Spurious differences due to differences in the length of time between randomization and baseline for the two groups
Spurious differences due to incentives of clients or their proxy respondents to overreport needs and impairments to channeling staff (who will be using the baseline to prepare a care plan for the client), and to underreport ability to pay for needed services
Spurious differences due to differences between research interviewers and channeling staff in how questions are asked (including clarifications and probing), and how answers are recorded
Treatment-induced differences due to anticipated or actual effects of channeling on the treatment group prior to baseline (and known lack of assistance from channeling for the control group)
Spurious differences due to the differential usage of proxy respondents
The first task undertaken was to determine whether there were true differences between treatments and controls, either due to chance or to differential attrition. Comparison of treatment and control groups on screen variables for the full sample indicated virtually no differences outside the range of normal chance variation. We then compared the screen characteristics of treatments and controls for baseline respondents to determine whether attrition at baseline had led to differences between the remaining treatment and control groups. Again, we found very few significant differences between the groups (the exceptions were in the percent black and the percent impaired on eating in basic sites, and in impairment on bathing in financial control sites), and a model of baseline attrition confirmed that only for a few screen variables was the relationship between sample member characteristics and the probability of response significantly different between treatment and control groups.
Despite the overwhelming evidence, based on screen characteristics, that there were essentially no true treatment/control differences at randomization due to chance, and only minor differences due to differential attrition, we found a substantial number of large and statistically significant differences between the two groups on baseline variables, including some of the same variables for which no differences were found on the screen. Although real differences between the two groups (either due to differential attrition, or to pre-existing differences not detected by screen measures) could not be ruled out entirely, we concluded that differential measurement was largely responsible for the observed baseline differences between treatments and controls. This conclusion was based on several pieces of evidence:
The finding that very few screen variables exhibited statistically significant differences between treatments and controls among baseline respondents
The finding that few screen variables exhibited a significantly different impact on the probability of baseline response for treatments than for controls
The many statistically significant and occasionally large treatment/control differences found on baseline variables, including some for which no difference was found on the screen version of the same variable
The general correspondence of results with a priori expectations about which variables were likely to be affected by noncomparable measurement and the direction of the treatment/control differences
The timing and proxy use differences that are known to exist at baseline and which had obvious effects on some baseline variables and highly probable effects on others
The general correspondence of treatment/control differences at baseline with baseline-reinterview differences observed for a subsample of treatment group members who were given a second baseline by research interviewers
We then showed how regression estimates of channeling impacts would be affected by the use of noncomparable data items as auxiliary control variables in the regression. The expressions for bias induced by noncomparable data suggested two types of tests of baseline variables to determine whether the baseline differences are so large that it is unlikely that they represent true treatment/control differences and therefore might cause significant bias in estimates of channeling impacts, or small enough that they may well be due to chance and are unlikely to affect impact estimates. The two tests--one for baseline variables for which comparable measures are available on the screen, and one for variables that have no such screen counterparts--make use of all of the available information. Thus, for the first type of variable we tested whether the treatment/control differences at baseline were significantly different from the treatment/control differences in the screen version of the variable for the same individuals. For those variables for which the hypothesis of no differential was rejected, the baseline version of the variable was considered noncomparable, and the screen version will be used in future analysis. Variables for which no significant differential was found were considered to be comparably measured at baseline and therefore the baseline version was declared appropriate to include as a control variable in future analyses. The conclusions based on this procedure were then compared to the results obtained from the reinterview sample, which were based on comparison of baseline and reinterview responses on these same questions. The two sets of results were found to be broadly consistent in terms of which variables appeared to be noncomparable, and the direction of the differences.
For variables that had no screen counterpart, the procedure used was to regress baseline variables on treatment status, site, and the variables selected from the group with screen counterparts, test whether the coefficients on the two treatment status variables (for basic and financial control models) were significantly different form zero, and consider noncomparable those variables for which this hypothesis was rejected. This is a test of whether there are treatment/control differences in these variables beyond what could be explained by the small observed differences at screen in a set of other variables. Under the assumption that any such remaining differences are due to noncomparable data rather than real differences, limiting the set of control variables in future impact analyses to baseline variables which do not exhibit significant treatment/control differences will produce estimates of channeling impacts for which we expect no bias due to noncomparability. Again, the results obtained were found to be broadly consistent with the reinterview sample comparisons of baseline and reinterview responses.
The two sets of tests yielded the following conclusions regarding the comparability of the baseline variables that were used as control variables in the preliminary analysis of channeling impacts (Kemper et al., 1984), and whether they should be excluded from future regressions (asterisks indicate that a screen version of the variable exists):
|COMPARABLE BASELINE VARIABLES
|NONCOMPARABLE BASELINE VARIABLES
|Living arrangement (*)
|Nursing home waiting list (*)
|Unmet needs (*)
|Attitude toward nursing home
|Hours of informal care received (per week)
|Hours of formal care received (per week)
|Number of physician visits
|Global life satisfaction
|Self-rating of health
|Restricted days last 2 months
|Hospital days last 2 months
|Nursing home days last 2 months
For noncomparable baseline variables with screen counterparts, the screen version will be used as a control variable in its place. The other noncomparable baseline variables will be excluded from the set of control variables, with the exception of hospital and nursing home days, which will be replaced with information from the screen on whether the sample member was in a hospital or nursing home at screen or referred to channeling by hospital or nursing home staff.
The exclusion of these variables is not likely to cause serious problems for the analysis. Estimates of channeling impacts obtained from regressions with control variables drawn only from the screen were found to be different for some outcome measures from those obtained from regressions using the baseline, as expected, but the standard errors of these impact estimates were virtually unaffected by this difference in regressors. Thus, the argument that increased precision will be obtained when the more complete baseline data are used as control variables does not appear to be borne out in practice here. It is also the case that attrition-induced differences between treatments and controls on excluded characteristics can no be controlled for in estimating channeling impacts. However, the evidence presented suggests that real differences between the two groups are likely to be considerably smaller than the observed differences in the data. Thus, failure to control for such real differences, if they exist, is expected to cause less bias than attempting to account for attrition biases by using control variables that are not comparably measured for the two groups. We will rely on the attrition bias analysis (i.e., an extension of the analysis presented in Mossel and Brown, 1984) to account for these as well as other unobserved differences between treatment and control groups.
The major problem with the exclusion of some variables is that certain potentially interesting subgroups, such as those defined by mental functioning, IADL, or attitudes toward nursing homes cannot be examined. Again, this shortcoming is less critical than the problems of interpretation and inference that would be created by using noncomparable data to construct subgroups. An additional, but minor problem is that we will also have somewhat more missing data for some control variables since screen imputations will not be possible if the screen version of a variable is now to be used as the control variable (e.g., income).
Based on the results and arguments above, we view the decisions presented in this report regarding use of baseline data in future analyses as the best approach. To the extent that future work on the analysis of channeling impacts results in other baseline variables being considered for use as control variables or to define subgroups, such variables must meet these same criteria before being used in the estimation of channeling impacts.