GUIDE TO USING THE U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES NATIONAL EVALUATION OF WELFARE-TO-WORK STRATEGIES (NEWWS) TWO-YEAR CHILD OUTCOMES STUDY PUBLIC USE FILE This CD-ROM contains a public use analysis file (N2PC1730.TXT) and documentation for research on the two-year experimental impacts on outcomes for children of six welfare-to-work programs. The programs were operated in 3 of the 7 NEWWS sites: Atlanta, Georgia; Grand Rapids, Michigan; and Riverside, California during the early- to mid-1990s. The file contains the sample, original survey items, and outcome measures analyzed for U.S. Department of Health and Human Services and U.S. Department of Education, Impacts on Children and Families Two Years After Enrollment: Findings from the Child Outcomes Study (2000). The report and this public use file were prepared by Child Trends, who is conducting the Child Outcomes Study under subcontract to the Manpower Demonstration Research Corporation (MDRC). MDRC is conducting the NEWWS Evaluation under a contract with the U.S. Department of Health and Human Services (HHS), funded by HHS under a competitive award, Contract No. HHS-100-89-0030. HHS is also receiving funding for the evaluation from the U.S. Department of Education. The study of one of the sites in the evaluation, Riverside County (California), is also conducted under a contract from the California Department of Social Services (CDSS). CDSS, in turn, is receiving funding from the California State Job Training Coordinating Council, the California Department of Education, HHS, and the Ford Foundation. I. DATA FILE N2PC1730.TXT includes 1,380 variables on 14 records and a total of 3,018 sample members. (See table below for the sample sizes of individual sites.) N2PC1730.TXT contains, in ASCII format, data on survey-based measures of children's developmental outcomes and maternal and family outcomes (for example, maternal psychological well-being and parenting). It also contains 1) Targeted outcome variables 2) Non-targeted outcome variables 3) Covariates used in statistical models 4) Subgroups used in subgroup impacts analyses See Section V. of this memo for more information. IMPORTANT: Values of some measures have been changed to protect sample members' confidentiality. (See notes in codebook) The data file N2RC1326.TXT contains the original values and is available to researchers (on a restricted basis) at the National Center for Health Statistics. See www.aspe.hhs.gov/hsp/newws/data-info.htm for more information. II. SAMPLE Atlanta 1,422 Grand Rapids 646 Riverside 950 TOTAL 3,018 Members of the Child Outcomes Study (COS) sample were all, as of baseline, single-mother recipients of AFDC with at least one 3- to 5-year-old child. A child aged 3 to 5 was chosen as the COS "focal child" and is the subject of most of the survey questions on this file. In most cases, this was the mother's only or youngest child, except in the Grand Rapids site, where about one-third of families had also had a 1- to 2-year-old at baseline. Sample member identifiers and other information that could be used to identify individuals have been deleted from this file. All Child Outcomes Study sample members are also members of the NEWWS 2-Year Client Survey sample (N=9,675) and the Full Impact sample (N=44,569). For all research group sample sizes, please refer to sample size table: SAMPTBL1.TXT located in the \Tables directory. III. USING THE DATA FILE A. Creating a COS analysis file IMPORTANT: The data on N2PC1730.TXT are intended to be analyzed together with 1) Responses by COS sample members to additional survey questions that were asked of all respondents in the Two-Year Client Survey. (CD #2). 2) Additional information on background characteristics; responses to a Private Opinion Survey (POS) administered at baseline; scores on baseline literacy and math tests; and earnings, welfare, and Food Stamp data recorded from administra- tive records. These data were recorded for all members of the NEWWS Full Impact sample. (CD #1). Each record of all 3 files contains a unique sample member IDNUMBER (which varies from 1 to 44569 on the full impact sample). For any sample member, the same IDNUMBER is used on each file. Researchers may build one or several analysis files, depending on their research interests, by identifying the samples (and records) they wish to study and then merging files BY IDNUMBER. The key subsamples are identified on the Full Impact Sample file SAMPLES record. They having a value of 1 on the following variables: FULLSAMP: Full impact sample SRV2RESP: 2-Year Survey respondent COS2RESP: 2-Year Client Outcomes Study respondent B. Review documentation and test the file We strongly suggest that users of this file do the following before conducting any further analyses: 1.Read the C2README.TXT, which gives a brief description of all files included on the CD-ROM. 2.Read the report, particularly Chapter 2, which describes the research design, samples, and data sources; Chapter 3, which describes the program models and sites; Chapters 6 and 7, which summarize the impacts of each program on children's developmental outcomes (in the aggregate and in subgroups, respectively), and Chapter 9, which summarizes the impacts of each program on family outcomes (both in the aggregate and in subgroups). 3.Review the following SAS output located within the \OUTPUT subdirectory. a) BRAKHLTH.LST presents output showing the mean values for the Bracken Basic Concepts Scale/School Readiness Composite (BBCS/SRC) raw scores for focal children of control group members, and the proportion rated in very good or excellent health, in each of the 3 COS sites. (These means can be found in Table 5.1 in Chapter 5 of McGroder et al., 2000.) Reproducing these means will give the user experience running and interpreting means on both continuous and dichotomous outcomes. b) CSEMPLOY.LST presents output showing mean values on selected two-year employment-related outcomes for control group members in each of the 3 COS sites. (These means can be found in Table 8.1 in Chapter 8 of McGroder et al., 2000.) These variables are stored on the Full Impact Sample file (CD #1) and must be merged with the COS data on this file: JEMPYN1 is a dummy variable indicating whether the respondent was employed in the month prior to the survey, and was calculated on all COS respondents in each site. JHRLYPAY represents the hourly wage received in the job held in the month prior to the survey, and was calculated only for those still employed at the time of the two-year interview (to yield a meaningful wage rate for descriptive purposes). JCOVHEAL is a dummy variable indicating whether the respondent had health insurance at her current/most recent job, and was calculated only for those still employed at the interview. VFMEDCOV is a dummy variable indicating whether the respondent had used any transitional Medicaid benefits since random assignment, and was calculated only for those reporting any paid work since random assignment. Running these means will give the user experience running and interpreting means on both continuous and dichotomous outcomes, and will give the user an understanding of how, for descriptive purposes only, to select the appropriate subsample for whom certain employment- related variables apply. Note, however, that to maintain the experimental comparison for impacts analyses, all sample members are retained (e.g., even those without a job are retained in impacts analyses of wage rate, with zeroes assigned to non-employed respondents). c) GLMBRAKN.LST presents output showing two-year impacts of each of the 6 programs on the Bracken assessment of focal children's cognitive school readiness. (These means can be found in Table 6.2 in Chapter 6 of McGroder et al., 2000.) Reproducing these impacts will give the user experience running and interpreting impacts on a continuous child outcome measure. d) LOGHLTH.LST presents outputs showing two-year impacts of each of the 6 programs on the proportion of focal children rated by mothers as being in very good or excellent health. (These means can be found in Table 6.2 in Chapter 6 of McGroder et al., 2000.) Reproducing these impacts will give the user experience running and interpreting impacts on a dichotomous child outcome measure. e) EDURSKSG.LST presents output showing two-year impacts of each of the 6 programs on the Bracken assessment of focal children's cognitive school readiness, for the subgroup of children in each site whose mothers had no educational risk factors and for the subgroup of children whose mothers had any of three educational risk factors at baseline. (These means can be found in Table 7.1 in Appendix C of McGroder et al., 2000.) Reproducing these impacts will give the user experience running and interpreting impacts on a continuous child outcome measure for a lower- and higher-risk subgroup, defined here according to the presence of any of three educational risks. f) CUMRSKSG.LST presents output showing two-year impacts of each of the 6 programs on the proportion of focal children rated by mothers as being in very good or excellent health, for the subgroup of children in each site whose mothers had none or 1 of four composite risks and for the subgroup of children whose mothers had 2-4 of four composite risks at baseline. (These means can be found in Table 7.1 in Appendix C of McGroder et al., 2000.) Reproducing these impacts will give the user experience running and interpreting impacts on a dichotomous child outcome measure for a lower- and higher-risk subgroup, defined here according to the cumulative number of composite family risks). NOTE: These standard weight variable used for estimating program impacts for the Two-Year Client Survey and COS Survey samples is called FIELDWGT. It is stored in the Two-Year Client Survey data set (CD #2). The .LST files described above show output from impact calculations that used a slightly different version of FIELDWGT for each site. Weighting by FIELDWGT will produce the same impact results as weighting by these different versions. 4.Review the rest of the documentation on the public use file, including N2PC_CBK.TXT, the COS file codebook, and N2PCVARS.TXT, which provides additional background information on the measures on this file. 5.Replicate the means and frequencies on the CD-ROM. Recreating these results will familiarize the user with the samples, outcome measures, and the regression models used to estimate program impacts. If the user cannot replicate the output, there is a danger of producing inaccurate results that may lead to inappropriate conclusions. IV. THE IMPACT SAMPLE AND RANDOM ASSIGNMENT DESIGN To test the effectiveness of welfare-to-work program strategies, NEWWS conducted a random assignment experiment: In each research site, people who were required to participate in the program were assigned, by chance, to either a program group that had access to employment and training services and whose members were required to participate in the program or risk a reduction in their monthly AFDC grant, or to a control group, which received no services through the program but whose members could seek out such services on their own from the community. This random assignment design assures that there are no systematic differences between the background characteristics of people in the program and control groups when they enter the study. Thus, any subsequent differences in outcomes between the groups can be attributed with confidence to the effects of the program. These differences are called impacts. In the three Child Outcomes Study sites (Atlanta, Grand Rapids, and Riverside), two different types of welfare-to-work programs were operated side by side--a strongly employment-focused approach, called Labor Force Attachment (or LFA), or a strongly education-focused approach, called Human Capital Development (or HCD). Sample members in Atlanta and Grand Rapids were randomly assigned to an LFA group, an HCD group, or to a control group. Riverside implemented different random assignment designs to study the effects of its LFA and HCD programs. Following program intake procedures established by California's welfare department, Riverside determined each sample member's "need for basic education" just prior to random assignment. Those who had a high school diploma or GED certificate, and scored above minimum levels on both the math and the literacy sections of the GAIN Appraisal test, and were proficient in English, were determined not to need basic education. This group was randomly assigned only to the LFA or control group. Those without a high school diploma or GED certificate, or who scored below minimum levels on either section of the GAIN Appraisal test, or who did not speak English, were determined by the program to be in need of basic education. Individuals in this group were randomly assigned to any of the three Riverside research groups, including the HCD group. Thus, the effects of the LFA approach were tested on the entire sample, but the effects of the HCD approach were tested only on sample members determined to need basic education. Comparisons can be made between outcomes for individuals assigned to each of the program groups and outcomes for those assigned to the control group (LFA versus control; HCD versus control), enabling one to estimate the added benefit of either of these approaches above what the individuals would achieve in the absence of a welfare-to-work program. Additionally, a direct comparison can be made between outcomes for individuals randomly assigned to the two program groups (LFA versus HCD), enabling one to estimate the relative benefits of one welfare-to-work strategy over the other. In the Child Outcomes Study sites, random assignment took place at the JOBS office; that is, sample members were randomly assigned as they attended a program orientation. Random assignment for the different sites took place as indicated by the table below. The impact samples used for the two-year child impacts report (McGroder, Zaslow, Moore, & LeMenestrel, 2000), and therefore for the HHS (2001) data file, includes the full single-mother impact samples in the three Child Outcomes Study sites. Site and Random Assignment Period for Child Outcomes Study Sample Atlanta 03/12/92-01/27/94 Grand Rapids 03/25/92-01/31/94 Riverside 09/03/91-06/30/93 The proportion of sample members randomly assigned to the program and control groups differed across sites. In Atlanta and Grand Rapids, the proportion of sample members in each of the three research groups is roughly equal. Because of Riverside's dual random assignment design, the proportion of sample members in each of the three research groups is not equal. The Riverside research design has several implications for making comparisons between research groups. First, comparisons between the LFA and the HCD groups in Riverside should include only sample members determined to need basic education as of random assignment (DIPLOMA=0). That is, researchers should select HCDs, LFAs, and control group members determined to be in need of basic education when estimating the impacts of Riverside's HCD program, or when comparing the relative effectiveness of the LFA versus the HCD approach in Riverside. Second, Riverside's design also affects the comparability of the HCD research groups to other education-focused programs, particularly to the HCD programs in Atlanta and Grand Rapids. Researchers should select Atlanta and Grand Rapids HCDs and control group members who had not completed high school or received a GED certificate before random assignment (DIPLOMA=0), when comparing results to those of the HCD approach in Riverside. The Riverside design also has implications for calculating LFA impacts in that site. In Riverside, a sample member determined not to need basic education had a 50-50 chance of becoming an LFA (because those "not in need" were not assigned to the HCD group), whereas a sample member determined to need basic education had only a 1 in 3 chance of becoming an LFA. Therefore, those not in need of basic education are overrepresented among the LFAs and control group members, and outcomes for those determined not to need basic education unduly influence unweighted LFA-control group comparisons. Thus, the most accurate estimates of LFA impacts use weighted averages of the outcomes for LFAs found by program staff to be in need of basic education at baseline and LFAs who were determined not to need basic education. This additional weighting procedure is required even after the sample is weighted by FIELDWGT, the standard weight variable for the Two-Year Client Survey and COS Survey data sets. V. VARIABLES INCLUDED ON THE FILE The key variables on the public use file are outlined below: 1.Sample member baseline characteristics As noted above, each sample member is identified by IDNUMBER. Variables available for all sample members in the The Two-Year Client Survey sample and the Full Impact sample (for example, RES2 [research group], ALPHSITE [site]) are also available for the Child Outcomes Study sample. They can be accessed by merging (by IDNUMBER) the Child Outcomes Study datafiles with the Full Impact Sample- and 2-Year Client Survey data files. Variables available only for Child Outcomes Study (COS) sample members include: a) Covariates used in COS impacts models. Demographic characteristics are dummy-coded; measures derived from the Private Opinion Survey pertaining to such variables as psychological well-being, social support, and barriers to employment are trichotomized. Covariates created by Child Trends used the following naming convention: CTxxxTRB, with CT indicating a Child Trends- created variable; xxx indicating a shorthand for the particular measure (e.g., DEP for baseline depressive symptoms); TR denoting a trichotomous measure; and B indicating a baseline measure. (See N2PC_CBK.TXT and N2PCVARS.TXT for a list of covariates, with variable names and a brief description. See N2PCCOVA.TXT for how these composited covariates were created.) b) Subgroups, for which impacts were run separately. (See N2PC_CBK.TXT and N2PCVARS.TXT for a list of subgroups, with variable names and a brief description. See N2PCCOVA.TXT for how these composited subgroup variables were created.) For McGroder, Zaslow, Moore, & LeMenestrel (2000), researchers imputed values for sample members with missing data on baseline characteristics and attitudes used in creating covariates for the impacts models and in creating baseline subgroups for examining the impacts of these six programs for lower-risk and higher-risk families. (Eight composite subgroup variables were created by combining conceptually-related covariates into 8 dichotomous variables. (See N2PC_CBK.TXT and N2PCVARS.TXT, for a list of subgroups and N2PCCOVA.TXT for how these composited subgroup variables were created.) Multivariate mean substitution was used for imputing values. The imputation method and variables used in the mean substitution were identical to the method and variables used for Freedman et al., 2000, though an additional variable (CTRSKTRB) was used to impute baseline covariates, with the premise that the number of family risks at baseline (CTRSKTRB) would increase the precision of the imputation, particularly of baseline attitudinal variables used as covariates in the Child Outcomes Study. (The number of baseline family risks included mothers' lack of a high school degree or GED, mothers' low literacy, mothers' low numeracy, mothers' limited work history, numerous and frequent depressive symptoms, a relatively external locus of control, lacking any of three sources of support, three or more children in the family, welfare receipt exceeding 5 years, and between 4 and 7 (of 7) family barriers to work. CTRSKTRB =0 for families having 0-3 of these risks, =1 for families having 4 or 5 of these risks, =2 for families having 6-10 of these risks. See N2PCCOVA.TXT for how this family risk variable was created.) Covariates that have been imputed have values between 0 and 1 (or, for trichotomous covariates, between 0 and 2), which are stored permanently on the file. Subgroups that have been imputed have values of either 0 (denoting "lower-risk" families) or 1 (denoting "higher-risk" families). Researchers who choose to substitute other values or return the measures to missing can merge (by IDNUMBER) the original values of covariates used in the impacts analyses for both the Child Outcomes Study report (McGroder et al., 2000) and the full NEWWS report (Freedman et al., 2000): namely, MARSTAT, BLACK, GYRADC, YRKREC, YREMP, YREARN, YREARNSQ, YRREC from the Full Impact Sample File (CD #1). 2. Survey-based Outcome Variables Child Outcome Variables From the continuous measures pertaining to an assessment of academic school readiness and to maternal ratings of focal child behavior problems, positive behaviors, and the overall health, dichotomous measures were created to indicate the proportion of children scoring in the top and bottom 25th percentiles on these measures. Thus, in addition to examining experimental impacts on mean levels of a child outcome measure, impacts analyses were also conducted on these "distributional" outcomes, to assess whether a program shifted the distribution of scores. (See N2PC_CBK.TXT and N2PCVARS.TXT for a complete list and description of variables.) Targeted "Intervening Mechanism" Variables The majority of outcomes targeted by these welfare-to-work programs are included in CD#1 and CD#2. However, two additional outcomes were created (from two of these key targeted outcomes) because of their potential relevance to children's developmental outcomes: MINWAGE2 is a dummy variable, coded 1 if the wage reported by mothers for their current/most recent job (JHRLYPAY) was below the minimum wage in the early 1990s ($4.25). MANYHRS2 is a dummy variable, coded 1 if the number of weekly hours reportedly worked by mothers in their current/most recent job (JWRK_HRS) was more than 40. (See N2PC_CBK.TXT and N2PCVARS.TXT) Non-Targeted "Intervening Mechanism" Variables These outcomes were not targeted by these welfare-to-work programs, but they may nevertheless be affected by mothers' involvement in these programs and may have implications for children's developmental outcomes. (See N2PC_CBK.TXT and N2PCVARS.TXT for a complete list and description of variables.) The user should note that, to retain the experimental design, as many of the 3,018 cases as possible should be included in impacts analyses. Most importantly, respondents who were appropriately skipped out of questions because they did not apply to them should be assigned a "0" on the skipped items, thereby retaining these cases in impacts analyses. For example, variables representing the use of child care while employed last month (named EMPxxxxx) contain missing data for respondents reporting only mother care (BBCHCAR24=1) and/or reporting they were not employed at any point in the month prior to the survey (JEMPYN1=0). The user seeking to run impacts on child care variables should assign the value "0" to child care variables for these cases. Likewise, child support award and amount variables contain missing data for respondents reporting that the focal child's biological father was deceased (CCPACUR=1) or currently living in the household (CCPACUR=2). The user seeking to run impacts on child support variables should assign the value "0" to child support variables for these cases. In addition, impacts on dichotomous survey-based outcomes with missing values were calculated (i.e., with PROC LOGISTIC; see below) as though these missing values were 0s. Impacts on continuous survey-based outcomes with missing values were calculated (i.e., with PROC GLM; see below) by listwise deleting cases with missing values on the particular outcome. Note that missing values on outcome measures were never "hard-coded" to 0 in the data file. 3.Original Survey Items Sections of the two-year follow-up survey administered only to Child Outcomes Study families were denoted with a double letter (i.e., modules AA, BB, CC, DD, EE, FF, GG, and HH); variable names for the original items in these sections begin with the appropriate double letter. Variable names for the original items in the interviewer assessment likewise begin with "IA." (See N2PC_CBK.TXT and N2PCVARS.TXT for a complete list and description of variables.) VI. TUTORIAL: REPRODUCING OUTPUT All impacts analyses were run separately within site, selecting only the applicable program group (b = HCD group; j = LFA group) and the applicable control group (all Cs in Atlanta, all Cs in Grand Rapids, all Cs when assessing the impacts of Riverside's LFA program, and only "in-need" Cs when assessing the impacts of Riverside's HCD program). Means and cross-site comparisons of means appearing in Chapter 5 (child outcomes) and Chapter 8 (adult outcomes) are unadjusted but weighted and were obtained using the following SAS programming: PROC GLM; CLASS SITE; MODEL = SITE; LSMEANS SITE/PDIFF STDERR; WEIGHT FIELDWGT; RUN; Impacts analyses of continuous outcome measures used OLS regression methods (PROC GLM, in SAS), and impacts analyses of dichotomous outcome measures used logistic regression methods (PROC LOGISTIC, in SAS). Logistic regression models are fit iteratively and must converge in order to obtain reliable results. Logistic regression models were allowed up to 100 iterations; models that did not converge were not interpreted nor reported in impacts tables. The SAS language used to run all impacts analyses of continuous outcomes, with "b" used in models testing the impact of HCD programs, and "j" used in models testing the impact of LFA programs is shown below. NOTE that the user will not be able to replicate exactly the impact tables in the report for the following reason: Focal child gender did not become available until the five-year follow- up survey, so for the two-year analyses, child gender was "assigned" by same- race/same-ethnicity raters based on the child's first name. This estimation method was over 90 percent accurate. Nevertheless, now that actual focal child gender is available, the user will want to this variable (FCGENDER) in subsequent analyses. PROC GLM; CLASS b (or j); MODEL = b (or j) marstat ctnchtrb black agep gyradc yrkrec fcgender cthsgrkb ctlitrkb ctnumrkb ctwlftrb ctdeptrb ctloctrb ctsuprkb ctbartrb ctwrkrkb ctrsktrb/ solution; LSMEANS b (or j)/PDIFF STDERR; weight fieldwgt; run; The following SAS language was used to run all impacts analyses, and print means (probabilities, "probxxx") for each program group, on dichotomous outcomes. Note for SAS users: Each model, testing a single outcome, must define a separate, unique "prob" variable; otherwise, the means statement will print means (probabilities) of the particular PROBXXX variable (from a prior run) specified. PROC logistic descending; MODEL = black agep gyradc yrkrec fcgender marstat ctnchtrb cthsgrkb ctlitrkb ctnumrkb ctwlftrb ctwrkrkb ctdeptrb ctloctrb ctsuprkb ctbartrb ctrsktrb b (or j) /maxiter = 100; weight fieldwgt; output pred=; proc sort tagsort ; BY B (or j); proc means ; var ; BY B (or j); run; NOTE: 1) Calculations for McGroder et al. 2000 include an additional covariate (CHAGERAD: focal child's age at random assignhment, in months) that is available only in the restricted access version of this file. 2) The covariates, MARSTAT, BLACK, AGEP, GYRADC, YRKREC were collected for all members of the Full Impact Sample (N=44,569) and are stored in the Full Impact Sample File (CD #1). 3) However, their values have been changed slightly to protect sample members' confidentiality. (See NPBCOVER.TXT for details.) For this reason, researchers will obtain slightly different results than those which appear in tables in McGroder et al., 2000. MISSING VALUES FOR COVARIATES 1) FCGENDER is missing for 17 COS respondents. Researchers will need to impute values for FCGENDER when using it as a covariate; otherwise, these sample members will be dropped from the calculations due to listwise deletion. Here are the suggested values for imputing FCGENDER, based on inferences from reading the focal child's first name: IDNUMBER FCGENDER (1=male, 2=female) 1153 2 4306 2 8669 2 11016 2 13305 2 14420 2 16762 2 18956 1 22806 2 23644 2 24073 2 28321 2 29258 1 32191 2 35016 2 40618 1 41433 2 2) Some covariates collected for the Full Impact Sample also have missing values for COS respondents. Specifically, in the model used by Child Trends, the measure BLACK is missing for 6 respondents. (MARSTAT, AGEP, GYRADC, and YRKREC do not have missing values for COS respondents.) Researchers will need to impute values for BLACK and, possibly, for other covariates that researchers choose to add. The measure XBLACK contains imputed values (through mean substitution by site and level of educational attainment for the Full Impact Sample). It is stored in CD #1. Child Trends imputed values in a slightly different way (as described above). The number of sample members involved is too small to be affected this minor variation in imputation procedures.