[Previous] [Table of Contents] [Next]

POSITIVE YOUTH DEVELOPMENT
IN THE UNITED STATES:

CHAPTER FOUR:
SUMMARY AND CONCLUSIONS

The main focus of this chapter is to summarize program and evaluation characteristics of well-evaluated programs (n=25). First, however, we will summarize the reasons that programs or evaluations were excluded from the well-evaluated and effective category.

The second section presents the results of our analysis in summary form. The 25 effective programs are analyzed by constructs, domains, strategies, and other elements of successful youth development programs. When appropriate, we compare these to the other 44 evaluated programs to see in what ways strongly evaluated programs that do produce positive effects are different from the field of other evaluated positive youth development programs.

The third section presents the findings on methodological issues surrounding positive youth development programs. This section covers how evaluations of positive youth development programs addressed research design, statistical methods, attrition, outcome measures, and other important aspects of assessment technology.

Finally we present conclusions and comments concerning future directions for the field, for both positive youth development interventions and for the evaluations that they use.

Evaluations Excluded from the Effective Interventions

Seventy-seven evaluations were identified by this review for analysis; however, eight were sufficiently limited by missing information in key parts of the evaluation that they had to be removed from the summary analyses of programs. Thus 69 programs — 25 well-evaluated programs, and 44 that did not have adequate evaluations — are analyzed. Further, of the excluded programs, although 44 were included in the summary, not all contained complete information that permitted comparisons with the group of effective programs on each dimension. Therefore, depending on the dimension in question, the number of excluded programs used as the basis for comparison will vary slightly (see Appendix J).

Generally, programs were excluded from the effective category based on weaknesses in the evaluation that made it impossible to draw conclusions about the intervention's effects on youth behavior. Or, there was a strong design and no effects were shown. Thus, four types of problems caused placement of programs in this category:  (1) evaluation design weaknesses; (2) insufficient behavioral outcome measures; (3) outcomes showing no impact for the intervention, or limited to only measurement of knowledge or attitude changes; and (4) lack of methodological information needed to draw conclusions about program effectiveness. More specifically, some programs (n= 12) had evaluations which received a "medium confidence" designation. This group fell into two sub-sets:  one set (n=8) had a reasonable design but provided insufficient data in the report to conclude that the comparison groups were solidly equivalent; the second sub-set (n= 4) was excluded because there was a stronger evaluation of the same program. Another group of programs (n=19) received the designation of "low confidence" for one of two reasons:  either the description of the comparison group did not establish that the intervention and control groups were equivalent (n=10), or there was no comparison group at all (n=9). In addition to these programs with design or methodological issues, some interventions with very strong designs (n=5) were excluded because they had no significant outcomes (n=3), or because their evaluation measured only attitude and knowledge changes, not behavioral outcomes (n=2) (see Appendix K).

Characteristics of Effective Positive Youth Development Programs

Social Domains

There were eight evaluations (32%) of single domain-focused programs, two based in communities and six based in schools. Eight evaluations (28%) reported programs in two domains, one of these combined school and community, and seven of these combined school and family. Nine evaluations (36%) reported programs in three domains, seven combined school, family, and community, one combined family, church, and community, and one combined school, community, and workplace. Thus the total number reporting multiple-domain interventions was 17 (68%).

More than half (21, or 53%) of the excluded programs were in a single domain; about a quarter (10, or 25%) were in two domains, six (15%) were in three, and three (8%) were in four domains.

Representation of the School Setting in Positive Youth Development Programs

Across the possible settings in which effective, well-evaluated positive youth development programs were conducted, the school domain was by far the most widely represented, with 22 (88%) programs basing at least some of their components there. Sixsingle domain programs were based in schools, and 16 multiple-domain programs had a school component. The typical profile of a multiple-domain program that incorporated a school component used the school as a primary base of operations (e.g., for trainings conducted in classrooms), for strategic and consistent access to children, and for access to school resources (e.g., teachers trained to implement the intervention curriculum).

The finding for school domains was similar in excluded programs, with 32 (80%) having some school component.

Representation of the Family in Positive Youth Development Programs

Family domain programs were identified in one of two ways:  if the program used some components based in the physical home setting, or if evaluators used other methods not necessarily in the home setting to involve the family or parents. No evaluations were identified of effective single domain programs operating solely in a family setting. However, among multiple-domain programs, the family component was widely represented. Overall, 15 (60%) of the effective programs used family or parent strategies. Among two and three-domain programs, almost all addressed some of their program strategies to the family or parents (seven of eight of the two-domain and eight of nine of the three-domain).

Only eight (20%) of excluded programs had a family component or operated in a family setting.

Representation of the Community in Positive Youth Development Programs

The community domain was represented in 12 (48%) programs. Two of these programs were based solely in the community, and only one of the two-domain interventions combined community and school strategies. All nine of the three-domain programs incorporated some community-based strategies. The profile of these multiple-domain programs indicated that the community was not typically a primary base of operations for most programs. They used the community's resources and physical opportunities to augment or enhance strategies based in the other domains (e.g., volunteering in the community as a way to practice new principles learned in the school domain). However, these programs were often based on program principles that stressed the importance of addressing community risk and/or protective factors as an integral part of producing successful youth positive youth development outcomes. Communities were incorporated either through using their social, economic, or physical resources, or targeting specific community risk factors, or attempting to influence community-level policies and practices. About half of these three-domain programs (Across Ages, Midwestern Prevention Project, Valued Youth Partnerships, Woodrock) emphasized the development of strategic relationships or partnerships with the community.

Half the excluded programs (20, or 50%) used some community component or operated in the community domain.

Positive Youth Development Constructs

The ways in which interventions addressed "positive youth development constructs" was a primary focus of the analysis. As noted elsewhere in the report, programs did not need to measure these constructs in order to meet the criteria for this review. Had measurement of positive youth development constructs been a criterion, there would have been very few programs to review. Ideally, the program should address these constructs in the intervention, and the evaluation should measure the impact of the intervention on these constructs. Measurement of youth development constructs is one of the most powerful ways to advance the field because of the information it provides on the relationships between the intervention, mediating variables such as positive youth development constructs, and youth outcomes.

Overall Representation of Constructs Across Programs

All of the effective programs in this review addressed a minimum of five positive youth constructs. Most interventions addressed at least eight constructs, and three-domain programs averaged 10 constructs. Three constructs were addressed in all 25 well evaluated programs:  competence, self-efficacy, and prosocial norms.

The profile was similar in the excluded programs, with all programs addressing competencies of one or more types. Both self efficacy and prosocial norms had lower averages than the well evaluated programs, but each was still represented in approximately three fourths of those interventions.

Competence

Competence was defined as a child's capacity for acquiring developmentally appropriate skills across social, emotional, cognitive, behavioral, and moral dimensions. All 25 (100%) of the effective, well-evaluated programs addressed one or more of these forms of youth competence. In fact, 100% of the effective programs met the criteria for promoting children's competencies on social, cognitive, and behavioral dimensions. Twenty-two programs (88%) met the criteria for promoting emotional competencies, and eight (32%) met the criteria for promoting moral competence. In those cases in which an evaluation measured a positive youth development construct, that construct was most likely to be a form of competence.

Self-Efficacy

Self-efficacy was defined as youth's perception that one can achieve desired goals through one's own action. Twenty-five (100%) of the effective, well-evaluated programs addressed self-efficacy. During the analysis, significant overlap was noted between those programs meeting the criteria for competence and those for self-efficacy. Typically, most program strategies that promoted a youth's capacity to learn, acquire and master new skills also addressed perceptions of self-efficacy. When an evaluation measured self-efficacy, two things could be noted. First, self-efficacy was typically generated from a self-report index, and self-efficacy was almost always grouped with measures of attitudes or beliefs, rather than designated a behavioral measure.

Prosocial Norms

Prosocial norms are defined as healthy standards and clear beliefs. Programs typically addressed these through delivering messages about healthy expectations from peers or adults, or by stressing the importance of knowing how to respond appropriately to negative peer influences. The positive youth development construct of promoting prosocial norms in youth was tied with competence and self-efficacy for the highest representation among all interventions:  25 (100%) of the effective, well-evaluated programs addressed prosocial norms.

Opportunities for Prosocial Involvement

Opportunities for prosocial involvement were defined as events or activities in the intervention that encourage youth in prosocial actions. These programs created, or linked children to, opportunities for positive involvement. The positive youth development construct of promoting children's opportunities for prosocial involvement had the second highest representation among all interventions. Twenty-two (88%) of the effective, well-evaluated programs created and used these opportunities for youth to practice and develop new behaviors and forms of contact with others, including family members, peers, teachers, and other adults.

Among the excluded programs, opportunities for prosocial involvement were less frequently noted, with about half those programs (19, or 49%) receiving that designation.

Recognition for Positive Behavior

This construct was defined as reinforcement or acknowledgement for positive behavior. It tied with opportunities for prosocial involvement as having the second highest representation across programs, with 22 (88%) using some framework for providing acknowledgment, rewards or reinforcement to youth. Most often this recognition was provided in connection with learning a developmentally appropriate skill, task, or challenge, or for supporting appropriate behavioral changes.

Among excluded programs, the recognition construct appeared in fewer than half the programs, at 41% (16).

Bonding

Bonding was defined as a youth's social attachment and commitment to others, including family, peers, school, community, and the culture(s). Bonding had the third highest representation of constructs, present in 19 (76%) programs. A program with a typical bonding component often structured or encouraged direct contact with prosocial adults and peers. Programs also promoted bonding when they sought to strengthen healthy relationships between youth and the people delivering intervention services.

Among excluded programs, bonding was represented in slightly more than half the programs (13, or 55%).

Positive Identity, Self-Determination, Belief in the Future, Resiliency, and Spirituality

Five positive youth development constructs were represented in significantly fewer than 50% of programs. In two cases, belief in the future and spirituality, most programs simply did not address these principles. Spirituality and belief in the future were each addressed in only two (8%) programs. Among the other three constructs, resiliency was the most represented, with 12 programs (48%) identified as addressing the construct. In most instances in which the resiliency construct was identified, it was referred to in the text of the evaluation, often in the theory section. It was generally far less clear how the construct was integrated with the rest of the evaluation or program. Both positive identity and self-determination were rarely identified as constructs by program evaluators. However, nine (36%) programs met the criteria for addressing positive identity in some way. Only four (16%) programs met the criteria to define self-determination.

Similar findings occurred for the excluded programs. There were eight (21%) programs that addressed self-determination. However, more of the excluded programs addressed belief in the future (11, or 28%). Ten (25%) programs addressed youth resiliency, and only one of the excluded programs addressed spirituality.

Positive Youth Development Strategies

The original pool of strategies used for the analysis was drawn from a framework developed by Tolan and Guerra (1994). The list was expanded from its original purpose in violence prevention evaluations to encompass techniques or methods linked with forms of positive youth development, health promotion, and competence promotion. This resulted in each intervention being analyzed for 30 possible categories of strategies. These may be generally grouped into two broad categories:  skills focus and environmental/organizational change. Overall, specific strategies that corresponded to social skills or cognitive behavioral skills were represented in the greatest proportions in evaluations of effective positive youth development programs. Twenty-four (96%) of all programs incorporated some skills-based strategies. Leading the category of skills-focused strategies were decision-making and self-management skills (each at 73%), followed by coping skills (62%) and refusal-resistance skills (50%).

One of the most commonly documented forms of environmental strategies was the effort to influence teacher practices in the classroom. Another strategy, the influencing of peer norms and perceptions, was not always described in the report, but many programs met the criteria for this, particularly among the multiple-domain programs.

Again a similar profile was found for the excluded programs; about three fourths of these programs used skill based strategies. Except for the excluded programs with strong designs, it was more difficult to determine how many of these used environmental and organizational strategies. The information was not always available for a meaningful analysis.

Measurement of Positive and Problem Behavior Outcomes

The issue of whether a positive youth development intervention measures, as well as addresses, positive-focused outcomes has important implications for the future of the positive youth development field, and is currently the subject of considerable discussion among practitioners, prevention scientists, and the policy community. The minimum requirement for inclusion was that the evaluation measure either reductions in problem behavior, or increases in positive behavior. Measures based on reductions in problem behavior were widely represented in the well-evaluated effective programs, with 24 (96%) interventions using these to assess intervention outcomes. Nineteen programs (76%) used positive outcome measures in addition to measures of problem reduction. This is higher than was expected, and very good news. There is a need for all positive youth development programs to measure both types of outcomes in order to assess fully the effects of these programs on youth. This integrated measurement approach will provide funders of promotion and prevention programs a greater understanding of program effects on all important youth outcomes. Such an integrated approach to measuring youth outcomes has potential for increased funding, and broader applications of positive youth development strategies.

This analysis could not be completed in a meaningful way with the excluded programs. Only a few of the excluded programs met the behavioral outcome criteria, and almost without exception, these were the earlier iterations of subsequently successful programs that had simply had an inadequate evaluation design or failed to prove effects in the first round. What is possible to say is that of the medium confidence programs (n= 12), four of eight would have been described as having important youth outcomes (e.g., career maturity, academic performance, positive self-concept, improved family relations), had their evaluation designs been much stronger.

Curriculum

Knowing the extent to which programs relied on a structured curriculum or structured activities is critical for program replication. This analysis identified an overlap between a program's use of a curriculum, and the likelihood it incorporated skills-based strategies, the two concepts being closely linked in practical application. Twenty-four (96%) of the well-evaluated effective programs incorporated a curriculum or program of activities. A program such as Big Brothers/Big Sisters, which did not focus on skill-based strategies to build social competence, did not use a curriculum. Most skills-based programs assume that the outcomes are mediated by the opportunities associated with the direct learning and practice of its strategies. In a program such as Big Brothers/Big Sisters, the opposite is assumed:  positive outcomes are mediated by the bonding and other aspects of positive interaction (such as the presumed modeling of effective behavior by the adult) within the mentoring relationship.

Far fewer (20, or 50%, n= 40) of the excluded programs incorporated a curriculum or structured program of activities into their intervention. Those that did were mainly the programs in which there was some confidence in their evaluation designs (n= 12), or the five programs with excellent evaluation designs which proved to show no significant behavioral effects.

Program Frequency and Duration

Considerable discussion about adequate "frequency and duration" of an intervention has been generated within the positive youth development field, associated with issues of program length, intensity, periodicity, and "booster" sessions.

The analysis found that twenty (80%) effective, well-evaluated programs were delivered over a period of nine months or more. A number of these, often those operating in a school domain, applied their interventions during the academic year. In the interventions shorter than nine months, programs ranged from 10 to 25 sessions, averaging about 12 sessions per intervention.

By contrast with the well-evaluated programs, fewer than half (17, or 43%, n=40) of those in the excluded category lasted nine months or more. Once again, those that did so tended to be either those with a reasonable or strong design that failed to prove effects.

Program Implementation and Assurance of Implementation Quality

Issues of program implementation have recently emerged as some of the most important topics in the positive youth development field. Based on the evidence of many of these evaluations, attention to implementation quality, management and measurement has steadily increased. Among multi-year, well-funded studies, separate evaluations of implementation, in addition to outcomes evaluations, are becoming more common. The science of studying implementation has taken investigators in many different directions, some evaluations offering supplemental statistical analyses of outcomes based on perceived level of implementation quality (e.g., Gottfredson, et al. 1993; Battistich, 1996). In a major, multi-year evaluation such as Midwestern Prevention Project (Pentz et al., 1990), operational definitions of implementation have been offered, organizing types of implementation by adherence, exposure, reinvention. The term "fidelity" is associated with implementation quality, with evaluators of multi-year evaluations such as Life Skills in 56 New York Public Schools (Botvin et al., 1995; Botvin et al., 1990) reporting outcomes based on analyses of high versus average fidelity of program implementation.

Our analysis showed that the effective positive youth development programs consistently attended to the quality and consistency of program implementation. Twenty-four (96%) evaluations in some way addressed and/or measured how well and how reliably the program implementers delivered the intervention.

Although not as high as the well-evaluated programs, the percentage of programs in the excluded category that addressed or assured implementation quality was fairly high (28, or 70%, n=40).

Ethnographic Data

Roughly three-fourths of the programs each indicated that they had served African-American youth and/or Caucasian youth. Half of the programs reviewed included Hispanic youth and approximately one third of the programs identified Asian youth among their participants. Native-American youth were involved in about 28% of these programs.

Positive youth development programs in the excluded category used youth populations with similar ethnographic profiles as did the well-evaluated ones.

Age

By definition, the analysis focused on youth between the ages of six and twenty, inclusively. Programs which went beyond these boundaries were included if they met the methodology criteria; however, the examination of their results focused on those youth within the designated age range when the data were presented in a manner which permitted this type of analysis. The majority of participants in these evaluations were in grades four through nine, particularly during their initiation into the program.

The age profile of youth in the excluded programs was similar, although there was a noticeable trend for these programs to target slightly older children. The average range for this category of programs was fourth to 12th grade, with a few addressing children as young as first grade but the majority targeting sixth, seventh and eighth grade and beyond.

Methodological Issues

The methodological successes and challenges of positive youth development studies will be discussed and summarized next. In this section we will describe the relative strengths and weaknesses of the methodology used in the evaluations of the 25 programs. As much as possible, we will provide relevant definitions and explanations to frame the importance of each empirical issue in the larger context of evaluation quality and its implications for the positive youth development field.

The major methodological issues associated with the evaluation of positive youth development interventions involve the quality of the program, the quality of the evaluation design, and how well the evaluation report portrays the important aspects of the study. A well done evaluation ideally sets up a reliable framework for testing the impact of the program. If the evaluation report is not comprehensive or leaves out important information, then it is not possible to judge the reliability and validity of the results or the viability of the conclusions. The strongest evaluations used an experimental research design with random assignment or, if this was not viable, a quasi-experimental design with well-matched, well-analyzed comparison groups. Within this framework, it was further necessary that evaluators used an acceptable standard of statistical proof, paid attention to reporting key methodological and analytic details, and described the limitations of their study. In this way, it is possible to gain a clear picture of how evaluators conceptualized and measured the effects of their strategies, and the relative merits of the outcomes.

Overall Quality of Evaluation Research Design

The fundamental question of whether a positive youth development program can reliably demonstrate it had a meaningful effect on the children it targeted is central to issues of evaluation quality. The most reliable design for determining intervention effects is the experimental research design. This method involves randomization of participants to differing conditions or levels of the intervention, thereby allowing the investigator to eliminate systematic differences between the participants in the two conditions. This attempt to control for individual differences among participants significantly increases the likelihood that the intervention groups will contain subjects of the same average ability, which increases the ability to interpret differences between the two conditions as those produced by the intervention, allowing for the highest possible level of confidence in the conclusions. Although the experimental method is not the only method through which reliable differences may be discovered, it is the better choice because, more than any other research design, it removes or minimizes the uncertainty surrounding the conclusions about whether or not a study had effects. The second most reliable method is a rigorous quasi-experimental design that uses a nonrandomly assigned comparison group. The best quasi-experimental designs seek a comparison group whose participants are closely similar to the program group prior to intervention, and explore many possible sources of pre-intervention differences between the two groups in order to rule out these pre-intervention subject differences as sources of post-intervention differences. The more rigorous this investigation the greater the confidence that post-intervention differences are due to the intervention and not to preexisting subject differences.

Experimental Research Studies of Positive Youth Development Programs

Of the 25 effective programs, 16 (64%) used experimental designs with randomization of subjects to varying levels of the intervention. This speaks to the strengths in the evaluations being conducted by youth development investigators in the last 15 years. It is common in reviews of prevention programs to hear the refrain that the state of evaluation is weak and underdeveloped. In fact, over half of this group chose to employ the rigorous approach of using random assignment.

How effective programs evaluate, anticipate and overcome roadblocks to the use of experimental designs needs to be studied. The 16 studies with strong experimental evaluations clearly were able to overcome the objections and obstacles commonly associated with random assignment. Several more studies, those that eventually reported using quasi-experimental designs, said they had begun the design as an experimental method, only to be "forced" to adapt the design because of issues, generally sociopolitical or environmental, that precluded full randomization. It is true that there are a range of practical and human impediments to using random assignment. These include objections from line staff and parents who feel random assignment excludes some children when they are at equal need, and issues of access to parental consent or permission. Programs such as Life Skills Training nonetheless managed to conduct rigorous evaluations, with long-term follow-up, for extremely large samples of youth populations. Such programs demonstrated that the various objections to using an experimental design on large-scale project (and to long-term follow-up) could be overcome. It is possible to see from the strongest evaluations, however, that clear commitment to the principles of random assignment frequently correlated with that evaluation's ability to deliver on its application. Programs such as Quantum Opportunities Program remained firm about randomly selecting youth who met program requirements and then recruited them, instead of relying on a sample of self-selected youths. Not only did this provide more rigor, but it also provided investigators some insight into issues around program "take up." In the Big Brothers/Big Sisters evaluation, evaluators did not want to withhold mentoring opportunities from research subjects. They used an experimental design in which they randomly assigned those who signed up to a mentor or put them on an 18 month wait list, during which time they would collect data from them but not provide a mentor.

Quasi-Experimental Research Studies of Positive Youth Development Programs

When evaluations are unable for various reasons to use the experimental method to contrast two or more intervention conditions, quasi-experimental designs are often used. Quasi-experimental designs also use comparison groups and pre- and post-measurement to look for program effects, but these designs carry a heavier burden of proof because participants are not randomly assigned to program and comparison groups and thus there may be preexisting difference between groups. Nine (36%) evaluations used strong quasi-experimental designs to compensate for either being unable to use random assignment or, as was true for almost half these interventions, for ending up in the compromise position of "partial" random assignment. These quasi-experimental designs dealt with the absence of random assignment in numerous ways. They began by ensuring the comparability of participants in the program. They used methods such as matching individual factors and exploring subject differences before beginning the intervention. These evaluations analyzed differences noted after the intervention, to rule out sources of erroneously concluding that group differences were produced by the program. They included analysis of dropout between conditions, and exploration of other potential group differences that may have produced the differences observed in outcomes. In addressing the absence of random assignment these studies were persuasive that the participants in both the intervention and comparison conditions were comparable. If, for example, an evaluation indicated that a much higher number of youth in the comparison group dropped out compared with the intervention condition, we required that the evaluation analyze these differences and investigated the effects of differential attrition on their findings. If evaluators investigated these differences and produced evidence that provided confidence in their findings, the study was categorized as a rigorous quasi-experimental design.

Unit of Analysis vs. Unit of Assignment

The issue of whether the unit of assignment is also used as the unit of analysis is an extremely important one. As Biglan and Ary (1985) and Kirby, et al. (1995) have said, when units of assignment and analysis are mixed, school or classroom differences may be confounded with program effects on individuals. However, this issue is not as straightforward in programs that last for multiple years in which participants may change classrooms or schools, communities, or other organizational structures to which they were originally assigned.

Nevertheless, almost half of the effective positive youth development interventions matched unit of analysis and unit of assignment. By far the most common situation involved individual units of assignment and analysis. On the other hand, in very large studies in which classrooms, schools, or even school districts were the unit of assignment, the investigators often chose to use individual subject scores as the unit of analysis. Ideally programs should address the problem by using multi-level analytical techniques such as hierarchical linear modeling.

Statistical Reporting

Claims for the intervention based upon appropriate statistical analysis make it possible to have greater confidence in the conclusions about the intervention. These claims are more strongly supported when they include sufficient detail for the reader to make judgments regarding the significance of the findings. These data would include at a minimum:  the type of statistical test used, the test values generated (e.g., T or F values), the degrees of freedom, the sample size, and the p values (level of statistical significance). Ideally, evaluations would also include effect sizes or odds ratios to help the reader evaluate the strength of findings. While many of the programs reviewed did provide most of this information, they rarely included all of it. Typically even in strong evaluation reports, some statistics were not reported. An evaluation might list the numerical values for the several strongest findings, then simply provide narrative statements about the other results. While there are space limitations in many scientific journals, readers should be given the essential data necessary to determine the significance level, effect size, and power of the analyses presented in order to independently evaluate the importance of the findings to the field.

Attrition Issues

Two issues associated with the attrition level in a study are important for the effective evaluation of positive youth development programs. One is a programmatic challenge for investigators, and the other is an important methodological issue. First, in studying populations that are socioeconomically challenged, certain risk factors have an impact on attrition. Neighborhood risk factors such as community disorganization and mobility, as well as family risk factors of severe stress and poor family management practices, all lead to conditions which increase the likelihood of attrition. This presents a programmatic challenge for investigators who want to retain participants through follow-up. Strategies must be conceived to assure adequate subject retention, and it is important that investigators document those strategies. Assuring higher quality levels of implementation monitoring and management generally contributes to higher levels of subject retention. The majority of effective positive youth development program evaluations did an adequate attrition analysis; however, fewer addressed strategies for effective subject retention.

Second, it is essential to analyze the attrition rates that resulted during the intervention in order to understand whether different intervention group conditions or sub-groups had distinguishing characteristics which affect their presumed equivalence. This is particularly important in the case of strong quasi-experimental research designs, which rely on the ability to demonstrate that their groups were comparable. If an attrition analysis reveals significant, previously undetected differences between members of intervention and comparison groups, these differences need to be controlled for in subsequent analysis, otherwise it seriously impedes the investigators' ability to draw conclusions about the study's effects.

Sufficient Sample Size and Power

It is important that evaluations of positive youth development programs undertake their investigation with sufficient sample size. Sample sizes must be large enough that any programmatically significant impact is also statistically significant. When examining the impact of a program, subsamples must often be analyzed, and the size of each of these subsamples must also be large enough to show statistically significant differences where they exist. This is particularly challenging when investigators wish to assign communities, classrooms, schools or districts to experimental conditions. If they want the unit of assignment to line up with the unit of analysis, they must then contend with the statistical implications of the decision. Such choices almost inevitably produce smaller sample sizes than if they assigned individual participants to conditions. All the effective, well evaluated programs used samples of sufficient size, ranging from at least 100 per experimental and control group to, in a number of the school-based interventions, more than 1000 per condition. Only a few positive youth program evaluations analyzed here had total samples of fewer than 200 participants, and only one had a total sample of less than 100.

Conclusions

This report addressed three challenges for the field of positive youth development:  defining key concepts, documenting evidence of program effectiveness, and better understanding the relationships between predictors of youth behavior and positive youth development outcomes. To address the first challenge, we identified and defined 15 "positive youth development constructs" that appear in the positive youth development literature in studies of child and youth development, psychology, and prevention science. To address the second, 25 programs were identified from 77 reviewed programs that demonstrated important youth outcomes at some point after the program was delivered. To address the third challenge, better understanding the relationships between predictors of youth behavior and positive youth development outcomes, we examined the social domains in which the programs conducted their strategies.

The study concluded that a wide range of positive youth development approaches can result in positive youth behavior outcomes and the prevention of youth problem behaviors. Nineteen effective programs showed positive changes in youth behavior, including significant improvements in interpersonal skills, quality of peer and adult relationships, self-control, problem solving, cognitive competencies, self-efficacy, commitment to schooling, and academic achievement. Twenty-four effective programs showed significant improvements in problem behaviors, including drug and alcohol use, school misbehavior, aggressive behavior, violence, truancy, high risk sexual behavior, and smoking. This is good news indeed. Promotion and prevention programs that address positive youth development constructs are definitely making a difference in well-evaluated studies.

Although a broad range of strategies produced these results, the themes common to success involved methods to:  strengthen social, emotional, behavioral, cognitive, and moral competencies; build self-efficacy; shape messages from family and community about clear standards for youth behavior; increase healthy bonding with adults, peers and younger children; expand opportunities and recognition for youth; provide structure and consistency in program delivery; and intervene with youth for at least nine months or more. Although one third of the effective programs operated in only a single setting, it is important to note that for the other two thirds, combining the resources of the family, the community, and the community's schools were the other ingredients of success.

Future Directions

In addition to the good news about positive youth development programs, we present some concerns related to specific findings, and considerations for the future.

A little more than half of the well-evaluated programs measured outcomes only at the end of the program; in other words, no further follow-up was done or was available at the time of this review. Whether those programs will continue to show positive results is a question that remains unanswered. This is of particular concern since in two instances, programs that reported long-term results were unable to sustain their initial positive findings. It is clearly most desirable - and presents the most compelling evidence - when programs can demonstrate positive long-term outcomes. In the case of the two studies unable to demonstrate long-term results after initial positive effects, the reasons for these findings needs additional study, and should be shared with the positive youth development community.

Evaluators of positive youth development programs are encouraged to take action to expand the knowledge gained from evaluations. Consensus on the use of standardized youth outcome measures needs to be reached. Studies should measure changes of both positive and problem behaviors because to do so is truly representative of the "whole child." Although such positive outcomes as academic achievement, engagement in the workforce, and income are widely accepted positive outcome measures, there is little consensus on what constitutes a complete set of positive youth development outcomes.

Standardized measures of positive youth development constructs need to be developed and used. While the positive youth development constructs are typically seen as important mediating variables, the field is just beginning to grapple with defining outcomes of positive developmental experiences. Further, measurement of a comprehensive set of predictors of positive and problem outcomes will allow for a better understanding of the processes through which the intervention has an impact on youth outcomes. A complete measurement package (positive and problem behaviors, appropriate and relevant positive youth development constructs, and risk and protective factors) common across promotion and prevention studies would increase our understanding of the processes leading to positive youth development. This will help to establish a shared language and framework.

We call for consensus on the use of structured comparisons in evaluation designs. While it is true that there are many innovative ways to evaluate programs, so far nothing has come close to substituting for the credibility of a strong structured comparison. Admittedly the rigors of experimental designs with the complexities of random assignment are beyond the reach of many programs, but evaluations can be only as credible as the framework they use. A good quasi-experimental design with well-balanced comparison groups can provide acceptable proof of effectiveness.

Finally, we call on all investigators who submit articles to peer reviewed journals to move toward consensus on which information they will report, particularly the quantitative data, and in what forms they will report it. In program reports, particularly in peer reviewed journals but also in unpublished evaluation studies, there must be both sufficient narrative description, and quantitative and statistical detail, to enable an independent assessment of what the program accomplished. Program descriptions should specify which youth constructs they address, and they should specify the relationship between these constructs and the outcomes that the evaluation measures. As a field of youth development specialists, we show surprisingly little agreement on the issue of a common statistical metric in published reports. As long as some studies report such key information as group means and standard deviations, and others do not, we will not give each other the tools to create a viable basis for comparison between studies. Consistency in the presentation of the evidence will truly advance our understanding of program effectiveness.



 
[Previous]
[Table of Contents]
[Next]