Origins of the Discipline of Evaluation

From EvaluationWiki

Jump to: navigation, search

Contents

The Evaluation Landscape[1]

The original mission of program evaluation in the human services and education fields was to assist in improving the quality of social programs. However, for several reasons, program evaluation has come to focus (both implicitly and explicitly) much more on proving whether a program or initiative works, rather than on improving programs. In our opinion, this has created an imbalance in human service evaluation work—with a heavy emphasis on proving that programs work through the use of quantitative, impact designs, and not enough attention to more naturalistic, qualitative designs aimed at improving programs.


We discuss two reasons for this imbalance:


  • the historical context of program evaluation in the U.S.; and
  • the influence of the dominant research paradigm on human services evaluation.


Historical Context of Evaluation in Human Services

Although human beings have been attempting to solve social problems using some kind of rationale or evidence (e.g., evaluation) for centuries, program evaluation in the United States began with the ambitious, federally funded social programs of the Great Society initiative during the mid- to late-1960s. Resources poured into these programs, but the complex problems they were attempting to address did not disappear.The public grew more cautious, and there was increasing pressure to provide evidence of the effectiveness of specific initiatives in order to allocate limited resources.


During this period, “systematic evaluation [was] increasingly sought to guide operations, to assure legislators and planners that they [were] proceeding on sound lines and to make services responsive to their public”. [2] One lesson we learned from the significant investments made in the 1960s and ’70s was that we didn’t have the resources to solve all of our social problems.We needed to target our investments. But to do this effectively, we needed a basis for deciding where and how to invest.“Program evaluation as a distinct field of professional practice was born of two lessons…: First, the realization that there is not enough money to do all the things that need doing; and second, even if there were enough money, it takes more than money to solve complex human and social problems. As not everything can be done, there must be a basis for deciding which things are worth doing. Enter evaluation” [3]


Today, we are still influenced by this pressure to demonstrate the effectiveness of our social programs in order to ensure funders, government officials, and the public at large that their investments are worthwhile. In fact, since the years of the Great Society, pressure to demonstrate the worth of social programs has increased. Limited resources, increasingly complex and layered social problems, the changing political climate, and a seeming shift in public opinion about the extent to which government and other institutions should support disadvantaged or vulnerable populations have shifted the balance even further to an almost exclusive focus on accountability (prove it works), versus quality (work to improve).

The Scientific Method as the Dominant Evaluation Paradigm

A second factor leading to an emphasis on proving whether a social program works is the influence of the scientific method on human-services evaluation. When most people think about program evaluation, they think of complex experimental designs with treatment and control groups where evaluators measure the impact of programs based on statistically significant changes in certain outcomes; for example, did the program lead to increases in income, improved school performance, or health-status indicators, etc.?


The scientific method is based on hypothetico-deductive methodology. Simply put, this means that researchers/evaluators test hypotheses about the impact of a social initiative using statistical analysis techniques. Perhaps because this way of conducting research is dominant in many highly esteemed fields and because it is backed by rigorous and well-developed statistical theories, it might dominate in social, educational, and human-services fields — members of which often find themselves fighting for legitimacy. In addition, this way of doing research and evaluation is well suited to answering the very questions programs/initiatives have historically been most pressured to address: Are they effective? Do they work?


The hypothetico-deductive, natural science model is designed to explain what happened and show causal relationships between certain outcomes and the “treatments” or services aimed at producing these outcomes. If designed and conducted effectively, the experimental or quasi-experimental design can provide important information about the particular impacts of the social program being studied. Did the academic enrichment program lead to improved grades for students? Or increased attendance? Ultimately, was it effective? However, many of the criteria necessary to conduct these evaluations limit their usefulness to primarily single intervention programs in fairly controlled environments.The natural science research model is therefore ill equipped to help us understand complex, comprehensive, and collaborative community initiatives.


Balancing the Call to Prove With the Need to Improve

Both of these factors--the historical growth in the pressure to demonstrate effectiveness, and the dominance of a research philosophy or model that is best suited to measure change—-may have led many evaluators, practitioners, government officials, and the public at large to think of program evaluation as synonymous with demonstrating effectiveness or “proving” the worth of programs. As a result, conventional evaluations have not addressed issues of process, implementation, and improvement nearly as well. And they may very well be negatively impacting the more complex, comprehensive community initiatives (like many of those you operate in your communities) because these initiatives are often ignored as unevaluatable, or evaluated in traditional ways that do not come close to capturing the complex and often messy ways in which these initiatives effect change[4] [5]


Clearly, demonstrating effectiveness and measuring impact are important and valuable; yet we believe that it is equally important to focus on gathering and analyzing data which will help us improve our social initiatives. In fact, when the balance is shifted too far to a focus on measuring statistically significant changes in quantifiable outcomes, we miss important parts of the picture. This ultimately hinders our ability to understand the richness and complexity of contemporary human-services programs—especially the system change reform and comprehensive community initiatives which many of you are attempting to implement.

Consequences of operating within a limited evaluation framework

Following are some of the many consequences of operating within a limited evaluation framework:


Consequence 1. We begin to believe that there is only one way to do evaluation. Most people (even those trained in research and evaluation methods) don’t realize that methods employed, such as an experimental design, are part of larger world views or paradigms about research.These paradigms are based on different assumptions about:

  • What is the nature of reality?
  • How do we come to know something?
  • What should be the relationship between the researcher/evaluator and the participants in the evaluation process?

The dominant research paradigm described above (hypothetico-deductive), derived from medical and other natural science disciplines, is one such paradigm, but there are others.When one research paradigm begins to dominate a field, it becomes easier to forget that other paradigms—which address different goals and questions—also exist.


Patton explains the effect of forgetting paradigms in this way: The very dominance of the hypothetico-deductive paradigm, with its quantitative, experimental emphasis, appears to have cut off the great majority of its practitioners from serious consideration of any alternative evaluation research paradigm or methods.The label “research” [or evaluation] has come to mean the equivalent of employing the “scientific method” of working within the dominant paradigm.[3]

In other words, people begin to believe there is only one right way of doing evaluation.


Consequence 2. We do not ask and examine equally important questions. We have already discussed how the dominant research paradigm is suited for addressing certain impact questions—the very questions that, historically, social programs have been pressured to address. However, while it brings certain aspects into focus, it misses other important dimensions of the program.


Here again, research paradigms and philosophies come into play. Even more powerful than the notion that there are different paradigms with different assumptions about the world and how it works (i.e., there is no one right way to do evaluation) is how much our particular paradigms/assumptions influence the questions we ask; what we think is important to know; the evaluation methods we use; the data we collect; even the interpretations and conclusions we make.


If we are unaware that evaluation designs and results are based on a paradigm or set of assumptions about how to do evaluation, it is more difficult to see the questions and issues we are missing. These are questions and issues that would come into focus only if we look at the program through the lens of another paradigm.


For example, conventional research methods don’t tell us how and why programs work, for whom, and in what circumstances, and don’t adequately answer other process and implementation questions. And yet, given the increasingly complex social problems and situations we face today, and the increasingly complex social initiatives and programs developed to solve these problems, these are important questions to address.


Consequence 3. We come up short when attempting to evaluate complex system change and comprehensive community initiatives. This may be the most dangerous consequence of all. In a political and social climate of increasing reluctance to support disadvantaged populations and skepticism about whether any social program works, some of the most promising initiatives are being overlooked and are in danger of being cut off.These are the system change and comprehensive community change initiatives that many know from practice, experience, and even common sense create real change in the lives of children, youth, and families.


However, these initiatives are complex and messy.They do not fit criteria for a “good” quantitative impacts evaluation.There are no simple, uniform goals.There is no standard intervention, or even standard participant/consumer.There is no way to isolate the effects of the intervention because these initiatives focus on integrating multiple interventions.


And since these initiatives are based on multi-source and multiperspective community collaborations, their goals and core activities/services are constantly changing and evolving to meet the needs and priorities of a variety of community stakeholders. In short, these initiatives are “unevaluatable” using the dominant natural science paradigm.[4]


What does this mean? It means that many of these initiatives are not evaluated at all, making it difficult for communities to provide evidence that they are effective. It means that others are evaluated using traditional methods.This leads either to a narrowing of the project to fit the evaluation design (a problem, if what really works is the breadth and multi-pronged nature of these initiatives), or to a traditional impacts report which shows that the initiative had limited impact (because impacts in these complex initiatives may occur over a much longer time period and because many of the critical interim outcomes which are difficult to quantify are overlooked). And it means that a great deal of resources are being wasted and very little is being learned about how these initiatives really work and what their true potential may be.[4]


Consequence 4. We lose sight of the fact that all evaluation work is political and value laden. When we look at the impacts of a program by using the scientific method only, we miss important contextual factors.This, coupled with the fact that statistical theories can lull us into thinking that we are looking at the neutral and objective truth about the initiative, can mask the fact that evaluation is a political and value-laden process.


Virtually every phase of the evaluation process has political implications which will affect the issues of focus, decisions made, how the outside world perceives the project, and whose interests are advanced and whose are ignored. Evaluators must therefore understand the implications of their actions during all phases of the evaluation and must be sensitive to the concerns of the project director, staff, clientele, and other stakeholders.This understanding requires ongoing dialogue with all groups involved and a responsibility to fully represent the project throughout the evaluation process.


Conflicting agendas, limited funds, different perspectives, or the lack of a common knowledge base may lead to strained relationships between evaluators, project directors, and staff. It is important to talk openly about how these factors affect the evaluation process.

Recommendations for a Better Balance

So, how do we create a better balance, and design evaluations that not only help demonstrate the effectiveness of the project, but also help us know how to improve and strengthen it? The following recommendations form the foundation of [the W.K. Kellogg Foundation's] evaluation philosophy:


Recommendation 1. Learn about and reflect on alternative paradigms and methods that are appropriate to our work. As we discussed earlier, conducting research within a single paradigm makes it difficult for us to remember that it is still only one view, and not the only legitimate way to conduct evaluation.There are others—some developed within other disciplines such as anthropology, others developed in reaction to the dominant paradigm. Since we cannot fully describe these complex alternative paradigms here, we provide snapshots of a few to stimulate your thinking.


Interpretivism/Constructivism: The interpretivist or constructivist paradigm has its roots in anthropological traditions. Instead of focusing on explaining, this paradigm focuses on understanding the phenomenon being studied through ongoing and in-depth contact and relationships with those involved (e.g., indepth observations and interviewing). Relying on qualitative data and rich description which comes from these close, ongoing relationships, the interpretivist/constructivist paradigm’s purpose is “the collection of holistic world views, intact belief systems, and complex inner psychic and interpersonal states.”[6] In other words, who are the people involved in the program and what do the experiences mean to them? These holistic accounts are often lost in conventional evaluations, which rely on evaluator-determined categories of data collection, and do not focus on contextual factors.


The primary objective of evaluations based on the assumptions of interpretivism/constructivism is to understand social programs from many different perspectives.This paradigm focuses on answering questions about process and implementation, and what the experiences have meant to those involved.Therefore, it is well suited to helping us understand contextual factors and the complexities of programs—and helping us make decisions about improving project management and delivery.


Feminist Methods: Feminist researchers and practitioners (as well as many ethnic and cultural groups, including African Americans and Hispanics), have long been advocating for changes in research and evaluation based on two principles:


1. Historically, the experiences of girls, women, and minorities have been left out or ignored because these experiences have not fit with developing theories (theories constructed primarily from data on white, middle-class males); and

2. Conventional methodologies, such as the superiority of objective vs. subjective knowing, the distancing of the researcher/evaluator from participants, and the assumptions of value-free, unbiased research/evaluations have been seriously flawed.


Although encompassing a widely diverse set of assumptions and techniques, feminist research methods have been described as “contextual, inclusive, experiential, involved, socially relevant, multi-methodological, complete but not necessarily replicable, open to the environment, and inclusive of emotions and events as experiences.” [7]


Participatory Evaluation: One research method that is receiving increased utilization in developing countries, and among many of our community-based initiatives, is participatory evaluation, which is primarily concerned with the following: (1) creating a more egalitarian process, where the evaluator’s perspective is given no more priority than other stakeholders, including program participants; and (2) making the evaluation process and its results relevant and useful to stakeholders for future actions. Participatory approaches attempt to be practical, useful, and empowering to multiple stakeholders, and help to improve program implementation and outcomes by actively engaging all stakeholders in the evaluation process.


Theory-Based Evaluation: Another approach to evaluation is theory-based evaluation, which has been applied both in the substance abuse area[8] and in the evaluation of comprehensive community initiatives. [9]Theory-based evaluation attempts to address the problems associated with evaluating comprehensive, community-based initiatives and others not well suited to statistical analysis of outcomes. Its underlying premise is that just because we cannot effectively measure an initiative’s ultimate outcomes statistically, it does not mean we cannot learn anything about the initiative’s effectiveness. In fact, proponents of theory-based evaluation reason that, by combining outcome data with an understanding of the process that led to those outcomes,we can learn a great deal about the program’s impact and its most influential factors.[5]

Theory-based evaluation starts with the premise that every social program is based on a theory—some thought process about how and why it will work.This theory can be either explicit or implicit.The key to understanding what really matters about the program is through identifying this theory.[9] This process is also known as developing a program logic model—or picture— describing how the program works. Evaluators and staff can then use this theory of how the initiative effects change to develop key interim outcomes (both for the target population and for the collaborating agencies and organizations) that will lead to ultimate long-term outcomes.


Documenting these interim outcomes (measured in both quantitative and qualitative ways) provides multiple opportunities. It demonstrates whether or not an initiative is on track. Tracking short-term achievements takes some of the pressure off demonstrating long-term impacts in the first year or two, or having very little to say about the initiative for several years. It allows staff to modify the theory and the initiative based on what they are learning, thereby increasing the potential for achieving long-term impacts. Ultimately, it allows staff to understand and demonstrate effectiveness (to multiple stakeholders) in ways that make sense for these types of complex initiatives.


This evaluation approach also provides a great deal of important information about how to implement similar complex initiatives.What are the pitfalls? What are the core elements? What were the lessons learned along the way?


Recommendation 2. Question the questions. Creating open environments where different perspectives are valued will encourage reflection on which questions are not being addressed and why. Perhaps these questions are hidden by the particular paradigm at work. Perhaps they are not questions that are politically important to those in more powerful positions. Perhaps they hint at potentially painful experiences, not often spoken of or dealt with openly in our society. Encourage staff and the evaluation team to continuously question the questions, and to ask what is still missing. Additionally, review whether you are addressing the following questions:


  • How does this program work?
  • Why has it worked or not worked? For whom and in what circumstances?
  • What was the process of development and implementation?
  • What were the stumbling blocks faced along the way?
  • What do the experiences mean to the people involved?
  • How do these meanings relate to intended outcomes?
  • What lessons have we learned about developing and implementing this program?
  • How have contextual factors impacted the development, implementation, success, and stumbling blocks of this program?
  • What are the hard-to-measure impacts of this program (ones that cannot be easily quantified)? How can we begin to effectively document these impacts?


Recommendation 3. Take action to deal with the effects of paradigms, politics, and values. Perhaps more important than understanding all of the factors that can impact the evaluation process is taking specific actions to deal with these issues, so that you and your evaluation staff can achieve a fuller understanding of your project and how and why it is working.The following tips can be used by project directors and their evaluation staff to deal with the influence of paradigms, politics, and values:


  • Get inside the project––understand its roles, responsibilities, organizational structure, history, and goals; and how politics, values, and paradigms affect the project’s implementation and impact.
  • Create an environment where all stakeholders are encouraged to discuss their values and philosophies.
  • Challenge your assumptions. Constantly look for evidence that you are wrong.
  • Ask other stakeholders for their perspectives on particular issues. Listen.
  • Remember there may be multiple “right” answers.
  • Maintain regular contact and provide feedback to stakeholders, both internal and external to the project.
  • Involve others in the process of evaluation and try to work through any resistance.
  • Design specific strategies to air differences and grievances.
  • Make the evaluation and its findings useful and accessible to project staff and clients. Early feedback and a consultative relationship with stakeholders and project staff leads to a greater willingness by staff to disclose important and sensitive information to evaluators.
  • Be sensitive to the feelings and rights of individuals.
  • Create an atmosphere of openness to findings, with a commitment to considering change and a willingness to learn.


Each of these areas may be addressed by providing relevant reading materials; making formal or informal presentations; using frequent memos; using committees composed of staff members, customers, or other stakeholders; setting interim goals and celebrating achievements; encouraging flexibility; and sharing alternative viewpoints.These tips will help you deal with political issues, bring multiple sets of values, paradigms and philosophies onto the table for examination and more informed decision making, and will help foster an open environment where it is safe to talk honestly about both the strengths and weaknesses of the project.

References

  1. W.K. Kellogg Foundation Evaluation Handbook @ http://www.wkkf.org/Pubs/Tools/Evaluation/Pub770.pdf
  2. Cronbach, Lee J. and Associates, Toward Reform of Program Evaluation, San Francisco: Jossey-Bass, 1980. (p.12)
  3. 3.0 3.1 Patton, Michael Quinn, Practical Evaluation, Newbury Park, CA: Sage Publications, 1982., (p. 11)
  4. 4.0 4.1 4.2 Connell, P. James,Anne C. Kubisch, Lisbeth B. Schorr, and Carol H.Weiss, New Approaches to Evaluating Communities Initiatives: Concepts, Methods, and Contexts,Washington, DC: The Aspen Institute, 1995.
  5. 5.0 5.1 Schorr, Lisbeth B., and Anne C. Kubisch,“New Approaches to Evaluation: Helping Sister Mary Paul, Geoff Canada, and Otis Johnson While Convincing Pat Moynihan, Newt Gingrich, and the American Public,” Presentation. Annie E. Casey Foundation Annual Research/Evaluation Conference: Using Research and Evaluation Information to Improve Programs and Policies, September 1995.
  6. Maxwell, J.A. and Y.S. Lincoln, Methodology and Epistemology: A Dialogue, Harvard Educational Review, 60(4), P.497-512, 1990.
  7. Nielson, Joyce M.,“Introduction” in J. Nielson (Ed.) Feminist Research Methods, Boulder: Westview Press, 1990.
  8. Chen, Huey-tsyh, Theory-Driven Evaluations, California: Sage Publications, 1990.
  9. 9.0 9.1 Weiss, Carol H., “Nothing as Practical as Good Theory: Exploring Theory-Based Evaluation for Comprehensive Community Initiatives for Children and Families.” In James Connell, et al. (ed.), New Approaches to Evaluating Community Initiatives: Concepts, Methods and Contexts,Washington D.C.: The Aspen Institute, 1995.
Personal tools