Origins of the Discipline of Evaluation
From EvaluationWiki
The Evaluation Landscape[1]
The original mission of program evaluation in the human services and education fields was to assist in improving the quality of social programs. However, for several reasons, program evaluation has come to focus (both implicitly and explicitly) much more on proving whether a program or initiative works, rather than on improving programs. In our opinion, this has created an imbalance in human service evaluation work—with a heavy emphasis on proving that programs work through the use of quantitative, impact designs, and not enough attention to more naturalistic, qualitative designs aimed at improving programs.
We discuss two reasons for this imbalance:
- the historical context of program evaluation in the U.S.; and
- the influence of the dominant research paradigm on human services evaluation.
Historical Context of Evaluation in Human Services
Although human beings have been attempting to solve social problems using some kind of rationale or evidence (e.g., evaluation) for centuries, program evaluation in the United States began with the ambitious, federally funded social programs of the Great Society initiative during the mid- to late-1960s. Resources poured into these programs, but the complex problems they were attempting to address did not disappear.The public grew more cautious, and there was increasing pressure to provide evidence of the effectiveness of specific initiatives in order to allocate limited resources.
During this period, “systematic evaluation [was] increasingly sought to guide
operations, to assure legislators and planners that they [were] proceeding on
sound lines and to make services responsive to their public”. [2] One lesson we learned from the significant investments made in the
1960s and ’70s was that we didn’t have the resources to solve all of our social
problems.We needed to target our investments. But to do this effectively, we
needed a basis for deciding where and how to invest.“Program evaluation as a
distinct field of professional practice was born of two lessons…: First, the realization
that there is not enough money to do all the things that need doing; and second,
even if there were enough money, it takes more than money to solve complex
human and social problems. As not everything can be done, there must be a basis for
deciding which things are worth doing. Enter evaluation” [3]
Today, we are still influenced by this pressure to demonstrate the effectiveness
of our social programs in order to ensure funders, government officials, and the
public at large that their investments are worthwhile. In fact, since the years of
the Great Society, pressure to demonstrate the worth of social programs has
increased. Limited resources, increasingly complex and layered social problems,
the changing political climate, and a seeming shift in public opinion about the
extent to which government and other institutions should support
disadvantaged or vulnerable populations have shifted the balance even further
to an almost exclusive focus on accountability (prove it works), versus quality
(work to improve).
The Scientific Method as the Dominant Evaluation Paradigm
A second factor leading to an emphasis on proving whether a social program works is the influence of the scientific method on human-services evaluation. When most people think about program evaluation, they think of complex experimental designs with treatment and control groups where evaluators measure the impact of programs based on statistically significant changes in certain outcomes; for example, did the program lead to increases in income, improved school performance, or health-status indicators, etc.?
The scientific method is based on hypothetico-deductive methodology. Simply
put, this means that researchers/evaluators test hypotheses about the impact of a
social initiative using statistical analysis techniques.
Perhaps because this way of conducting research is dominant in many highly
esteemed fields and because it is backed by rigorous and well-developed statistical
theories, it might dominate in social, educational, and human-services fields — members of which often find themselves fighting for legitimacy. In addition, this
way of doing research and evaluation is well suited to answering the very
questions programs/initiatives have historically been most pressured to address:
Are they effective? Do they work?
The hypothetico-deductive, natural science model is designed to explain what
happened and show causal relationships between certain outcomes and the
“treatments” or services aimed at producing these outcomes. If designed and
conducted effectively, the experimental or quasi-experimental design can provide
important information about the particular impacts of the social program being
studied. Did the academic enrichment program lead to improved grades for students? Or
increased attendance? Ultimately, was it effective? However, many of the criteria
necessary to conduct these evaluations limit their usefulness to primarily single
intervention programs in fairly controlled environments.The natural science
research model is therefore ill equipped to help us understand complex,
comprehensive, and collaborative community initiatives.
Balancing the Call to Prove With the Need to Improve
Both of these factors--the historical growth in the pressure to demonstrate effectiveness, and the dominance of a research philosophy or model that is best suited to measure change—-may have led many evaluators, practitioners, government officials, and the public at large to think of program evaluation as synonymous with demonstrating effectiveness or “proving” the worth of programs. As a result, conventional evaluations have not addressed issues of process, implementation, and improvement nearly as well. And they may very well be negatively impacting the more complex, comprehensive community initiatives (like many of those you operate in your communities) because these initiatives are often ignored as unevaluatable, or evaluated in traditional ways that do not come close to capturing the complex and often messy ways in which these initiatives effect change[4] [5]
Clearly, demonstrating effectiveness and measuring impact are important and valuable; yet we believe that it is equally important to focus on gathering and analyzing data which will help us improve our social initiatives. In fact, when the balance is shifted too far to a focus on measuring statistically significant changes in quantifiable outcomes, we miss important parts of the picture. This ultimately hinders our ability to understand the richness and complexity of contemporary human-services programs—especially the system change reform and comprehensive community initiatives which many of you are attempting to implement.
Consequences of operating within a limited evaluation framework
Following are some of the many consequences of operating within a limited evaluation framework:
Consequence 1. We begin to believe that there is only one way to do evaluation.
Most people (even those trained in research and evaluation methods) don’t realize
that methods employed, such as an experimental design, are part of larger world
views or paradigms about research.These paradigms are based on different
assumptions about:
- What is the nature of reality?
- How do we come to know something?
- What should be the relationship between the researcher/evaluator and the participants in the evaluation process?
The dominant research paradigm described above (hypothetico-deductive), derived from medical and other natural science disciplines, is one such paradigm, but there are others.When one research paradigm begins to dominate a field, it becomes easier to forget that other paradigms—which address different goals and questions—also exist.
Patton explains the effect of forgetting paradigms in this way:
The very dominance of the hypothetico-deductive paradigm, with its
quantitative, experimental emphasis, appears to have cut off the great
majority of its practitioners from serious consideration of any alternative
evaluation research paradigm or methods.The label “research” [or
evaluation] has come to mean the equivalent of employing the
“scientific method” of working within the dominant paradigm.[3]
In other words, people begin to believe there is only one right way of doing evaluation.
Consequence 2. We do not ask and examine equally important questions. We have
already discussed how the dominant research paradigm is suited for addressing
certain impact questions—the very questions that, historically, social programs
have been pressured to address. However, while it brings certain aspects into
focus, it misses other important dimensions of the program.
Here again, research paradigms and philosophies come into play. Even more
powerful than the notion that there are different paradigms with different
assumptions about the world and how it works (i.e., there is no one right way to
do evaluation) is how much our particular paradigms/assumptions influence the questions
we ask; what we think is important to know; the evaluation methods we use; the data we
collect; even the interpretations and conclusions we make.
If we are unaware that evaluation designs and results are based on a paradigm
or set of assumptions about how to do evaluation, it is more difficult to see
the questions and issues we are missing. These are questions and issues that
would come into focus only if we look at the program through the lens of
another paradigm.
For example, conventional research methods don’t tell us how and why programs
work, for whom, and in what circumstances, and don’t adequately answer other
process and implementation questions. And yet, given the increasingly complex
social problems and situations we face today, and the increasingly complex social
initiatives and programs developed to solve these problems, these are important
questions to address.
Consequence 3. We come up short when attempting to evaluate complex system
change and comprehensive community initiatives. This may be the most dangerous
consequence of all. In a political and social climate of increasing reluctance to
support disadvantaged populations and skepticism about whether any social
program works, some of the most promising initiatives are being overlooked and
are in danger of being cut off.These are the system change and comprehensive
community change initiatives that many know from practice, experience, and
even common sense create real change in the lives of children, youth, and
families.
However, these initiatives are complex and messy.They do not fit criteria for a
“good” quantitative impacts evaluation.There are no simple, uniform goals.There
is no standard intervention, or even standard participant/consumer.There is no
way to isolate the effects of the intervention because these initiatives focus on
integrating multiple interventions.
And since these initiatives are based on multi-source and multiperspective
community collaborations, their goals and core
activities/services are constantly changing and evolving to meet the
needs and priorities of a variety of community stakeholders. In short,
these initiatives are “unevaluatable” using the dominant natural science
paradigm.[4]
What does this mean? It means that many of these initiatives are not evaluated at all,
making it difficult for communities to provide evidence that they are effective. It
means that others are evaluated using traditional methods.This leads either to a
narrowing of the project to fit the evaluation design (a problem, if what really works
is the breadth and multi-pronged nature of these initiatives), or to a traditional
impacts report which shows that the initiative had limited impact (because impacts
in these complex initiatives may occur over a much longer time period and because
many of the critical interim outcomes which are difficult to quantify are
overlooked). And it means that a great deal of resources are being wasted and very
little is being learned about how these initiatives really work and what their true
potential may be.[4]
Consequence 4. We lose sight of the fact that all evaluation work is political and
value laden. When we look at the impacts of a program by using the scientific
method only, we miss important contextual factors.This, coupled with the fact
that statistical theories can lull us into thinking that we are looking at the neutral
and objective truth about the initiative, can mask the fact that evaluation is a
political and value-laden process.
Virtually every phase of the evaluation process has political implications which
will affect the issues of focus, decisions made, how the outside world perceives
the project, and whose interests are advanced and whose are ignored. Evaluators
must therefore understand the implications of their actions during all phases of
the evaluation and must be sensitive to the concerns of the project director, staff,
clientele, and other stakeholders.This understanding requires ongoing dialogue
with all groups involved and a responsibility to fully represent the project
throughout the evaluation process.
Conflicting agendas, limited funds, different perspectives, or the lack of a
common knowledge base may lead to strained relationships between evaluators,
project directors, and staff. It is important to talk openly about how these factors
affect the evaluation process.
Recommendations for a Better Balance
So, how do we create a better balance, and design evaluations that not only help demonstrate the effectiveness of the project, but also help us know how to improve and strengthen it? The following recommendations form the foundation of [the W.K. Kellogg Foundation's] evaluation philosophy:
Recommendation 1. Learn about and reflect on alternative paradigms and methods
that are appropriate to our work. As we discussed earlier, conducting research
within a single paradigm makes it difficult for us to remember that it is still only
one view, and not the only legitimate way to conduct evaluation.There are
others—some developed within other disciplines such as anthropology, others
developed in reaction to the dominant paradigm. Since we cannot fully describe
these complex alternative paradigms here, we provide snapshots of a few to
stimulate your thinking.
Interpretivism/Constructivism: The interpretivist or constructivist paradigm has its
roots in anthropological traditions. Instead of focusing on explaining, this
paradigm focuses on understanding the phenomenon being studied through
ongoing and in-depth contact and relationships with those involved (e.g., indepth
observations and interviewing). Relying on qualitative data and rich
description which comes from these close, ongoing relationships, the
interpretivist/constructivist paradigm’s purpose is “the collection of holistic world
views, intact belief systems, and complex inner psychic and interpersonal states.”[6] In other words, who are the people involved in the program and what do the experiences mean to them? These
holistic accounts are often lost in conventional evaluations, which rely on
evaluator-determined categories of data collection, and do not focus on
contextual factors.
The primary objective of evaluations based on the assumptions of
interpretivism/constructivism is to understand social programs from many
different perspectives.This paradigm focuses on answering questions about
process and implementation, and what the experiences have meant to those
involved.Therefore, it is well suited to helping us understand contextual factors
and the complexities of programs—and helping us make decisions about
improving project management and delivery.
Feminist Methods: Feminist researchers and practitioners (as well as many ethnic
and cultural groups, including African Americans and Hispanics), have long been
advocating for changes in research and evaluation based on two principles:
1. Historically, the experiences of girls, women, and minorities have been
left out or ignored because these experiences have not fit with
developing theories (theories constructed primarily from data on white,
middle-class males); and
2. Conventional methodologies, such as the superiority of objective vs. subjective knowing, the distancing of the researcher/evaluator from participants, and the assumptions of value-free, unbiased research/evaluations have been seriously flawed.
Although encompassing a widely diverse set of assumptions and techniques,
feminist research methods have been described as “contextual, inclusive,
experiential, involved, socially relevant, multi-methodological, complete but not
necessarily replicable, open to the environment, and inclusive of emotions and
events as experiences.” [7]
Participatory Evaluation: One research method that is receiving increased
utilization in developing countries, and among many of our community-based
initiatives, is participatory evaluation, which is primarily concerned with the
following: (1) creating a more egalitarian process, where the evaluator’s
perspective is given no more priority than other stakeholders, including program
participants; and (2) making the evaluation process and its results relevant and
useful to stakeholders for future actions. Participatory approaches attempt to be
practical, useful, and empowering to multiple stakeholders, and help to improve
program implementation and outcomes by actively engaging all stakeholders in
the evaluation process.
Theory-Based Evaluation: Another approach to evaluation is theory-based evaluation,
which has been applied both in the substance abuse area[8] and in the evaluation of comprehensive community initiatives. [9]Theory-based evaluation attempts to address the problems associated with evaluating
comprehensive, community-based initiatives and others not well suited to statistical
analysis of outcomes. Its underlying premise is that just because we cannot effectively
measure an initiative’s ultimate outcomes statistically, it does not mean we cannot
learn anything about the initiative’s effectiveness. In fact, proponents of theory-based
evaluation reason that, by combining outcome data with an understanding of the
process that led to those outcomes,we can learn a great deal about the program’s
impact and its most influential factors.[5]
Theory-based evaluation starts with the premise that every social program is based on a theory—some thought process about how and why it will work.This theory can be either explicit or implicit.The key to understanding what really matters about the program is through identifying this theory.[9] This process is also known as developing a program logic model—or picture— describing how the program works. Evaluators and staff can then use this theory of how the initiative effects change to develop key interim outcomes (both for the target population and for the collaborating agencies and organizations) that will lead to ultimate long-term outcomes.
Documenting these interim outcomes (measured in both quantitative and
qualitative ways) provides multiple opportunities. It demonstrates whether or not
an initiative is on track. Tracking short-term achievements takes some of the pressure off demonstrating long-term impacts in the first year or two, or having
very little to say about the initiative for several years. It allows staff to modify the
theory and the initiative based on what they are learning, thereby increasing the
potential for achieving long-term impacts. Ultimately, it allows staff to understand
and demonstrate effectiveness (to multiple stakeholders) in ways that make sense
for these types of complex initiatives.
This evaluation approach also provides a great deal of important information
about how to implement similar complex initiatives.What are the pitfalls? What
are the core elements? What were the lessons learned along the way?
Recommendation 2. Question the questions. Creating open environments where
different perspectives are valued will encourage reflection on which questions are
not being addressed and why. Perhaps these questions are hidden by the particular
paradigm at work. Perhaps they are not questions that are politically important to
those in more powerful positions. Perhaps they hint at potentially painful
experiences, not often spoken of or dealt with openly in our society. Encourage
staff and the evaluation team to continuously question the questions, and to ask
what is still missing. Additionally, review whether you are addressing the
following questions:
- How does this program work?
- Why has it worked or not worked? For whom and in what circumstances?
- What was the process of development and implementation?
- What were the stumbling blocks faced along the way?
- What do the experiences mean to the people involved?
- How do these meanings relate to intended outcomes?
- What lessons have we learned about developing and implementing this program?
- How have contextual factors impacted the development, implementation, success, and stumbling blocks of this program?
- What are the hard-to-measure impacts of this program (ones that cannot be easily quantified)? How can we begin to effectively document these impacts?
Recommendation 3. Take action to deal with the effects of paradigms, politics, and
values. Perhaps more important than understanding all of the factors that can
impact the evaluation process is taking specific actions to deal with these issues, so that you and your evaluation staff can achieve a fuller understanding of your
project and how and why it is working.The following tips can be used by
project directors and their evaluation staff to deal with the influence of
paradigms, politics, and values:
- Get inside the project––understand its roles, responsibilities, organizational structure, history, and goals; and how politics, values, and paradigms affect the project’s implementation and impact.
- Create an environment where all stakeholders are encouraged to discuss their values and philosophies.
- Challenge your assumptions. Constantly look for evidence that you are wrong.
- Ask other stakeholders for their perspectives on particular issues. Listen.
- Remember there may be multiple “right” answers.
- Maintain regular contact and provide feedback to stakeholders, both internal and external to the project.
- Involve others in the process of evaluation and try to work through any resistance.
- Design specific strategies to air differences and grievances.
- Make the evaluation and its findings useful and accessible to project staff and clients. Early feedback and a consultative relationship with stakeholders and project staff leads to a greater willingness by staff to disclose important and sensitive information to evaluators.
- Be sensitive to the feelings and rights of individuals.
- Create an atmosphere of openness to findings, with a commitment to considering change and a willingness to learn.
Each of these areas may be addressed by providing relevant reading materials;
making formal or informal presentations; using frequent memos; using committees
composed of staff members, customers, or other stakeholders; setting interim goals
and celebrating achievements; encouraging flexibility; and sharing alternative
viewpoints.These tips will help you deal with political issues, bring multiple sets
of values, paradigms and philosophies onto the table for examination and more
informed decision making, and will help foster an open environment where it is
safe to talk honestly about both the strengths and weaknesses of the project.
References
- ↑ W.K. Kellogg Foundation Evaluation Handbook @ http://www.wkkf.org/Pubs/Tools/Evaluation/Pub770.pdf
- ↑ Cronbach, Lee J. and Associates, Toward Reform of Program Evaluation, San Francisco: Jossey-Bass, 1980. (p.12)
- ↑ 3.0 3.1 Patton, Michael Quinn, Practical Evaluation, Newbury Park, CA: Sage Publications, 1982., (p. 11)
- ↑ 4.0 4.1 4.2 Connell, P. James,Anne C. Kubisch, Lisbeth B. Schorr, and Carol H.Weiss, New Approaches to Evaluating Communities Initiatives: Concepts, Methods, and Contexts,Washington, DC: The Aspen Institute, 1995.
- ↑ 5.0 5.1 Schorr, Lisbeth B., and Anne C. Kubisch,“New Approaches to Evaluation: Helping Sister Mary Paul, Geoff Canada, and Otis Johnson While Convincing Pat Moynihan, Newt Gingrich, and the American Public,” Presentation. Annie E. Casey Foundation Annual Research/Evaluation Conference: Using Research and Evaluation Information to Improve Programs and Policies, September 1995.
- ↑ Maxwell, J.A. and Y.S. Lincoln, Methodology and Epistemology: A Dialogue, Harvard Educational Review, 60(4), P.497-512, 1990.
- ↑ Nielson, Joyce M.,“Introduction” in J. Nielson (Ed.) Feminist Research Methods, Boulder: Westview Press, 1990.
- ↑ Chen, Huey-tsyh, Theory-Driven Evaluations, California: Sage Publications, 1990.
- ↑ 9.0 9.1 Weiss, Carol H., “Nothing as Practical as Good Theory: Exploring Theory-Based Evaluation for Comprehensive Community Initiatives for Children and Families.” In James Connell, et al. (ed.), New Approaches to Evaluating Community Initiatives: Concepts, Methods and Contexts,Washington D.C.: The Aspen Institute, 1995.
