Survey Sampling
From EvaluationWiki
Contents |
Survey Sampling Methods[1]
Your overarching goal in doing a survey is to determine what some group thinks or feels about some issue. If money, time, or other resources were not a concern, the most accurate data you could get would come from surveying the entire population of interest. Since limited resources are a reality we all have to deal with, however, we are often forced to survey the views of only a few members of the population. But never lose sight of the fact that the real purpose is to discover the views of the entire population. Obviously, then, we want to be able to say with as much confidence as possible that the views of the group we surveyed represents the views of the entire population. Using a combination of powerful statistical tools known as inferential statistics and unbiased sampling techniques, any surveyor can collect data that actually represent the views of the entire population from which the sample was taken. Two things are absolutely necessary, however, to ensure a high level of confidence that the sample represents the population:
- an unbiased sample
- a sufficiently large sample
Bias as a statistical term means error. To say that you want an
unbiased sample may sound like you're trying to get a sample that is errorfree.
As appealing as this notion may be, it is impossible to achieve! Error
always occurs -- even when using the most unbiased sampling techniques.
One source of error is caused by the act of sampling itself. To understand
it, consider the following example.
Let's say you have a bowl containing ten slips of paper. On each
slip is printed a number, one through ten. This is your “population.” Now
you are going to select a sample. We will use a random method for
drawing the sample, which can be done easily by closing your eyes and
reaching into the bowl and choosing one slip of paper. After choosing it,
check the number on it and place it in the sample pile.
Now to determine if the sample is representative of the population,
we must know what attribute(s) we wish to make representative. Since
there are an infinite number of human attributes, we must precisely
determine the one(s) we are interested in before choosing the sample.
In our example, the attribute of interest will be the average
numerical value on the slips of paper. Since the “population” contained ten
slips numbered consecutively from one to ten, the average numerical value
in the population is:
As you can see, no matter what slip of paper we draw as our first sample selection, it's value will be either lower or higher than the population average. Let's say the slip we choose first has a 9 on it. The difference between our sample (9) and the population (5.5) averages is +3.5 (plus signifies the sample average is larger than the population average). The difference between the sample average and the population average is known as sampling error. That is, the sample mean (average) plus (or minus) the total amount of sampling error equals the population mean.
On our second pick, we choose a slip that has a 1 on it. Now the
average of sample values is:
The sampling error has shrunk from its previous value of + 3.5 to
its new value of - 0.5 (minus signifies the sample mean is now smaller than
the population mean). Each time we choose a slip from the population to
include in the sample, one of three mutually exclusive things can occur --
the sample mean will become:
- larger than the population mean
- smaller than the population mean
- equal to the population mean
On average, each sampling brings the sample mean a bit closer to
the population mean. Ultimately, if we sampled everyone from the
population, the sample mean and the population mean would be equal.
This is why a complete census is completely accurate - there is no sampling
error. Yet, if we are forced to use only a sample from the population, the
larger the sample the less sampling error we will have, generally speaking.
Equally important to the size of the sample is the determination of
the type of sampling to be done. In our example, we randomly (blindly)
chose from the population. Random sampling always produces the
smallest possible sampling error. In a very real sense, the size of the
sampling error in a random sample is affected only by random chance. The
two most useful random sampling techniques are simple random and
stratified random sampling methods. These will discussed shortly.
Because a random sample contains the least amount of sampling
error, we may say that it is an unbiased sample. Note that we are not
saying the sample contains no error, but rather the minimum possible
amount of error.
Nonrandom sampling techniques also exist, and are used more
frequently than you might imagine. As you can probably guess from our
previous discussion, nonrandom sampling techniques will always produce
larger sampling errors (for the same sample size) than random techniques.
The reason for this is that nonrandom techniques generate the expected
random sampling error on each selection plus additional error related to the
nonrandom nature of the selection process. To explain this, let's extend our
sampling example from above.
Let's say we want to sample from a “population” of 1000
consecutively numbered slips of paper. Because numbering these slips is
time consuming, we have 10 people each number 100 slips and place all
100 of them into our bowl when they finish. Let's also say that the last
person to finish has slips numbered from 901 to 1000, and these are laid on
top of all the other slips in the bowl. Now we are ready to select them.
If we wanted to make this a truly random sampling process, we
would have to mix the slips in the bowl thoroughly before selecting.
Furthermore, we would want to reach into the bowl to different depths on
subsequent picks to make sure every slip had a fair chance of being picked.
But, let us say in this example that we forget to mix the slips in the
bowl. Let's also say we only pick from the top layer of slips. It should be
obvious what will occur. Because the top layer of slips is numbered 901
through 1000, the mean of any sample (of 100 or less) we select will hover
around 950.5 (the true mean of the numbers 901 through 1000). Clearly,
this is not even close to the true population mean (500.5 -- the mean of the
numbers from 1 to 1000). Sampling error amounts to the difference
between the true population mean and the sample mean. In this example,
the sampling error can as large as 450 (950.5 - 500.5).
This was a simple, and somewhat absurd, example of nonrandom
sampling. But, it makes the point. Nonrandom sampling methods usually
do not produce samples that are representative of the general population
from which they are drawn. The greatest error occurs when the surveyor
attempts to generalize the results of the survey obtained from the sample to
the entire population. Such an error is insidious because it is not at all
obvious from merely looking at the data, or even from looking at the
sample. The easiest way to recognize whether a sample is representative or
not is to determine if the sample was selected randomly. To be a random
sampling method, two conditions must be met. If both are met, the
resulting sample is random. If not, it is a nonrandom sampling technique:
- every member in the population must have an equal opportunity of being selected,
- the selection of any member of the population must have no influence on the selection of any other member
All nonrandom sampling methods violate one or both of these
criteria. The most commonly used nonrandom methods are:
- systematic sampling (selecting every nth person from a group)
- cluster sampling (selecting groups of members rather than single members)
- convenience or incidental sampling (selecting only readily available members)
- judgment or purposive sampling (selecting members who are judged to be appropriate for the study)
Simple Random Sampling
A simple random sample is one in which each member (person) in the total population has an equal chance of being picked for the sample. In addition, the selection of one member should in no way influence the selection of another. Simple random sampling should be used with a homogeneous population, that is, one composed of members who all possess the same attribute you are interested in measuring. In identifying the population to be surveyed, homogeneity can be determined by asking the question, “What is (are) the common characteristic(s) that are of interest?” These may include such characteristics as age, sex, rank/grade, position, income, religious or political affiliation, etc. -- whatever you are interested in measuring.
The best way to choose a simple random sample is to use a random
number table (or let a computer generate a series of random numbers
automatically). In either case, you would assign each member of the
population a unique number (or perhaps use a number already assigned to
them such as SSAN, telephone number, zip code, etc.). The members of
the population chosen for the sample will be those whose numbers are
identical to the ones extracted from the random number table (or
computer) in succession until the desired sample size is reached. An
example of a random number table and instructions for its use appear in
Appendix D. Many statistical texts or mathematical tables treat random
number generation. A less rigorous procedure for determining randomness
is to write the name of each member of the population on a separate card,
and with continuous mixing, draw out cards until the sample size is
reached.
The simple random sample requires less knowledge about the
population than other techniques, but it does have two major drawbacks.
One is if the population is large, a great deal of time must be spent listing
and numbering the members. The other is the fact that a simple random
sample will not adequately represent many population attributes
(characteristics) unless the sample is relatively large. That is, if you are
interested if choosing a sample to be representative of a population on the
basis of the distribution in the population of gender, age, and economic
status, a simple random sample will need to be very large to ensure all
these distributions are equivalent to (or representative of) the population.
To obtain a representative sample across multiple population attributes,
you should use the technique of stratified random sampling.
To determine if the sampling method you
use is random or not, remember that true random sampling methods must
meet two criteria:
- every member in the population must have an equal opportunity of being chosen for the sample (equality)
- the selection of one member is not affected by the selection of previous members (independence)
Both simple random and stratified random sampling methods meet
these two criteria. Nonrandom sampling methods lack one or both of these
criteria.
Stratified Random Sampling
This method is used when the population is heterogeneous rather than homogeneous (or as discussed above, when you want to obtain a representative sample across many population attributes). A heterogeneous population is composed of unlike elements; such as, officers of different ranks, civilians and military personnel, or the patrons of a discount store (differing by gender or age).
A stratified random sample is defined as a combination of
independent samples selected in proper proportions from homogeneous
groups within a heterogeneous population. The procedure calls for
categorizing the heterogeneous population into groups that are
homogeneous in themselves. If one group is proportionally larger than
another, its sample size should also be proportionally larger. The number
of groups to be considered is determined by the characteristics of the
population. Many times the survey plan will determine some or all of the
groups. For example, if you are comparing enlisted and officer segments
on your base, each of these will be a separate group.
After dividing the population into groups, you then sample each
homogeneous group. Different sampling techniques can be used in each of
the different groups, but keep in mind that random techniques produce the
minimum amount of sampling error. Finally, you should calculate the
sample statistics for each group to determine how many members you need
from each subgroup.
These calculations are designed to
determine the size of a simple random sample. Since the stratified sampling
technique requires you to create simple, homogeneous subgroups from a
large heterogeneous group, think of the calculations for a stratified sample
as a series of simple random sample size calculations for each
homogeneous subgroup. The only other information you must know is the
proportion of the population possessing the attribute contained in each
homogeneous subgroup.
For example, let's say we want to draw a random sample from a
population of military personnel to assess their opinions on some issue. In
addition, we would like to determine if the opinions differ by officerenlisted
affiliation and gender of the individuals surveyed. We recognize
that the population we want to draw our sample from is heterogeneous
with respect to the two attributes of interest to us. So, we have to create
homogeneous subgroups (four to be exact):
- Enlisted, male
- Enlisted, female
- Officer, male
- Officer, female
Now, each group is homogeneous on both attributes. To ensure
each subgroup in the sample will represent its counterpart subgroup in the
population, we must ensure each subgroup is represented in the sample in
the same proportion to the other subgroups as they are in the population.
Let's assume that we know (or can estimate) the population of Air Force
military personnel to be distributed as follows: 70 percent male, 30 percent
female and 65 percent enlisted, 35 percent officer. With that, we can
determine the approximate proportions of our four homogeneous
subgroups in the population:
- Enlisted, male .65 x .70 = .455
- Enlisted, female .65 x .30 = .195
- Officer, male .35 x .70 = .245
- Officer, female .35 x .30 = .105
Thus, a representative sample of the Air Force population (by race
and enlisted-officer affiliation) would be composed of 45.5 percent enlisted
males, 19.5 percent enlisted females, 24.5 percent officer males, and 10.5
percent officer females. Each percentage should be multiplied by the total
sample size needed to arrive at that actual number of personnel required
from each subgroup or stratum.
As this example illustrates, stratified random sampling requires a
detailed knowledge of the distribution of attributes or characteristics of
interest in the population to determine the homogeneous groups that lie
within it. A stratified random sample is superior to a simple random
sample since the population is divided into smaller homogeneous groups
before sampling, and this yields less variation within the sample. This
makes possible the desired degree of accuracy with a smaller sample size.
But, if you cannot accurately identify the homogeneous groups, you are
better off using the simple random sample since improper stratification can
lead to serious error.
Systematic Sampling
Sometimes it is more expeditious to collect a sample of survey participants systematically. This is frequently done, for instance, in exit polling of voters or store customers. It is a nonrandom sampling technique, but is used primarily for its ease and speed of identifying participants.
To use the systematic approach, simply choose every Kth member in
the population where K is equal to the population size divided by the
required sample size. If this quotient has a remainder, ignore it (round
down). For example, if you need 100 members in your sample and the
population consists of 1000 people, you need to sample every 1000/100 (or
10th) member of the population. When using this method, some suggest
you should choose your starting point at random by choosing a random
number from 1 to K.
If you recall the characteristic requirements for a random sample
discussed above (equality and independence), you can see that systematic
sampling methods lack both characteristics. Every member from the
population does not have a equal chance of being selected, and the
selection of members for the sample depends on the initial selection.
Regardless of how you select your starting point, once selected, every
subsequent member of the sample is automatically determined. This
method is clearly nonrandom.
Some suggest that by mixing the population well you can turn this
into a random sampling technique. They are wrong. Regardless of how
much you mix the population before selecting a starting point, the fact
remains that once that point is chosen, further selection of members for the
sample is nonrandom (no independence).
Recognize the limitation of this type of sampling. Since it is
nonrandom, the resulting sample will not necessarily be representative of
the population from which it was drawn. This will affect your ability to
confidently generalize results of the survey since you may not be sure to
which segment of the population the results will apply. As a word of
advice, unless you have experience in systematic sampling techniques, and
have full knowledge of the population to be sampled, you should avoid
using this method.
Judgement or Purposive Sampling
The final method covered in this guide is the judgment sample. The procedure is simply to ask an expert on the issue being investigated to define the members that should comprise the sample. The representativeness of the sample is determined solely by the judgment of the researcher. Since each member in the population does not have an equal chance of being chosen, a judgment sample is also a nonrandom sampling method. Since the sample does not meet the criterion of randomness - the basis for many statistical sampling applications ( a judgment sample should never be used in a statistical evaluation effort. There are situations when a variation of the judgment sampling method can be argued to be appropriate. In such situations, it goes by the name of purposive sampling. As the name implies, members from the population are selected into the sample to meet some purpose. This type of sampling is used primarily in causal-comparative (ex post facto) research where the researcher is interested in finding a possible cause-and-effect link between two variables, one of which has already occurred. The researcher intentionally selects the samples in such a way that one possesses the causal (independent) variable and one does not. The purpose of the research governs the selection of the sample and, thus, excludes members of the population who do not contribute to that purpose. For our purposes in this guide, suffice it to say that you should never consider using a judgment sampling method.
The types of sampling methods discussed above are only a few of the many available. You will find others in the references listed in the bibliography. Each type is designed to obtain the most representative sample possible from different kinds of populations. Before using any sampling method yourself, first think about the population to which you want to generalize the results of your survey (which population do you want to represent). Then, choose your sample appropriately. If generalizing results is not your aim, any sampling method will do. If generalizing results is important, use only a random sampling method to ensure a high degree of confidence that the results do, in fact, represent those of the whole population.
Factors Influencing Sample Size
When you sample you are dealing with only partial information. And you must accept a risk of being wrong when inferring something about the population on the basis of sample information. In the analysis portion of your survey plan, you identify the amount of risk you are willing (or allowed) to take. This amount of risk relates directly to the size of your sample. Simply stated, the less risk you are willing to take, the larger your sample must be. If you cannot accept any risk, you should survey the entire population (take a census).
When determining your risk level, keep in mind the time and cost
involved in obtaining the sample size sufficient to achieve the risk level you
can accept. You may find it impossible to produce a sample large enough
to meet that risk level.
Another factor bearing on sample size is also obtained from your
analysis plan. It is the number of groups you are planning to examine
within the population. For example, if you are planning to compare two
groups (enlisted and officer) on a base (your population), each of the
groups must be sampled and each of the samples must be large enough to
ensure satisfying your risk level.
Confidence Level and Precision
Risk, as it relates to sample size determination, is specified by two interrelated factors:
- the confidence level
- the precision (or reliability) range.
To minimize risk, you should have a high confidence (say 95
percent) that the true value you seek (the actual value in the population)
lies somewhere within a small interval (say + or - 5 percent) around your
sample value (your precision). Sawyer[2] uses a baseball game
analogy to explain confidence level, precision range, and their relationship.
A baseball pitcher may feel that he can get very few of his pitches (perhaps
10 percent) over the exact center (small precision range) of home plate.
But since home plate is 17 inches wide, he may feel that he can get 95
percent of his pitches over the center of the plate with a precision of plus or
minus 8 1/2 inches (a 95 percent confidence level). If the plate is widened
to 30 inches, he may feel 99 percent confident. So when we widen the
range of precision (or reliability), we increase our confidence level.
Likewise, if we reduce the range, we reduce our confidence level. Most
surveying organizations use a 95 percent confidence level and a ± 5 percent
precision level as the absolute minimum.
Determining the Size of the Sample
Once you determine your desired degree of precision and your confidence level, there are several formulas you can use to determine sample size depending on how you plan to report the results of your study. We'll discuss three of them here. If you will be reporting results as percentages (proportions) of the sample responding, use the following formula:
If you will report results as means (averages) of the sample
responding, use the following formula:
If you plan to report results in a variety of ways, or if you have
difficulty estimating percentage or standard deviation of the attribute of
interest, the following formula may be more suitable for use:
We illustrate this formula with the following example. If the total
population (N) is 10,000, and you wish a 95% confidence level and ± 5
percent precision level (d = .05, Z = 1.96 from Appendix E), then:
So, a representative sample of 370 (369.98 rounded up) would be
sufficient to satisfy your risk level. Inspection of the formula shows that
the required sample size will increase most rapidly if:
- the confidence level (Z factor) is increased, or
- the precision level (d) is made smaller.
If you have stratified your population into more than one group, the
size of each group will be its proportion (percentage) in the population
times the total sample size as computed above. To illustrate, recall our
earlier example of four stratified groups. Using the n of 370 calculated
above, each of these strata should have the following sample sizes:
- Enlisted, male 370 x .455 = 168.35 = 168
- Enlisted, female 370 x .195 = 72.15 = 72
- Officer, male 370 x .245 = 90.65 = 91
- Officer, female 370 x .105 = 38.85 = 39
Finally, you should adjust the computed sample size (n) by dividing
n by the expected response rate. For instance, if you expect 75 percent
response rate, you should make your sample size equal . If you can't
anticipate a response rate, assume a 50 percent response rate (i.e., double
the n value). This sort of adjustment should ensure you get a sufficient
number of responses regardless of return rate.
References
- ↑ Ross, Keneth C. (1996). Air University Sampling and Surveying Handbook: Guidelines for planning, conducting, and organizing surveys.Retrieved September 9, 2006, from http://www.au.af.mil/au/awc/awcgate/edref/smpl-srv.pdf.
- ↑ Sawyer, Lawrence. (November-December 1971) Statistics Confuse Me, Grandfather, Internal Auditor, Vol. 28, No. 6, pp 49-52.
