Survey Sampling Methods<ref name="Ross">Ross, Keneth C. (1996). Air University Sampling and Surveying Handbook: Guidelines for planning, conducting, and organizing surveys.Retrieved September 9, 2006, from http://www.au.af.mil/au/awc/awcgate/edref/smpl-srv.pdf.</ref>
Your overarching goal in doing a survey is to determine what some group thinks or feels about some issue. If money, time, or other resources were not a concern, the most accurate data you could get would come from surveying the entire population of interest. Since limited resources are a reality we all have to deal with, however, we are often forced to survey the views of only a few members of the population. But never lose sight of the fact that the real purpose is to discover the views of the entire population. Obviously, then, we want to be able to say with as much confidence as possible that the views of the group we surveyed represents the views of the entire population. Using a combination of powerful statistical tools known as inferential statistics and unbiased sampling techniques, any surveyor can collect data that actually represent the views of the entire population from which the sample was taken. Two things are absolutely necessary, however, to ensure a high level of confidence that the sample represents the population:
- an unbiased sample
- a sufficiently large sample
Bias as a statistical term means error. To say that you want an unbiased sample may sound like you're trying to get a sample that is errorfree. As appealing as this notion may be, it is impossible to achieve! Error always occurs -- even when using the most unbiased sampling techniques. One source of error is caused by the act of sampling itself. To understand it, consider the following example.
Let's say you have a bowl containing ten slips of paper. On each slip is printed a number, one through ten. This is your “population.” Now you are going to select a sample. We will use a random method for drawing the sample, which can be done easily by closing your eyes and reaching into the bowl and choosing one slip of paper. After choosing it, check the number on it and place it in the sample pile. Now to determine if the sample is representative of the population, we must know what attribute(s) we wish to make representative. Since there are an infinite number of human attributes, we must precisely determine the one(s) we are interested in before choosing the sample. In our example, the attribute of interest will be the average numerical value on the slips of paper. Since the “population” contained ten slips numbered consecutively from one to ten, the average numerical value in the population is:
As you can see, no matter what slip of paper we draw as our first sample selection, it's value will be either lower or higher than the population average. Let's say the slip we choose first has a 9 on it. The difference between our sample (9) and the population (5.5) averages is +3.5 (plus signifies the sample average is larger than the population average). The difference between the sample average and the population average is known as sampling error. That is, the sample mean (average) plus (or minus) the total amount of sampling error equals the population mean.
On our second pick, we choose a slip that has a 1 on it. Now the average of sample values is:
The sampling error has shrunk from its previous value of + 3.5 to its new value of - 0.5 (minus signifies the sample mean is now smaller than the population mean). Each time we choose a slip from the population to include in the sample, one of three mutually exclusive things can occur -- the sample mean will become:
- larger than the population mean
- smaller than the population mean
- equal to the population mean
On average, each sampling brings the sample mean a bit closer to the population mean. Ultimately, if we sampled everyone from the population, the sample mean and the population mean would be equal. This is why a complete census is completely accurate - there is no sampling error. Yet, if we are forced to use only a sample from the population, the larger the sample the less sampling error we will have, generally speaking. Equally important to the size of the sample is the determination of the type of sampling to be done. In our example, we randomly (blindly) chose from the population. Random sampling always produces the smallest possible sampling error. In a very real sense, the size of the sampling error in a random sample is affected only by random chance. The two most useful random sampling techniques are simple random and stratified random sampling methods. These will discussed shortly. Because a random sample contains the least amount of sampling error, we may say that it is an unbiased sample. Note that we are not saying the sample contains no error, but rather the minimum possible amount of error.
Nonrandom sampling techniques also exist, and are used more frequently than you might imagine. As you can probably guess from our previous discussion, nonrandom sampling techniques will always produce larger sampling errors (for the same sample size) than random techniques. The reason for this is that nonrandom techniques generate the expected random sampling error on each selection plus additional error related to the nonrandom nature of the selection process. To explain this, let's extend our sampling example from above.
Let's say we want to sample from a “population” of 1000 consecutively numbered slips of paper. Because numbering these slips is time consuming, we have 10 people each number 100 slips and place all 100 of them into our bowl when they finish. Let's also say that the last person to finish has slips numbered from 901 to 1000, and these are laid on top of all the other slips in the bowl. Now we are ready to select them. If we wanted to make this a truly random sampling process, we would have to mix the slips in the bowl thoroughly before selecting. Furthermore, we would want to reach into the bowl to different depths on subsequent picks to make sure every slip had a fair chance of being picked. But, let us say in this example that we forget to mix the slips in the bowl. Let's also say we only pick from the top layer of slips. It should be obvious what will occur. Because the top layer of slips is numbered 901 through 1000, the mean of any sample (of 100 or less) we select will hover around 950.5 (the true mean of the numbers 901 through 1000). Clearly, this is not even close to the true population mean (500.5 -- the mean of the numbers from 1 to 1000). Sampling error amounts to the difference between the true population mean and the sample mean. In this example, the sampling error can as large as 450 (950.5 - 500.5).
This was a simple, and somewhat absurd, example of nonrandom sampling. But, it makes the point. Nonrandom sampling methods usually do not produce samples that are representative of the general population from which they are drawn. The greatest error occurs when the surveyor attempts to generalize the results of the survey obtained from the sample to the entire population. Such an error is insidious because it is not at all obvious from merely looking at the data, or even from looking at the sample. The easiest way to recognize whether a sample is representative or not is to determine if the sample was selected randomly. To be a random sampling method, two conditions must be met. If both are met, the resulting sample is random. If not, it is a nonrandom sampling technique:
- every member in the population must have an equal opportunity of being selected,
- the selection of any member of the population must have no influence on the selection of any other member
All nonrandom sampling methods violate one or both of these criteria. The most commonly used nonrandom methods are:
- systematic sampling (selecting every nth person from a group)
- cluster sampling (selecting groups of members rather than single members)
- convenience or incidental sampling (selecting only readily available members)
- judgment or purposive sampling (selecting members who are judged to be appropriate for the study)
Simple Random Sampling
A simple random sample is one in which each member (person) in the total population has an equal chance of being picked for the sample. In addition, the selection of one member should in no way influence the selection of another. Simple random sampling should be used with a homogeneous population, that is, one composed of members who all possess the same attribute you are interested in measuring. In identifying the population to be surveyed, homogeneity can be determined by asking the question, “What is (are) the common characteristic(s) that are of interest?” These may include such characteristics as age, sex, rank/grade, position, income, religious or political affiliation, etc. -- whatever you are interested in measuring.
The best way to choose a simple random sample is to use a random number table (or let a computer generate a series of random numbers automatically). In either case, you would assign each member of the population a unique number (or perhaps use a number already assigned to them such as SSAN, telephone number, zip code, etc.). The members of the population chosen for the sample will be those whose numbers are identical to the ones extracted from the random number table (or computer) in succession until the desired sample size is reached. An example of a random number table and instructions for its use appear in Appendix D. Many statistical texts or mathematical tables treat random number generation. A less rigorous procedure for determining randomness is to write the name of each member of the population on a separate card, and with continuous mixing, draw out cards until the sample size is reached.
The simple random sample requires less knowledge about the population than other techniques, but it does have two major drawbacks. One is if the population is large, a great deal of time must be spent listing and numbering the members. The other is the fact that a simple random sample will not adequately represent many population attributes (characteristics) unless the sample is relatively large. That is, if you are interested if choosing a sample to be representative of a population on the basis of the distribution in the population of gender, age, and economic status, a simple random sample will need to be very large to ensure all these distributions are equivalent to (or representative of) the population. To obtain a representative sample across multiple population attributes, you should use the technique of stratified random sampling.
To determine if the sampling method you use is random or not, remember that true random sampling methods must meet two criteria:
- every member in the population must have an equal opportunity of being chosen for the sample (equality)
- the selection of one member is not affected by the selection of previous members (independence)
Both simple random and stratified random sampling methods meet these two criteria. Nonrandom sampling methods lack one or both of these criteria.
Stratified Random Sampling
This method is used when the population is heterogeneous rather than homogeneous (or as discussed above, when you want to obtain a representative sample across many population attributes). A heterogeneous population is composed of unlike elements; such as, officers of different ranks, civilians and military personnel, or the patrons of a discount store (differing by gender or age).
A stratified random sample is defined as a combination of independent samples selected in proper proportions from homogeneous groups within a heterogeneous population. The procedure calls for categorizing the heterogeneous population into groups that are homogeneous in themselves. If one group is proportionally larger than another, its sample size should also be proportionally larger. The number of groups to be considered is determined by the characteristics of the population. Many times the survey plan will determine some or all of the groups. For example, if you are comparing enlisted and officer segments on your base, each of these will be a separate group.
After dividing the population into groups, you then sample each homogeneous group. Different sampling techniques can be used in each of the different groups, but keep in mind that random techniques produce the minimum amount of sampling error. Finally, you should calculate the sample statistics for each group to determine how many members you need from each subgroup.
These calculations are designed to determine the size of a simple random sample. Since the stratified sampling technique requires you to create simple, homogeneous subgroups from a large heterogeneous group, think of the calculations for a stratified sample as a series of simple random sample size calculations for each homogeneous subgroup. The only other information you must know is the proportion of the population possessing the attribute contained in each homogeneous subgroup.
For example, let's say we want to draw a random sample from a population of military personnel to assess their opinions on some issue. In addition, we would like to determine if the opinions differ by officerenlisted affiliation and gender of the individuals surveyed. We recognize that the population we want to draw our sample from is heterogeneous with respect to the two attributes of interest to us. So, we have to create homogeneous subgroups (four to be exact):
- Enlisted, male
- Enlisted, female
- Officer, male
- Officer, female
Now, each group is homogeneous on both attributes. To ensure each subgroup in the sample will represent its counterpart subgroup in the population, we must ensure each subgroup is represented in the sample in the same proportion to the other subgroups as they are in the population. Let's assume that we know (or can estimate) the population of Air Force military personnel to be distributed as follows: 70 percent male, 30 percent female and 65 percent enlisted, 35 percent officer. With that, we can determine the approximate proportions of our four homogeneous subgroups in the population:
- Enlisted, male .65 x .70 = .455
- Enlisted, female .65 x .30 = .195
- Officer, male .35 x .70 = .245
- Officer, female .35 x .30 = .105
Thus, a representative sample of the Air Force population (by race and enlisted-officer affiliation) would be composed of 45.5 percent enlisted males, 19.5 percent enlisted females, 24.5 percent officer males, and 10.5 percent officer females. Each percentage should be multiplied by the total sample size needed to arrive at that actual number of personnel required from each subgroup or stratum.
As this example illustrates, stratified random sampling requires a detailed knowledge of the distribution of attributes or characteristics of interest in the population to determine the homogeneous groups that lie within it. A stratified random sample is superior to a simple random sample since the population is divided into smaller homogeneous groups before sampling, and this yields less variation within the sample. This makes possible the desired degree of accuracy with a smaller sample size. But, if you cannot accurately identify the homogeneous groups, you are better off using the simple random sample since improper stratification can lead to serious error.
Sometimes it is more expeditious to collect a sample of survey participants systematically. This is frequently done, for instance, in exit polling of voters or store customers. It is a nonrandom sampling technique, but is used primarily for its ease and speed of identifying participants.
To use the systematic approach, simply choose every Kth member in the population where K is equal to the population size divided by the required sample size. If this quotient has a remainder, ignore it (round down). For example, if you need 100 members in your sample and the population consists of 1000 people, you need to sample every 1000/100 (or 10th) member of the population. When using this method, some suggest you should choose your starting point at random by choosing a random number from 1 to K.
If you recall the characteristic requirements for a random sample discussed above (equality and independence), you can see that systematic sampling methods lack both characteristics. Every member from the population does not have a equal chance of being selected, and the selection of members for the sample depends on the initial selection. Regardless of how you select your starting point, once selected, every subsequent member of the sample is automatically determined. This method is clearly nonrandom.
Some suggest that by mixing the population well you can turn this into a random sampling technique. They are wrong. Regardless of how much you mix the population before selecting a starting point, the fact remains that once that point is chosen, further selection of members for the sample is nonrandom (no independence).
Recognize the limitation of this type of sampling. Since it is nonrandom, the resulting sample will not necessarily be representative of the population from which it was drawn. This will affect your ability to confidently generalize results of the survey since you may not be sure to which segment of the population the results will apply. As a word of advice, unless you have experience in systematic sampling techniques, and have full knowledge of the population to be sampled, you should avoid using this method.
Judgement or Purposive Sampling
The final method covered in this guide is the judgment sample. The procedure is simply to ask an expert on the issue being investigated to define the members that should comprise the sample. The representativeness of the sample is determined solely by the judgment of the researcher. Since each member in the population does not have an equal chance of being chosen, a judgment sample is also a nonrandom sampling method. Since the sample does not meet the criterion of randomness - the basis for many statistical sampling applications ( a judgment sample should never be used in a statistical evaluation effort. There are situations when a variation of the judgment sampling method can be argued to be appropriate. In such situations, it goes by the name of purposive sampling. As the name implies, members from the population are selected into the sample to meet some purpose. This type of sampling is used primarily in causal-comparative (ex post facto) research where the researcher is interested in finding a possible cause-and-effect link between two variables, one of which has already occurred. The researcher intentionally selects the samples in such a way that one possesses the causal (independent) variable and one does not. The purpose of the research governs the selection of the sample and, thus, excludes members of the population who do not contribute to that purpose. For our purposes in this guide, suffice it to say that you should never consider using a judgment sampling method.
The types of sampling methods discussed above are only a few of the many available. You will find others in the references listed in the bibliography. Each type is designed to obtain the most representative sample possible from different kinds of populations. Before using any sampling method yourself, first think about the population to which you want to generalize the results of your survey (which population do you want to represent). Then, choose your sample appropriately. If generalizing results is not your aim, any sampling method will do. If generalizing results is important, use only a random sampling method to ensure a high degree of confidence that the results do, in fact, represent those of the whole population loans.
Factors Influencing Sample Size
When you sample you are dealing with only partial information. And you must accept a risk of being wrong when inferring something about the population on the basis of sample information. In the analysis portion of your survey plan, you identify the amount of risk you are willing (or allowed) to take. This amount of risk relates directly to the size of your sample. Simply stated, the less risk you are willing to take, the larger your sample must be. If you cannot accept any risk, you should survey the entire population (take a census).
When determining your risk level, keep in mind the time and cost involved in obtaining the sample size sufficient to achieve the risk level you can accept. You may find it impossible to produce a sample large enough to meet that risk level.
Another factor bearing on sample size is also obtained from your analysis plan. It is the number of groups you are planning to examine within the population. For example, if you are planning to compare two groups (enlisted and officer) on a base (your population), each of the groups must be sampled and each of the samples must be large enough to ensure satisfying your risk level.
Confidence Level and Precision
Risk, as it relates to sample size determination, is specified by two interrelated factors:
- the confidence level
- the precision (or reliability) range.
To minimize risk, you should have a high confidence (say 95 percent) that the true value you seek (the actual value in the population) lies somewhere within a small interval (say + or - 5 percent) around your sample value (your precision). Sawyer<ref name="Sawyer">Sawyer, Lawrence. (November-December 1971) Statistics Confuse Me, Grandfather, Internal Auditor, Vol. 28, No. 6, pp 49-52.</ref> uses a baseball game analogy to explain confidence level, precision range, and their relationship. A baseball pitcher may feel that he can get very few of his pitches (perhaps 10 percent) over the exact center (small precision range) of home plate. But since home plate is 17 inches wide, he may feel that he can get 95 percent of his pitches over the center of the plate with a precision of plus or minus 8 1/2 inches (a 95 percent confidence level). If the plate is widened to 30 inches, he may feel 99 percent confident. So when we widen the range of precision (or reliability), we increase our confidence level. Likewise, if we reduce the range, we reduce our confidence level. Most surveying organizations use a 95 percent confidence level and a ± 5 percent precision level as the absolute minimum.
Determining the Size of the Sample
Once you determine your desired degree of precision and your confidence level, there are several formulas you can use to determine sample size depending on how you plan to report the results of your study. We'll discuss three of them here. If you will be reporting results as percentages (proportions) of the sample responding, use the following formula:
If you will report results as means (averages) of the sample responding, use the following formula:
If you plan to report results in a variety of ways, or if you have difficulty estimating percentage or standard deviation of the attribute of interest, the following formula may be more suitable for use:
We illustrate this formula with the following example. If the total population (N) is 10,000, and you wish a 95% confidence level and ± 5 percent precision level (d = .05, Z = 1.96 from Appendix E), then:
So, a representative sample of 370 (369.98 rounded up) would be sufficient to satisfy your risk level. Inspection of the formula shows that the required sample size will increase most rapidly if:
- the confidence level (Z factor) is increased, or
- the precision level (d) is made smaller.
If you have stratified your population into more than one group, the size of each group will be its proportion (percentage) in the population times the total sample size as computed above. To illustrate, recall our earlier example of four stratified groups. Using the n of 370 calculated above, each of these strata should have the following sample sizes:
- Enlisted, male 370 x .455 = 168.35 = 168
- Enlisted, female 370 x .195 = 72.15 = 72
- Officer, male 370 x .245 = 90.65 = 91
- Officer, female 370 x .105 = 38.85 = 39
Finally, you should adjust the computed sample size (n) by dividing n by the expected response rate. For instance, if you expect 75 percent response rate, you should make your sample size equal . If you can't anticipate a response rate, assume a 50 percent response rate (i.e., double the n value). This sort of adjustment should ensure you get a sufficient number of responses regardless of return rate.