Stratified samplingis effective when the population comprises a number of subgroups (or ‘sub‐populations’) that are thought to have an effect on the data being collected, such as male and female, different age groupings, or ethnic origin. These subgroups are called strata. A stratum(‘layer’) is defined as a collection of individuals (sampling units) that are as alike as possible. For example, the credibility of results from a study of breast cancer would be in doubt if the proportion of premenopausal patients differed between two samples selected for comparison. By defining two strata, namely ‘pre‐menopausal patients’ and ‘not pre‐menopausal patients’, this problem is avoided.
A simple random sample is taken from each stratum. The resulting stratified samples are then more likely to reproduce the characteristics of the population. The two main approaches to deciding how many individuals should be sampled from each stratum are equal allocation and proportional allocation . The first approach results in an equal number of individuals per stratum, while the second provides samples in which the sample sizes from each stratum reflects the sizes of those in the population.
Quota samplingdiffers from stratified sampling in that a simple random sample is not chosen from each stratum. Instead, the sample is obtained by using the most accessible patients, as long as they represent the identified subgroups. For example, if we require details relating to 20 women patients with asthma between 30 and 50 years of age, we do not identify all individuals satisfying these criteria in the population in order to take a simple random sample of these. Rather, we simply select the first 20 individuals who present themselves and fulfil these criteria.
Quota sampling is so called because the number of sampling units (e.g. patients) required in a particular sample is referred to as the quotato be obtained. If making comparisons between different subgroups (e.g. adults and children), the sizes of the sample from each subgroup are usually decided to reflect the proportions in the population. For example, if there are twice as many adults as children in the available population, the quota of adults is twice as large as the children.
The main problem with quota sampling is that accessible individuals may not be representative of the study population. Patients who attend at their GP practice regularly may be different from those who don't, or who are unable to attend through work or other commitments.
Cluster samplinginvolves dividing the population into subgroups called clusters. However, unlike stratified sampling and quota sampling (in which the subjects in a particular stratum or subgroup are meant to be as alike as possible), the objective is to include in each cluster the various characteristics that the population might contain. The rationale for both stratified and quota sampling is the control of factors (e.g. age or sex differences) that are known (or suspected) to confound the response being investigated. In cluster sampling, the idea is not to have a homogeneous group, but one which is representative of the cluster through either a census (100% sample) or, more usually, by taking a representative sample of the cluster.
Cluster sampling is commonly used when the population covers an area that can be divided by region (e.g. GP practices). A small number of these clusters is selected at random (using simple random sampling). Every subject in the chosen clusters is then included in the sample. One key problem with cluster sampling is choosing appropriate clusters.
2.12 Sampling Designs – Summary
Choice of correct survey method is extremely important. The best approaches to sampling from a finite population, as in our asthma example, are to use either a simple random sample or a stratified random sample. Stratification is used when it is known that the response of interest is related to some factor (e.g. age or sex).
Choice of appropriate sampling method is not always obvious, and may involve a mixture of the methods we have described. Always seek advice if you are in doubt, as the cost of advice in relation to the cost of obtaining the sample is very small. Scheaffer et al. (2011) provide a very good introduction to all aspects of survey sampling.
2.13 Statistics and Parameters
Measures that describe a variable of a sample are called statistics. It is from the sample statistics that the parametersof a population are estimated. Thus, the average weight of a sample of new‐born male babies is the statistic that is used to estimate the average weight (parameter) of a population of new‐born male babies. An easy way to remember this is: Statistics is to sample as parameter is to population .
Sometimes populations appear to be rather abstract or hypothetical concepts, in which case their parameters are also hypothetical. We can calculate an ‘average temperature’ from a sample of 10 observations collected from a patient over a day. What exactly is the parameter that this statistic is estimating? It is the hypothetical ‘population’ of all temperature observations that could be made during the observation period.
In estimating a population parameter from a sample statistic, the number of observations in a sample can be critical. Some statistical methods depend upon a minimum number of sampling units, and where this is the case, it should be borne in mind before commencing your study. While it is true that larger samples will invariably result in greater statistical confidence, there is nevertheless a ‘diminishing returns’ effect. In many cases the time, effort and expense involved in collecting very large samples might be better spent in extending the study in other directions. We offer guidance as to what constitutes a suitable sample size for each statistical technique as it arises.
2.14 Descriptive and Inferential Statistics
Descriptive statistics are used to organize, summarize and describe measures of a sample. No predictions or inferences are made regarding population parameters. Inferential(or deductive) statistics, on the other hand, are used to infer or predict population parameters from sample measures. This is done by a process of inductive reasoning based on the mathematical theory of probability. Fortunately, only a very minimal knowledge of the mathematical theory of probability is needed in order to apply the rules of the statistical methods, and the little that is needed will be explained. However, no‐one can predict exactly a population parameter from a sample statistic, but only indicate with a stated degree of confidence within what range it lies. The degree of confidence depends upon the sample selection procedures and the statistical techniques used.
2.15 Parametric and Non‐Parametric Statistics
Statistical methods commonly fall into one of two classes – parametricand non‐parametric. Parametric methods are the oldest, and although most often used by statisticians, may not always be the most appropriate for analysing medical data. Parametric methods make strict assumptions that may not always hold true.
Читать дальше