Small and/or Pilot Studies
by Paul W. Stewart, PhD
UNC GCRC and UNC Biostatistics Department
Chapel Hill, NC
paul_stewart@unc.edu
What is a Pilot Study?
What is a Small Exploratory Study?
Dealing with Small Studies
Example Review of a Proposal: Pilot Study or Small Exploratory Study?
Small-scale studies, small exploratory studies, pilot studies and feasibility studies are not exempt from the necessity of providing a clear and well-reasoned rationale for the number of subjects to be studied. The distinctions among pilot studies, feasibility studies, and small exploratory studies are often blurred. This is unfortunate because these studies are not the same in regard to design issues, analysis strategy and sample size considerations.
The term 'pilot study' is broadly used and misused. In the pharmaceutical industry some use it to mean 'not a pivotal study'. Medical investigators frequently use the term broadly to mean 'not the full-scale study'.
However, 'pilot study' may also refer to a preparatory investigation that is in no way intended to test the research hypotheses of interest. For example, collecting a small amount of data, per protocol, in order to assess the cost of operations addresses financial feasibility of the future full-scale study. Similarly, it is usually the case that a small sample can provide adequate, approximate information about variances and correlations needed for analysis of power and sample size for the future full-scale study.
My preference is to define "pilot study" as a small preparatory investigation that is in no way intended to directly investigate or test the research hypotheses of interest. As such, most pilot studies are not publishable - although there certainly are exceptions.
This definition facilitates making a distinction between a "pilot study" and a "small-scale investigation of the research hypotheses". Small-scale studies are discussed below.
Abstracted from website of Medstar Research Institute
Invocation of the term "pilot study" does not exempt the investigators from justifying their choice of sample size. The term pilot studyhas been overused. Characteristics of a pilot study include the following:
- Should justify the number of subjects required.
- Designed to answer the question "Is a trial/experiment worth pursuing?"
- Must provide details on how the decision of pursuing an experiment will be made.
- Must give evidence for designation as a "pilot study"; one of three reasons may apply:
- To learn how to do a new procedure. For example, to simultaneously try out (and correct problems in) a new questionnaire, de-bug a new data-entry system.
- To establish estimates of variances, correlations, and/or differences for use in power calculations that will guide selection of a sample size for the full-scale study.
- To evaluate the total cost or timeliness of doing the experiment. In this case a sample size even as low as N=1 may be sufficient.
Abstracted from "Guidelines for clinical trials in Helicobacter pylori infection" (1997). Working Party of the European Helicobacter Pylori Study Group. Gut 1997;41(Suppl 2)SI-S9
"PILOT STUDY. A pilot study should be viewed as an early study to test the logistic feasibility of the procedures that will take place in the definitive study. Usually, such studios are too small to help decide whether to abandon the treatment concept or institute a definitive study. Normally, such studies should not be published. Sometimes the term "pilot study" is used as a label to excuse a poorly conducted trial; such studies should not be reported on the same basis as studies that purport to be definitive. The report should only state that the treatment is practicable, with the potential for effective use, and that it will undergo a definite evaluation in a properly designed clinical trial."
Abstracted from Nursing Research: Principles and Methods, 7th Edition (2004) by Denise F. Polit, Cheryl Tatano
"A pilot study is not the same as a small-scale study. The term pilot study has been misused by some researchers who appear to use it as an excuse for not using a bigger sample (King, 2001). The purpose of a pilot study is not so much to test research hypotheses, but rather to test protocols, data collection instruments, sample recruitment strategies, and other aspects of a study in preparation for a larger study."
Abstracted from the website of The Organization for Autism Research
"A pilot study is an initial or preliminary investigation designed to test research hypotheses, gather data, and validate the scientific approach and methodology for a particular area of research interest. It is important as a test bed for ideas and as an evaluation and assessment measure before investing further in a major study. Especially for new and up and coming investigators, pilot studies are vital stepping-stones to more significant grants."
back to topInvestigators frequently propose small studies that have aims that investigation or testing of the research hypotheses of interest. These aims go beyond preparatory work Such a study can be labeled "small exploratory study." This is distinct from a "pilot study", as defined above, which is not intended to directly investigate or test the research hypotheses of interest.
Such studies may be useful for generating new hypotheses. When this is the intent, analyses of power, precision and sample size are irrelevant. A few case studies, even N=1, may be all that is needed to provide a sufficient number and quality of new ideas that can be pursued as testable hypotheses in future studies. The results may or may not be publishable - depending on topic, target journal, and other considerations. There is a place for case studies in the literature. Some journals also publish small inconclusive studies whose only value appears to be in that they suggest new hypotheses.
Small exploratory studies can also be used to test or refine existing hypotheses. In this role they are high-risk endeavors; they have adequate power and precision only if there is a very high signal-to-noise ratio; e.g., the magnitude of therapeutic effect on the measure of interest is large relative to the standard deviation of the measure of interest. In the earliest stages of a particular line of research it may be plausible that the effect of interest might be very large. When such an investigation is the intent, analyses of power, precision and sample size are relevant and are useful in quantifying just how "high-risk" the endeavor may be for a given sample size.
Investigators proposing small-scale studies such as these may refer to a desire to conduct a very small putatively under-powered study "just look for trends in the data" or to "obtain a preliminary idea about the magnitude of effect" as an early-stage in a new line of research.
Abstracted from "Guidelines for clinical trials in Helicobacter pylori infection" (1997). Working Party of the European Helicobacter Pylori Study Group. Gut 1997;41(Suppl 2)SI-S9
"SMALL STUDIES. In order to obtain a statistically significant difference between treatments in a small study, a large difference is needed. A study with such a result is much easier to publish than a study of the same size with a non-significant difference between treatments. However, the result of the latter study may never have been presented. Publication bias has been a major problem with H. pylori treatment studies as a result of the abundance of small eradication trials. Studies with fewer than 50 patients per treatment arm may not be worth publishing because the estimates of the true eradication rate for each treatment are too inaccurate. For example, a study with 50 patients per treatment arm and an eradication rate of 90% for treatment A and 80% for treatment B would result in a very wide 95%> confidence interval for the treatment difference. It would extend from 24 percentage units in favour of A to 4 percentage units in favour of B."
back to topIt is not unusual for clinical investigators to have mixed motivations for cobbling together a small-scale study. These motivations can include some or all of the following simultaneously:
- need to report preliminary data in a grant proposal.
- extremely limited seed-grant funds are available.
- need to test procedures, assay methods, estimate standard deviations, etc.
- need to study feasibilities, costs and time requirements.
- their boss told them to collect some data.
- they belive they can combine these data with a those from a second study; if necessary.
And, given that there will be some data in hand, why not also:
- test the main research hypothesis.
- publish the results if at all possible.
It is frequently the case that the study must be very small because of extremely limited funds - or a total lack of funds. Lack of funding to support appropriate data management efforts and colloboration or consultation with a professional statistician is often not a consideration.
In such cases, varying in details, the study may be labeled "a pilot study" by the investigator. Claims may also be made that:
- plans for managing and analyzing the data are not necessary.
- for obvious reasons justification of the proposed sample size are not needed
- the study is belived to be under-powered, but that is not a problem
For GCRC biostatisticians, it is important to explicitly identify the aims of such studies and to give careful consideration to whether the proposed plans for management and analysis of the data are precisely aligned with those aims.
As an example of mis-aligned aims and plans, consider the following case: Preliminary to a larger full-scale clinical trial, a "feasibility study" involving 10 patients is proposed. The investigators state that the one and only purpose of the project is to investigate how well patients will tolerate wearing a new ambulatory heart monitor while receiving an experimental medication. A justification for studying 10 patients was not given. The stated plan for data collection is to download the data from the heart monitors. The proposed plan for analyzing the data is to apply a t-test procedure for comparison of post-treatment heart rate to pre-treatment heart rate. What is wrong with this picture?
If the investigators in this example were serious about studying feasibility then there is a complete mis-alignment between the one and only specific aim and their plans for data collection, data management, and data analysis. This should raise a few concerns, such as the following:
- In this example, the investigators mentioned no measures of 'tolerability' and proposed no plans for bringing statistical methods to bear on the problem of drawing a conclusion as to whether tolerability might be a problem in the future full-scale study. Why not? Are the investigators serious about evaluating how well the device is tolerated?
- The investigators plan to apply a simple and familiar statistical method to a comparison that is potentially publishable. And why not? There is a small chance (at least 5%) that this easy t-test will pay off; that is, if statistically significant, a manuscript can and will be submitted for publication. This intent represents a second specific aim. Is this implicit aim influencing the choice of sample size?
- It is not clear that the proposed sample size, N=10 patients, is appropriate and ethical. What were the primary considerations and what should be the primary considerations in the choice of sample size for this study? Potential considerations to be weighed in the balance might include budget, patient availability, research burden per patient, precision for the stated primary aim, power for the unstated second aim. In any case, this project is not exempt from the necessity of providing a clear and well-reasoned rationale for the number of subjects to be studied.
In this example, many other issues might also warrant attention. For example, should the plans for analysis also include estimatation of quantities such as standard deviations that might be needed for planning the full-scale study?
back to topGCRC # ________
P.I. ______________________
Summary
"Pilot Study?" This proposal is for an exploratory, single-center, uncontrolled, non-randomized, observational, longitudinal study investigating the physiologic outcomes associated with [
drug] induced hyperprolactinemia. N=30 female patients in 3 age strata (pediatric, geriatric, other) and in 2 treatment strata ( [
drug], other drugs). All patients will be enrolled in the study as they are started on [
drug] by their primary care-giver. The investigators desire a minimum of 10 subjects to be on drugs of types other than that of [
drug]. The investigators anticipate that at least 7 pediatric and 7 geriatric subjects will be enrolled. This would yield 16 adults (not geriatric). However, there will be no requirements or constraints proposed to ensure that these desired targets will be met.
Measurements will be made at 0, 8 and 24 weeks. Descriptive statistics (mean, standard deviation, percentiles, histograms, and graphical displays) will be computed for all outcome variables. Primary analyses will focus on pre-treatment to post-treatment changes in (1) bone density, (2) gonadal hormones, and (3) prolactin level. Secondarily, the relationship between subjective symptoms and biological measurements will be examined using graphic descriptive methods. No formal inferences are proposed in the analysis plans.
The proposal states that the project is a pilot study. However, details of the analysis plans go beyond the usual scope of a pilot study. This study is more accurately described as a small-scale exploratory study that has the intent of detecting large-magnitude effects ('trends'), and has the intent of generating and refining hypotheses.
Support
Statistical expertise, study coordination, and statistical computations for RDM and statistical analysis will be provided by the research team.
Concerns
The sample size may be too large. Adequate justification for the proposed sample size was not provided. Small exploratory studies, and pilot studies, are not exempt from the necessity of providing a clear and well-reasoned rationale for the number of subjects to be studied. It is not clear why 16 mid-aged adult patients should be needed when 7 geriatric subjects and 7 pediatric subjects are, apparently, considered adequate.
Suggestions
None
back to top
Copyright 2004, Paul W. Stewart, Ph.D.
Contact from other GCRC Biostatisticians who are interested in commenting on this document or collaborating on its further development is welcomed. paul_stewart@unc.edu