Review
The bootstrap: A technique for data-driven statistics. Using computer-intensive analyses to explore experimental data

https://doi.org/10.1016/j.cccn.2005.04.002Get rights and content

Abstract

Background

The concept of resampling data – more commonly referred to as bootstrapping – has been in use for more than three decades. Bootstrapping has considerable theoretical advantages when it is applied to non-Gaussian data. Most of the published literature is concerned with the mathematical aspects of the bootstrap but increasingly this technique is being utilized in medical and other fields.

Methods

I reviewed the published literature following a 1994 publication assessing the transfer of technology, including the bootstrap, to the biomedical literature.

Results

In the ten-year period following that 1994 paper there were 1679 published references to the technique in Medline. In that same time period the following citations were found in the four major medical journals—British Medical Journal (48), JAMA (51), Lancet (52) and the New England Journal of Medicine (45).

Content

I introduce the basic theory of the bootstrap, the jackknife, and permutation tests. The bootstrap is used to estimate the accuracy of an estimator such as the standard error, a confidence interval, or the bias of an estimator. The technique may be useful for analysing smallish expensive-to-collect data sets where prior information is sparse, distributional assumptions are unclear, and where further data may be difficult to acquire. Some of the elementary uses of bootstrapping are illustrated by considering the calculation of confidence intervals such as for reference ranges or for experimental data findings, hypothesis testing such as comparing experimental findings, linear regression, and correlation when studying association and prediction of variables, non-linear regression such as used in immunoassay techniques, and ROC curve processing.

Conclusions

These techniques can supplement current nonparametric statistical methods and should be included, where appropriate, in the armamentarium of data processing methodologies.

Introduction

In a 1994 review Altman and Goodman [1] identified influential statistical articles and the time pattern of their citations in the medical literature. One such article described the bootstrap [2]—the topic of this review. I used an Ovid Technologies Medline keyword search [“bootstrap” or “resampling”] for the period 1995 to 2004 to assess the subsequent pattern of citations in the medical literature and recovered 1679 references.1 These citations increased year-by-year since 1995 (Fig. 1). I also performed a full-text search (numbers of citations in parenthesis) of research articles in the journals BMJ (48), JAMA (51), Lancet (52), and the New England Journal of Medicine (45) over the same period (due to archive limitations some of these searches were for shorter periods). These findings suggest that bootstrap methods are increasingly being utilised in the medical literature. These techniques have also found wide application in such diverse fields as astronomy, biology, economics, engineering, genetics, molecular biology, and finance [3].

In exploring aspects of the bootstrap in this review I largely used S-Plus, version 6.2 (Insightful Corp.), R, version 1.9 [4], a series of available S-Plus and R libraries [5], [6], [7], Confidence Interval Calculator, version 2 [8], [9], Analyse-it for Microsoft Excel, version 1.72 [10], and CBstat, version 5 [11]. In passing, it is worth noting that it is perfectly possible to implement the bootstrap and jackknife within a spreadsheet program [12], [13], [14] although the random number generator may not be entirely satisfactory (see Random number generators).

All the data used to illustrate this review are typical of the type of data analysed in the practice of clinical chemistry. These data were obtained from Hand et al. [15], Harris and Boyd [16], Beck and Shultz [17], Krzanowski [18], results of proficiency testing of enzyme determinations in Ontario [19], [20], and a study of the rate of removal of lactate dehydrogenase-1 (LD-1) from serum following a myocardial infarction [21].

Section snippets

Parametric and nonparametric statistics

The normal (Gaussian) distribution is characterised by two parameters—the mean and S.D. Statistical methods that assume the Gaussian distribution of data are called parametric. Of course, other probability distributions whose characteristics are defined by one or more parameters can also be analysed by appropriate parametric methods. Nonparametric or distribution-free [22] statistical techniques are used to analyse data that do not assume a particular family of probability distributions. It is

The bootstrap process

What is the bootstrap? Essentially, a set of data is randomly resampled (with replacement, i.e., when an item is sampled it is immediately replaced) multiple times (as many as 10,000 or more times) and statistical conclusions are drawn from this data collection. Excellent elementary accounts of the theory have been provided by Simon and Bruce [24], [25]. More advanced accounts are found in a 1983 Scientific American article by Diaconis and Efron [26] and a 1991 Science article by Efron and

Three bootstrap methods

An early application of the bootstrap was the calculation of confidence intervals of non-Gaussian distributions. By contrast, confidence intervals of Gaussian distributions (or of some other defined distributional framework) were calculated by statistical methods appropriate to the particular distribution being examined. In dealing with confidence intervals of a non-Gaussian univariate population two measurements are of interest—the confidence interval of the median and the confidence interval

The jackknife

The jackknife [44], [45] preceded the concept of the bootstrap The name derives from JW Tukey's suggestion, in an unpublished 1958 manuscript [46], that “The approach … shares two characteristics with a Boy Scout jackknife: (i) wide applicability to many different problems, and (ii) inferiority to special tools for those problems for which special tools have been designed and built” [47].

Consider a data set x = (x1, x2, ….., xn) and an estimator θˆ = s(x). Let x(i) indicate the data set remaining

The combinatorial algebra of the bootstrap

The combinatorial algebra of the bootstrap is quite different from the usual process of sampling without replacement. The illustrative sample consists of 10 atoms numbered from 1 to 10 (Table 1A). It is evident that resampling with replacement produces a very different sample—some atoms are not retrieved at all while others are retrieved several times, such as atoms 3 and 4 when they are present 2- and 3-fold with B = 1 This set of resampled observations constitute a bootstrap pseudo-sample. When

Random number generators

The bootstrap process depends on the random selection of items from the data set using a random number generator as the basis for the selection of the bootstrap pseudo-sample. Thus in the example shown in Table 1A for B = 1 with a pre-defined seed, the atoms 5, 7, and 8 are not selected at all while atoms 1, 2, 6, 9, and 10 are each selected once. The process of random number generation is fraught with theoretical and practical problems [52], [53] and it is probably safe to suggest that there is

Confidence intervals of an L-statistic (univariate data)

The confidence interval of the mean or the median is often required. For the sample mean the standard nonparametric statistical procedure is:

  • calculate the sample mean and S.D.,

  • calculate the sample S.E.M.,

  • obtain the appropriate value of Student's t for n  1 degrees of freedom and the confidence interval required [9],

  • calculate the confidence interval  ± (t × S.E.M.).

By contrast, the bootstrap determination is easier and only requires:

  • bootstrap type (data, statistic = (median, appropriate CI), value of

Journal articles (reviews or tutorials, ordered by year of publication):

Mathematical viewpoint [113], [114]. General biological applications [115], [116], [117], [118], [119], [120], [121], [122], [123]. Applications in specific disciplines—psychophysiology [47], calibration in analytical chemistry [124], cost-effective analysis [125], pharmacoeconomic cost analysis [126], reference interval estimation [127], imprecision profiles in biochemical analysis [128], environmental research [129], probabilistic sensitivity analysis [130], screening for early renal failure

Concluding remarks

The examples illustrated in this article merely touch the surface of the potential of the bootstrap and the jackknife but it is evident that these techniques can supplement and extend conventional statistical thinking. Some of the elementary uses of bootstrapping were illustrated by considering the calculation of confidence intervals such as for reference ranges or for experimental data sets, hypothesis testing such as comparing experimental findings, linear regression and correlation when

Acknowledgements

I am grateful to Dr Frank Harrell (Vanderbilt University), Dr Robert Platt (McGill University), Dr Tim Hesterberg (Insightful Corporation), and Elizabeth Atkinson (Mayo Clinic) for their goodwill and patience in constructively responding to my questions, and to the technical support staff at Insightful Corporation for their advice and assistance.

References (144)

  • Insightful Corporation
  • D.G. Altman et al.

    Confidence interval analysis program

    (2000)
  • Analyze-it for Microsoft Excel, Leeds. UK;...
  • K. Linnet

    CBstat. A program for statistical analysis in clinical biochemistry

    (2004)
  • T.R. Willemain

    Bootstrap on a shoestring: resampling using spreadsheets

    Am Stat

    (1994)
  • B.F.J. Manly
  • E.K. Harris et al.
  • J.R. Beck et al.

    The use of relative operating characteristic (ROC) curves in test performance evaluation

    Arch Pathol Lab Med

    (1986)
  • W.J. Krzanowski
  • A.R. Henderson et al.

    Is determination of creative kinase-2 after electrophoretic separation accurate?

    Clin Chem

    (1994)
  • A.R. Henderson et al.

    Proficiency testing of creatine kinase and creatine kinase-2: the experience of the Ontario Laboratory Proficiency Testing Program

    Clin Chem

    (1998)
  • D.A. Smith et al.

    Determination, by radioimmunoassay, of the mass of lactate dehydrogenase isoenzyme one in human serum and of its rate of removal from serum after a myocardial infarction

    Clin Chem

    (1987)
  • P. Sprent et al.
  • K. Linnet

    Two-stage transformation systems for normalization of reference distributions evaluated

    Clin Chem

    (1987)
  • J.L. Simon et al.

    Resampling: a tool for everyday statistical work

    Chance

    (1991)
  • J.L. Simon et al.

    The new biostatistics of resampling

    MD Comput

    (1995)
  • P. Diaconis et al.

    Computer-intensive methods in statistics

    Sci Am

    (1983)
  • B. Efron et al.

    Statistical data analysis in the computer age

    Science

    (1991)
  • P. Sprent

    Data driven statistical methods

  • B. Efron et al.
  • B. Efron et al.

    Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy

    Stat Sci

    (1986)
  • B. Efron

    Better bootstrap confidence intervals

    J Am Stat Assoc

    (1987)
  • J.G. Booth et al.

    Monte Carlo approximation of bootstrap variances

    Am Stat

    (1998)
  • P. Hall
  • C.E. Lunneborg
  • A.C. Davison et al.
  • International Federation of Clinical Chemistry

    Approved recommendation (1987) on the theory of reference values: Part 5. Statistical treatment of collected reference values. Determination of reference limits

    J Clin Chem Clin Biochem

    (1987)
  • K. Linnet

    Nonparametric estimation of reference intervals by simple and bootstrap-based procedures [Technical brief]

    Clin Chem

    (2000)
  • T. Hesterberg

    Tail-specific linear approximations for efficient bootstrap simulations

    J Comput Graph Stat

    (1995)
  • H.A. David et al.
  • T.C. Hesterberg

    Weighted average importance sampling and defensive mixture distributions

    Technometrics

    (1995)
  • B. Efron

    Nonparametric standard errors and confidence intervals

    Can J Stat

    (1981)
  • Hesterberg T. Bootstrap tilting confidence intervals. http://www.insightful.com/Hesterberg/bootstrap/default.asp...
  • R.G. Miller

    A trustworthy jackknife

    Ann Math Stat

    (1964)
  • R.G. Miller

    The jackknife—a review

    Biometrika

    (1974)
  • J.W. Tukey

    Bias and confidence in not-quite large samples (Abstract)

    Ann Math Stat

    (1958)
  • S. Wasserman et al.

    Bootstrapping: applications to psychophysiology

    Psychophysiology

    (1989)
  • Cited by (0)

    View full text