The bootstrap: A technique for data-driven statistics. Using computer-intensive analyses to explore experimental data

doi:10.1016/j.cccn.2005.04.002

Clinica Chimica Acta

Volume 359, Issues 1–2, September 2005, Pages 1-26

https://doi.org/10.1016/j.cccn.2005.04.002 Get rights and content

Abstract

Background

The concept of resampling data – more commonly referred to as bootstrapping – has been in use for more than three decades. Bootstrapping has considerable theoretical advantages when it is applied to non-Gaussian data. Most of the published literature is concerned with the mathematical aspects of the bootstrap but increasingly this technique is being utilized in medical and other fields.

Methods

I reviewed the published literature following a 1994 publication assessing the transfer of technology, including the bootstrap, to the biomedical literature.

Results

In the ten-year period following that 1994 paper there were 1679 published references to the technique in Medline. In that same time period the following citations were found in the four major medical journals—British Medical Journal (48), JAMA (51), Lancet (52) and the New England Journal of Medicine (45).

Content

I introduce the basic theory of the bootstrap, the jackknife, and permutation tests. The bootstrap is used to estimate the accuracy of an estimator such as the standard error, a confidence interval, or the bias of an estimator. The technique may be useful for analysing smallish expensive-to-collect data sets where prior information is sparse, distributional assumptions are unclear, and where further data may be difficult to acquire. Some of the elementary uses of bootstrapping are illustrated by considering the calculation of confidence intervals such as for reference ranges or for experimental data findings, hypothesis testing such as comparing experimental findings, linear regression, and correlation when studying association and prediction of variables, non-linear regression such as used in immunoassay techniques, and ROC curve processing.

Conclusions

These techniques can supplement current nonparametric statistical methods and should be included, where appropriate, in the armamentarium of data processing methodologies.

Introduction

In a 1994 review Altman and Goodman [1] identified influential statistical articles and the time pattern of their citations in the medical literature. One such article described the bootstrap [2]—the topic of this review. I used an Ovid Technologies Medline keyword search [“bootstrap” or “resampling”] for the period 1995 to 2004 to assess the subsequent pattern of citations in the medical literature and recovered 1679 references.¹ These citations increased year-by-year since 1995 (Fig. 1). I also performed a full-text search (numbers of citations in parenthesis) of research articles in the journals BMJ (48), JAMA (51), Lancet (52), and the New England Journal of Medicine (45) over the same period (due to archive limitations some of these searches were for shorter periods). These findings suggest that bootstrap methods are increasingly being utilised in the medical literature. These techniques have also found wide application in such diverse fields as astronomy, biology, economics, engineering, genetics, molecular biology, and finance [3].

In exploring aspects of the bootstrap in this review I largely used S-Plus, version 6.2 (Insightful Corp.), R, version 1.9 [4], a series of available S-Plus and R libraries [5], [6], [7], Confidence Interval Calculator, version 2 [8], [9], Analyse-it for Microsoft Excel, version 1.72 [10], and CBstat, version 5 [11]. In passing, it is worth noting that it is perfectly possible to implement the bootstrap and jackknife within a spreadsheet program [12], [13], [14] although the random number generator may not be entirely satisfactory (see Random number generators).

All the data used to illustrate this review are typical of the type of data analysed in the practice of clinical chemistry. These data were obtained from Hand et al. [15], Harris and Boyd [16], Beck and Shultz [17], Krzanowski [18], results of proficiency testing of enzyme determinations in Ontario [19], [20], and a study of the rate of removal of lactate dehydrogenase-1 (LD-1) from serum following a myocardial infarction [21].

Section snippets

Parametric and nonparametric statistics

The normal (Gaussian) distribution is characterised by two parameters—the mean and S.D. Statistical methods that assume the Gaussian distribution of data are called parametric. Of course, other probability distributions whose characteristics are defined by one or more parameters can also be analysed by appropriate parametric methods. Nonparametric or distribution-free [22] statistical techniques are used to analyse data that do not assume a particular family of probability distributions. It is

The bootstrap process

What is the bootstrap? Essentially, a set of data is randomly resampled (with replacement, i.e., when an item is sampled it is immediately replaced) multiple times (as many as 10,000 or more times) and statistical conclusions are drawn from this data collection. Excellent elementary accounts of the theory have been provided by Simon and Bruce [24], [25]. More advanced accounts are found in a 1983 Scientific American article by Diaconis and Efron [26] and a 1991 Science article by Efron and

Three bootstrap methods

An early application of the bootstrap was the calculation of confidence intervals of non-Gaussian distributions. By contrast, confidence intervals of Gaussian distributions (or of some other defined distributional framework) were calculated by statistical methods appropriate to the particular distribution being examined. In dealing with confidence intervals of a non-Gaussian univariate population two measurements are of interest—the confidence interval of the median and the confidence interval

The jackknife

The jackknife [44], [45] preceded the concept of the bootstrap The name derives from JW Tukey's suggestion, in an unpublished 1958 manuscript [46], that “The approach … shares two characteristics with a Boy Scout jackknife: (i) wide applicability to many different problems, and (ii) inferiority to special tools for those problems for which special tools have been designed and built” [47].

Consider a data set x = (x₁, x₂, ….., x_n) and an estimator θˆ = s(x). Let x_(i) indicate the data set remaining

The combinatorial algebra of the bootstrap

The combinatorial algebra of the bootstrap is quite different from the usual process of sampling without replacement. The illustrative sample consists of 10 atoms numbered from 1 to 10 (Table 1A). It is evident that resampling with replacement produces a very different sample—some atoms are not retrieved at all while others are retrieved several times, such as atoms 3 and 4 when they are present 2- and 3-fold with B = 1 This set of resampled observations constitute a bootstrap pseudo-sample. When

Random number generators

The bootstrap process depends on the random selection of items from the data set using a random number generator as the basis for the selection of the bootstrap pseudo-sample. Thus in the example shown in Table 1A for B = 1 with a pre-defined seed, the atoms 5, 7, and 8 are not selected at all while atoms 1, 2, 6, 9, and 10 are each selected once. The process of random number generation is fraught with theoretical and practical problems [52], [53] and it is probably safe to suggest that there is

Confidence intervals of an L-statistic (univariate data)

The confidence interval of the mean or the median is often required. For the sample mean the standard nonparametric statistical procedure is:

•
calculate the sample mean and S.D.,
•
calculate the sample S.E.M.,
•
obtain the appropriate value of Student's t for n − 1 degrees of freedom and the confidence interval required [9],
•
calculate the confidence interval x¯ ± (t × S.E.M.).

By contrast, the bootstrap determination is easier and only requires:

•
bootstrap type (data, statistic = (median, appropriate CI), value of

Journal articles (reviews or tutorials, ordered by year of publication):

Mathematical viewpoint [113], [114]. General biological applications [115], [116], [117], [118], [119], [120], [121], [122], [123]. Applications in specific disciplines—psychophysiology [47], calibration in analytical chemistry [124], cost-effective analysis [125], pharmacoeconomic cost analysis [126], reference interval estimation [127], imprecision profiles in biochemical analysis [128], environmental research [129], probabilistic sensitivity analysis [130], screening for early renal failure

Concluding remarks

The examples illustrated in this article merely touch the surface of the potential of the bootstrap and the jackknife but it is evident that these techniques can supplement and extend conventional statistical thinking. Some of the elementary uses of bootstrapping were illustrated by considering the calculation of confidence intervals such as for reference ranges or for experimental data sets, hypothesis testing such as comparing experimental findings, linear regression and correlation when

Acknowledgements

I am grateful to Dr Frank Harrell (Vanderbilt University), Dr Robert Platt (McGill University), Dr Tim Hesterberg (Insightful Corporation), and Elizabeth Atkinson (Mayo Clinic) for their goodwill and patience in constructively responding to my questions, and to the technical support staff at Insightful Corporation for their advice and assistance.

References (144)

I. Meineke
An add-in implementation of the RESAMPLING syntax under Microsoft EXCEL
Comput Methods Programs Biomed
(2000)
B.D. Ripley
Thoughts on pseudorandom number generators
J Comput Appl Math
(1990)
P. Horn et al.
Reference intervals: an update
Clin Chim Acta
(2003)
D. Bamber
The area above the ordinal dominance graph and the area below the receiver operating graph
J Math Psychol
(1975)
D. Altman et al.
Transfer of technology from statistical journals to the biomedical literature. Past trends and future predictions
JAMA
(1994)
B. Efron
Bootstrap methods: another look at the jackknife
Ann Stat
(1979)
M.R. Chernick
Bootstrap methods
R Development Core Team. R: A language and environment for statistical computing. http://www.R-project.org Accessed...
R.R. Wilcox
Harrell F, Alzola C. An Introduction to S and the Hmisc and Design libraries....

Insightful Corporation

D.G. Altman et al.

Confidence interval analysis program

(2000)

Analyze-it for Microsoft Excel, Leeds. UK;...

K. Linnet

CBstat. A program for statistical analysis in clinical biochemistry

(2004)

T.R. Willemain

Bootstrap on a shoestring: resampling using spreadsheets

Am Stat

(1994)

B.F.J. Manly

E.K. Harris et al.

J.R. Beck et al.

The use of relative operating characteristic (ROC) curves in test performance evaluation

Arch Pathol Lab Med

(1986)

W.J. Krzanowski

A.R. Henderson et al.

Is determination of creative kinase-2 after electrophoretic separation accurate?

Clin Chem

(1994)

A.R. Henderson et al.

Proficiency testing of creatine kinase and creatine kinase-2: the experience of the Ontario Laboratory Proficiency Testing Program

Clin Chem

(1998)

D.A. Smith et al.

Determination, by radioimmunoassay, of the mass of lactate dehydrogenase isoenzyme one in human serum and of its rate of removal from serum after a myocardial infarction

Clin Chem

(1987)

P. Sprent et al.

K. Linnet

Two-stage transformation systems for normalization of reference distributions evaluated

Clin Chem

(1987)

J.L. Simon et al.

Resampling: a tool for everyday statistical work

Chance

(1991)

J.L. Simon et al.

The new biostatistics of resampling

MD Comput

(1995)

P. Diaconis et al.

Computer-intensive methods in statistics

Sci Am

(1983)

B. Efron et al.

Statistical data analysis in the computer age

Science

(1991)

P. Sprent

Data driven statistical methods

B. Efron et al.

Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy

Stat Sci

(1986)

B. Efron

Better bootstrap confidence intervals

J Am Stat Assoc

(1987)

J.G. Booth et al.

Monte Carlo approximation of bootstrap variances

Am Stat

(1998)

P. Hall

C.E. Lunneborg

A.C. Davison et al.

International Federation of Clinical Chemistry

Approved recommendation (1987) on the theory of reference values: Part 5. Statistical treatment of collected reference values. Determination of reference limits

J Clin Chem Clin Biochem

(1987)

K. Linnet

Nonparametric estimation of reference intervals by simple and bootstrap-based procedures [Technical brief]

Clin Chem

(2000)

T. Hesterberg

Tail-specific linear approximations for efficient bootstrap simulations

J Comput Graph Stat

(1995)

H.A. David et al.

T.C. Hesterberg

Weighted average importance sampling and defensive mixture distributions

Technometrics

(1995)

B. Efron

Nonparametric standard errors and confidence intervals

Can J Stat

(1981)

Hesterberg T. Bootstrap tilting confidence intervals. http://www.insightful.com/Hesterberg/bootstrap/default.asp...

R.G. Miller

A trustworthy jackknife

Ann Math Stat

(1964)

R.G. Miller

The jackknife—a review

Biometrika

(1974)

J.W. Tukey

Bias and confidence in not-quite large samples (Abstract)

Ann Math Stat

(1958)

S. Wasserman et al.

Bootstrapping: applications to psychophysiology

Psychophysiology

(1989)

Cited by (0)

View full text

ReviewThe bootstrap: A technique for data-driven statistics. Using computer-intensive analyses to explore experimental data

Abstract

Background

Methods

Results

Content

Conclusions

Introduction

Section snippets

Parametric and nonparametric statistics

The bootstrap process

Three bootstrap methods

The jackknife

The combinatorial algebra of the bootstrap

Random number generators

Confidence intervals of an L-statistic (univariate data)

Journal articles (reviews or tutorials, ordered by year of publication):

Concluding remarks

Acknowledgements

Comput Methods Programs Biomed

J Comput Appl Math

Clin Chim Acta

J Math Psychol

Transfer of technology from statistical journals to the biomedical literature. Past trends and future predictions

JAMA

Bootstrap methods: another look at the jackknife

Ann Stat

Bootstrap methods

Confidence interval analysis program

CBstat. A program for statistical analysis in clinical biochemistry

Bootstrap on a shoestring: resampling using spreadsheets

Am Stat

The use of relative operating characteristic (ROC) curves in test performance evaluation

Arch Pathol Lab Med

Is determination of creative kinase-2 after electrophoretic separation accurate?

Clin Chem

Proficiency testing of creatine kinase and creatine kinase-2: the experience of the Ontario Laboratory Proficiency Testing Program

Clin Chem

Determination, by radioimmunoassay, of the mass of lactate dehydrogenase isoenzyme one in human serum and of its rate of removal from serum after a myocardial infarction

Clin Chem

Two-stage transformation systems for normalization of reference distributions evaluated

Clin Chem

Resampling: a tool for everyday statistical work

Chance

The new biostatistics of resampling

MD Comput

Computer-intensive methods in statistics

Sci Am

Statistical data analysis in the computer age

Science

Data driven statistical methods

Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy

Stat Sci

Better bootstrap confidence intervals

J Am Stat Assoc

Monte Carlo approximation of bootstrap variances

Am Stat

Approved recommendation (1987) on the theory of reference values: Part 5. Statistical treatment of collected reference values. Determination of reference limits

J Clin Chem Clin Biochem

Nonparametric estimation of reference intervals by simple and bootstrap-based procedures [Technical brief]

Clin Chem

Tail-specific linear approximations for efficient bootstrap simulations

J Comput Graph Stat

Weighted average importance sampling and defensive mixture distributions

Technometrics

Nonparametric standard errors and confidence intervals

Can J Stat

A trustworthy jackknife

Ann Math Stat

The jackknife—a review

Biometrika

Bias and confidence in not-quite large samples (Abstract)

Ann Math Stat

Bootstrapping: applications to psychophysiology

Psychophysiology

Review
The bootstrap: A technique for data-driven statistics. Using computer-intensive analyses to explore experimental data