In ecology, the method is generally called the Petersen method because of Petersen's work in
1894 associated with tagged fish, though its first use in fisheries was by Dahl in 1917. It was also
used by Lincoln in 1930 to estimate the size of a duck population [Le Cren(1)]. Sekar and Deming (2) used the method
to estimate birth and death rates, and the extent of registration in 1949. Their paper may be
regarded as the first serious application of the capture-recapture method to human health and has
a good discussion on some of the practical problems associated with the method. Using a similar
approach, Shapiro (3) applied the technique to birth registration in the
USA using census data. There is also a substantial literature going back to the 1940's (Tracey (4)), under the title of dual record systems or dual-system estimator,
dealing with the application of the two sample method to census data. By taking another sample
in addition to the census, the capture-recapture method can be used for estimating undercount by
the census. The method and controversy currently surrounding its application to the US census
are described by Hogan (5). A helpful bibliography of the literature
relating to this problem is given by Fienberg (6).
The above method can, in principle, be applied to any situation where there are two incomplete
lists. One simply replaces "being caught in sample i" by "being on list i". This is the case in
epidemiology where lists can be constructed from a variety of sources such as hospital records,
doctors' medical files, medical prescriptions and so on. By their very nature these lists are
incomplete and the problem is to estimate those missing from both lists. In spite of the above
early work, such applications to epidemiology came later with Wittes and her colleagues (7,8) pointing out the connections.
With regard to applying the assumptions to epidemiology, the experiment can generally be set up
so that (i) is at least approximately true. For assumption (ii), matching will depend on the quality
of the patients' records and the uniqueness of the patients' code names. In some parts of the
world matching is a real problem. Unfortunately assumption (iii), that each individual has the
same probability of being on a given list, is generally false, that is patients tend to be
heterogeneous with regard to being "caught" on a list. Some methods for minimizing
heterogeneity are described later. However, even if something could be done about this,
assumption (iv) is invariably false. For example, if certain doctors refer their patients to certain
hospitals, then hospital admissions and doctors' records will not give two independent lists. This
question of dependence is discussed in detail by Sekar and Deming (2)
and Wolter (9). One can think of decomposing assumption (iii) into
two parts -- dependence and heterogeneity of capture probabilities. For human populations, the
latter component has been considered only recently (10,11) although
those working in ecology and other areas had done so earlier.
In animal population studies, the 2-sample method was extended to the K-sample method. By
taking more than two samples one can utilize the information from the multiple recaptures. The
unmarked animals in each sample are now given individual marks before being returned to the
population. If one uses individual (e.g. numbered) marks then the capture history of each marked
individual is known.
The first person to introduce the K-sample capture-recapture method was Schnabel in 1938(12), in the context of fishing in a lake. She made the usual assumptions about the sampling and the marking processes such as each sample is a simple random sample and animals do not lose their tags. The theory of this model was developed more fully by Chapman, Darroch and others in the 1950's (13, chapter 4). However it was recognized that some of the underlying assumptions may not hold. For example there was the problem of heterogeneity - unmarked animals had different probabilities of being captured in a given sample, and marked animals behaved differently from unmarked. To cater for populations with these problems, a range of different models was introduced in the 1970's and these are associated with the names of Anderson, Burnham, Otis, White and others (see the review by Seber (14), p.275). These models have since been added to by Chao so that a hierarchy of eight models is now available (see the reviews by Pollock (15); Seber(16, pp.141-3)).
The K-sample method had also been applied to populations that allow migration, birth, and death to take place during the period of the study (the open population). There is a very extensive and expanding literature on the subject (17,18). However, such models depend on the assumption that samples are independent. As this is not the case with lists, it is unlikely that these general models will be directly useful in epidemiology.
Another method for handling the breakdown of the assumptions is the log-linear model which was
applied by Fienberg (19) to capture-recapture data. In fact, a general
log-linear framework allows for the representation and incorporation of most of these models for
K lists, as well as some extensions for the generalization from closed to open populations (20).
Clearly the above methodology has the potential for being applied to K lists. Unfortunately we
run into the same problem again, namely that of list dependence. Current thinking would suggest
that of all the above approaches only the log-linear model has the flexibility for handling this
particular problem. However, such a model has to be used with caution as one still needs some
assumptions to hold for the model to be useful (see Appendix for detail).
2. Sekar C and Deming EW. On a method of estimating birth and death rates and extent of
registration. Journal of the American Statistical Association 1949;44:101-115.
3. Shapiro S. Estimating birth registration completeness. J Amer Stat Assoc
1949;45:261-264.
4. Tracy WR. Fertility of the population of Canada. Reprinted from Seventh Census of
Canada, 1931, (Vol 2), Census Monograph No. 3. Ottawa:Cloutier.
5. Hogan H. The 1990 post-enumeration survey: operations and results. J Amer Stat
Assoc 1993;88:1047-1060.
6. Fienberg SE. Bibliography on capture-recapture modeling with application to census
undercount adjustment. Survey Methodology 1992;18:143-154.
7. Wittes J and Sidel VW. A generalization of the simple capture-recapture model with
applications to epidemiological research. J Chronic Dis 1968;21:287-301.
9. Wolter KM. Some coverage error models for census data. J Am Stat Assoc
1986;81:338-46.
10. Hook EB, Regal RR. Effect of variation in probability of ascertainment by sources ("variable
catchability") upon "capture-recapture" estimates of prevalence. Am J Epidemiol
1993;137:1148-66.
12. Schnabel ZE. The estimation of the total fish population of a lake. Amer Math Mon
1938;45:348-52.
13. Seber GAF. The estimation of animal abundance and related parameters, 2nd edit.
London:Griffin 1982.
14. Seber GAF. A review of estimating animal abundance. Biometrics 1986;42:267-292.
15. Pollock KH. Modeling capture, recapture and removal statistics for estimation of
demographic parameters for fish and wildlife populations: past, present and future. J. Amer
Stat Assoc 1991;86:225-238.
16. Seber GAF. A review of estimating animal abundance II. International Statistical
Review 1992;60:129-166.
17. Pollock KH. Modeling capture, recapture and removal statistics for estimation of
demographic parameters for fish and wildlife populations: past, present and future. J Am Stat
Assoc 1991;86:225-38.
18. Seber GAF. A review of estimating animal abundance II. Int Stat Rev>
1992;60:129-66.
19. Fienberg SE. The multiple recapture census for closed populations and incomplete 2k
contingency tables. Biometrika 1972;59:591-603.
20. Cormack RM. Log-linear models for capture-recapture experiments on open populations. In:
Hiorns RW, Cooke D, eds., The mathematical theory of the dynamics of biological
populations II. London: Academic Press, 1981.