Capture-Recapture-Mark using DNA genotyping: Challenges and State of the Art
This is one of those topics that I would consider “old but gold”. I recently had to write about it for one of my classes. So I thought sharing the information I compiled with you too :)
One of the most popular techniques when scientists are looking for quantitative (although as we will see, it ends up being semiquantitative or quasi-quantitative due to technical constraints) population studies is Capture-Mark-Recapture which has two main variants: traditional CMR, and Model-Based CMR, commonly used along genotyping. So, I decided to explore further into the later, because I noticed that even though it’s become very common practice among monitoring programs, is still having many shortcomings, so I wanted to understand the source of these and how researchers are coping with them in the recent years.
Genetic Capture-Mark-Recapture (CMR), as said before is now a routine procedure in many programs of wildlife conservation. Some of the most common applications of this technique are: estimates of population size (abundance), sex ratio, survival, migrations patterns, fecundity, population growth and of course the genotypes can be used for paternity tests (Lampa et al., 2015; Lucaks and Burnham., 2005).
The general steps of the method imply processing non-invasively collected samples (e.g. faeces, hair, feathers, saliva, etc.) to extract DNA and genotype it at multiple loci (e.g. microsatellites). This multilocus genotype is then considered a molecular individual mark, just as fingerprints in forensic medicine. After obtaining a number of samples, matching genotypes are considered to belong to the same individual and are classified as recaptures. While non-matching genotypes should indicate newly captured animals, but unfortunately that is not always as straight forward.
So, even nowadays, there are still challenges to overcome when using genotyping-CMR. Genotyping errors can be misleading, causing to erroneously assign a sample to the wrong individual. Also can create “false individuals” by just a single loci being mistyped.
Samples can also be confounded if they contain a mixture of DNA from more than one individual. It can also be an intrinsic bias, as in human forensics some individuals had been identified and termed “shedders” which tend to leave more DNA traces than others (“nonshedders”) (Lucaks and Burnham, 2005).
The problems faced in the genetic analysis itself, of noninvasive samples can arise from very different sources and steps. Some of the most commonly referenced are: amplification failure and allelic dropout (Lukacs and Burnham, 2005). Amplification failure can be caused by a lack of the target sequence (due to degraded DNA) or by PCR inhibitors present in the sample. While allelic dropout, is by definition the loss of one allele during polymerase chain reaction (PCR) amplification of DNA (Steyer, K, et al. 2016).
Challenges in constructing capturing histories from multilocus genotypes may also involve inflated mark-recapture abundance estimates. Additionally, while increasing the number of analyzed loci improves the discriminating power of the test, larger panels also increase the probability of additional genotyping error events (Sethi, et al. 2016).
Possible solutions to overcome genotyping errors
Approaches to deal with genotyping most common sources of error have been proposed, which can be grouped into the following categories: (i) removing genotyping errors from the data, (ii) develop robust genotyping errors sampling protocols, but this is not as straight forward as it seems (Lampa et al., 2015; Sethi, 2016).
There are some tests that can provide useful information about the quality of the genotypes. One of these is the examining-bimodality (EB) test, that searches for an over-abundance of genotypes observed only once. Second one is the Difference in Capture History (DCH) which examines the rate at which new individuals are recognized by looking at more loci (McKelvey and Schwartz, 2004). Although the two tests assume equal capture probability among individuals (which seems to be one of the assumptions that researchers can’t shake off when using CMR).
Since the type of errors mainly depend on the stage of the process we look at, (i.e. sampling, genotyping and in most cases, estimating population size) it makes sense to approach the solutions on the same basis. For instance, when collecting the DNA samples it is very important to chose a sampling design appropriate for the ecology and other details of target species and the assumptions of the selected CMR model (e.g. closed populations, capture probabilities, etc.). As simple as this step may seem, neglecting it would lead to unsuccessful downstream genotyping and population analysis. Other relevant considerations on early stages are the careful sample handling, preservation and extraction methods, given the fact that samples normally contain contaminants, PCR inhibitors and poor quality genetic material (Lampa et al., 2013).
As seen overestimation or underestimation of populations size are very sensitive to even slight genotyping errors. One recommendation is to look for the best balance of the number of typed loci to minimize the probability of identity without increasing too much the genotyping effort and thus, the error window (Lampa et al., 2013; McKelvey and Schwartz, 2004).
There are attempts harnessing bioinformatic tools for simulations and exploring options to select the number of markers, allelic richness and genotyping error rates which are acceptable for efficient CMR histories. They tested error-tolerant likelihood-based match, combining probabilities of latent genotypes and probabilities of observed genotypes, which may contain genotyping errors. Since they compared microsatellite based makers and Single-Nucleotide Polymorphisms (SNPs, which are another option for genotyping not so commonly explored in CMR), they found for instance that SNPs are computationally more efficient during the match calling and sample clustering. They suggest that SNPs may overtake microsatellite approaches in the future (Sethi et al., 2016).
Choosing the best models to fit the data obtained by microsatellite genotyping is also very challenging, given that one must take many assumptions (closed population, equal capture probabilities). The general recommendation is to prefer a model accounting for error. For instance, a recently published paper, combined the non-invasive CMR with Spatially Explicit Capture Recapture approach using wolves feces. They applied a Poisson observation model with a single survey, and the null model M0. To overcome the problem with the assumptions that all individuals are uniformly and independently distributed, they modeled each collection site “centroid” of the cell, as count detectors based on the assumption that the same wolf can be detected at multiple cells during the sampling and more than one individual can be detected in the same detector (López et al., 2018).
Almost all the steps can be analyzed using decision diagrams of assumptions and filling in with the specific features of the study, jus as exemplified by Lampa et al., 2013 and Lampa et al., 2015. A summary of the general workflow as recommended by Lampa et al., 2013 can be found above, although this and other similar setups must only be considered as guidance.
The most recent advances in high-throughput Next Generation Sequencing may help solve some of the genotyping wicked shortcomings by allowing the possibility of sequencing single molecules without previous amplification steps (Lampa, et al., 2013), however I was surprised to found no papers looking specifically into this, so it seems this option still needs to be explored.
In general it seems that is always highly advisable running pilot studies to define the most suitable approach and model prior conducting the actual non-invasive CMR. These pilot studies and good planning overall, can be lifesavers, since it can prevent other sources of bias (inappropriate sample size, wrong type of sample and/or sampling sites, etc.).
As today, although it is very clear that we haven’t reach the point in which genotyped-CMR data can be considered 100 % reliable, as new techniques continue to develop is very likely that in the near future more robust tools will become available.
López-Bao JV, Godinho R, Pacheco C, Lema FJ, García E, Llaneza L, Palacios V, Jiménez J (2018), Towards reliable population estimates of wolves by combining spatial capture-recapture models and non-invasive DNA-monitoring. Nature Scientific Reports 8:2177.
Lukacs MP and Burnham PK (2005), Review of capture–recapture methods applicable to noninvasive genetic sampling. Molecular Ecology 14, 3909–3919
Lampa S, Henle K, Klenke R, Hoehn M, Gruber B (2013). How to Overcome Genotyping Errors
in Non-Invasive Genetic Mark-Recapture Population Size Estimation—A Review of Available Methods Illustrated by a Case Study. The Journal of Wildlife Management, DOI: 10.1002/jwmg.604.
Lampa S, Mihoub J-B, Gruber B, Klenke R, Henle K (2015) Non-Invasive Genetic Mark- Recapture as a Means to Study Population Sizes and Marking Behaviour of the Elusive Eurasian Otter (Lutra lutra). PLoS ONE 10(5): e0125684. doi:10.1371/journal.pone.0125684
McKelvey, KS, Schwartz M. K. (2004), Providing Reliable and Accurate Genetic Capture-Mark-Recapture estimates in a cost-effective way. The Journal of Wildlife Management, 68: 453–456. doi 10.2193/0022-541X(2004)068[0453:PRAAGC]2.0.CO;2
Sethi SA, Linden D, Wenburg J, Lewis C, Lemons P, Fuller A, Hare MP (2016) Accurate recapture identification for genetic mark–recapture studies with error-tolerant likelihood-based match calling and sample clustering. R. Soc. open sci. 3: 160457. http://dx.doi.org/10.1098/rsos.160457
Steyer, K., Kraus, R.H.S., Mölich, T. et al. t (2016) Large-scale genetic census of an elusive carnivore, the European wildcat (Felis s. silvestris), Conserv Gene 17: 1183. https://doi.org/10.1007/s10592-016-0853-2