What’s the Problem?
So what are the issues with mixture interpretation? The main issue revolves around the use of the Combined Probability of Inclusion (CPI) statistic. While the CPI calculation is not problematic in and of itself, it is the way many labs were applying this statistic to low-level data that became problematic. CPI is designed to give the user an answer to the question “given this set of DNA types at these DNA locations, what is the probability that another, unrelated, individual other than the person of interest, could also be a contributor to the mixture”. In other words, if I went out and randomly selected a set number of individuals, how many individuals would I need to test until I found another person who had DNA types in their profile that are present in the mixture found on the given item of evidence.
If all of the possible DNA types are present at a significant level, and there are no indications of additional DNA types below the lab’s reporting threshold, then there is no issue with the CPI statistic. But, many samples can yield complex mixture profiles originating from very minimal amounts of DNA.
If low level DNA types are present, this is a sign the sample may be suffering from stochastic effects – basically, random fluctuations that can occur during the copying step of the process – and this means that there could be missing DNA types. If not all of the data is present, then not all of the genotypes are represented – you can find out more about genotyping at LifeNet Health for example. If not all of the genotypes are represented, the CPI statistic is not valid. Dr. John Butler from NIST, author of numerous DNA textbooks, including what is generally referred to as the “DNA Bible” by lab analysts, has stated repeatedly in his talks, books, and other presentations that the CPI statistic cannot handle dropout and therefore should not be used in unrestricted CPI calculations.
In an effort to help ensure that only the loci where all alleles are present are actually used in the statistical calculations, SWGDAM, in 2010, recommended the use of two threshold, the analytical threshold and the stochastic threshold. The analytical threshold is the height that a possible DNA type much reach before the lab considers the peak to be a “true” DNA peak, and not just noise or some sort of amplification artifact. The stochastic threshold has been defined by SWGDAM as, “the peak height value above which it is reasonable to assume that, at a given locus, allelic dropout of a sister allele has not occurred”. This basically means that the lab must set a stochastic threshold, based upon the validation of the particular instrument and amplification kit in use, that provides them with a level of confidence such that they can be certain that drop-out of data (alleles) has not occurred due to low levels of input material.