Skip to main content
ARS Home » Southeast Area » Stuttgart, Arkansas » Harry K. Dupree Stuttgart National Aquaculture Research Cntr » Research » Publications at this Location » Publication #278134

Title: A bayesian cross-validation approach to evaluate genetic baselines and forecast the necessary number of informative single nucleotide polymorphisms

Author
item GARVIN, MICHAEL - University Of Alaska
item MASUDA, MICHELE - National Oceanic & Atmospheric Administration (NOAA)
item PELLA, JERRY - National Oceanic & Atmospheric Administration (NOAA)
item Fuller, Adam
item RILEY, RACHEL - University Of Alaska
item WILMONT, RICHARD - National Oceanic & Atmospheric Administration (NOAA)
item BRYKOV, VLAD - Russian Academy Of Sciences
item GHARRETT, ANTHONY - University Of Alaska

Submitted to: National Oceanic and Atmospheric Administration
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/1/2014
Publication Date: 11/1/2014
Citation: Garvin, M., Masuda, M., Pella, J., Fuller, S.A., Riley, R., Wilmont, R., Brykov, V., Gharrett, A.J. 2014. A bayesian cross-validation approach to evaluate genetic baselines and forecast the necessary number of informative single nucleotide polymorphisms. National Oceanic and Atmospheric Administration Technical Memorandum NMFS-AFSC-283. p. 59.

Interpretive Summary: Mixed stock analysis (MSA) is a powerful tool used in the management and conservation of numerous species. The purpose is to estimate the sources of contributions in a mixture of populations of a species, and to estimate the likelihood that individuals originated from a particular area. Considerable effort is now underway to create genetic baselines to do this, particularly for Pacific salmon. Previous methods to evaluate baselines were optimistic and tended to overestimate a population, and newer more realistic methods have been developed to account for this. Here we developed a method called “leave ten percent out cross validation” (LTO), which avoids overestimation, accepts multiple types of genetic data, and uses advanced statistical methods for calculations. We applied this new method to our combined genetic marker baseline for chum salmon (Oncorhynchus keta) to describe its current ability and applications to fishery management.

Technical Abstract: Mixed stock analysis (MSA) is a powerful tool used in the management and conservation of numerous species. Its function is to estimate the sources of contributions in a mixture of populations of a species, as well as to estimate the probabilities that individuals originated at a source. Considerable effort is now underway to create genetic baselines to do this, particularly for Pacific salmon, and most notably with single nucleotide polymorphism (SNP) loci. Robust analyses of available genetic baselines are necessary to predict their performance in future MSA applications. In addition, estimates of the number of informative SNPs necessary to correctly assign individuals to their sources at a specified high probability should guide baseline development. Previous methods to evaluate baselines were optimistic, and newer more realistic methods have been developed. However, the newer methods do not accommodate haploid data, which can be informative, and they are based on maximum likelihood estimation methods when Bayesian estimation methods may be preferable. Here we developed a method called “leave ten percent out cross validation” (LTO), which avoids optimism, accepts haploid and diploid data, and uses Bayesian or maximum likelihood estimation methods. We applied LTO to our combined SNP/mSat baseline for chum salmon (Oncorhynchus keta) to describe its current ability in applications to fishery management, and we used a logistic regression analysis based on performance data from a simulation to estimate the number of SNPs necessary to achieve at least 90% correct assignment of individuals to source populations.