findhap.f90 | Find haplotypes and impute genotypes using multiple chip sets and sequence data | ||
Downloads | Version 4 program, example files, and executable (beta version — not quite ready for routine use on U.S. chip data, but performs better than version 3 for sequence data)
|
||
Version 3 program, example files, and executable |
|||
Version 2 program, example files, and executable (not maintained) |
|||
Inputs | genotypes.txt | Format: animal# chip# #SNPs genotypes Sort by animal#, genotype codes are 0,1,2, and 5 = missing For fixed length input, set chip# to 1 and missing genotypes to 5 For variable length input, #SNPs and order must match chromosome.data |
|
chromosome.data | List of all SNPs used and which SNPs are on each chip Sort by chromosome number and position within chromosome X-specific chromosome last, after pseudo-autosomal "chromosome" Y-specific SNPs not supported yet |
||
pedigree.file | Format: sex animal# sire# dam# birthdate animal ID animal name Sort in ascending birth date order |
||
findhap.options | Program control file with user-defined options | ||
sequences.readdepth (version 4 only) |
Format: animal# chip# #SNPs Read counts for A and B alleles stored in 1-byte hexadecimal format |
||
Outputs | hap.list | List of all haplotypes found in each segment0 | |
hap.found | Each animal's paternal and maternal haplotypes (2 lines/animal) | ||
hap.inherit | Tracks inheritance and crossovers for each parental chromosome | ||
hap.filled | Summarizes imputation quality for each animal | ||
cross.overs | Lists exact location of all detected crossovers | ||
allele.frequency | Estimated allele frequencies and missing rates for each SNP | ||
genotypes.filled | Imputed genotypes with codes: 0 = BB, 1 = AB, 2 = AA, 3 = B_, 4 = A_, 5 = __ Number of animals output may exceed input because of imputed dams Remaining missing alleles in codes 3, 4, and 5 can be set using allele frequencies |
||
haplotypes.txt | Imputed haplotypes: SNP1 paternal maternal, SNP2 pat mat, etc., for each animal No missing alleles, allowing genotypes to be formed simply as (pat + mat - 2) |
||
Version differences | 4 vs. 3 | Can input numbers of A and B allele reads from sequence data Increased memory and CPU because of likelihood ratio calculations |
|
3 vs. 2 | Computing time reduced by using priors or imputing only new animals Files hap.list and hap.found output multiple lengths to use as priors Options file includes damout, listout, and errate for outputting imputed parents, outputting all steps or only the final step, and allowing error within haplotypes Option genout can output only best call (0,1,2) or just missing (0,1,2,5) in genotypes.filled |
||
2 vs. 1 | Options file uses maxlen, minlen, and steps to divide long segment into shorter segments Computing time increases by number of steps used to get from maxlen to minlen Population and pedigree haplotyping in one loop vs. 2 separate loops Searches for great-grandparent haplotypes, not just genotyped parents and grandparents Higher accuracy and/or fewer high-density genotypes required |
||
References | 2015 | VanRaden, P.M., C. Sun, and J.R. O'Connell. Fast imputation using medium- or low-coverage sequence data. BMC Genet. 16:82. | |
2014 | VanRaden, P.M., and C. Sun. Fast imputation using medium- or low-coverage sequence data. Proc. 10th World Congr. Genet. Appl. Livest. Prod., 179. | ||
2013 | VanRaden, P.M., D.J. Null, M. Sargolzaei, G.R. Wiggans, M.E. Tooker, J.B. Cole, T.S. Sonstegard, E.E. Connor, M. Winters, J.B.C.H.M. van Kaam, A. Valenti, B.J. Van Doormaal, M.A. Faust, and G.A. Doak. Genomic imputation and evaluation using high-density Holstein genotypes. J. Dairy Sci. 96:668–678. | ||
2011 | VanRaden, P.M., J.R. O'Connell, G.R. Wiggans, and K.A. Weigel. Genomic evaluations with many more genotypes. Genet. Sel. Evol. 43:10. | ||
2010 | VanRaden, P.M. Genomic evaluations with many more genotypes and phenotypes. Proc. 9th World Congr. Genet. Appl. Livest. Prod., Leipzig, Germany, Aug. 1–6, Comm. 27. VanRaden, P.M., J.R. O'Connell, G.R. Wiggans, and K.A. Weigel. Combining different marker densities in genomic evaluation. Interbull Bull. 42:113–118. |
||
License | Fortran package findhap.f90 is public domain and was developed with U.S. taxpayer funding. Accurate results are not guaranteed. Please report any bugs to paul.vanraden@usda.gov. You may modify, improve, use, and redistribute the code to anyone for any purpose. Or, you can ask Paul to make changes that could benefit U.S. evaluations and other users. | ||
Paul VanRaden
Animal Genomics and Improvement Laboratory
Agricultural Research Service, USDA