genosim | Simulates genotypes, breeding values, and phenotypes; simulate sDNA sequence read depth (numbers of A and B alleles); and resolves SNP conflicts between parent and offspring genotypes | ||
Downloads | Version 4 program, example files, and executable | ||
Programs | pedsim.f90 | Extremely simple pedigree program (usually not used because real pedigree is used; see details below) | |
markersim.f90 | |||
genosim.f90 | |||
geno2seq.f90 | |||
phenosim.f90 | |||
conflict.f90 | |||
Program order | Simulate genotypes from SNP chips | pedsim.f90 (optional) → markersim.f90 → genosim.f90 → conflict.f90 | |
Simulate DNA sequence read depths | pedsim.f90 (optional) → markersim.f90 → genosim.f90 → geno2seq.f90 | ||
Simulate phenotypes from either SNP genotypes or simulated DNA sequence | pedsim.f90 (optional) → markersim.f90 → genosim.f90 → conflict.f90 (optional for simulated DNA sequence) → findhap.f90 → phenosim.f90 | ||
Convert chip genotype data and simulate sequence format | geno2seq.f90 | ||
Check parentage conflicts, count conflicts by animal and chip, correct Mendelian errors, and fill missing SNPs using parental genotypes where possible | conflict.f90 | ||
Program files | Input/output files | Listed at beginning of source code file for each program | |
chips.txt | Used by markersim.f90 and geno2seq.f90; parameter definitions: | ||
chip | Sequential number for each chip | ||
reduce1 | 1 = Contains all simulated markers 2 = Contains every other simulated marker 3 = Contains every 3rd marker … etc. |
||
offset1 | Number of markers to shift from the beginning | ||
reduce2 | Can have the same value as reduce1, but if the value is different from that used for reduce1, all the markers picked by reduce1 and reduce2 will be used | ||
offset2 | Can have the same value as offset1 or different to pick different markers | ||
depth1 | Sequence read depth; if simulating chip data, set the value to 35 | ||
error1 | Error rate for chip or sequence data (extremely low for chip data) | ||
chomosome.data | Check after running markersim to be sure that the marker pattern is as intended | ||
*.options | Provides detailed parameter definitions | ||
pedsim.f90 details | Input | pedsim.options (please read this file for detailed explanations of each parameters) | |
Output | pedigree.file | Supplies pedigrees and birth dates (or years) of genotyped animals plus ancestors | |
genotype.data0 | Indicates which individuals are genotyped with which chip | ||
phenotype.data0 | Indicates reliability of conventional estimated breeding value (EBV) and parent average (PA) in truncated data | ||
phenotype.later0 | Indicates reliability of conventional EBV and PA in final data | ||
Output files can be easily created based on real pedigree, and the format is same as the files in the Example_Output folder. If phenotypes are not simulated, only the first 2 files need to be created base on real pedigree. | |||
Version differences | 4 vs. 3 | Added geno2seq.f90 to generate DNA sequence read depth (released August 2014) | |
3 vs. 2 | Allowed definition of multiple chips (used 2012–13, but not released) | ||
2 vs. 1 | Generated linkage disequilibrium in base population (used 2010–11, but not released) | ||
1 | Assumed no linkage disequilibrium in base population (used 2007–09, but not released) | ||
References | 2015 | VanRaden, P.M., C. Sun, and J.R. O'Connell. Fast imputation using medium- or low-coverage sequence data. BMC Genet. 16:82. | |
2014 | VanRaden, P.M., and C. Sun. Fast imputation using medium- or low-coverage sequence data. Proc. 10th World Congr. Genet. Appl. Livest. Prod., 179. | ||
2013 | VanRaden, P.M., D.J. Null, M. Sargolzaei, G.R. Wiggans, M.E. Tooker, J.B. Cole, T.S. Sonstegard, E.E. Connor, M. Winters, J.B.C.H.M. van Kaam, A. Valenti, B.J. Van Doormaal, M.A. Faust, and G.A. Doak. Genomic imputation and evaluation using high-density Holstein genotypes. J. Dairy Sci. 96:668–678. | ||
2011 | VanRaden, P.M., J.R. O'Connell, G.R. Wiggans, and K.A. Weigel. Genomic evaluations with many more genotypes. Genet. Sel. Evol. 43:10. | ||
2008 | VanRaden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91:4414–4423. | ||
License | Fortran package genosim is public domain and was developed with U.S. taxpayer funding. Accurate results are not guaranteed. Please report any bugs to paul.vanraden@usda.gov. You may modify, improve, use, and redistribute the code to anyone for any purpose. Or, you can ask Paul to make changes that could benefit U.S. evaluations and other users. | ||
Paul VanRaden
Animal Genomics and Improvement Laboratory
Agricultural Research Service, USDA
Chuanyu Sun
Biostatistics and Bioinformatics
Neogen Corporation