Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Animal Genomics and Improvement Laboratory » AIP » Software » GENOSIM

genosim Simulates genotypes, breeding values, and phenotypes; simulate sDNA sequence read depth (numbers of A and B alleles); and resolves SNP conflicts between parent and offspring genotypes

Downloads Version 4 program, example files, and executable

Programs pedsim.f90 Extremely simple pedigree program (usually not used because real pedigree is used; see details below)
markersim.f90  
genosim.f90  
geno2seq.f90  
phenosim.f90  
conflict.f90  

Program order Simulate genotypes from SNP chips pedsim.f90 (optional) → markersim.f90 → genosim.f90 → conflict.f90
Simulate DNA sequence read depths pedsim.f90 (optional) → markersim.f90 → genosim.f90 → geno2seq.f90
Simulate phenotypes from either SNP genotypes or simulated DNA sequence pedsim.f90 (optional) → markersim.f90 → genosim.f90 → conflict.f90 (optional for simulated DNA sequence) → findhap.f90 → phenosim.f90
Convert chip genotype data and simulate sequence format geno2seq.f90
Check parentage conflicts, count conflicts by animal and chip, correct Mendelian errors, and fill missing SNPs using parental genotypes where possible conflict.f90

Program files Input/output files Listed at beginning of source code file for each program
chips.txt Used by markersim.f90 and geno2seq.f90; parameter definitions:
chip Sequential number for each chip
reduce1 1 = Contains all simulated markers
2 = Contains every other simulated marker
3 = Contains every 3rd marker
… etc.
offset1 Number of markers to shift from the beginning
reduce2 Can have the same value as reduce1, but if the value is different from that used for reduce1, all the markers picked by reduce1 and reduce2 will be used
offset2 Can have the same value as offset1 or different to pick different markers
depth1 Sequence read depth; if simulating chip data, set the value to 35
error1 Error rate for chip or sequence data (extremely low for chip data)
chomosome.data Check after running markersim to be sure that the marker pattern is as intended
*.options Provides detailed parameter definitions

pedsim.f90 details Input pedsim.options (please read this file for detailed explanations of each parameters)
Output pedigree.file Supplies pedigrees and birth dates (or years) of genotyped animals plus ancestors
genotype.data0 Indicates which individuals are genotyped with which chip
phenotype.data0 Indicates reliability of conventional estimated breeding value (EBV) and parent average (PA) in truncated data
phenotype.later0 Indicates reliability of conventional EBV and PA in final data
Output files can be easily created based on real pedigree, and the format is same as the files in the Example_Output folder. If phenotypes are not simulated, only the first 2 files need to be created base on real pedigree.

Version differences 4 vs. 3 Added geno2seq.f90 to generate DNA sequence read depth (released August 2014)
3 vs. 2 Allowed definition of multiple chips (used 2012–13, but not released)
2 vs. 1 Generated linkage disequilibrium in base population (used 2010–11, but not released)
1 Assumed no linkage disequilibrium in base population (used 2007–09, but not released)

References 2015 VanRaden, P.M., C. Sun, and J.R. O'Connell. Fast imputation using medium- or low-coverage sequence data. BMC Genet. 16:82.
2014 VanRaden, P.M., and C. Sun. Fast imputation using medium- or low-coverage sequence data. Proc. 10th World Congr. Genet. Appl. Livest. Prod., 179.
2013 VanRaden, P.M., D.J. Null, M. Sargolzaei, G.R. Wiggans, M.E. Tooker, J.B. Cole, T.S. Sonstegard, E.E. Connor, M. Winters, J.B.C.H.M. van Kaam, A. Valenti, B.J. Van Doormaal, M.A. Faust, and G.A. Doak. Genomic imputation and evaluation using high-density Holstein genotypes. J. Dairy Sci. 96:668–678.
2011 VanRaden, P.M., J.R. O'Connell, G.R. Wiggans, and K.A. Weigel. Genomic evaluations with many more genotypes. Genet. Sel. Evol. 43:10.
2008 VanRaden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91:4414–4423.

License Fortran package genosim is public domain and was developed with U.S. taxpayer funding. Accurate results are not guaranteed. Please report any bugs to paul.vanraden@usda.gov. You may modify, improve, use, and redistribute the code to anyone for any purpose. Or, you can ask Paul to make changes that could benefit U.S. evaluations and other users.

 Paul VanRaden
 Animal Genomics and Improvement Laboratory
 Agricultural Research Service, USDA

 Chuanyu Sun
 Biostatistics and Bioinformatics
 Neogen Corporation