May 11, 2018

AgSeq

AgSeq is an exclusive first-of-its-kind academic genotyping pipeline which combines Illumina’s latest sequencing technology, PerkinElmer’s automation in library preparation, and proprietary sample bar-coding system, with capability to provide low cost genotyping-by-sequence using low coverage whole genome sequencing.

Through economies of scale, process optimization, robotics and new biochemistry, AgSeq will be able to slash the cost per sample by 600 percent for large-scale DNA genotyping projects.

SkimGBS, Skim-based Genotyping by Sequencing

Genotyping by sequencing (GBS) or RADseq has been popular in the past several years, particularly for its capability of genotyping a large population with relatively low cost. The GBS/RADseq was developed as a reduced‐representation sequencing (RRS) approach using restriction enzymes to decrease genome complexity before sequencing. This means only the restriction sites associated DNA will be sequenced, which is <10% of the genome in most of the cases.

Due to the advances of NGS technology, the sequencing cost has been tremendously reduced, which makes the high-throughput genotyping with whole genome sequencing economically feasible. For example, Huang et al. [5] performed whole-genome sequencing to re-sequence a total of 150 recombinant inbred lines (RILs) developed from a cross between Oryza sativa ssp. indica and japonica. The recombinant lines were sequenced to an average coverage of 0.02×, identifying a total of 1,493,461 SNPs. Philippe et al. [2] reported the development of the skim-based genotyping by sequencing (skimGBS), which uses low-coverage whole genome sequencing to perform high-throughput genotyping, to characterize the distribution of crossover and non-crossover recombination in Brassica napus and Cicer arietinum. As an example in [1], to perform a de-novo SNP discovery, the authors used parental samples, sequenced with HiSeq 2000, 100bp paired reads, on libraries of 500 bp insert size, sequenced to 30X coverage. The progeny samples were sequenced with 1X coverage in average. Parental reads were aligned to a reference genome, and SNPs were discovered based only on the reads (not on the reference). The resulting SNPs list is used for genotyping of the progeny reads. Where there were no reads, for a progeny sample, at a SNP location, the genotype was called based on a imputation technique, which uses the haplotype structure of the parents

SkimGBS, the sequencing of multiple individuals at low coverage combined with imputation of missing data [1,2]. It can include whole genome resequensing, as in [8], or reduced representation by use of restriction enzimes [1] SkimGBS, comparing to traditional GBS, decreases the number of steps and the cost of library preparation, reduces the complexity of downstream bioinformatics analysis, and potentially eliminates biases stemming from the use of restriction enzymes, but may require more sequence data than reduced representation methods. In SkimGBS, due to the low coverage sequencing, often only a portion of markers for each individual will be genotyped, which makes imputation a crucial step in the data analysis. The choice of imputation algorithm requires careful consideration and needs to be balanced with the data volume and the desired resolution of genotyping. In its application spotlight on agrigenomics white paper, Illumina [4] describes SkimSeq as effective whole genome sequencing method for SNP discovery.

SkimSeq VS Traditional GBS

In Scheben et al. [3], the authors describe the advantages of SkimGBS. Some of the pros and cons, compared to traditional GBS, are listed below:

Pros Cons
  • Lower cost of library preparation with a decreased number of steps;
  • Reduced complexity of downstream analysis;
  • Eliminate biases caused by the restriction enzymes;
  • Whole genome coverage
  • Many more markers discovered, thus cost per-marker is lower
  • Lower cost per marker (SNP)
  • Higher SNP discovery rate
  • Imputation of missing genotypes becomes crucial
  • Harder to work with if no reference genome available
  • Higher cost per sample
  • Higher complexity in analysis

Some considerations

  • As noted in [3] this method is preferable for Biparental cross. In [8] the authors compare SkimSeq for whole genome resequencing, compared to reduced representation sequencing.
  • SNPs can be determined from an existing SNP list for the population, or from the parent samples. In the second case, the parent samples are mapped to the reference genome, and from them, a list of SNP is determined from them. A high coverage is necessary for a good detection of SNP from the parents [2]. If a reference sequence is not available, it can be generated from the sequencing reads.
  • For Genotyping, progeny reads, sequenced at lower resolution, are mapped to the reference genome, and alleles are called based on comparison with the parental SNP information [2,3].
  • For the library preparation, Huang et al. [5] used a sliding window approach for genotyping, before the term SkimSeq was coined. describing the library preparation in detail.
  • For imputation, Pausch et al. [6] describe and compare two imputation methods for whole genome. They are FImpute, which uses family and population based information to infer the missing genotypes, and Minimac, which uses previously phased genotypes, and does not consider pedigree information, and displayed higher accuracy of imputation.
  • In [7] the authors describe the scripts and a manual for the SkimGBS genotyping by sequencing pipeline published in Bayer et al. 2015 [2].

 

Our Equipment

  • JANUS® G3 NGS Express™ Workstation – a benchtop platform designed for low-to-moderate throughput NGS library construction.
  • Sciclone® G3 NGS Workstation – a complete benchtop solution for the automated construction of up to 96 libraries per day.

   
 
 

References

[1] Agnieszka A. Golicz , Philipp E. Bayer , and David Edwards, “Skim Based Genotyping by Sequencing“, in Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245, DOI 10.1007/978-1-4939-1966-6_19, © Springer Science+Business Media New York 2015

[2] Philipp E. Bayer, Pradeep Ruperao, Annaliese Mason, Jiri Stiller, Chon-Kit Kenneth Chan, Satomi Hayashi, Yan Long, Jinling Meng, Tim Sutton, Paul Visendi, Rajeev K. Varshney, Jacqueline Batley, David Edwards, ” High‑resolution skim genotyping by sequencing reveals the distribution of crossovers and gene conversions in Cicer arietinum and Brassica napus“, Theor Appl Genet. 2015 Jun;128(6):1039-47. doi: 10.1007/s00122-015-2488-y. Epub 2015 Mar 10.

[3] Armin Scheben, Jacqueline Batley and David Edwards, “Genotyping-by-sequencing approaches to characterize crop genomes: choosing the right tool for the right application“, Plant Biotechnology Journal (2017) 15, pp. 149–161

[4] Illumina, “Sequence-Based Genotyping Brings Agrigenomics to a Crossroads“, Application Spotlight: Agrigenomics, Illumina, 2014.

[5] Huang X, Feng Q, Quian Q, et al. “High-throughput genotyping by whole-genome resequencing“. Genome Res. 2009;19:1068-1076

[6] Hubert Pausch et al, “Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle“, Pausch et al. Genet Sel Evol (2017) 49:24

[7] http://www.appliedbioinformatics.com.au/index.php/SkimGBS

[8] Scheben A., Batley J., Edwards D. (2018), “Revolution in Genotyping Platforms for Crop Improvement“. In: . Advances in Biochemical Engineering/Biotechnology. Springer, Berlin, Heidelberg


Follow us in Twitter.