TriTyper

Lude Franke,1,3 Carolien G.F. de Kovel,1 Yurii S. Aulchenko,2 Gosia Trynka,3 Alexandra Zhernakova,1 Karen A. Hunt,4 Hylke M. Blauw,5 Leonard H. van den Berg,5 Roel Ophoff,1,6 Panagiotis Deloukas,7 David A. van Heel,4 and Cisca Wijmenga1,3*

1 Complex Genetics Section, DBG-Department of Medical Genetics, University Medical Centre Utrecht, 3584 CG Utrecht, The Netherlands
2 Department of Epidemiology & Biostatistics, Erasmus MC Rotterdam, 3000 CA Rotterdam, The Netherlands
3 Genetics Department, University Medical Centre Groningen and University of Groningen, 9700 RB Groningen, The Netherlands
4 Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, London, E1 2AT, UK
5 Department of Neurology, Rudolf Magnus Institute of Neuroscience, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands
6 Center for Neurobehavioral Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
7 Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK 

* Corresponding author Cisca Wijmenga

Introduction:

TriTyper is software for Tri-allelic SNP calling. Please note: the results accompanying the paper are located on http://www.genenetwork.nl/trityper/indextt.php

TriTyper is composed out of three programs: TriTyper Importer imports raw intensity data and stores this in a binary data format. These files are required by TriTyper Discoverer, which can discover triallelic SNPs based on raw intensity data, and TriTyper Imputer, which uses LD information for 1,204 triallelic SNPs to infer triallelic genotypes, if only biallelic genotype calls are available.

If you want to get familiar with TriTyper, it is recommended you download the example data (47 samples, typed for 317,503 SNPs using Illumina HumanHap300 arrays). Once you have extracted this Zip file you can familiarize yourself with the data formats we have used. Additionally, you can use these files to assess how importing works and how triallelic SNP discovery and imputation perform.

TriTyper Importer: Importing genotype data

TriTyper uses a structured binary data format in order to increase performance and save on space. As such it is essential to import flat genotype reports and convert these to the TriTyper format. Please download TriTyper Importer to accomplish this. After you unzip this file, you will find a jar file ‘TriTyperImporter.jar’. This is a command line program you can invoke by typing:

java -Xmx512m -jar TriTyperImporter.jar

Please follow the subsequent instructions of the program carefully. The most easy way of generating a compatible input file is by using Illumina’s BeadStation export wizard: Generate a “Final Report” and ensure you include the Sample ID, SNP Name, Allele1 – Top, Allele2 – Top, R and Theta fields. On default it is assumed this file is tab-delimited. This program requires at most 256 Mb, which makes it mandatory to provide the option: ‘-Xmx512m’ to ensure enough memory will be allocated.

To improve the speed of TriTyper Discoverer and TriTyper Imputer, it is recommended you already provide the “SNPMappings.txt” file in the directory where the output files will be generated, before importing.

We have provided example data (47 samples, typed for 317,503 SNPs using Illumina HumanHap300 arrays). This dataset can be used to get familiar with using TriTyper. To import this data, unzip the example data Zip file in a folder called ‘data’. Move one directory up and place ‘TriTyperImporter.jar’ here. Subsequently use the following command to import the example data:

java -Xmx512m -jar TriTyperImporter.jar data/Hap300_47dna_Final_GC_XY_Report_CEU.txt data/

TriTyper Discoverer: Discovering triallelic SNPs

TriTyper can discover triallelic SNPs when raw intensity data is available. Please download TriTyper Discoverer to accomplish this. Please ensure you unzip this file in such a way that the directory structure, as defined within the Zip file, will remain intact. Extract this Zip file and move the ‘data’ directory which contains data from TriTyper Importer to the extracter TriTyper Discoverer directory.

Subsequently you can start TriTyper Discoverer with the following command:

java -Xmx512m -jar TriTyperDiscoverer.jar data/

where ‘data/’ is the directory where you have saved the binary data files together with ‘SNPMappings.txt’ and ‘PhenotypeInformation.txt’.

It is important to realize discovering triallelic SNPs is a time and memory consuming process. When using a sample size of 1,000 sampled, typed using Illumina HumanHap300 assays, processing all chromosomes takes approximately 12 hours on a MacBook Pro (Core Two Duo, 2.33Ghz). Additionally, this program requires 512 Mb, which makes it mandatory to provide the option: ‘-Xmx512m’.

If you have multiple processors at hand, you can improve speed by dedicating a single chromosome to each processor using the option:

-analysechromosomeQ

where ‘Q’ is the chromosome you want to analyze.

The first time TriTyper Discoverer starts working with a dataset, it will analyse all chromosome X SNPs, which it uses to establish an empiric distribution of 1 and 2 copy genotypes. Once this has completed, TriTyper Discoverer starts processing all SNPs. For each identified triallelic SNPs, pictures for visual assessment and genotype call reports will be generated. These are placed in the “triallelicsnps” directory within the ‘data’ directory.

When using the example data (47 samples, typed for 317,503 SNPs using Illumina HumanHap300 arrays), TriTyper Discoverer will identify only a limited number of triallelic SNPs, because the number of samples is small.

TriTyper Imputer: Imputing triallelic genotypes based on LD and biallelic calls

If no raw intensity data is available, or the sample size is small, you can still infer triallelic genotypes, but this will be limited to the 1,204 triallelic SNPs we have identified. Please be aware of the fact that this imputation requires that for these triallelic SNPs also certain neighbouring biallelic SNPs have been typed. If you have not called some of these, the number of eventually imputed triallelic SNPs will decrease.

We will make TriTyper Imputer available on 2008-01-31.

Required TriTyper specific file formats:

SNPMappings.txt format
A physical mapping file for all SNPs is required, as TriTyper needs to know what the neighbouring SNPs of triallelic SNPs are, as it uses these to improve the accuracy of genotyping. SNP Mappings are available for the Illumina HumanHap300 and Illumina HumanHap550 assays (NCBI V36 Assembly). If you use a different assay (e.g. Illumina GoldenGate assays or more recent Illumina Infinium arrays), please ensure the format of this file is tab-delimited. For each SNP you should first indicate the chromosome, then the physical postion and finally the SNP name. Below is an example of the HumanHap300 mapping file:

1	995669	rs3934834
1	1011278	rs3737728
1	1020428	rs6687776
1	1021403	rs9651273
1	1038818	rs4970405

Please make sure you already have this file generated before you start importing data, as this will ensure speed optimized binary files will be generated. You have to place this file in the ‘data/’ directory.
PhenotypeInformation.txt format
TriTyper needs to know for each individual whether they are male or female, case or control and whether they should be included or excluded for analysis. The format of this file needs to be tab-delimited and has to resemble the format of the following example:

Sample1	Control	Include	Male
Sample2	Control	Include	Female
Sample3	Case	Include	Male
Sample4	Case	Include	Female
Sample5	Case	Exclude	Male

Please make sure you place this file in the ‘data/’ directory.

Downloads

  • Program for importing genotype and raw intensity data, discovering triallelic SNPs and imputing triallelic SNPs: TriTyper
  • Example data set: 47 CEU HapMap Samples, typed on Illumina HumanHap300. This Zip file contains a BeadStudio final report file, SNP mappings and phenotype information for 47 CEU HapMap samples, that are called on the Illumina HumanHap300 platform. You can use these files to get familiar with the required data structures, how importing data works, and what to expect from TriTyper Discoverer and TriTyper Imputer.
  • SNP Mappings file: As a service we provide mappings (NCBI 36 assembly) for Illumina HumanHap300 and Illumina HumanHap550 arrays, which are compatible with TriTyper. Place either of these files in the data directory. Before you can use these files, you first need to unzip them.
  • Chromosome X distribution file. If the number of chromosome X SNPs is less than 50 (e.g. when using a custom-made Illumina GoldenGate assays), TriTyper cannot determine an accurate intensity distribution. To overcome this, you can use this file for TriTyper Discoverer, which you should place in the data directory. However, we cannot guarantee this intensity distribution perfectly resembles the distribution your assays have. As such it is recommended to included at least 50 chromosome X SNPs, when designing and ordering custom assays.

System Requirements

Please ensure that you are familiar with using the command line, as we do not provide a graphical user interface yet with TriTyper. Windows 2000 and Windows XP users need to have a program (e.g. Winzip) to extract files from Zip files. Windows Vista, Mac OS X and Linux have built in support for extracting Zip files.

TriTyper requires Java 1.5 or higher. Listed here is how to ensure you have a proper Java version installed:

  • Windows and Linux: Please make sure you have a recent Java (JDK or JRE) installation by going to the ‘Command Prompt’. Issue the following command ‘java -version’ (without the brackets). If the Java version is 1.5.0 or higher, your Java installation allows running TriTyper. If not, head to java.sun.com to download the most current version. If do not have enough rights to install this software, contact your local system administrator.
  • Mac OS X 10.4 Tiger: TriTyper works on Mac OS X 10.4 Tiger when you have installed the Java 1.5 version. Please use ‘Software Update‘ to get all updates, this will also give you the latest Java 1.5 release. If do not have enough rights to install this update, contact your local system administrator.
  • Max OS X 10.5 Leopard: TriTyper works out of the box on Mac OS X 10.5 Leopard.