AssociationGG

associationGG is a command-line program written in JAVA to perform (e)QTL mapping on Genetical Genomics datasets. Download a copy of associationGG

Data format

The input data must be supplied in the TriTyper format. You can find tools to convert your data at the TriTyper website, or by using ImputationTool. Furthermore, we provide a small example dataset of 47 individuals, which you can use to get familiar with associationGG. To use this data, please unzip to a directory with the name ‘data’.
AssociationGG needs at least the following files:
FileName Description
GenotypeMatrix.dat Binary file containing genotype data. This file can be created from several sources, such as FinalReport format. See TriTyperImporter for reference. This file has the following filesize: (number of SNPs * 2 alleles) * number of individuals, in tab-seperated text format.
ExpressionData.txt The gene expression data, in tab-separated text format, one line per probe, one column per gene expression array, one header row.
SNPs.txt The list of SNPs that are encoded within the GenotypeMatrix.dat file. One line per SNP.
SNPMappings.txt The list of SNPs that are encoded within the GenotypeMatrix.dat file. One line per probe: first column contains the chromosome number, second column contains the SNP position, and third column contains the rs-id.
Individuals.txt The list of individuals that are encoded within the GenotypeMatrix.dat file. One line per individual.
PhenotypeInformation.txt This file describes the phenotypes of the individuals. One line per individual: first column contains the individual ID, second columns states wether the individual is a case (case) or control (control), third column states whether you want to include (include) or exclude (exclude) a certain individual, and the fourth column contains the gender (female/male).

System requirements

This program requires JAVA JVM version 1.6 or higher, and at least 512Mb of RAM memory to run.

Usage

The user interface of the program is quite straightforward. Configuration is mainly performed via an XML file. You can run the program by envoking the following command:
java -Xmx2g -jar associationGG.jar settings.xml

Configuration

Configuration of associationGG is performed using an XML file. The configuration options are described in the table below. We also supply an example configuration file.
Section Option Description
settings.defaults.qc snpqccallratethreshold float: SNP QC quality threshold for call-rate [default: 0.95]
snpqchwethreshold float: SNP QC quality threshold for Hardy-Weinberg equilibrium [default: 0.001]
snpqcmafthreshold float: SNP QC quality threshold for minor-allele frequency [default: 0.05]
settings.defaults.analysis analysistype String: defines type of analysis to perform. Possible options: cis, trans, cistrans. [default: cis]
cisanalysisprobedistance Integer: declare an effect a cis-effect when the distance between the SNP and the middle of the probe is less than this distance. [default: 250000]
threads Integer: number of threads to use during calculation. Defaults to number of processors when <= 0. We can recommend using 1-6 threads when performing a cis-analysis, and >6 for a cistrans or trans analysis.
settings.defaults.multipletesting type String: defines the multiple testing correction to perform. Currently, only fdr is available as an option. FWER might be added at a later stage.
threshold float: significance threshold [default: 0.05]
permutations Integer: number of permutations to perform in order to estimate FDR.
settings.defaults.output outputdirectory String: name of the directory where the results should be stored
outputplotthreshold float: defines the threshold for creating plots. We advise to use a threshold of at least 0 < x < 1E-25 for large datasets
outputplotdirectory String: name of the directory where plots should be stored
maxnreqtlresults Integer: the maximum number of eQTL results to store. Default is 150000
settings.datasets.dataset name String: name of the dataset
location String: folder location of the genotypematrix.dat file.
genometoexpressioncoupling String: tab separated file describing the link between genotype and gene expression file. Format: genotype\tgene expression\n
expressiondata String: file location of file containing gene expression data. Data should be in TriTyper format. If not defined, the program looks for settings.datasets.dataset.location/Expressiondata.txt
quantilenormalize Boolean: defines if the dataset should be quantile normalized. Defaults to false.
logtranform Boolean: defines if the dataset should be log transformed. Defaults to false.

Output

AssociationGG produces several output files, which are described in the following table.
File Description
AssociationGGsettings.xml Settings that were used during the analysis
ChromosomeYExpressionPlot.png Plot of average chromosome Y expression vs sample gender.
eQTLProbesFDR0.05.txt Contains eQTL results after multiple testing. This file lists the strongest effect for each gene expression probe.
eQTLs.txt Contains raw eQTL results on real data (not corrected for multiple testing).
eQTLsFDR0.05.txt Contains eQTL results after multiple testing correction.
eQTLsFDR0.05DotPlot.png Dot plot of all significant eQTL effects.
eQTLsFDR0.05QQPlot.pdf QQ plot of test statistics.
eQTLSNPsFDR0.05.txt Contains eQTL results after multiple testing. This file lists the strongest effect for each SNP.
eQTLsTestedProbes.txt Describes for each probe how many times it was tested against a SNP.
PermutedEQTLsPermutationRound1.txt These files (dependent upon number of permutations performed) contain the eQTL mapping results on the permuted data, which is used for FDR estimation.
log.txt Log of the program output