associationGG is a command-line program written in JAVA to perform (e)QTL mapping on Genetical Genomics datasets. Download a copy of associationGG
Data format
The input data must be supplied in the TriTyper format. You can find tools to convert your data at the TriTyper website, or by using ImputationTool. Furthermore, we provide a small example dataset of 47 individuals, which you can use to get familiar with associationGG. To use this data, please unzip to a directory with the name ‘data’.
AssociationGG needs at least the following files:
| FileName | Description |
| GenotypeMatrix.dat | Binary file containing genotype data. This file can be created from several sources, such as FinalReport format. See TriTyperImporter for reference. This file has the following filesize: (number of SNPs * 2 alleles) * number of individuals, in tab-seperated text format. |
| ExpressionData.txt | The gene expression data, in tab-separated text format, one line per probe, one column per gene expression array, one header row. |
| SNPs.txt | The list of SNPs that are encoded within the GenotypeMatrix.dat file. One line per SNP. |
| SNPMappings.txt | The list of SNPs that are encoded within the GenotypeMatrix.dat file. One line per probe: first column contains the chromosome number, second column contains the SNP position, and third column contains the rs-id. |
| Individuals.txt | The list of individuals that are encoded within the GenotypeMatrix.dat file. One line per individual. |
| PhenotypeInformation.txt | This file describes the phenotypes of the individuals. One line per individual: first column contains the individual ID, second columns states wether the individual is a case (case) or control (control), third column states whether you want to include (include) or exclude (exclude) a certain individual, and the fourth column contains the gender (female/male). |
System requirements
This program requires JAVA JVM version 1.6 or higher, and at least 512Mb of RAM memory to run.
Usage
The user interface of the program is quite straightforward. Configuration is mainly performed via an XML file. You can run the program by envoking the following command:
java -Xmx2g -jar associationGG.jar settings.xml
Configuration
Configuration of associationGG is performed using an XML file. The configuration options are described in the table below. We also supply an example configuration file.
| Section | Option | Description |
| settings.defaults.qc | snpqccallratethreshold | float: SNP QC quality threshold for call-rate [default: 0.95] |
| snpqchwethreshold | float: SNP QC quality threshold for Hardy-Weinberg equilibrium [default: 0.001] | |
| snpqcmafthreshold | float: SNP QC quality threshold for minor-allele frequency [default: 0.05] | |
| settings.defaults.analysis | analysistype | String: defines type of analysis to perform. Possible options: cis, trans, cistrans. [default: cis] |
| cisanalysisprobedistance | Integer: declare an effect a cis-effect when the distance between the SNP and the middle of the probe is less than this distance. [default: 250000] | |
| threads | Integer: number of threads to use during calculation. Defaults to number of processors when <= 0. We can recommend using 1-6 threads when performing a cis-analysis, and >6 for a cistrans or trans analysis. | |
| settings.defaults.multipletesting | type | String: defines the multiple testing correction to perform. Currently, only fdr is available as an option. FWER might be added at a later stage. |
| threshold | float: significance threshold [default: 0.05] | |
| permutations | Integer: number of permutations to perform in order to estimate FDR. | |
| settings.defaults.output | outputdirectory | String: name of the directory where the results should be stored |
| outputplotthreshold | float: defines the threshold for creating plots. We advise to use a threshold of at least 0 < x < 1E-25 for large datasets | |
| outputplotdirectory | String: name of the directory where plots should be stored | |
| maxnreqtlresults | Integer: the maximum number of eQTL results to store. Default is 150000 | |
| settings.datasets.dataset | name | String: name of the dataset |
| location | String: folder location of the genotypematrix.dat file. | |
| genometoexpressioncoupling | String: tab separated file describing the link between genotype and gene expression file. Format: genotype\tgene expression\n | |
| expressiondata | String: file location of file containing gene expression data. Data should be in TriTyper format. If not defined, the program looks for settings.datasets.dataset.location/Expressiondata.txt | |
| quantilenormalize | Boolean: defines if the dataset should be quantile normalized. Defaults to false. | |
| logtranform | Boolean: defines if the dataset should be log transformed. Defaults to false. |
Output
AssociationGG produces several output files, which are described in the following table.
| File | Description |
| AssociationGGsettings.xml | Settings that were used during the analysis |
| ChromosomeYExpressionPlot.png | Plot of average chromosome Y expression vs sample gender. |
| eQTLProbesFDR0.05.txt | Contains eQTL results after multiple testing. This file lists the strongest effect for each gene expression probe. |
| eQTLs.txt | Contains raw eQTL results on real data (not corrected for multiple testing). |
| eQTLsFDR0.05.txt | Contains eQTL results after multiple testing correction. |
| eQTLsFDR0.05DotPlot.png | Dot plot of all significant eQTL effects. |
| eQTLsFDR0.05QQPlot.pdf | QQ plot of test statistics. |
| eQTLSNPsFDR0.05.txt | Contains eQTL results after multiple testing. This file lists the strongest effect for each SNP. |
| eQTLsTestedProbes.txt | Describes for each probe how many times it was tested against a SNP. |
| PermutedEQTLsPermutationRound1.txt | These files (dependent upon number of permutations performed) contain the eQTL mapping results on the permuted data, which is used for FDR estimation. |
| log.txt | Log of the program output |
