Le lundi 16 juin 2014 à 15:00 - SupAgro Salle 11/204Timothée Flutre
Using gene expression as intermediate phenotype is now a common approach to interpret associations between genetic variants and organismal phenotypes. Until recently, most studies were performed on a single tissue or cell-type sampled in a hundred individuals. Statistical methods were gradually improved so as to now robustly detect genetic variants associated with changes in gene expression (eQTLs). As typical effect sizes are weak, several recent studies analyzed up to one thousand individuals, showing that most genes have at least one eQTL. However, interpretation of such associations is hampered by the fact that easy-to-sample tissues may often be irrelevant to the etiology of organismal phenotypes of interest. This prompted the NIH to fund the ?genotype - tissue expression? pilot project (GTEx) aiming at building the largest, tissue-wide eQTL data resource to date. In prevision of such a data set, we recently developed a statistical framework to detect eQTLs with high power by jointly analyzing multiple tissues, and to reliably infer the proportion of tissue-consistent and tissue-specific eQTLs (Flutre et al, 2013). As part of the GTEx consortium, we applied our model on the current data set of 9 tissues from 100-200 individuals. However, most current methods are unable to efficiently cope with the larger number of tissues that will be available in the future. The reason stems from the use of ?configurations?, binary vectors representing activity patterns of eQTLs among tissues. Indeed, a data set of 20 tissues generates 2^20 configurations (> 10^6). Instead of considering each of them, our improved model learns only the subset of configurations present in the data. Moreover, to alleviate the need for permutations in such large-scale studies, we also developed an efficient, yet conservative procedure to control the Bayesian FDR.