Examples of these original and randomized distributions are shown in figure 2A, 2C, and 2E. The FDR estimate for any given accuracy cutoff was computed as FDR FP/TP, where FP represents the false positive estimate at the selected accuracy cutoff, and TP represents the total positives. The FP estimate was calculated by evaluating the accuracy of all gene pairs from 10 random permutations of the class labels for each phenotype comparison dataset considered herein. The FP estimate was computed as the average number of pairs above the cutoff accuracy observed in the 10 permutations. With random pheno type label permutations, we assume that all pairs observed above a given accuracy in these datasets should be consid ered as false positives.
Because all pairs are considered for each permutation, the total number of pair accuracies considered for the null distributions is high. The FDR method accounts for the multiple hypothesis testing inherent in the TSP algorithm. In several cases of this study, no classi fier in the accuracy distribution of randomized data achieved the top accuracy of those from the original data and thus these TSPs technically exhibited a calculated FDR of zero. In these cases, the lowest non zero FDR value was listed as an upper bound estimate for the likely true FDR. Background Trypanosoma cruzi is a protozoan parasite of the order Kinetoplastida, and the causative agent of Chagas Disease, one of the so called neglected diseases that dis proportionately affect the poor. The disease is endemic in most Latin American countries, affecting in excess of 8 million people.
Chagas disease has a variable clinical outcome. In its acute form it can lead to death, Cilengitide while in its chronic form, it is a debilitating disease producing different associated pathologies mega colon, mega esophagus and cardiomyopathy, among others. These different clinical outcomes are the result of a complex inter play between environmental factors, the host genetic back ground and the genetic diversity present in the parasite population. As a result, these different clinical manifesta tions have been suggested to be, at least in part, due to the genetic diversity of T. cruzi. The T. cruzi species has a structured population, with a predominantly clonal mode of reproduction, and a con siderable phenotypic diversity.
Through the use of a number of molecular markers the population has been divided in a number of evolutionary lineages, also called discrete typing units. Some markers allow the distinction of two or three major lineages, while other experimen tal strategies, such as RAPD and multilocus isoenzyme electrophoresis support the distinction of six sub divisions originally designated as DTUs I, IIa, IIb, IIc, IId, and IIe. Recently, this nomenclature was revised as follows TcI, TcII, TcIII, TcIV, TcV and TcVI.