(In this calculation, due to the small number of observations, we assume that g equals 1.) For the de novo events in siblings,
c1 = 14, c = 15, d = 16, and C = 232. This calculation is performed in the siblings because the observed rare de novo CNVs in this group are assumed to be predominantly nonrisk variants and consequently represent the null distribution. Next, we calculate the chance that two de novo events match VE-822 at any one of C eCNVRs in probands by using methods from the classic “birthday problem” which assesses the likelihood of seeing at least one pair of matching birthdays among a given number of people. Our interest was in seeing >2 matches (m) in probands under the null hypothesis of no association with ASD. This calculation is performed empirically by distributing d events at random among C eCNVRs selleck chemical and then counting the maximum number of CNVs falling in the same location. Repeating this experiment one million times, we obtained an estimate of the probability
of finding ≥m counts for ≥1 eCNVR under the null hypothesis. Given the importance of the estimate of eCNVRs in unaffected populations for the determination of significance, we recalculated C based on a combined set of confirmed de novo CNVs in controls described in the literature and obtained a highly similar result (C = 242) (Supplemental Experimental Methisazone Procedures). Moreover, we determined that the results reported here remain significant under the plausible range of estimates for C (Supplemental Experimental Procedures). The unseen species problem was used to predict the total number of ASD risk loci based on the distribution of de novo CNVs in probands. This required
identification of the de novo CNVs that confer risk; to identify such CNVs we estimated that 76% of de novo CNVs in probands confer risk (67 de novo CNVs in probands − 16 de novo CNVs expected in siblings/67 de novo CNVs in probands) and assumed that recurrent de novo CNVs were most likely to be associated with risk and should be included within this 76%. The remainder of the 76% is made up of 27 single occurrence de novo CNVs (though we do not identify which ones), leading to an estimate of the total number of risk-conferring loci as 130 (c1 = 27, c = 33, d = 51). A similar approach was applied to all de novo CNVs in 3816 probands (count derived from the literature), leading to an estimate of 234 risk-conferring loci (c1 = 59, c = 88, d = 158). Predictors were examined in a logical order, e.g.