As posted by Ted K. @ fb [Highlights by me]
A familial, telomere-to-telomere reference for human de novo mutation and recombination from a four-generation pedigree
Using five complementary short- and long-read sequencing technologies, we phased and assembled >95% of each diploid human genome in a four-generation, 28-member family (CEPH 1463) allowing us to systematically assess de novo mutations (DNMs) and recombination.
From this family, we estimate an average of ... 12.4 de novo Y chromosome events per generation.
We accurately assemble ... six Y chromosomes across the generations,
https://www.biorxiv.org/content/10.1101/...5.606142v1
Sample IDs:
NA12889 paternal grandfather
NA12877 father
NA12882 son
NA12883 son
NA12884 son
NA12886 son
NA12888 son
also
NA12891 maternal grandfather
https://www.yfull.com/tree/R-C111293/
Y chromosome mutations. Here, we focus on the ~59.7 Mbp male-specific Ychromosomal region (MSY, i.e., excluding pseudoautosomal regions) considering both read-based as well as assembly-based approaches to discover DNMs (Methods, Supplementary Notes). There are nine male members who carry the R1b1a-Z302 Y haplogroup across the four generations (Fig. 5a, Supplementary Table 13) and we use the great-grandfather (G1-NA12889, Fig. 1) chromosome Y assembly as a reference for DNM detection across the 48.8 Mbp MSY. The de novo assembly-based approach increases by >2-fold the number of accessible base pairs when compared to HiFi readbased calling but increases by >7-fold the discovery of de novo SNVs (Methods). In total, we identify 48 de novo SNVs in the MSY across the five G2-G3 males, ranging from 7-11 SNVs per Y transmission (mean 9.6, median 10) (Supplementary Table 14). Only two SNVs map to the Y euchromatic regions, one to the pericentromeric with the remaining 45/48 to the Yq12 heterochromatic satellite regions (Fig. 5b). We thus estimate the de novo SNV rate of 1.99×10-7 (95% CI = 1.59 - 2.39×10-7) for the MSY combining both read- and assembly-based approaches. It is important to note that 13/45 (29%) of the DNMs had 100% identical matches elsewhere in the Yq12 region (but not at orthologous positions) and could, therefore, result from interlocus gene conversion events (Methods) consistent with the DYZ1/HSat3A6 and DYZ2/HSat1B organization of the region36. We also identify a total of nine de novo indels (<50 bp, homopolymers excluded) ranging from 1-3 indels/sample (mean 1.8 events/Y transmission) and five de novo SVs (≥50 bp) (Fig. 5b, Supplementary Table 14). The latter range from 2,416 to 4,839 bp in size, each affecting an entire DYZ2 repeat unit(s), with an average of one SV per Y transmission. Variants detected in the G3 parents of G4 are confirmed by both transmission and read data, supporting the high quality of the variant calls. Overall, 83% (52/63) of the DNMs identified on chrY (42/48 SNVs, 4/9 of indels and 5/5 SVs) are located in regions where short reads cannot be reliably mapped (mapping quality = 0).
[Glossary: SNV (Single Nucleotide Variant), DNM (De Novo Mutation), SV (Structural Variant)]
A familial, telomere-to-telomere reference for human de novo mutation and recombination from a four-generation pedigree
Using five complementary short- and long-read sequencing technologies, we phased and assembled >95% of each diploid human genome in a four-generation, 28-member family (CEPH 1463) allowing us to systematically assess de novo mutations (DNMs) and recombination.
From this family, we estimate an average of ... 12.4 de novo Y chromosome events per generation.
We accurately assemble ... six Y chromosomes across the generations,
https://www.biorxiv.org/content/10.1101/...5.606142v1
Sample IDs:
NA12889 paternal grandfather
NA12877 father
NA12882 son
NA12883 son
NA12884 son
NA12886 son
NA12888 son
also
NA12891 maternal grandfather
https://www.yfull.com/tree/R-C111293/
Y chromosome mutations. Here, we focus on the ~59.7 Mbp male-specific Ychromosomal region (MSY, i.e., excluding pseudoautosomal regions) considering both read-based as well as assembly-based approaches to discover DNMs (Methods, Supplementary Notes). There are nine male members who carry the R1b1a-Z302 Y haplogroup across the four generations (Fig. 5a, Supplementary Table 13) and we use the great-grandfather (G1-NA12889, Fig. 1) chromosome Y assembly as a reference for DNM detection across the 48.8 Mbp MSY. The de novo assembly-based approach increases by >2-fold the number of accessible base pairs when compared to HiFi readbased calling but increases by >7-fold the discovery of de novo SNVs (Methods). In total, we identify 48 de novo SNVs in the MSY across the five G2-G3 males, ranging from 7-11 SNVs per Y transmission (mean 9.6, median 10) (Supplementary Table 14). Only two SNVs map to the Y euchromatic regions, one to the pericentromeric with the remaining 45/48 to the Yq12 heterochromatic satellite regions (Fig. 5b). We thus estimate the de novo SNV rate of 1.99×10-7 (95% CI = 1.59 - 2.39×10-7) for the MSY combining both read- and assembly-based approaches. It is important to note that 13/45 (29%) of the DNMs had 100% identical matches elsewhere in the Yq12 region (but not at orthologous positions) and could, therefore, result from interlocus gene conversion events (Methods) consistent with the DYZ1/HSat3A6 and DYZ2/HSat1B organization of the region36. We also identify a total of nine de novo indels (<50 bp, homopolymers excluded) ranging from 1-3 indels/sample (mean 1.8 events/Y transmission) and five de novo SVs (≥50 bp) (Fig. 5b, Supplementary Table 14). The latter range from 2,416 to 4,839 bp in size, each affecting an entire DYZ2 repeat unit(s), with an average of one SV per Y transmission. Variants detected in the G3 parents of G4 are confirmed by both transmission and read data, supporting the high quality of the variant calls. Overall, 83% (52/63) of the DNMs identified on chrY (42/48 SNVs, 4/9 of indels and 5/5 SVs) are located in regions where short reads cannot be reliably mapped (mapping quality = 0).
[Glossary: SNV (Single Nucleotide Variant), DNM (De Novo Mutation), SV (Structural Variant)]