(10-05-2024, 11:50 AM)nomad01 Wrote: Here's the list of biased SNPs identified by the study "Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808625/
(10-05-2024, 11:50 AM)nomad01 Wrote: Here's the list of biased SNPs identified by the study "Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808625/
(10-05-2024, 11:50 AM)nomad01 Wrote: Here's the list of biased SNPs identified by the study "Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808625/
Could you do this with your raw data in plink format as well??
Sure, you can try it and see if it influences the qpadm result.
(10-05-2024, 12:43 PM)Genetics189291 Wrote:
(10-05-2024, 11:50 AM)nomad01 Wrote: Here's the list of biased SNPs identified by the study "Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808625/ https://pastelink.net/zqh56z40
When analizing different types of data together (1240k, .SG, .DG, .HO) these should be excluded from the dataset.
The easiest way to do this is to use the plink version of the AADR dataset, paste the snps in a txt file and exclude them like this:
plink2 --bfile v62_AADR_1240K --exclude snp.txt --make-bed --out v62_AADR_1240K_filtered
(10-05-2024, 12:58 PM)AimSmall Wrote: Guess my immediate question is why they retain those snps in the latest AADR?
When analyzing only one type of data (for example, the left, right and target pops are all 1240k) they shouldn't cause problems, actually in this case using all 1.2 million snps is probably more precise then using just a part of them.
Also, this is experimental, and some studies don't filter them at all.
10-05-2024, 01:20 PM (This post was last modified: 10-05-2024, 01:20 PM by Genetics189291.)
(10-05-2024, 12:47 PM)nomad01 Wrote:
(10-05-2024, 12:41 PM)Genetics189291 Wrote:
(10-05-2024, 11:50 AM)nomad01 Wrote: Here's the list of biased SNPs identified by the study "Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808625/
Could you do this with your raw data in plink format as well??
Sure, you can try it and see if it influences the qpadm result.
(10-05-2024, 12:43 PM)Genetics189291 Wrote:
(10-05-2024, 11:50 AM)nomad01 Wrote: Here's the list of biased SNPs identified by the study "Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808625/ https://pastelink.net/zqh56z40
When analizing different types of data together (1240k, .SG, .DG, .HO) these should be excluded from the dataset.
The easiest way to do this is to use the plink version of the AADR dataset, paste the snps in a txt file and exclude them like this:
plink2 --bfile v62_AADR_1240K --exclude snp.txt --make-bed --out v62_AADR_1240K_filtered
(10-05-2024, 12:58 PM)AimSmall Wrote: Guess my immediate question is why they retain those snps in the latest AADR?
You should try excluding them and see if your results change my sons did
You ran a K36 with less SNPs? The algo was created using what SNPs existed 10 years ago. Be like changing a K36 run from using Ancestry v2 to using a 23&Me v5 which has less SNPs. You're just getting different results because you have less coverage against that algo.
Not sure that's the difference we're looking for. I'd think a qpAdm delta or Admixture delta would be more meaningful.
(10-05-2024, 12:58 PM)AimSmall Wrote: Guess my immediate question is why they retain those snps in the latest AADR?
You should try excluding them and see if your results change my sons did
You ran a K36 with less SNPs? The algo was created using what SNPs existed 10 years ago. Be like changing a K36 run from using Ancestry v2 to using a 23&Me v5 which has less SNPs. You're just getting different results because you have less coverage against that algo.
Not sure that's the difference we're looking for. I'd think a qpAdm delta or Admixture delta would be more meaningful.
I’ll try merging it with the dataset again would something like this be useful for g25? From what I know it’s removing snps that etc cause ancestral noise for more accurate results? Unless I’m wrong
(10-05-2024, 12:58 PM)AimSmall Wrote: Guess my immediate question is why they retain those snps in the latest AADR?
You should try excluding them and see if your results change my sons did
You ran a K36 with less SNPs? The algo was created using what SNPs existed 10 years ago. Be like changing a K36 run from using Ancestry v2 to using a 23&Me v5 which has less SNPs. You're just getting different results because you have less coverage against that algo.
Not sure that's the difference we're looking for. I'd think a qpAdm delta or Admixture delta would be more meaningful.
I’ll try merging it with the dataset again would something like this be useful for g25? From what I know it’s removing snps that etc cause ancestral noise for more accurate results? Unless I’m wrong
I'm skeptical, because any previous calculators such as K13, K36, or G25 would be possibly looking for those SNPs. With them missing, you're just getting lower coverage results. I would think those example calculators would have to exclude the SNP list as well to be meaningful if your dataset doesn't have them. I could be wrong.
You should try excluding them and see if your results change my sons did
You ran a K36 with less SNPs? The algo was created using what SNPs existed 10 years ago. Be like changing a K36 run from using Ancestry v2 to using a 23&Me v5 which has less SNPs. You're just getting different results because you have less coverage against that algo.
Not sure that's the difference we're looking for. I'd think a qpAdm delta or Admixture delta would be more meaningful.
I’ll try merging it with the dataset again would something like this be useful for g25? From what I know it’s removing snps that etc cause ancestral noise for more accurate results? Unless I’m wrong
I'm skeptical, because any previous calculators such as K13, K36, or G25 would be possibly looking for those SNPs. With them missing, you're just getting lower coverage results. I would think those example calculators would have to exclude the SNP list as well to be meaningful if your dataset doesn't have them. I could be wrong.
Yes, filtering and removing biased single nucleotide polymorphisms (SNPs) can significantly improve the accuracy of ancestry analysis. Bias SNPs, including those under selection or those exhibiting population-specific allele frequencies due to genetic drift or recent admixture, can distort ancestry estimates. Here’s why removing such SNPs is beneficial:
1. Reduce Confounding Signals: Certain SNPs may show frequency differences between populations not due to ancestry but due to selection or population-specific bottlenecks. By removing these, the analysis becomes more focused on neutral genetic markers, which more accurately reflect population history and ancestry.
2. Minimize Population-Specific Bias: Some SNPs may disproportionately represent particular populations, leading to over-representation of certain ancestral components. Removing these SNPs can prevent skewing the ancestry proportions and help ensure a more balanced analysis across different population groups.
3. Improve Phylogenetic Inference: SNPs affected by natural selection may reflect adaptation to specific environments rather than shared ancestry. By filtering them out, the remaining neutral SNPs provide a clearer signal of true population relationships and ancestral ties.
4. Reduce False Positives in Admixture Estimates: Admixture analyses can be particularly sensitive to SNPs that deviate from neutral patterns. Removing SNPs with strong population-specific selection or bias can prevent incorrect inferences about gene flow between populations.
5. More Reliable PCA (Principal Component Analysis): In ancestry analysis, PCA is commonly used to identify population structure. Biased SNPs can lead to artificial clustering or spreading of populations, but by removing them, the PCA results will more accurately reflect genetic similarities and differences due to shared ancestry rather than recent evolutionary pressures.
In conclusion, filtering out biased SNPs can help make ancestry analysis more robust, improving the quality of the data used to infer population structure and historical relationships. This leads to more accurate representations of genetic ancestry.
I thought I would ask ChatGPT to see
From what I can see the results I got from this were different from 23andme v5 and sorted out some noise I was getting in my sons results, I also got less noise on my combined kit, I got less oceanian, more Iberian, Arab, Near Eastern and less itailian, the East African was also gone and absorbed into west African it looked historically more correct for me
That question wasn't in the context of comparing to calculators using those SNPs. You're missing the point. The calculators would also have to have those biased SNPs removed for your comparison to be meaningful. You're only removing biased SNPs on one side of the comparison.
Produce a new PCA calculator using the SNPs removed from the source comparison (AADR or whatever) AND your raw samples... then make the comparison.
(10-05-2024, 03:41 PM)AimSmall Wrote: That question wasn't in the context of comparing to calculators using those SNPs. You're missing the point. The calculators would also have to have those biased SNPs removed for your comparison to be meaningful. You're only removing biased SNPs on one side of the comparison.
Produce a new PCA calculator using the SNPs removed from the source comparison (AADR or whatever) AND your raw samples... then make the comparison.
I’ll do that when I get back from work and post it here