Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-05-2024, 06:06 PM)nomad01 Wrote: I tested the AG, SG and DG version of the same sample with and without removing the biased SNPs:
Without filtering
Code: I1496.AG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
0.425 0.973 0.0268
I1496.SG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
0.0356 0.963 0.0374
I1496.DG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
0.0190 0.963 0.0372
with filtering
Code: I1496.AG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
8.23e-1 0.974 0.0257
I1496.SG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
6.85e-1 0.983 0.0168
I1496.DG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
0.840 0.979 0.0213
Without filtering, only the AG version has a passing p value. With filtering all 3 do. left and right pops are also exclusively AG. It's crazy that so many academic papers carelessly mix different data types.
When I get home I’ll merge my new data and see if there are any differences.
Posts: 863
Threads: 48
Joined: Aug 2023
Gender: Male
Ethnicity: Colonial American
Nationality: American
Y-DNA (P): R1b-U152 >R-FTA96415
Y-DNA (M): I2-P37 > I-BY77146
mtDNA (M): J1b1a1a
mtDNA (P): H66a
(10-05-2024, 06:08 PM)nomad01 Wrote: (10-05-2024, 03:50 PM)Light Wrote: So it becomes 883K SNPs now from 1233K, interesting
There should be 475,425 SNPs remaining in the 1240k dataset, not 800k.
In reviewing the paper, where did you get your 800K SNP list? A specific supplement?
2/3’s of the AADR being biased is a tough pill to swallow.
Posts: 454
Threads: 5
Joined: May 2024
I also found that sex chromosomes need to be excluded if you are using plink data in qpadm. Otherwise the p-values are abysmal.
the relevant command:
plink2 --bfile v62_AADR_1240K --exclude snp.txt --chr 1-22 --make-bed --out v62_AADR_1240K_filtered
Posts: 440
Threads: 1
Joined: Sep 2024
10-05-2024, 06:14 PM
(This post was last modified: 10-05-2024, 06:14 PM by Light.)
(10-05-2024, 04:03 PM)Light Wrote: PACKEDPED:- https://drive.google.com/file/d/1---gS5I...drive_link
Only 475K have overlap with 1233K AADR, so it's worse than even HO, basically trash
Yes, I mention it here
Posts: 454
Threads: 5
Joined: May 2024
(10-05-2024, 06:12 PM)AimSmall Wrote: (10-05-2024, 06:08 PM)nomad01 Wrote: (10-05-2024, 03:50 PM)Light Wrote: So it becomes 883K SNPs now from 1233K, interesting
There should be 475,425 SNPs remaining in the 1240k dataset, not 800k.
In reviewing the paper, where did you get your 800K SNP list? A specific supplement?
2/3’s of the AADR being biased is a tough pill to swallow.
The file "supp_gr.276728.122_Supplemental_Data_1_twistSNP_1352529bp.txt.zip" in the supplementary data. In the last column passing SNPs have the value 1, and bisaed ones have 0.
Posts: 454
Threads: 5
Joined: May 2024
(10-05-2024, 04:03 PM)Light Wrote: PACKEDPED:- https://drive.google.com/file/d/1---gS5I...drive_link
Only 475K have overlap with 1233K AADR, so it's worse than even HO, basically trash
From the Genetic History of the Balkans study:
Quote:The following analyses were performed using the ´HO´ dataset (Materials and Methods), after filtering out 366,668 SNPs (224,207 SNPs remained) known to produce biases when co-analyzing 1240k data (most of our ancient samples) with other types of data [163],
In the HO dataset indeed very few SNPs remain after filtering, but still enough to run analyses. I've run succesful qpadm models on samples with less than 10k SNPs.
Posts: 440
Threads: 1
Joined: Sep 2024
10-05-2024, 06:24 PM
(This post was last modified: 10-05-2024, 06:25 PM by Light.)
Yes so for your half-baked future models you can use this garbage AADR of no use. That study is probably garbage itself if it suggests 833k of AADR is biased or it is that you misinterpreted something from it. There is no reason authors like Swapan Mallick, Nick Patterson, David Reich would host v62 AADR with 1233K SNPs if 833K were biased
Posts: 454
Threads: 5
Joined: May 2024
(10-05-2024, 06:24 PM)Light Wrote: Yes so for your half-baked future models you can use this garbage AADR of no use. That study is probably garbage itself if it suggests 833k of AADR is biased
Calm down, I'm not saying you must run models like this. I'm just experimenting and sharing my results.
Posts: 454
Threads: 5
Joined: May 2024
Just to clarify, biased SNPs doesn't mean bad SNPs.
It's SNPs for where different chips tend to read different values for the same individual.
Just like if you tested on both 23andme and Ancestry, the 2 raw datas wouldn't be 100% identical.
Those SNPs are not worthless. Some future version of qpadm will probably be able to detect and fix them automatically.
For example, the "amateur" G25 is almost immune to data type differences, regardless of some other flaws it may have
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-05-2024, 06:59 PM)nomad01 Wrote: Just to clarify, biased SNPs doesn't mean bad SNPs.
It's SNPs for where different chips tend to read different values for the same individual.
Just like if you tested on both 23andme and Ancestry, the 2 raw datas wouldn't be 100% identical.
Those SNPs are not worthless. Some future version of qpadm will probably be able to detect and fix them automatically.
For example, the "amateur" G25 is almost immune to data type differences, regardless of some other flaws it may have
Global25 is immune can you go into depth what you mean it’s immune to data type differences
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-05-2024, 06:24 PM)Light Wrote: Yes so for your half-baked future models you can use this garbage AADR of no use. That study is probably garbage itself if it suggests 833k of AADR is biased or it is that you misinterpreted something from it. There is no reason authors like Swapan Mallick, Nick Patterson, David Reich would host v62 AADR with 1233K SNPs if 833K were biased
If they said 16,000 or even 100,000 that would have been okay but 833k is way too much. Clearly something is gone wrong when they’ve identified theee biased snps
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-05-2024, 06:59 PM)nomad01 Wrote: Just to clarify, biased SNPs doesn't mean bad SNPs.
It's SNPs for where different chips tend to read different values for the same individual.
Just like if you tested on both 23andme and Ancestry, the 2 raw datas wouldn't be 100% identical.
Those SNPs are not worthless. Some future version of qpadm will probably be able to detect and fix them automatically.
For example, the "amateur" G25 is almost immune to data type differences, regardless of some other flaws it may have
Regardless of whether it’s bad snps or not biased snps can cause issue for ancestral analysis, Etc it can cause things like noise. That said 833k biased snps is way too much and I don’t agree with that paper
Posts: 454
Threads: 5
Joined: May 2024
(10-05-2024, 07:57 PM)Genetics189291 Wrote: (10-05-2024, 06:59 PM)nomad01 Wrote: Just to clarify, biased SNPs doesn't mean bad SNPs.
It's SNPs for where different chips tend to read different values for the same individual.
Just like if you tested on both 23andme and Ancestry, the 2 raw datas wouldn't be 100% identical.
Those SNPs are not worthless. Some future version of qpadm will probably be able to detect and fix them automatically.
For example, the "amateur" G25 is almost immune to data type differences, regardless of some other flaws it may have
Global25 is immune can you go into depth what you mean it’s immune to data type differences
For example, in the G25 sheet you have Turkey_N.DG, Turkey_N.SG, Turkey_N_noUDG, Turkey_N, and you can use all four interchangeably, all can act well as an EEF source.
In qpadm the model can fail if you replace Turkey_N.DG with Turkey_N.
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-05-2024, 08:13 PM)nomad01 Wrote: (10-05-2024, 07:57 PM)Genetics189291 Wrote: (10-05-2024, 06:59 PM)nomad01 Wrote: Just to clarify, biased SNPs doesn't mean bad SNPs.
It's SNPs for where different chips tend to read different values for the same individual.
Just like if you tested on both 23andme and Ancestry, the 2 raw datas wouldn't be 100% identical.
Those SNPs are not worthless. Some future version of qpadm will probably be able to detect and fix them automatically.
For example, the "amateur" G25 is almost immune to data type differences, regardless of some other flaws it may have
Global25 is immune can you go into depth what you mean it’s immune to data type differences
For example, in the G25 sheet you have Turkey_N.DG, Turkey_N.SG, Turkey_N_noUDG, Turkey_N, and you can use all four interchangeably, all can act well as an EEF source.
In qpadm the model can fail if you replace Turkey_N.DG with Turkey_N.
Doesn’t that make g25 inaccurate in that instance then?
Posts: 454
Threads: 5
Joined: May 2024
(10-05-2024, 08:19 PM)Genetics189291 Wrote: (10-05-2024, 08:13 PM)nomad01 Wrote: (10-05-2024, 07:57 PM)Genetics189291 Wrote: Global25 is immune can you go into depth what you mean it’s immune to data type differences
For example, in the G25 sheet you have Turkey_N.DG, Turkey_N.SG, Turkey_N_noUDG, Turkey_N, and you can use all four interchangeably, all can act well as an EEF source.
In qpadm the model can fail if you replace Turkey_N.DG with Turkey_N.
Doesn’t that make g25 inaccurate in that instance then? No, it makes G25 more reliable because it can recognize these are all the same population and isn't distracted by chip differences.
|