List of biased SNPs for qpadm
#31
(10-05-2024, 06:06 PM)nomad01 Wrote: I tested the AG, SG and DG version of the same sample with and without removing the biased SNPs:

Without filtering

Code:
I1496.AG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
0.425                            0.973                0.0268

I1496.SG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
0.0356                              0.963                0.0374

I1496.DG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
0.0190                              0.963                0.0372


with filtering

Code:
I1496.AG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
8.23e-1                              0.974                0.0257

I1496.SG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
6.85e-1                              0.983                0.0168

I1496.DG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
0.840                              0.979                0.0213

Without filtering, only the AG version has a passing p value. With filtering all 3 do. left and right pops are also exclusively AG. It's crazy that so many academic papers carelessly mix different data types.

When I get home I’ll merge my new data and see if there are any differences.
Reply
#32
(10-05-2024, 06:08 PM)nomad01 Wrote:
(10-05-2024, 03:50 PM)Light Wrote: So it becomes 883K SNPs now from 1233K, interesting

There should be 475,425 SNPs remaining in the 1240k dataset, not 800k.

In reviewing the paper, where did you get your 800K SNP list?  A specific supplement?

2/3’s of the AADR being biased is a tough pill to swallow.
Reply
#33
I also found that sex chromosomes need to be excluded if you are using plink data in qpadm. Otherwise the p-values are abysmal.

the relevant command:
plink2 --bfile v62_AADR_1240K --exclude snp.txt --chr 1-22 --make-bed --out v62_AADR_1240K_filtered
Reply
#34
(10-05-2024, 04:03 PM)Light Wrote: PACKEDPED:- https://drive.google.com/file/d/1---gS5I...drive_link

Only 475K have overlap with 1233K AADR, so it's worse than even HO, basically trash

Yes, I mention it here
Reply
#35
(10-05-2024, 06:12 PM)AimSmall Wrote:
(10-05-2024, 06:08 PM)nomad01 Wrote:
(10-05-2024, 03:50 PM)Light Wrote: So it becomes 883K SNPs now from 1233K, interesting

There should be 475,425 SNPs remaining in the 1240k dataset, not 800k.

In reviewing the paper, where did you get your 800K SNP list?  A specific supplement?

2/3’s of the AADR being biased is a tough pill to swallow.

The file "supp_gr.276728.122_Supplemental_Data_1_twistSNP_1352529bp.txt.zip" in the supplementary data. In the last column passing SNPs have the value 1, and bisaed ones have 0.
Reply
#36
(10-05-2024, 04:03 PM)Light Wrote: PACKEDPED:- https://drive.google.com/file/d/1---gS5I...drive_link

Only 475K have overlap with 1233K AADR, so it's worse than even HO, basically trash

From the Genetic History of the Balkans study:

Quote:The following analyses were performed using the ´HO´ dataset (Materials and Methods), after filtering out 366,668 SNPs (224,207 SNPs remained) known to produce biases when co-analyzing 1240k data (most of our ancient samples) with other types of data [163],

In the HO dataset indeed very few SNPs remain after filtering, but still enough to run analyses. I've run succesful qpadm models on samples with less than 10k SNPs.
Reply
#37
Yes so for your half-baked future models you can use this garbage AADR of no use. That study is probably garbage itself if it suggests 833k of AADR is biased or it is that you misinterpreted something from it. There is no reason authors like Swapan Mallick, Nick Patterson, David Reich would host v62 AADR with 1233K SNPs if 833K were biased
Reply
#38
(10-05-2024, 06:24 PM)Light Wrote: Yes so for your half-baked future models you can use this garbage AADR of no use. That study is probably garbage itself if it suggests 833k of AADR is biased

Calm down, I'm not saying you must run models like this. I'm just experimenting and sharing my results.
Reply
#39
Just to clarify, biased SNPs doesn't mean bad SNPs.
It's SNPs for where different chips tend to read different values for the same individual.

Just like if you tested on both 23andme and Ancestry, the 2 raw datas wouldn't be 100% identical.

Those SNPs are not worthless. Some future version of qpadm will probably be able to detect and fix them automatically.
For example, the "amateur" G25 is almost immune to data type differences, regardless of some other flaws it may have
Reply
#40
(10-05-2024, 06:59 PM)nomad01 Wrote: Just to clarify, biased SNPs doesn't mean bad SNPs.
It's SNPs for where different chips tend to read different values for the same individual.

Just like if you tested on both 23andme and Ancestry, the 2 raw datas wouldn't be 100% identical.

Those SNPs are not worthless. Some future version of qpadm will probably be able to detect and fix them automatically.
For example, the "amateur" G25 is almost immune to data type differences, regardless of some other flaws it may have

Global25 is immune can you go into depth what you mean it’s immune to data type differences
Reply
#41
(10-05-2024, 06:24 PM)Light Wrote: Yes so for your half-baked future models you can use this garbage AADR of no use. That study is probably garbage itself if it suggests 833k of AADR is biased or it is that you misinterpreted something from it. There is no reason authors like Swapan Mallick, Nick Patterson, David Reich would host v62 AADR with 1233K SNPs if 833K were biased

If they said 16,000 or even 100,000 that would have been okay but 833k is way too much. Clearly something is gone wrong when they’ve identified theee biased snps
Reply
#42
(10-05-2024, 06:59 PM)nomad01 Wrote: Just to clarify, biased SNPs doesn't mean bad SNPs.
It's SNPs for where different chips tend to read different values for the same individual.

Just like if you tested on both 23andme and Ancestry, the 2 raw datas wouldn't be 100% identical.

Those SNPs are not worthless. Some future version of qpadm will probably be able to detect and fix them automatically.
For example, the "amateur" G25 is almost immune to data type differences, regardless of some other flaws it may have

Regardless of whether it’s bad snps or not biased snps can cause issue for ancestral analysis, Etc it can cause things like noise. That said 833k biased snps is way too much and I don’t agree with that paper
Reply
#43
(10-05-2024, 07:57 PM)Genetics189291 Wrote:
(10-05-2024, 06:59 PM)nomad01 Wrote: Just to clarify, biased SNPs doesn't mean bad SNPs.
It's SNPs for where different chips tend to read different values for the same individual.

Just like if you tested on both 23andme and Ancestry, the 2 raw datas wouldn't be 100% identical.

Those SNPs are not worthless. Some future version of qpadm will probably be able to detect and fix them automatically.
For example, the "amateur" G25 is almost immune to data type differences, regardless of some other flaws it may have

Global25 is immune can you go into depth what you mean it’s immune to data type differences

For example, in the G25 sheet you have Turkey_N.DG, Turkey_N.SG, Turkey_N_noUDG, Turkey_N, and you can use all four interchangeably, all can act well as an EEF source.
In qpadm the model can fail if you replace  Turkey_N.DG with  Turkey_N.
Reply
#44
(10-05-2024, 08:13 PM)nomad01 Wrote:
(10-05-2024, 07:57 PM)Genetics189291 Wrote:
(10-05-2024, 06:59 PM)nomad01 Wrote: Just to clarify, biased SNPs doesn't mean bad SNPs.
It's SNPs for where different chips tend to read different values for the same individual.

Just like if you tested on both 23andme and Ancestry, the 2 raw datas wouldn't be 100% identical.

Those SNPs are not worthless. Some future version of qpadm will probably be able to detect and fix them automatically.
For example, the "amateur" G25 is almost immune to data type differences, regardless of some other flaws it may have

Global25 is immune can you go into depth what you mean it’s immune to data type differences

For example, in the G25 sheet you have Turkey_N.DG, Turkey_N.SG, Turkey_N_noUDG, Turkey_N, and you can use all four interchangeably, all can act well as an EEF source.
In qpadm the model can fail if you replace  Turkey_N.DG with  Turkey_N.

Doesn’t that make g25 inaccurate in that instance then?
Reply
#45
(10-05-2024, 08:19 PM)Genetics189291 Wrote:
(10-05-2024, 08:13 PM)nomad01 Wrote:
(10-05-2024, 07:57 PM)Genetics189291 Wrote: Global25 is immune can you go into depth what you mean it’s immune to data type differences

For example, in the G25 sheet you have Turkey_N.DG, Turkey_N.SG, Turkey_N_noUDG, Turkey_N, and you can use all four interchangeably, all can act well as an EEF source.
In qpadm the model can fail if you replace  Turkey_N.DG with  Turkey_N.

Doesn’t that make g25 inaccurate in that instance then?
No, it makes G25 more reliable because it can recognize these are all the same population and isn't distracted by chip differences.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)