List of biased SNPs for qpadm
#16
(10-05-2024, 03:50 PM)Light Wrote: So it becomes 883K SNPs now from 1233K, interesting

The biased SNP count list was only 16,759 SNPs.  How'd you arrive at 883K?
Reply
#17
PACKEDPED:- https://drive.google.com/file/d/1---gS5I...drive_link

Only 475K have overlap with 1233K AADR, so it's worse than even HO, basically trash
Reply
#18
(10-05-2024, 03:57 PM)AimSmall Wrote:
(10-05-2024, 03:50 PM)Light Wrote: So it becomes 883K SNPs now from 1233K, interesting

The biased SNP count list was only 16,759 SNPs.  How'd you arrive at 883K?

I counted the total SNPs


.png   image_2024-10-05_21-34-28.png (Size: 2.11 KB / Downloads: 119)
Reply
#19
(10-05-2024, 03:41 PM)AimSmall Wrote: That question wasn't in the context of comparing to calculators using those SNPs.  You're missing the point.  The calculators would also have to have those biased SNPs removed for your comparison to be meaningful. You're only removing biased SNPs on one side of the comparison.

Produce a new PCA calculator using the SNPs removed from the source comparison (AADR or whatever) AND your raw samples... then make the comparison.

Yes, by removing the biased SNPs from your data, it’s likely that your ancestry results are now more historically accurate and more reflective of your true ancestry.

Here’s why:




Global 25 tries to position your genetic data in a “global space” using modern and ancient populations. By using only unbiased SNPs, you’ve given the model a cleaner dataset to work with. Even if the reference populations might still include some biased SNPs, your data is now less affected by any distortions, making your results a more accurate representation of your genetic heritage.

Conclusion

Yes, by removing the biased SNPs, it’s very likely that your results are now more accurate, particularly in terms of reflecting your true historical ancestry. This process has removed potential distortions and given you a clearer and more reliable genetic picture.

I get what you mean now if global25 includes biased snps and you have removed them it can cause a mismatch

If Global 25 is using reference samples that include the biased SNPs, while your sample has had those SNPs removed, there are a few potential impacts on your results:

1. Mismatched SNP Sets

• Global 25 relies on comparing your SNP data to the SNPs in the reference populations. If your data no longer contains the biased SNPs but the reference samples do, it could lead to a mismatch in the number of SNPs being compared. In such cases, Global 25 would only be able to use the common SNPs that both your sample and the reference samples share. If too many SNPs are missing from your data, it might slightly reduce the precision of your ancestry results.

2. Impact on PCA Positioning

• Principal Component Analysis (PCA), which Global 25 is based on, works by finding patterns in the data that distinguish different populations. If the biased SNPs are still influencing the reference samples but are absent from your sample, it might cause a slight shift in where your ancestry is placed on the PCA plot. However, since Global 25 uses a very large number of SNPs, the effect of missing some biased SNPs should be relatively small unless the removed SNPs were heavily biased in a specific direction.

3. Possible Slight Distortion in Comparison

• If the reference populations contain SNPs that you have removed due to bias, those populations might appear slightly closer to or farther from your genetic position than they should. However, the fact that you’ve removed biased SNPs means that your sample will be less influenced by distortions, so in theory, your results might still be more accurate in representing your true ancestry, even if the reference populations are slightly skewed.

4. Small Overall Impact

• Global 25 uses a very large number of SNPs across many populations. As long as you haven’t removed a huge proportion of SNPs (for example, tens of thousands), the overall effect on your results should be minimal. The algorithm will adjust to work with the common SNPs that remain between your sample and the reference populations.

5. Possible Adjustment to Reference Populations

• If you are particularly concerned about the mismatch, one possible solution is to ensure that the reference populations used by Global 25 are also adjusted to exclude the same biased SNPs, if possible. This would make the comparison more consistent, though it is more challenging unless you have control over or access to the reference datasets used.

Summary

While removing biased SNPs could lead to a mismatch with the reference samples in Global 25, the impact on your results should be minimal unless a large proportion of SNPs were removed. The overall effect might result in a slightly less precise comparison, but since you’ve removed biased SNPs, your sample could be more representative of your true ancestry despite the differences in the datasets.
Reply
#20
(10-05-2024, 04:03 PM)Light Wrote: PACKEDPED:- https://drive.google.com/drive/folders/1...sp=sharing

Only 475K have overlap with 1233K AADR, so it's worse than even HO, basically trash

Basically useless like you said that list filters too many snps it seems
Reply
#21
(10-05-2024, 04:05 PM)Light Wrote:
(10-05-2024, 03:57 PM)AimSmall Wrote:
(10-05-2024, 03:50 PM)Light Wrote: So it becomes 883K SNPs now from 1233K, interesting

The biased SNP count list was only 16,759 SNPs.  How'd you arrive at 883K?

I counted the total SNPs
I'm missing something.  The list he posted had 16K biased SNPs listed.   How did the remaining SNPs after removal get down to 883K.   Shouldn't it be in the neighborhood of 1217K?   Are there additional bias SNPs listed somewhere than the 16K posted I'm not accounting for?  That's a 350K SNPs difference.
Reply
#22
(10-05-2024, 04:18 PM)AimSmall Wrote:
(10-05-2024, 04:05 PM)Light Wrote:
(10-05-2024, 03:57 PM)AimSmall Wrote: The biased SNP count list was only 16,759 SNPs.  How'd you arrive at 883K?

I counted the total SNPs
I'm missing something.  The list he posted had 16K biased SNPs listed.   How did the remaining SNPs after removal get down to 883K.   Shouldn't it be in the neighborhood of 1217K?   Are there additional bias SNPs listed somewhere than the 16K posted I'm not accounting for?  That's a 350K SNPs difference.

Quote: here's a temporary link. I can't find a better upload service right now.
https://easyupload.io/oi9p09
Reply
#23
(10-05-2024, 04:18 PM)AimSmall Wrote:
(10-05-2024, 04:05 PM)Light Wrote:
(10-05-2024, 03:57 PM)AimSmall Wrote: The biased SNP count list was only 16,759 SNPs.  How'd you arrive at 883K?

I counted the total SNPs
I'm missing something.  The list he posted had 16K biased SNPs listed.   How did the remaining SNPs after removal get down to 883K.   Shouldn't it be in the neighborhood of 1217K?   Are there additional bias SNPs listed somewhere than the 16K posted I'm not accounting for?  That's a 350K SNPs difference.

The file he provided had 883k it was empire it removed 345,041 snps from my raw data
Reply
#24
I'm feeling dense.  When I go to https://pastelink.net/zqh56z40 and copy and paste those SNPs, I get 16,760.

I tried another route and saved that entire HTML page locally.  Copied the div tag with the SNPs, removed the HTML encoding and still only get 16K.   I don't see a file to download, just the pastebin.


.jpg   biased snp count.JPG (Size: 18 KB / Downloads: 91)
Reply
#25
(10-05-2024, 04:34 PM)AimSmall Wrote: I'm feeling dense.  When I go to https://pastelink.net/zqh56z40 and copy and paste those SNPs, I get 16,760.

I tried another route and saved that entire HTML page locally.  Copied the div tag with the SNPs, removed the HTML encoding and still only get 16K.   I don't see a file to download, just the pastebin.

Did you download this file 

Quote:here's a temporary link. I can't find a better upload service right now.
https://easyupload.io/oi9p09
Reply
#26
Nomad....

What is the difference between this file https://easyupload.io/oi9p09 and the https://pastelink.net/zqh56z40.

The latter is your first post with 16K snps listed. The second had 883K. Something is seriously amiss.

How many bias SNPs exist? That's a huge difference. Why the difference in counts?
Reply
#27
(10-05-2024, 04:34 PM)AimSmall Wrote: I'm feeling dense.  When I go to https://pastelink.net/zqh56z40 and copy and paste those SNPs, I get 16,760.

I tried another route and saved that entire HTML page locally.  Copied the div tag with the SNPs, removed the HTML encoding and still only get 16K.   I don't see a file to download, just the pastebin.

Can you put yours In .txt file and upload it
Reply
#28
I tested the AG, SG and DG version of the same sample with and without removing the biased SNPs:

Without filtering

Code:
I1496.AG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
0.425                            0.973                0.0268

I1496.SG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
0.0356                              0.963                0.0374

I1496.DG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
0.0190                              0.963                0.0372


with filtering

Code:
I1496.AG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
8.23e-1                              0.974                0.0257

I1496.SG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
6.85e-1                              0.983                0.0168

I1496.DG
p-value Turkey_Marmara_Barcin_N.AG Croatia_Mesolithic.AG
0.840                              0.979                0.0213

Without filtering, only the AG version has a passing p value. With filtering all 3 do. left and right pops are also exclusively AG. It's crazy that so many academic papers carelessly mix different data types.
Reply
#29
(10-05-2024, 04:42 PM)AimSmall Wrote: Nomad....

What is the difference between this file https://easyupload.io/oi9p09  and the https://pastelink.net/zqh56z40.

The latter is your first post with 16K snps listed.  The second had 883K.    Something is seriously amiss.

How many bias SNPs exist?  That's a huge difference.  Why the difference in counts?

Maybe pastelink deleted a part of my data because it was too big. Use the easyupload version.
Reply
#30
(10-05-2024, 03:50 PM)Light Wrote: So it becomes 883K SNPs now from 1233K, interesting

There should be 475,425 SNPs remaining in the 1240k dataset, not 800k.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)