02-14-2024, 10:31 PM
Huang et al, 2022 (Genomic Insights Into the Demographic History of the Southern Chinese) developed an N=10 ADMIXTURE model for ~1000 subjects drawn from different regions and language groups in China as well as East and Southeast Asia. Unfortunately, the component allele frequencies were not available from the paper so it is not possible to use his model to estimate component frequencies for new data.
I have reverse engineered those component allele frequencies and write some quick and dirty R code to calculate component weights for 23andMe style data from WGSExtract. I discarded the Sub-Saharan component since it had negligible weights in his model. it seems to recover very well values he reported for various test subjects in his paper. Evidently, given the training set, I think it is only suitable for data from subjects in East Asia and SE Asia that are unlikely to have ancestry components external to it.
Components are:-
I have run my own data resulting in:-
K1.Andaman.related. K2..Northeast_Asia. K3..Hmong.Mien. K4..West_Eurasian. K5..Kra.Dai. K6..Austroasiatic. K7..Sino.Tibetan. K8..Papuan. K9..Austronesian.
[1,] 0.003089515 0.01125185 0.03041038 6.115006e-07 0.610802 0.1074186 0.174036 6.115006e-07 0.06299041
That would make me similar to Han_Guangxi which is what might be expected of a Cantonese descended from ancestors in West Guangdong.
Another dataset from a SE Asian person of Hokkien descent from Quanzhou and Xiamen:-
K1.Andaman.related. K2..Northeast_Asia. K3..Hmong.Mien. K4..West_Eurasian. K5..Kra.Dai. K6..Austroasiatic. K7..Sino.Tibetan. K8..Papuan. K9..Austronesian.
[1,] 5.232493e-07 0.0749497 0.02121188 5.232493e-07 0.522726 0.05012311 0.2684863 0.001340675 0.06116127
This seems more similar to Han_Taiwan rather than Han_Fujian in Huang's dataset. But aren't many of Taiwanese of Fujian descent anyway?
Perhaps we could make a collection of data of this type like with the thread with GEDMatch averages?
Download link is here. If your data is does not work on it and you are willing to send it to me, I should be able to modify the parser to handle it and provide you a result.
I have reverse engineered those component allele frequencies and write some quick and dirty R code to calculate component weights for 23andMe style data from WGSExtract. I discarded the Sub-Saharan component since it had negligible weights in his model. it seems to recover very well values he reported for various test subjects in his paper. Evidently, given the training set, I think it is only suitable for data from subjects in East Asia and SE Asia that are unlikely to have ancestry components external to it.
Components are:-
- Andaman-related
- Northeast Asia
- Hmong-Mien
- West Eurasian
- Kra-Dai
- Austroasiatic
- Sino-Tibetan
- Austronesian
I have run my own data resulting in:-
K1.Andaman.related. K2..Northeast_Asia. K3..Hmong.Mien. K4..West_Eurasian. K5..Kra.Dai. K6..Austroasiatic. K7..Sino.Tibetan. K8..Papuan. K9..Austronesian.
[1,] 0.003089515 0.01125185 0.03041038 6.115006e-07 0.610802 0.1074186 0.174036 6.115006e-07 0.06299041
That would make me similar to Han_Guangxi which is what might be expected of a Cantonese descended from ancestors in West Guangdong.
Another dataset from a SE Asian person of Hokkien descent from Quanzhou and Xiamen:-
K1.Andaman.related. K2..Northeast_Asia. K3..Hmong.Mien. K4..West_Eurasian. K5..Kra.Dai. K6..Austroasiatic. K7..Sino.Tibetan. K8..Papuan. K9..Austronesian.
[1,] 5.232493e-07 0.0749497 0.02121188 5.232493e-07 0.522726 0.05012311 0.2684863 0.001340675 0.06116127
This seems more similar to Han_Taiwan rather than Han_Fujian in Huang's dataset. But aren't many of Taiwanese of Fujian descent anyway?
Perhaps we could make a collection of data of this type like with the thread with GEDMatch averages?
Download link is here. If your data is does not work on it and you are willing to send it to me, I should be able to modify the parser to handle it and provide you a result.