Many women have false-positive calls for Y-chromosome snips.
Without having Y-chromosome, the only source for such false information could be the X-chromosome.
The X-chromosome is 3x times bigger than Y, so there is plenty of space where to have this information written.
Of course it could be a simple mistake during the scanning.. Some bug in the software, error with the processing etc...
However that is not the case.
I recently found than huge part of these false-positive calls as Y are on the same positions.
I am going to share the positons / snips, detected for the women.
I did my best trying to find on the X-chromosome where such information could be written, but so far I have no success.
Hope someone else will join to find more.
First: to share the big picture.
Because of these false-positive calls (detected as Y-chromosome snips for the women), we see a huge cluster on the PCA for the women only. This is just a small confirmation for the same false-positive snips presented among the women.
Now here is the list of such false-positive snips.
I don't know where are these snips on the X-chromosome, but they are falsely detected as Y-chr snips.
Here is the list:
Quote:> map4
CHR Snip name, position
V1 V2 V3 V4
57 24 rs9650864 0 3712546
58 24 rs112629381 0 3712788
59 24 rs73612942 0 3712907
60 24 rs11096439 0 3713348
61 24 rs9645207 0 3713460
62 24 rs10482602 0 3713630
63 24 rs9306843 0 3713800
64 24 rs10776473 0 3713937
65 24 rs58562080 0 3714028
66 24 rs55883227 0 3714032
67 24 rs55819051 0 3714070
68 24 rs60410417 0 3714363
69 24 rs58843144 0 3714435
70 24 rs60375527 0 3714501
71 24 rs9645210 0 3714714
72 24 rs9645211 0 3714776
73 24 rs9645212 0 3714803
74 24 rs9645213 0 3714821
75 24 rs4307511 0 3714894
76 24 rs9645214 0 3714924
77 24 rs9645215 0 3714926
78 24 rs9645216 0 3715022
79 24 rs6655904 0 3715130
80 24 rs6530606 0 3715248
81 24 rs4351583 0 3715498
82 24 rs4353031 0 3715593
83 24 rs4336789 0 3715806
84 24 rs28542306 0 3716287
85 24 rs28826948 0 3716591
86 24 rs28782498 0 3716687
87 24 rs6655906 0 3716911
88 24 rs6530619 0 3717546
89 24 rs6530620 0 3717723
90 24 rs6530621 0 3717754
91 24 rs7067530 0 3717868
92 24 rs7067531 0 3717869
93 24 rs9650883 0 3718267
94 24 rs9650885 0 3718469
95 24 rs34304153 0 3718512
96 24 rs35380966 0 3718514
97 24 rs9650886 0 3718538
98 24 rs78969634 0 3718595
99 24 rs34087300 0 3718601
100 24 rs35325212 0 3718671
101 24 rs74505548 0 3718825
102 24 snp_24_3718862 0 3718862
103 24 rs34148711 0 3718925
104 24 rs74837576 0 3719001
105 24 rs35585714 0 3719030
106 24 rs34896211 0 3719094
107 24 rs34955355 0 3719115
108 24 rs34736674 0 3719194
109 24 rs36077882 0 3719228
110 24 rs35664651 0 3719265
111 24 rs34894088 0 3719304
112 24 rs75922697 0 3719473
113 24 rs73612943 0 3719485
114 24 rs34998737 0 3719491
115 24 rs74770295 0 3719626
116 24 rs9650887 0 3719856
117 24 rs7067495 0 3720007
118 24 rs7067523 0 3720058
119 24 rs7067508 0 3720217
120 24 rs34681186 0 3720388
121 24 rs201111072 0 3720391
122 24 rs201918715 0 3720392
123 24 rs7892879 0 3720464
124 24 rs7892938 0 3720617
125 24 rs7892913 0 3720887
126 24 rs7893082 0 3721154
Non-zero calls (false-positive for HG01858)
HG01858.DG - female from Vietnam
> ped4[9945, ped4[9945, ] != 0 ]
V1 V2 V5 V6 V15 V16 V81 V82 V119 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129
9945 KHV.DG HG01858.DG 2 1 G G T T C C A A G G C C A A C
V130 V131 V132 V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 V144 V145 V146 V147 V148
9945 C T T A A C C G G C C A A G G T T A A
V149 V150 V151 V152 V153 V154 V155 V156 V157 V158 V159 V160 V161 V162 V163 V164 V165 V166 V167
9945 A A T T T T G G C C A A A A C C G G A
V168 V169 V170 V171 V172 V173 V174 V175 V176 V177 V178 V179 V180 V183 V184 V185 V186 V187 V188
9945 A A A G G C C A A A A T T T T G G T T
V189 V190 V191 V192 V193 V194 V195 V196 V197 V198 V199 V200 V201 V202 V203 V204 V205 V206 V207
9945 C C T T G G T T C C T T G G T T C C C
V208 V209 V210 V211 V212 V213 V214 V215 V216 V217 V218 V219 V220 V221 V222 V223 V224 V225 V226
9945 C G G C C T T G G A A T T A A G G A A
V227 V228 V229 V230 V231 V232 V233 V234 V235 V236 V237 V238 V239 V240 V241 V242 V243 V244 V245
9945 G G A A A A A A T T T T A A C C G G G
V246 V247 V248 V249 V250 V251 V252 V253 V254 V255 V256 V257 V258 V505 V506
9945 G G G A A A A A A G G G G C C
The list of the snips above is for: V120 - V258 .
Here is the list of the false-positive calls for another woman:
which (ped4$V2 == "NA20876.DG") / Modern, Gujarat female
ped4[6135, ped4[6135, ] != 0 ] # to check for non-zero values
Quote: V1 V2 V5 V6 V57 V58 V81 V82 V119 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129
6135 GIH.DG NA20876.DG 2 1 G G T T C C A A G G C C A A C
V130 V131 V132 V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 V144 V145 V146 V147 V148
6135 C T T A A C C G G C C A A G G T T A A
V149 V150 V151 V152 V153 V154 V155 V156 V157 V158 V159 V160 V161 V162 V163 V164 V165 V166 V167
6135 A A T T T T G G C C A A A A C C G G A
V168 V169 V170 V171 V172 V173 V174 V175 V176 V177 V178 V179 V180 V183 V184 V185 V186 V187 V188
6135 A A A G G C C A A A A T T T T G G T T
V189 V190 V191 V192 V193 V194 V195 V196 V197 V198 V199 V200 V201 V202 V203 V204 V205 V206 V207
6135 C C T T G G T T C C T T G G T T C C C
V208 V209 V210 V211 V212 V213 V214 V215 V216 V217 V218 V219 V220 V221 V222 V223 V224 V225 V226
6135 C G G C C T T G G A A T T A A G G A A
V227 V228 V229 V230 V231 V232 V233 V234 V235 V236 V237 V238 V239 V240 V241 V242 V243 V244 V245
6135 G G A A A A A A T T T T A A C C G G G
V246 V247 V248 V249 V250 V251 V252 V253 V254 V255 V256 V257 V258 V619 V620 V1061 V1062
6135 G G G A A A A A A G G G G C C T T
You may notice: the positions V120 - V258 are also false positive, same as the previous example.
So these are the same snips as above.
The list of all these women with such false-positive calls is really huge, you may see also from my PCA graphic above..
And in general these are the same positions..
So this must be a copy of Y-chromosome information written on the X-chromosome..
More than that: this information for the Y-chr is not matching the modern men.. This info is from the very ancient Y-chr, presented in the Chimp and Hg A00.. So this is how I found them..( when I was searching for Chimp Hg A00 snips).
Next part of this story:
As you may notice on my pictures above: there are 2 related clusters of women on my PCA.
Why is that happening?
Obviously the 2 clusters must have different scope of snips.
My previous 2 examples were from one side only.
Now I did a verification for the second cluster.
Here is another example:
NA20899.DG / Gujarat - migrants sampled in Houston Texas
Here is the list of the snips for the second cluster.
NA20899.DG
Quote:> N1 = which (ped4$V2 == "NA20899.DG")
> ped4[N1, ped4[N1, ] != 0 ]
V1 V2 V5 V6 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 V100 V101
6152 GIH.DG NA20899.DG 2 1 A A T T C C T T C C A A A A A A C
V102 V103 V104 V105 V106 V119 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129 V130 V131 V132
6152 C A A T T C C A A G G C C A A C C T T
V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 V144 V145 V146 V147 V148 V149 V150 V151
6152 A A C C G G C C A A G G T T A A A A T
V152 V153 V154 V155 V156 V157 V158 V159 V160 V161 V162 V163 V164 V165 V166 V167 V168 V169 V170
6152 T T T G G C C A A A A C C G G A A A A
V171 V172 V173 V174 V175 V176 V177 V178 V179 V180 V183 V184 V185 V186 V187 V188 V189 V190 V191
6152 G G C C A A A A T T T T G G T T C C T
V192 V193 V194 V195 V196 V197 V198 V199 V200 V201 V202 V203 V204 V205 V206 V207 V208 V209 V210
6152 T G G T T C C T T G G T T C C C C G G
V211 V212 V213 V214 V215 V216 V217 V218 V219 V220 V221 V222 V223 V224 V225 V226 V227 V228 V229
6152 C C T T G G A A T T A A G G A A G G A
V230 V231 V232 V233 V234 V235 V236 V237 V238 V239 V240 V241 V242 V243 V244 V245 V246 V247 V248
6152 A A A A A T T T T A A C C G G G G G G
V249 V250 V251 V252 V253 V254 V255 V256 V257 V258
6152 A A A A A A G G G G
You may notice: here the list of the snips is from V86 to V106.
V107 - V120 are missing.
And the next range V120 to V258 is the same as for Cluster 1.
Therefore in Cluster2 we have more ancestrall snips than in cluster 1.
Here is the list of all the snips for cluster 1.
Quote:> map4[40:50,] ## (new snips for cluster 2 only )
V1 V2 V3 V4
40 24 snp_24_3405761 0 3405761
41 24 snp_24_3405762 0 3405762
42 24 rs7474419 0 3406138
43 24 rs4526559 0 3406277
44 24 rs4535946 0 3406292
45 24 rs4581224 0 3406365
46 24 rs4529646 0 3406710
47 24 rs7067411 0 3406834
48 24 rs7067427 0 3406853
49 24 rs2113789 0 3407219
50 24 rs2113790 0 3407265
> ## all the snips for cluster 2
V1 V2 V3 V4
40 24 snp_24_3405761 0 3405761
41 24 snp_24_3405762 0 3405762
42 24 rs7474419 0 3406138
43 24 rs4526559 0 3406277
44 24 rs4535946 0 3406292
45 24 rs4581224 0 3406365
46 24 rs4529646 0 3406710
47 24 rs7067411 0 3406834
48 24 rs7067427 0 3406853
49 24 rs2113789 0 3407219
50 24 rs2113790 0 3407265
57 24 rs9650864 0 3712546
58 24 rs112629381 0 3712788
59 24 rs73612942 0 3712907
60 24 rs11096439 0 3713348
61 24 rs9645207 0 3713460
62 24 rs10482602 0 3713630
63 24 rs9306843 0 3713800
64 24 rs10776473 0 3713937
65 24 rs58562080 0 3714028
66 24 rs55883227 0 3714032
67 24 rs55819051 0 3714070
68 24 rs60410417 0 3714363
69 24 rs58843144 0 3714435
70 24 rs60375527 0 3714501
71 24 rs9645210 0 3714714
72 24 rs9645211 0 3714776
73 24 rs9645212 0 3714803
74 24 rs9645213 0 3714821
75 24 rs4307511 0 3714894
76 24 rs9645214 0 3714924
77 24 rs9645215 0 3714926
78 24 rs9645216 0 3715022
79 24 rs6655904 0 3715130
80 24 rs6530606 0 3715248
81 24 rs4351583 0 3715498
82 24 rs4353031 0 3715593
83 24 rs4336789 0 3715806
84 24 rs28542306 0 3716287
85 24 rs28826948 0 3716591
86 24 rs28782498 0 3716687
87 24 rs6655906 0 3716911
88 24 rs6530619 0 3717546
89 24 rs6530620 0 3717723
90 24 rs6530621 0 3717754
91 24 rs7067530 0 3717868
92 24 rs7067531 0 3717869
93 24 rs9650883 0 3718267
94 24 rs9650885 0 3718469
95 24 rs34304153 0 3718512
96 24 rs35380966 0 3718514
97 24 rs9650886 0 3718538
98 24 rs78969634 0 3718595
99 24 rs34087300 0 3718601
100 24 rs35325212 0 3718671
101 24 rs74505548 0 3718825
102 24 snp_24_3718862 0 3718862
103 24 rs34148711 0 3718925
104 24 rs74837576 0 3719001
105 24 rs35585714 0 3719030
106 24 rs34896211 0 3719094
107 24 rs34955355 0 3719115
108 24 rs34736674 0 3719194
109 24 rs36077882 0 3719228
110 24 rs35664651 0 3719265
111 24 rs34894088 0 3719304
112 24 rs75922697 0 3719473
113 24 rs73612943 0 3719485
114 24 rs34998737 0 3719491
115 24 rs74770295 0 3719626
116 24 rs9650887 0 3719856
117 24 rs7067495 0 3720007
118 24 rs7067523 0 3720058
119 24 rs7067508 0 3720217
120 24 rs34681186 0 3720388
121 24 rs201111072 0 3720391
122 24 rs201918715 0 3720392
123 24 rs7892879 0 3720464
124 24 rs7892938 0 3720617
125 24 rs7892913 0 3720887
126 24 rs7893082 0 3721154
It's the
pseudoautosomal regions of X/Y. They are present at both ends of the chromosomes and are essential for getting X/Y chromosomes aligned in meiosis.
Any SNP in those regions is ignored when calling Y-SNPs.
(12-05-2024, 07:51 AM)ronin92 Wrote: [ -> ]It's the pseudoautosomal regions of X/Y. They are present at both ends of the chromosomes and are essential for getting X/Y chromosomes aligned in meiosis.
Any SNP in those regions is ignored when calling Y-SNPs.
Yes, you are right !
It could be
pseudoautosomal regions of X/Y .
However the snips that I am reporting as false-positive are not a result of recent recombination with today's men.
These are all ancient ancestral snips, matching the Chimp.
They must be somewhere on X-chromosome , we may expect on all X-chromosomes. However if the same are on the X-chromosome of the Men, we should get also many such false-positive for the men. Most of the men should have such false heterozigous version for their Y chromosome.
So what is the impact of this false-positive zone of snips for the men?
As I explained earlier: this must be some information originating from X-chromosome, but its been detected falsly as Y-chromosome data.
As shown above: almost any woman have such information somewhere on X-chr. It must be duplicated also on the Y-chr, otherwise it should not be detected as Y-chr info.
Every man has Y-chr + one X chromosome. If such information is already there on the X-chromosome, then the same information should be also falsly detected for the men.
So we may expect huge amount of errors / or heterozigous variants / for these positions..
Is this the case ? Yes, it is..
I will show few examples below for some men.
Fortunately these positions are not used for Y-Chromosome haplogroup determination.
A man from Yoruba: NA19096.DG
Quote: V1 V2 V3 V4 V5 V6 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 V100 V101 V102 V103 V104
17726 YRI.DG NA19096.DG 0 0 1 1 T T A A T T C C T T G G C C T T T T C C
V105 V106 V117 V118 V119 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129 V130 V131 V132 V133 V134 V135 V136 V137
17726 C C 0 0 C T A G G T C G A G C T T A A G C T G
V138 V139 V140 V141 V142 V143 V144 V145 V146 V147 V148 V149 V150 V151 V152 V153 V154 V155 V156 V157 V158 V159 V160
17726 A C G A G G A T A A T A G T C T G G A C T A G
V161 V162 V163 V164 V165 V166 V167 V168 V169 V170 V171 V172 V173 V174 V175 V176 V177 V178 V179 V180 V181 V182 V183
17726 A T C T G A A G A G G T C A A C A G T C 0 0 T
V184 V185 V186 V187 V188 V189 V190 V191 V192 V193 V194 V195 V196 V197 V198 V199 V200 V201 V202 V203 V204 V205 V206
17726 C G A T A C A T C G C T C C T T C G T T C C A
V207 V208 V209 V210 V211 V212 V213 V214 V215 V216 V217 V218 V219 V220 V221 V222 V223 V224 V225 V226 V227 V228 V229
17726 C T G A C G T A G A A G T G A C G C A G G A A
V230 V231 V232 V233 V234 V235 V236 V237 V238 V239 V240 V241 V242 V243 V244 V245 V246 V247 V248 V249 V250 V251 V252
17726 T A G A G T C T C A C C T G A G A G A A T A C
V253 V254 V255 V256
17726 A T G A
Еvery two adjacent positions (odd - even) should be the same variants.
For example: V85 = V86 = " T " - this position is correct. V103/V104 is correct as well.
But we see also: V255 / V256 ; V253/ V254 ; V251 V252 ; V247 / V248 and many other are all mixed...
All these false/positive are detected for the region Build37 : 3405761 - 3721154 on Y chr. (the position numbers under Build38 should be different). Current versions of 1240k datasets are still using the Build37 positions.
For UstIshim_snpAD - I don't see any issue. Most likely UstIshim_snpAD was scanned by using other techniques and these false calls were avoided.
Show Content
Spoiler
> N1 = which (ped4$V2 ==
+ "UstIshim_snpAD.DG")
> N1
[1] 13766
> ped4[N1, c(1:6, 85: 106 , 117 : 256 ) ]
V1 V2 V3 V4 V5 V6 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94 V95 V96 V97 V98
13766 Russia_UstIshim_IUP_snpAD.DG UstIshim_snpAD.DG 0 0 1 1 T T A A T T C C T T G G C C
V99 V100 V101 V102 V103 V104 V105 V106 V117 V118 V119 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129 V130 V131
13766 T T T T C C C C T T T T A A T T G G G G C C T
V132 V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 V144 V145 V146 V147 V148 V149 V150 V151 V152 V153 V154
13766 T A A T T G G C C G G G G A A A A G G C C G G
V155 V156 V157 V158 V159 V160 V161 V162 V163 V164 V165 V166 V167 V168 V169 V170 V171 V172 V173 V174 V175 V176 V177
13766 G G C C A A A A C C G G G G G G G G C C A A G
V178 V179 V180 V181 V182 V183 V184 V185 V186 V187 V188 V189 V190 V191 V192 V193 V194 V195 V196 V197 V198 V199 V200
13766 G C C A A T T A A A A A A T T G G C C C C C C
V201 V202 V203 V204 V205 V206 V207 V208 V209 V210 V211 V212 V213 V214 V215 V216 V217 V218 V219 V220 V221 V222 V223
13766 T T C C C C C C G G C C T T G G G G G G C C G
V224 V225 V226 V227 V228 V229 V230 V231 V232 V233 V234 V235 V236 V237 V238 V239 V240 V241 V242 V243 V244 V245 V246
13766 G A A A A T T G G A A C C C C C C C C G G A A
V247 V248 V249 V250 V251 V252 V253 V254 V255 V256
13766 G G T T C C A A A A
Same for Bacho Kiro and Yana: I don't see any false calls on these ancient samples.
Another example for modern: HG01437 (J1a2a1a2d2b2b2)
Show Content
Spoiler
> ped4[N1, c(1:6, 85: 106 , 117 : 256 ) ]
V1 V2 V3 V4 V5 V6 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 V100 V101 V102
2263 CLM.DG HG01437.DG 0 0 1 1 T T A A T T C C T T G G C C T T T T
V103 V104 V105 V106 V117 V118 V119 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129 V130 V131 V132 V133
2263 C C C C 0 0 C T A G G T C G A G C T T A A
V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 V144 V145 V146 V147 V148 V149 V150 V151 V152 V153 V154
2263 G C T G A C G A G G A T A A T A G T C T G
V155 V156 V157 V158 V159 V160 V161 V162 V163 V164 V165 V166 V167 V168 V169 V170 V171 V172 V173 V174 V175
2263 G A C T A G A T C T G A A G A G G T C A A
V176 V177 V178 V179 V180 V181 V182 V183 V184 V185 V186 V187 V188 V189 V190 V191 V192 V193 V194 V195 V196
2263 C A G T C 0 0 T C G A T A C A T C G C T C
V197 V198 V199 V200 V201 V202 V203 V204 V205 V206 V207 V208 V209 V210 V211 V212 V213 V214 V215 V216 V217
2263 C T T C G T T C C A C T G A C G T A G A A
V218 V219 V220 V221 V222 V223 V224 V225 V226 V227 V228 V229 V230 V231 V232 V233 V234 V235 V236 V237 V238
2263 G T G A C G C A G G A A T A G A G T C T C
V239 V240 V241 V242 V243 V244 V245 V246 V247 V248 V249 V250 V251 V252 V253 V254 V255 V256
2263 A C C T G A G A G A A T A C A T G A
There are many of these false calls. The Y-chromosome looks very heterogenous in this region, due to the falsly detected data.
What parts of the Y actually recombine with the X? Is it really only the pseudoautosomal region or do the X transposed/X degenerate regions also recombine with the X?