Many women have false-positive calls for Y-chromosome snips.
Without having Y-chromosome, the only source for such false information could be the X-chromosome.
The X-chromosome is 3x times bigger than Y, so there is plenty of space where to have this information written.
Of course it could be a simple mistake during the scanning.. Some bug in the software, error with the processing etc...
However that is not the case.
I recently found than huge part of these false-positive calls as Y are on the same positions.
I am going to share the positons / snips, detected for the women.
I did my best trying to find on the X-chromosome where such information could be written, but so far I have no success.
First: to share the big picture.
Because of these false-positive calls (detected as Y-chromosome snips for the women), we see a huge cluster on the PCA for the women only. This is just a small confirmation for the same false-positive snips presented among the women.
12-04-2024, 08:36 PM (This post was last modified: 12-04-2024, 08:55 PM by TanTin.)
Now here is the list of such false-positive snips.
I don't know where are these snips on the X-chromosome, but they are falsely detected as Y-chr snips.
Here is the list:
Non-zero calls (false-positive for HG01858)
HG01858.DG - female from Vietnam
> ped4[9945, ped4[9945, ] != 0 ]
V1 V2 V5 V6 V15 V16 V81 V82 V119 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129
9945 KHV.DG HG01858.DG 2 1 G G T T C C A A G G C C A A C
V130 V131 V132 V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 V144 V145 V146 V147 V148
9945 C T T A A C C G G C C A A G G T T A A
V149 V150 V151 V152 V153 V154 V155 V156 V157 V158 V159 V160 V161 V162 V163 V164 V165 V166 V167
9945 A A T T T T G G C C A A A A C C G G A
V168 V169 V170 V171 V172 V173 V174 V175 V176 V177 V178 V179 V180 V183 V184 V185 V186 V187 V188
9945 A A A G G C C A A A A T T T T G G T T
V189 V190 V191 V192 V193 V194 V195 V196 V197 V198 V199 V200 V201 V202 V203 V204 V205 V206 V207
9945 C C T T G G T T C C T T G G T T C C C
V208 V209 V210 V211 V212 V213 V214 V215 V216 V217 V218 V219 V220 V221 V222 V223 V224 V225 V226
9945 C G G C C T T G G A A T T A A G G A A
V227 V228 V229 V230 V231 V232 V233 V234 V235 V236 V237 V238 V239 V240 V241 V242 V243 V244 V245
9945 G G A A A A A A T T T T A A C C G G G
V246 V247 V248 V249 V250 V251 V252 V253 V254 V255 V256 V257 V258 V505 V506
9945 G G G A A A A A A G G G G C C
12-04-2024, 08:43 PM (This post was last modified: 12-04-2024, 08:53 PM by TanTin.)
Here is the list of the false-positive calls for another woman:
which (ped4$V2 == "NA20876.DG") / Modern, Gujarat female
ped4[6135, ped4[6135, ] != 0 ] # to check for non-zero values
Quote: V1 V2 V5 V6 V57 V58 V81 V82 V119 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129
6135 GIH.DG NA20876.DG 2 1 G G T T C C A A G G C C A A C
V130 V131 V132 V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 V144 V145 V146 V147 V148
6135 C T T A A C C G G C C A A G G T T A A
V149 V150 V151 V152 V153 V154 V155 V156 V157 V158 V159 V160 V161 V162 V163 V164 V165 V166 V167
6135 A A T T T T G G C C A A A A C C G G A
V168 V169 V170 V171 V172 V173 V174 V175 V176 V177 V178 V179 V180 V183 V184 V185 V186 V187 V188
6135 A A A G G C C A A A A T T T T G G T T
V189 V190 V191 V192 V193 V194 V195 V196 V197 V198 V199 V200 V201 V202 V203 V204 V205 V206 V207
6135 C C T T G G T T C C T T G G T T C C C
V208 V209 V210 V211 V212 V213 V214 V215 V216 V217 V218 V219 V220 V221 V222 V223 V224 V225 V226
6135 C G G C C T T G G A A T T A A G G A A
V227 V228 V229 V230 V231 V232 V233 V234 V235 V236 V237 V238 V239 V240 V241 V242 V243 V244 V245
6135 G G A A A A A A T T T T A A C C G G G
V246 V247 V248 V249 V250 V251 V252 V253 V254 V255 V256 V257 V258 V619 V620 V1061 V1062
6135 G G G A A A A A A G G G G C C T T
You may notice: the positions V120 - V258 are also false positive, same as the previous example.
So these are the same snips as above.
The list of all these women with such false-positive calls is really huge, you may see also from my PCA graphic above..
And in general these are the same positions..
So this must be a copy of Y-chromosome information written on the X-chromosome..
More than that: this information for the Y-chr is not matching the modern men.. This info is from the very ancient Y-chr, presented in the Chimp and Hg A00.. So this is how I found them..( when I was searching for Chimp Hg A00 snips).
Next part of this story:
As you may notice on my pictures above: there are 2 related clusters of women on my PCA.
Why is that happening?
Obviously the 2 clusters must have different scope of snips.
My previous 2 examples were from one side only.
Now I did a verification for the second cluster.
Here is another example:
NA20899.DG / Gujarat - migrants sampled in Houston Texas
Here is the list of the snips for the second cluster.
NA20899.DG
Quote:> N1 = which (ped4$V2 == "NA20899.DG")
> ped4[N1, ped4[N1, ] != 0 ]
V1 V2 V5 V6 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 V100 V101
6152 GIH.DG NA20899.DG 2 1 A A T T C C T T C C A A A A A A C
V102 V103 V104 V105 V106 V119 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129 V130 V131 V132
6152 C A A T T C C A A G G C C A A C C T T
V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 V144 V145 V146 V147 V148 V149 V150 V151
6152 A A C C G G C C A A G G T T A A A A T
V152 V153 V154 V155 V156 V157 V158 V159 V160 V161 V162 V163 V164 V165 V166 V167 V168 V169 V170
6152 T T T G G C C A A A A C C G G A A A A
V171 V172 V173 V174 V175 V176 V177 V178 V179 V180 V183 V184 V185 V186 V187 V188 V189 V190 V191
6152 G G C C A A A A T T T T G G T T C C T
V192 V193 V194 V195 V196 V197 V198 V199 V200 V201 V202 V203 V204 V205 V206 V207 V208 V209 V210
6152 T G G T T C C T T G G T T C C C C G G
V211 V212 V213 V214 V215 V216 V217 V218 V219 V220 V221 V222 V223 V224 V225 V226 V227 V228 V229
6152 C C T T G G A A T T A A G G A A G G A
V230 V231 V232 V233 V234 V235 V236 V237 V238 V239 V240 V241 V242 V243 V244 V245 V246 V247 V248
6152 A A A A A T T T T A A C C G G G G G G
V249 V250 V251 V252 V253 V254 V255 V256 V257 V258
6152 A A A A A A G G G G
You may notice: here the list of the snips is from V86 to V106.
V107 - V120 are missing.
And the next range V120 to V258 is the same as for Cluster 1.
Therefore in Cluster2 we have more ancestrall snips than in cluster 1.
It's the pseudoautosomal regions of X/Y. They are present at both ends of the chromosomes and are essential for getting X/Y chromosomes aligned in meiosis.
Any SNP in those regions is ignored when calling Y-SNPs.
12-05-2024, 02:37 PM (This post was last modified: 12-05-2024, 04:30 PM by TanTin.)
(12-05-2024, 07:51 AM)ronin92 Wrote: It's the pseudoautosomal regions of X/Y. They are present at both ends of the chromosomes and are essential for getting X/Y chromosomes aligned in meiosis.
Any SNP in those regions is ignored when calling Y-SNPs.
Yes, you are right !
It could be pseudoautosomal regions of X/Y .
However the snips that I am reporting as false-positive are not a result of recent recombination with today's men.
These are all ancient ancestral snips, matching the Chimp.
They must be somewhere on X-chromosome , we may expect on all X-chromosomes. However if the same are on the X-chromosome of the Men, we should get also many such false-positive for the men. Most of the men should have such false heterozigous version for their Y chromosome.
So what is the impact of this false-positive zone of snips for the men?
As I explained earlier: this must be some information originating from X-chromosome, but its been detected falsly as Y-chromosome data.
As shown above: almost any woman have such information somewhere on X-chr. It must be duplicated also on the Y-chr, otherwise it should not be detected as Y-chr info.
Every man has Y-chr + one X chromosome. If such information is already there on the X-chromosome, then the same information should be also falsly detected for the men.
So we may expect huge amount of errors / or heterozigous variants / for these positions..
Is this the case ? Yes, it is..
I will show few examples below for some men.
Fortunately these positions are not used for Y-Chromosome haplogroup determination.
Quote: V1 V2 V3 V4 V5 V6 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 V100 V101 V102 V103 V104
17726 YRI.DG NA19096.DG 0 0 1 1 T T A A T T C C T T G G C C T T T T C C
V105 V106 V117 V118 V119 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129 V130 V131 V132 V133 V134 V135 V136 V137
17726 C C 0 0 C T A G G T C G A G C T T A A G C T G
V138 V139 V140 V141 V142 V143 V144 V145 V146 V147 V148 V149 V150 V151 V152 V153 V154 V155 V156 V157 V158 V159 V160
17726 A C G A G G A T A A T A G T C T G G A C T A G
V161 V162 V163 V164 V165 V166 V167 V168 V169 V170 V171 V172 V173 V174 V175 V176 V177 V178 V179 V180 V181 V182 V183
17726 A T C T G A A G A G G T C A A C A G T C 0 0 T
V184 V185 V186 V187 V188 V189 V190 V191 V192 V193 V194 V195 V196 V197 V198 V199 V200 V201 V202 V203 V204 V205 V206
17726 C G A T A C A T C G C T C C T T C G T T C C A
V207 V208 V209 V210 V211 V212 V213 V214 V215 V216 V217 V218 V219 V220 V221 V222 V223 V224 V225 V226 V227 V228 V229
17726 C T G A C G T A G A A G T G A C G C A G G A A
V230 V231 V232 V233 V234 V235 V236 V237 V238 V239 V240 V241 V242 V243 V244 V245 V246 V247 V248 V249 V250 V251 V252
17726 T A G A G T C T C A C C T G A G A G A A T A C
V253 V254 V255 V256
17726 A T G A
Еvery two adjacent positions (odd - even) should be the same variants.
For example: V85 = V86 = " T " - this position is correct. V103/V104 is correct as well.
But we see also: V255 / V256 ; V253/ V254 ; V251 V252 ; V247 / V248 and many other are all mixed...
All these false/positive are detected for the region Build37 : 3405761 - 3721154 on Y chr. (the position numbers under Build38 should be different). Current versions of 1240k datasets are still using the Build37 positions.
12-05-2024, 06:54 PM (This post was last modified: 12-05-2024, 07:09 PM by TanTin.)
For UstIshim_snpAD - I don't see any issue. Most likely UstIshim_snpAD was scanned by using other techniques and these false calls were avoided.
Show Content
Spoiler
> N1 = which (ped4$V2 ==
+ "UstIshim_snpAD.DG")
> N1
[1] 13766
> ped4[N1, c(1:6, 85: 106 , 117 : 256 ) ]
V1 V2 V3 V4 V5 V6 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94 V95 V96 V97 V98
13766 Russia_UstIshim_IUP_snpAD.DG UstIshim_snpAD.DG 0 0 1 1 T T A A T T C C T T G G C C
V99 V100 V101 V102 V103 V104 V105 V106 V117 V118 V119 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129 V130 V131
13766 T T T T C C C C T T T T A A T T G G G G C C T
V132 V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 V144 V145 V146 V147 V148 V149 V150 V151 V152 V153 V154
13766 T A A T T G G C C G G G G A A A A G G C C G G
V155 V156 V157 V158 V159 V160 V161 V162 V163 V164 V165 V166 V167 V168 V169 V170 V171 V172 V173 V174 V175 V176 V177
13766 G G C C A A A A C C G G G G G G G G C C A A G
V178 V179 V180 V181 V182 V183 V184 V185 V186 V187 V188 V189 V190 V191 V192 V193 V194 V195 V196 V197 V198 V199 V200
13766 G C C A A T T A A A A A A T T G G C C C C C C
V201 V202 V203 V204 V205 V206 V207 V208 V209 V210 V211 V212 V213 V214 V215 V216 V217 V218 V219 V220 V221 V222 V223
13766 T T C C C C C C G G C C T T G G G G G G C C G
V224 V225 V226 V227 V228 V229 V230 V231 V232 V233 V234 V235 V236 V237 V238 V239 V240 V241 V242 V243 V244 V245 V246
13766 G A A A A T T G G A A C C C C C C C C G G A A
V247 V248 V249 V250 V251 V252 V253 V254 V255 V256
13766 G G T T C C A A A A
Same for Bacho Kiro and Yana: I don't see any false calls on these ancient samples.
12-05-2024, 07:06 PM (This post was last modified: 12-05-2024, 07:07 PM by TanTin.)
Another example for modern: HG01437 (J1a2a1a2d2b2b2)
Show Content
Spoiler
> ped4[N1, c(1:6, 85: 106 , 117 : 256 ) ]
V1 V2 V3 V4 V5 V6 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 V100 V101 V102
2263 CLM.DG HG01437.DG 0 0 1 1 T T A A T T C C T T G G C C T T T T
V103 V104 V105 V106 V117 V118 V119 V120 V121 V122 V123 V124 V125 V126 V127 V128 V129 V130 V131 V132 V133
2263 C C C C 0 0 C T A G G T C G A G C T T A A
V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 V144 V145 V146 V147 V148 V149 V150 V151 V152 V153 V154
2263 G C T G A C G A G G A T A A T A G T C T G
V155 V156 V157 V158 V159 V160 V161 V162 V163 V164 V165 V166 V167 V168 V169 V170 V171 V172 V173 V174 V175
2263 G A C T A G A T C T G A A G A G G T C A A
V176 V177 V178 V179 V180 V181 V182 V183 V184 V185 V186 V187 V188 V189 V190 V191 V192 V193 V194 V195 V196
2263 C A G T C 0 0 T C G A T A C A T C G C T C
V197 V198 V199 V200 V201 V202 V203 V204 V205 V206 V207 V208 V209 V210 V211 V212 V213 V214 V215 V216 V217
2263 C T T C G T T C C A C T G A C G T A G A A
V218 V219 V220 V221 V222 V223 V224 V225 V226 V227 V228 V229 V230 V231 V232 V233 V234 V235 V236 V237 V238
2263 G T G A C G C A G G A A T A G A G T C T C
V239 V240 V241 V242 V243 V244 V245 V246 V247 V248 V249 V250 V251 V252 V253 V254 V255 V256
2263 A C C T G A G A G A A T A C A T G A
There are many of these false calls. The Y-chromosome looks very heterogenous in this region, due to the falsly detected data.