Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-21-2024, 12:30 PM)AimSmall Wrote: (10-21-2024, 12:20 PM)Genetics189291 Wrote: @AimSmall
Do you think me using my combined kit is causing this issue since it’s 2000k snps as to the 1240k snps causing the bed file too big
When merging your kit, you should have kept only the SNPs that exist in the 1240K. It's pointless to add any that don't exist as it forces all the other samples to add NO CALLs and swells the resulting dataset.
plink --allow-no-sex --bfile v54p1_1240K_public --write-snplist --out v54p1_1240K_clean
plink --23file AimSmall.txt --extract v54p1_1240K_clean.snplist --make-bed --out AimSmall_v54p1_genome
Then merge AimSmall_v54p1_genome with v54p1_1240K_public or whatever dataset.
Thanks I’ll try this when I get home from work
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-21-2024, 12:30 PM)AimSmall Wrote: (10-21-2024, 12:20 PM)Genetics189291 Wrote: @AimSmall
Do you think me using my combined kit is causing this issue since it’s 2000k snps as to the 1240k snps causing the bed file too big
When merging your kit, you should have kept only the SNPs that exist in the 1240K. It's pointless to add any that don't exist as it forces all the other samples to add NO CALLs and swells the resulting dataset.
plink --allow-no-sex --bfile v54p1_1240K_public --write-snplist --out v54p1_1240K_clean
plink --23file AimSmall.txt --extract v54p1_1240K_clean.snplist --make-bed --out AimSmall_v54p1_genome
Then merge AimSmall_v54p1_genome with v54p1_1240K_public or whatever dataset.
This is some sort of joke
728550 variants loaded from .bim file.
1 person (1 male, 0 females) loaded from .fam.
1 phenotype value loaded from .fam.
--flip: 272 SNPs flipped.
Error: Invalid .bed file size (expected 728553 bytes).
I did what you said which helped my file, but now the bed file is 728,553 bytes and i need to remove the 23 bytes. these tools are very poor I'm gonna use my 23andme v3 file because looks like the combined kit is causing way too many issues
Posts: 863
Threads: 48
Joined: Aug 2023
Gender: Male
Ethnicity: Colonial American
Nationality: American
Y-DNA (P): R1b-U152 >R-FTA96415
Y-DNA (M): I2-P37 > I-BY77146
mtDNA (M): J1b1a1a
mtDNA (P): H66a
When did you get the error, on merge?
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-21-2024, 11:52 PM)AimSmall Wrote: When did you get the error, on merge?
When I need to flip the snps, once I merge it gives me the missnp file then when I try to flip it I get this error of file size with my .bed. Plink is strict with this I've found no other way to bypass this
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-21-2024, 11:52 PM)AimSmall Wrote: When did you get the error, on merge?
yeah it's a problem with my combined kit, 23andme v3 worked without no errors. I don't understand why it's an option in wgsextract
Posts: 863
Threads: 48
Joined: Aug 2023
Gender: Male
Ethnicity: Colonial American
Nationality: American
Y-DNA (P): R1b-U152 >R-FTA96415
Y-DNA (M): I2-P37 > I-BY77146
mtDNA (M): J1b1a1a
mtDNA (P): H66a
Did you use the merging steps?
Also, would you consider putting your signature in a spoiler? Kind of verbose being on every post.
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
10-22-2024, 12:22 AM
(This post was last modified: 10-22-2024, 12:22 AM by Genetics189291.)
(10-22-2024, 12:14 AM)AimSmall Wrote: Did you use the merging steps?
Also, would you consider putting your signature in a spoiler? Kind of verbose being on every post.
I took it off but yeah your steps work perfectly for my sons generated v3 file from wgs extract the issue is the combined kit file. I also noticed duplicates in the combined kit unlike the v3 file which I didn’t encounter and it allowed me to go on as normal with no issues and my file didn’t go up in significant size like before
The combined kit seems to cause an error with both duplicate snps which isn’t an issue to sort out but when it comes to flipping snps it will have the file size validation error which you would need to trim data which obviously you can’t do as it takes away crucial data. The combined kit is useless pretty much, I wouldn’t even trust it for ancestral analysis in my opinion if it has all those errors
Hopefully he can add a 1240k snp template in the future
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
@ AimSmall
Everything is working great now thanks, my first run
Code: > qp$weights
# A tibble: 4 × 5
target left weight se z
<chr> <chr> <dbl> <dbl> <dbl>
1 Son CanaryIslands_Guanche.SG 0.753 0.468 1.61
2 Son Congo_Kindoki_Protohistoric.AG 0.199 0.186 1.07
3 Son Greece_PalaceOfNestor_EIA_d.AG 1.09 3.18 0.343
4 Son Syria_TellQarassa_Umayyad.SG -1.04 3.75 -0.278
> qp$popdrop
# A tibble: 15 × 15
pat wt dof chisq p f4rank CanaryIslands_Guanche.SG Congo_Kindoki_Protohistor…¹ Greece_PalaceOfNesto…² Syria_TellQarassa_Um…³ feasible best
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
1 0000 0 7 12.5 8.59e- 2 3 0.753 0.199 1.09 -1.04 FALSE NA
2 0001 1 8 14.7 6.49e- 2 2 0.606 0.158 0.235 NA TRUE TRUE
3 0010 1 8 17.9 2.23e- 2 2 0.623 0.140 NA 0.237 TRUE TRUE
4 0100 1 8 10.7 2.18e- 1 2 0.124 NA -2.99 3.86 FALSE TRUE
5 1000 1 8 12.5 1.31e- 1 2 NA -0.212 -5.97 7.18 FALSE TRUE
6 0011 2 9 50.8 7.59e- 8 1 0.889 0.111 NA NA TRUE NA
7 0101 2 9 72.1 5.93e- 12 1 1.57 NA -0.570 NA FALSE NA
8 0110 2 9 155. 8.10e- 29 1 1.25 NA NA -0.252 FALSE NA
9 1001 2 9 32.6 1.57e- 4 1 NA 0.280 0.720 NA TRUE NA
10 1010 2 9 175. 7.10e- 33 1 NA 0.223 NA 0.777 TRUE NA
11 1100 2 9 10.7 3.00e- 1 1 NA NA -3.19 4.19 FALSE NA
12 0111 3 10 259. 6.52e- 50 0 1 NA NA NA TRUE NA
13 1011 3 10 7855. 0 0 NA 1 NA NA TRUE NA
14 1101 3 10 144. 5.72e- 26 0 NA NA 1 NA TRUE NA
15 1110 3 10 1151. 5.51e-241 0 NA NA NA 1 TRUE NA
# ℹ abbreviated names: ¹Congo_Kindoki_Protohistoric.AG, ²Greece_PalaceOfNestor_EIA_d.AG, ³Syria_TellQarassa_Umayyad.SG
# ℹ 3 more variables: dofdiff <dbl>, chisqdiff <dbl>, p_nested <dbl>
>
> `|`=`%>%`
> p=""
> o=""
> rm(t)
> #t=qp$popdrop|dplyr::filter(f4rank!=0&feasible)|arrange(desc(p),chisq)
> t=qp$popdrop|dplyr::filter(f4rank!=0&feasible)|arrange(desc(p),chisq)
>
> p=t|select(7:last_col(5))|apply(1,\(x)na.omit(100*x)|sort(T)|sprintf("%.0f %s",.,names(.))|paste(collapse=" "))
> o=sub("^0","",sprintf(ifelse(t$p<.001,"%.0g","%.3f"),t$p))|paste0((ifelse(t$p > 0.05,"SUCCESS ","FAILED ")),"p=",.," ",p,collapse="\n")
> paste0("Target: ",target,"\nLeft: ",paste(sort(left),collapse=", "),"\nRight: ",paste(right,collapse=", "),"\nFeasible Results:","\n",o)|writeLines
Target: Son
Left: CanaryIslands_Guanche.SG, Congo_Kindoki_Protohistoric.AG, Greece_PalaceOfNestor_EIA_d.AG, Syria_TellQarassa_Umayyad.SG
Right: Mbuti.DG, Levant_Natufian_d.AG, Turkey_Marmara_Barcin_N.SG, Iran_Wezmeh_N.SG, England_Mesolithic.AG, Russia_YuzhniyOleniyOstrov_Mesolithic.AG, Papuan.DG, Russia_MA1_UP.SG, Yoruba.DG, Russia_Samara_EBA_Yamnaya.AG, Morocco_Iberomaurusian.AG
Feasible Results:
SUCCESS p=.065 61 CanaryIslands_Guanche.SG 24 Greece_PalaceOfNestor_EIA_d.AG 16 Congo_Kindoki_Protohistoric.AG
FAILED p=.022 62 CanaryIslands_Guanche.SG 24 Syria_TellQarassa_Umayyad.SG 14 Congo_Kindoki_Protohistoric.AG
FAILED p=.0002 72 Greece_PalaceOfNestor_EIA_d.AG 28 Congo_Kindoki_Protohistoric.AG
FAILED p=8e-08 89 CanaryIslands_Guanche.SG 11 Congo_Kindoki_Protohistoric.AG
FAILED p=7e-33 78 Syria_TellQarassa_Umayyad.SG 22 Congo_Kindoki_Protohistoric.AG
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
@ AimSmall
Without syria added
Code: > qp$weights
# A tibble: 3 × 5
target left weight se z
<chr> <chr> <dbl> <dbl> <dbl>
1 Son CanaryIslands_Guanche.SG 0.624 0.0814 7.66
2 Son Congo_Kindoki_Protohistoric.AG 0.153 0.0202 7.59
3 Son Greece_PalaceOfNestor_EIA_d.AG 0.223 0.0668 3.34
> qp$popdrop
# A tibble: 7 × 14
pat wt dof chisq p f4rank CanaryIslands_Guanche.SG Congo_Kindoki_Protohist…¹ Greece_PalaceOfNesto…² feasible best dofdiff chisqdiff p_nested
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl> <dbl> <dbl> <dbl>
1 000 0 8 13.0 1.12e- 1 2 0.624 0.153 0.223 TRUE NA NA NA NA
2 001 1 9 38.2 1.60e- 5 1 0.884 0.116 NA TRUE TRUE 0 -33.4 1
3 010 1 9 71.6 7.43e-12 1 1.56 NA -0.560 FALSE TRUE 0 39.7 0
4 100 1 9 31.9 2.10e- 4 1 NA 0.280 0.720 TRUE TRUE NA NA NA
5 011 2 10 180. 2.44e-33 0 1 NA NA TRUE NA NA NA NA
6 101 2 10 7277. 0 0 NA 1 NA TRUE NA NA NA NA
7 110 2 10 141. 2.58e-25 0 NA NA 1 TRUE NA NA NA NA
# ℹ abbreviated names: ¹Congo_Kindoki_Protohistoric.AG, ²Greece_PalaceOfNestor_EIA_d.AG
>
> `|`=`%>%`
> p=""
> o=""
> rm(t)
> #t=qp$popdrop|dplyr::filter(f4rank!=0&feasible)|arrange(desc(p),chisq)
> t=qp$popdrop|dplyr::filter(f4rank!=0&feasible)|arrange(desc(p),chisq)
>
> p=t|select(7:last_col(5))|apply(1,\(x)na.omit(100*x)|sort(T)|sprintf("%.0f %s",.,names(.))|paste(collapse=" "))
> o=sub("^0","",sprintf(ifelse(t$p<.001,"%.0g","%.3f"),t$p))|paste0((ifelse(t$p > 0.05,"SUCCESS ","FAILED ")),"p=",.," ",p,collapse="\n")
> paste0("Target: ",target,"\nLeft: ",paste(sort(left),collapse=", "),"\nRight: ",paste(right,collapse=", "),"\nFeasible Results:","\n",o)|writeLines
Target: Son
Left: CanaryIslands_Guanche.SG, Congo_Kindoki_Protohistoric.AG, Greece_PalaceOfNestor_EIA_d.AG
Right: Mbuti.DG, Levant_Natufian_d.AG, Turkey_Marmara_Barcin_N.SG, Iran_Wezmeh_N.SG, England_Mesolithic.AG, Russia_YuzhniyOleniyOstrov_Mesolithic.AG, Papuan.DG, Russia_MA1_UP.SG, Yoruba.DG, Russia_Samara_EBA_Yamnaya.AG, Morocco_Iberomaurusian.AG
Feasible Results:
SUCCESS p=.112 62 CanaryIslands_Guanche.SG 22 Greece_PalaceOfNestor_EIA_d.AG 15 Congo_Kindoki_Protohistoric.AG
FAILED p=.0002 72 Greece_PalaceOfNestor_EIA_d.AG 28 Congo_Kindoki_Protohistoric.AG
FAILED p=2e-05 88 CanaryIslands_Guanche.SG 12 Congo_Kindoki_Protohistoric.AG
|