Check for new replies
Dataset merging help
#31
(10-21-2024, 12:30 PM)AimSmall Wrote:
(10-21-2024, 12:20 PM)Genetics189291 Wrote: @AimSmall

Do you think me using my combined kit is causing this issue since it’s 2000k snps as to the 1240k snps causing the bed file too big

When merging your kit, you should have kept only the SNPs that exist in the 1240K.  It's pointless to add any that don't exist as it forces all the other samples to add NO CALLs and swells the resulting dataset.

plink --allow-no-sex --bfile v54p1_1240K_public --write-snplist --out v54p1_1240K_clean
plink --23file AimSmall.txt --extract v54p1_1240K_clean.snplist --make-bed --out AimSmall_v54p1_genome

Then merge  AimSmall_v54p1_genome with  v54p1_1240K_public or whatever dataset.

Thanks I’ll try this when I get home from work
Reply
#32
(10-21-2024, 12:30 PM)AimSmall Wrote:
(10-21-2024, 12:20 PM)Genetics189291 Wrote: @AimSmall

Do you think me using my combined kit is causing this issue since it’s 2000k snps as to the 1240k snps causing the bed file too big

When merging your kit, you should have kept only the SNPs that exist in the 1240K.  It's pointless to add any that don't exist as it forces all the other samples to add NO CALLs and swells the resulting dataset.

plink --allow-no-sex --bfile v54p1_1240K_public --write-snplist --out v54p1_1240K_clean
plink --23file AimSmall.txt --extract v54p1_1240K_clean.snplist --make-bed --out AimSmall_v54p1_genome

Then merge  AimSmall_v54p1_genome with  v54p1_1240K_public or whatever dataset.

This is some sort of joke

728550 variants loaded from .bim file.
1 person (1 male, 0 females) loaded from .fam.
1 phenotype value loaded from .fam.
--flip: 272 SNPs flipped.
Error: Invalid .bed file size (expected 728553 bytes).


I did what you said which helped my file, but now the bed file is 728,553 bytes and i need to remove the 23 bytes. these tools are very poor I'm gonna use my 23andme v3 file because looks like the combined kit is causing way too many issues
Reply
#33
When did you get the error, on merge?
Reply
#34
(10-21-2024, 11:52 PM)AimSmall Wrote: When did you get the error, on merge?

When I need to flip the snps, once I merge it gives me the missnp file then when I try to flip it I get this error of file size with my .bed. Plink is strict with this I've found no other way to bypass this
Reply
#35
(10-21-2024, 11:52 PM)AimSmall Wrote: When did you get the error, on merge?

yeah it's a problem with my combined kit, 23andme v3 worked without no errors. I don't understand why it's an option in wgsextract
Reply
#36
Did you use the merging steps?

Also, would you consider putting your signature in a spoiler? Kind of verbose being on every post.
Reply
#37
(10-22-2024, 12:14 AM)AimSmall Wrote: Did you use the merging steps? 

Also, would you consider putting your signature in a spoiler?  Kind of verbose being on every post.

I took it off but yeah your steps work perfectly for my sons generated v3 file from wgs extract the issue is the combined kit file. I also noticed duplicates in the combined kit unlike the v3 file which I didn’t encounter and it allowed me to go on as normal with no issues and my file didn’t go up in significant size like before 

The combined kit seems to cause an error with both duplicate snps which isn’t an issue to sort out but when it comes to flipping snps it will have the file size validation error which you would need to trim data which obviously you can’t do as it takes away crucial data. The combined kit is useless pretty much, I wouldn’t even trust it for ancestral analysis in my opinion if it has all those errors

Hopefully he can add a 1240k snp template in the future
Reply
#38
@AimSmall

Everything is working great now thanks, my first run

Code:
> qp$weights
# A tibble: 4 × 5
  target left                           weight    se      z
  <chr>  <chr>                           <dbl> <dbl>  <dbl>
1 Son    CanaryIslands_Guanche.SG        0.753 0.468  1.61
2 Son    Congo_Kindoki_Protohistoric.AG  0.199 0.186  1.07
3 Son    Greece_PalaceOfNestor_EIA_d.AG  1.09  3.18   0.343
4 Son    Syria_TellQarassa_Umayyad.SG   -1.04  3.75  -0.278
> qp$popdrop
# A tibble: 15 × 15
   pat      wt   dof  chisq         p f4rank CanaryIslands_Guanche.SG Congo_Kindoki_Protohistor…¹ Greece_PalaceOfNesto…² Syria_TellQarassa_Um…³ feasible best
   <chr> <dbl> <dbl>  <dbl>     <dbl>  <dbl>                    <dbl>                       <dbl>                  <dbl>                  <dbl> <lgl>    <lgl>
1 0000      0     7   12.5 8.59e-  2      3                    0.753                       0.199                  1.09                  -1.04  FALSE    NA  
2 0001      1     8   14.7 6.49e-  2      2                    0.606                       0.158                  0.235                 NA     TRUE     TRUE
3 0010      1     8   17.9 2.23e-  2      2                    0.623                       0.140                 NA                      0.237 TRUE     TRUE
4 0100      1     8   10.7 2.18e-  1      2                    0.124                      NA                     -2.99                   3.86  FALSE    TRUE
5 1000      1     8   12.5 1.31e-  1      2                   NA                          -0.212                 -5.97                   7.18  FALSE    TRUE
6 0011      2     9   50.8 7.59e-  8      1                    0.889                       0.111                 NA                     NA     TRUE     NA  
7 0101      2     9   72.1 5.93e- 12      1                    1.57                       NA                     -0.570                 NA     FALSE    NA  
8 0110      2     9  155.  8.10e- 29      1                    1.25                       NA                     NA                     -0.252 FALSE    NA  
9 1001      2     9   32.6 1.57e-  4      1                   NA                           0.280                  0.720                 NA     TRUE     NA  
10 1010      2     9  175.  7.10e- 33      1                   NA                           0.223                 NA                      0.777 TRUE     NA  
11 1100      2     9   10.7 3.00e-  1      1                   NA                          NA                     -3.19                   4.19  FALSE    NA  
12 0111      3    10  259.  6.52e- 50      0                    1                          NA                     NA                     NA     TRUE     NA  
13 1011      3    10 7855.  0              0                   NA                           1                     NA                     NA     TRUE     NA  
14 1101      3    10  144.  5.72e- 26      0                   NA                          NA                      1                     NA     TRUE     NA  
15 1110      3    10 1151.  5.51e-241      0                   NA                          NA                     NA                      1     TRUE     NA  
# ℹ abbreviated names: ¹​Congo_Kindoki_Protohistoric.AG, ²​Greece_PalaceOfNestor_EIA_d.AG, ³​Syria_TellQarassa_Umayyad.SG
# ℹ 3 more variables: dofdiff <dbl>, chisqdiff <dbl>, p_nested <dbl>
>
> `|`=`%>%`
> p=""
> o=""
> rm(t)
> #t=qp$popdrop|dplyr::filter(f4rank!=0&feasible)|arrange(desc(p),chisq)
> t=qp$popdrop|dplyr::filter(f4rank!=0&feasible)|arrange(desc(p),chisq)
>
> p=t|select(7:last_col(5))|apply(1,\(x)na.omit(100*x)|sort(T)|sprintf("%.0f %s",.,names(.))|paste(collapse=" "))
> o=sub("^0","",sprintf(ifelse(t$p<.001,"%.0g","%.3f"),t$p))|paste0((ifelse(t$p > 0.05,"SUCCESS ","FAILED  ")),"p=",.," ",p,collapse="\n")
> paste0("Target: ",target,"\nLeft: ",paste(sort(left),collapse=", "),"\nRight: ",paste(right,collapse=", "),"\nFeasible Results:","\n",o)|writeLines
Target: Son
Left: CanaryIslands_Guanche.SG, Congo_Kindoki_Protohistoric.AG, Greece_PalaceOfNestor_EIA_d.AG, Syria_TellQarassa_Umayyad.SG
Right: Mbuti.DG, Levant_Natufian_d.AG, Turkey_Marmara_Barcin_N.SG, Iran_Wezmeh_N.SG, England_Mesolithic.AG, Russia_YuzhniyOleniyOstrov_Mesolithic.AG, Papuan.DG, Russia_MA1_UP.SG, Yoruba.DG, Russia_Samara_EBA_Yamnaya.AG, Morocco_Iberomaurusian.AG
Feasible Results:
SUCCESS p=.065 61 CanaryIslands_Guanche.SG 24 Greece_PalaceOfNestor_EIA_d.AG 16 Congo_Kindoki_Protohistoric.AG
FAILED  p=.022 62 CanaryIslands_Guanche.SG 24 Syria_TellQarassa_Umayyad.SG 14 Congo_Kindoki_Protohistoric.AG
FAILED  p=.0002 72 Greece_PalaceOfNestor_EIA_d.AG 28 Congo_Kindoki_Protohistoric.AG
FAILED  p=8e-08 89 CanaryIslands_Guanche.SG 11 Congo_Kindoki_Protohistoric.AG
FAILED  p=7e-33 78 Syria_TellQarassa_Umayyad.SG 22 Congo_Kindoki_Protohistoric.AG
Reply
#39
@AimSmall

Without syria added

Code:
> qp$weights
# A tibble: 3 × 5
  target left                           weight     se     z
  <chr>  <chr>                           <dbl>  <dbl> <dbl>
1 Son    CanaryIslands_Guanche.SG        0.624 0.0814  7.66
2 Son    Congo_Kindoki_Protohistoric.AG  0.153 0.0202  7.59
3 Son    Greece_PalaceOfNestor_EIA_d.AG  0.223 0.0668  3.34
> qp$popdrop
# A tibble: 7 × 14
  pat      wt   dof  chisq        p f4rank CanaryIslands_Guanche.SG Congo_Kindoki_Protohist…¹ Greece_PalaceOfNesto…² feasible best  dofdiff chisqdiff p_nested
  <chr> <dbl> <dbl>  <dbl>    <dbl>  <dbl>                    <dbl>                     <dbl>                  <dbl> <lgl>    <lgl>   <dbl>     <dbl>    <dbl>
1 000       0     8   13.0 1.12e- 1      2                    0.624                     0.153                  0.223 TRUE     NA         NA      NA         NA
2 001       1     9   38.2 1.60e- 5      1                    0.884                     0.116                 NA     TRUE     TRUE        0     -33.4        1
3 010       1     9   71.6 7.43e-12      1                    1.56                     NA                     -0.560 FALSE    TRUE        0      39.7        0
4 100       1     9   31.9 2.10e- 4      1                   NA                         0.280                  0.720 TRUE     TRUE       NA      NA         NA
5 011       2    10  180.  2.44e-33      0                    1                        NA                     NA     TRUE     NA         NA      NA         NA
6 101       2    10 7277.  0             0                   NA                         1                     NA     TRUE     NA         NA      NA         NA
7 110       2    10  141.  2.58e-25      0                   NA                        NA                      1     TRUE     NA         NA      NA         NA
# ℹ abbreviated names: ¹​Congo_Kindoki_Protohistoric.AG, ²​Greece_PalaceOfNestor_EIA_d.AG
>
> `|`=`%>%`
> p=""
> o=""
> rm(t)
> #t=qp$popdrop|dplyr::filter(f4rank!=0&feasible)|arrange(desc(p),chisq)
> t=qp$popdrop|dplyr::filter(f4rank!=0&feasible)|arrange(desc(p),chisq)
>
> p=t|select(7:last_col(5))|apply(1,\(x)na.omit(100*x)|sort(T)|sprintf("%.0f %s",.,names(.))|paste(collapse=" "))
> o=sub("^0","",sprintf(ifelse(t$p<.001,"%.0g","%.3f"),t$p))|paste0((ifelse(t$p > 0.05,"SUCCESS ","FAILED  ")),"p=",.," ",p,collapse="\n")
> paste0("Target: ",target,"\nLeft: ",paste(sort(left),collapse=", "),"\nRight: ",paste(right,collapse=", "),"\nFeasible Results:","\n",o)|writeLines
Target: Son
Left: CanaryIslands_Guanche.SG, Congo_Kindoki_Protohistoric.AG, Greece_PalaceOfNestor_EIA_d.AG
Right: Mbuti.DG, Levant_Natufian_d.AG, Turkey_Marmara_Barcin_N.SG, Iran_Wezmeh_N.SG, England_Mesolithic.AG, Russia_YuzhniyOleniyOstrov_Mesolithic.AG, Papuan.DG, Russia_MA1_UP.SG, Yoruba.DG, Russia_Samara_EBA_Yamnaya.AG, Morocco_Iberomaurusian.AG
Feasible Results:
SUCCESS p=.112 62 CanaryIslands_Guanche.SG 22 Greece_PalaceOfNestor_EIA_d.AG 15 Congo_Kindoki_Protohistoric.AG
FAILED  p=.0002 72 Greece_PalaceOfNestor_EIA_d.AG 28 Congo_Kindoki_Protohistoric.AG
FAILED  p=2e-05 88 CanaryIslands_Guanche.SG 12 Congo_Kindoki_Protohistoric.AG
Reply

Check for new replies

Forum Jump:


Users browsing this thread: 1 Guest(s)