Merging BED with v62 dataset with Google Colab
#16
https://drive.google.com/drive/folders/1...sp=sharing
Inquirer and brwn_trd like this post
Reply
#17
(04-09-2025, 04:36 AM)TanTin Wrote: https://drive.google.com/drive/folders/1...sp=sharing

Is the following correct?


> setwd("C:/DNA Analysis/Reich Data Files/v62.0")
> runPLINK <- function(PLINKoptions = "") system(paste("C:/DNA Analysis/plink-1.07-dos/plink"))
> system("runPLINK --bfile For_merge/Takarkori_trimmed/S2949_filtered --bmerge v62.0_1240K_public.bed v62.0_1240K_public.bim v62.0_1240K_public.fam --out GenoCenter/DataCentral_1.0")
[1] 127
Reply
#18
(04-09-2025, 04:52 AM)Inquirer Wrote:
(04-09-2025, 04:36 AM)TanTin Wrote: https://drive.google.com/drive/folders/1...sp=sharing

Is the following correct?


> setwd("C:/DNA Analysis/Reich Data Files/v62.0")
> runPLINK <- function(PLINKoptions = "") system(paste("C:/DNA Analysis/plink-1.07-dos/plink"))
> system("runPLINK --bfile For_merge/Takarkori_trimmed/S2949_filtered --bmerge v62.0_1240K_public.bed v62.0_1240K_public.bim v62.0_1240K_public.fam --out GenoCenter/DataCentral_1.0")
[1] 127

No, it's not correct. You will need to adjust the path according to your files.
C:/DNA Analysis/Reich Data Files/v62.0  - is your path to the initial dataset, source.

For_merge/Takarkori_trimmed/S2949_filtered  - is the file that you will merge.   In the case of Takarkori it should be :   Takarkori_trimmed  or whatever the file that you want to merge.
Next : it is the main datafile - I have here  v62.0_1240K_public.bed v62.0_1240K_public

and the output file at the end..

the easiest way is to put all in the same folder / directory.. If you want to separate in different folders: you will need to specify the exact path.
Reply
#19
If you use: C:/DNA Analysis/Reich Data Files/ as setwd - work directory, make sure to copy the files in the same directory.
The files that you need for Takarkori are:
Takarkori_trimmed.bed
Takarkori_trimmed.bim
Takarkori_trimmed.fam
Reply
#20
the command will be:

system("plink --bfile Takarkori_trimmed --bmerge v62.0_1240K_public --out your_output_filename ")

1. make sure to have plink.exe file in the same directory
C:/DNA Analysis/Reich Data Files/v62.0
2. Make sure your v62.0 files are there too.

3. Make sure Takarkori files are in the same directory too...

Takarkori_trimmed.bed
Takarkori_trimmed.bim
Takarkori_trimmed.fam


BTW, the same command you can execute directly in command prompt, just make sure you are in the same folder/ directory..
To execute use :
plink --bfile Takarkori_trimmed --bmerge v62.0_1240K_public --out your_output_filename
Reply
#21
(04-09-2025, 05:14 AM)TanTin Wrote: the command will be:

system("plink --bfile Takarkori_trimmed --bmerge v62.0_1240K_public --out your_output_filename  ")

1.  make sure to have plink.exe file in the same directory
                        C:/DNA Analysis/Reich Data Files/v62.0
2.  Make sure your v62.0 files are there too.

3. Make sure Takarkori  files are in the same directory too...

Takarkori_trimmed.bed
Takarkori_trimmed.bim
Takarkori_trimmed.fam


BTW, the same command you can execute directly in command prompt, just make sure you are in the same folder/ directory..
To execute use :
plink --bfile Takarkori_trimmed --bmerge v62.0_1240K_public --out your_output_filename

> setwd("C:/DNA Analysis/Reich Data Files/v62.0")
> system("plink --bfile Takarkori_trimmed --bmerge v62.0_1240K_public --out merged_file")

@----------------------------------------------------------@
|        PLINK!      |    v1.07      |  10/Aug/2009    |
|----------------------------------------------------------|
|  © 2009 Shaun Purcell, GNU General Public License, v2  |
|----------------------------------------------------------|
|  For documentation, citation & bug-report instructions:  |
|        http://pngu.mgh.harvard.edu/purcell/plink/        |
@----------------------------------------------------------@


ERROR: No file [ v62.0_1240K_public ] exists.
[1] 1

   
Reply
#22
This version of plink is very old. ( v1.07 )

Use PLINK v2
PLINK v2.00a5 64-bit (7 Jul 2023)

Or use the previous v. 1.9

I use both 1.9 / 2.
Reply
#23
Second problem: your dataset is still in Eigenstrat format. (.geno, .snp .ind ) files.
These must be converted to plink format. The plink files are with extension: .bed .bim .fam .

Plink can not work directly with Eigenstrat format files.
Reply
#24
(04-09-2025, 12:18 PM)TanTin Wrote: Second problem:  your dataset is still in Eigenstrat format.  (.geno, .snp .ind ) files.
These must be converted to plink format.  The plink files are with extension: .bed .bim .fam .

Plink can not work directly with Eigenstrat format files.

This is becoming tedious and frustrating.

> eigenstrat_to_plink(inpref = "v62.0_1240k_public", outpref = "plink_data")
ℹ v62.0_1240k_public.geno has 17629 samples and 1233013 SNPs.
ℹ Reading data for 17629 samples and 1233013 SNPs
ℹ Expected size of genotype data: 174032 MB
1233k SNPs read...
✔ 1233013 SNPs read in total
Error: cannot allocate vector of size 162.0 Gb


I have terabytes of storage and 64 GBs of RAM, by the way.
Reply
#25
(04-09-2025, 03:29 PM)Inquirer Wrote:
(04-09-2025, 12:18 PM)TanTin Wrote: Second problem:  your dataset is still in Eigenstrat format.  (.geno, .snp .ind ) files.
These must be converted to plink format.  The plink files are with extension: .bed .bim .fam .

Plink can not work directly with Eigenstrat format files.

This is becoming tedious and frustrating.

> eigenstrat_to_plink(inpref = "v62.0_1240k_public", outpref = "plink_data")
ℹ v62.0_1240k_public.geno has 17629 samples and 1233013 SNPs.
ℹ Reading data for 17629 samples and 1233013 SNPs
ℹ Expected size of genotype data: 174032 MB
1233k SNPs read...
✔ 1233013 SNPs read in total
Error: cannot allocate vector of size 162.0 Gb


I have terabytes of storage and 64 GBs of RAM, by the way.


I convert using "convertf"   under Linux environement.  If you have some Linux - do it in such way.
I think someone converted and shared the dataset already..

For converting in Windows environement I can't help much.
Reply
#26
You need to be very patient with this.. Some operations may take many many hours.
Processing some data may take up to day or 2.
Reply
#27
I think this link is the plink format.... It's from the original thread we had when the dataset was released.

https://www.mediafire.com/file/wywscie9d...K.zip/file
Inquirer and brwn_trd like this post
Reply
#28
(04-09-2025, 04:49 PM)AimSmall Wrote: I think this link is the plink format.... It's from the original thread we had when the dataset was released.

https://www.mediafire.com/file/wywscie9d...K.zip/file

Thank you  AimSmall .
I guess you will save some time for   Inquirer  if he is still motivated to continue with such data processing. 
Converting and merging is actually the easiest part.. The most hard difficulties start after...  We all know it.  So be patient and don't give up.
Reply
#29
(04-09-2025, 03:49 PM)TanTin Wrote: You need to be very patient with this.. Some operations may take many many hours.
Processing some data may take up to day or 2.

It's not a matter of time; it's a matter of my computer seemingly not being able to complete the conversion process due to limited RAM.
Reply
#30
(04-09-2025, 05:18 PM)Inquirer Wrote:
(04-09-2025, 03:49 PM)TanTin Wrote: You need to be very patient with this.. Some operations may take many many hours.
Processing some data may take up to day or 2.

It's not a matter of time; it's a matter of my computer seemingly not being able to complete the conversion process due to limited RAM.

Yess. I forgot to tell you that RAM is very important.. I have only 16 GB Ram on my PC. If you have 8 GB - it's not enough.
On the same computer I run VM (virtual machine) for Linux, where I do the conversion.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)