04-09-2025, 06:49 PM
fam files really all you need to open for selecting purposes
Merging BED with v62 dataset with Google Colab
|
04-09-2025, 06:49 PM
fam files really all you need to open for selecting purposes
04-09-2025, 06:51 PM
04-09-2025, 06:54 PM
This topic became very interesting all of a sudden. I have so many other things to share... Hopefully this is just the beginning.
04-09-2025, 09:04 PM
(04-09-2025, 06:49 PM)AimSmall Wrote: fam files really all you need to open for selecting purposes (04-09-2025, 06:54 PM)TanTin Wrote: This topic became very interesting all of a sudden. I have so many other things to share... Hopefully this is just the beginning. I was away from my computer for a few hours. However, I returned and managed to get things working thanks to you two (though perhaps not perfectly because I received error messages). Anyhow, here's a model that I got to work, somewhat. What do you guys think?
04-10-2025, 12:28 AM
04-10-2025, 07:07 AM
I'm running into an issue here. I downloaded the samples @TanTin provided. I converted them to Eigenstrat format(though I kept running into an issue where it kept setting the populations to "ignore" but I resolved that). Yet everytime I go to merge them with the larger dataset, I'm greeted to this message "fatalx: OOPS snp file has changed since genotype file was created Aborted (core dumped)". Not sure what to do to get it to actually work. I'm using Eigensoft to merge by the way. Does anyone have the samples in eigenstrat format?
04-10-2025, 08:36 PM
(04-10-2025, 07:07 AM)ModusOperandi Wrote: I'm running into an issue here. I downloaded the samples @TanTin provided. I converted them to Eigenstrat format(though I kept running into an issue where it kept setting the populations to "ignore" but I resolved that). Yet everytime I go to merge them with the larger dataset, I'm greeted to this message "fatalx: OOPS snp file has changed since genotype file was created Aborted (core dumped)". Not sure what to do to get it to actually work. I'm using Eigensoft to merge by the way. Does anyone have the samples in eigenstrat format? For the "ignore" error, you need to go into the .fam file and find>replace all the instances of 9 (or -9?, it's the value in the last column either way), with 1. I don't remember running into the OOPS snp, but OOPS indiv maybe same fix? https://genarchivist.net/showthread.php?...8#pid30538
04-15-2025, 02:39 AM
(04-10-2025, 08:36 PM)Kale Wrote: For the "ignore" error, you need to go into the .fam file and find>replace all the instances of 9 (or -9?, it's the value in the last column either way), with 1. I set hashcheck to "NO" and it didn't give me the error, however it did abruptly end the process and returned the message "killed. Figured it might be a low RAM issue so I increased my computer's memory, ran it again it and got a lot further along in the merge process, the program was actually returning "OK" after reading the geno file. But unfortunately once again the process got abruptly ended and returned the same "killed" message. Not sure what to do at this point
04-15-2025, 02:41 AM
How much RAM do you have?
04-15-2025, 04:47 AM
I was able to merge v62 with a few hundred other samples and convert to Eigenstrat with 16GB RAM.
That was using Linux, plink to merge, convertf to convert.
04-16-2025, 05:30 AM
(04-15-2025, 05:11 AM)Kale Wrote: I was able to merge v62 with a few hundred other samples and convert to Eigenstrat with 16GB RAM. I suspect it's this particular sample. It's strange because in the past I've been able to merge samples into datasets with little to no issue on PCs with even less ram with this
04-16-2025, 06:04 AM
I think I solved the issue. Turns out, I wasn't allocating enough memory to the VM so it could get the job done. It needed about 10GB of RAM dedicated to it just to keep itself from killing the process early, anything less just did not want to work
04-20-2025, 08:12 PM
Out of curiosity, what is the file size of everyone's merged genotype data file? I've spoken on this in the past, but I am not sure why but whenever I merge a sample with another larger dataset, the file size always explodes to 20+ GB. This makes each process slower than before as it seems admixtools reads the entire dataset before calculating statistics. This wouldn't be much of an issue to deal with if it were just a 5GB file, but this merged data is over 4 times as large. Where is the extra data in the file even coming from?
|
« Next Oldest | Next Newest »
|