Check for new replies
How to take indvidual names from eigenstrat file to plink
#1
codes to add individual name to plink .fam file from eigenstrat .ind file after you've merged your datasets as some people might want to stick with plink

awk '{print $1}' v62.0_1240k_public.ind > Sample_Names.txt

this extracted the population individuals I needed, I then used this to replace the first column in the plink .fam file


awk 'NR==FNR {new_id[NR]=$1; next} { $1=new_id[FNR]; print }' Family_Samples.txt v64_0_Berkane2.fam > v64_0_Berkane2_updated.fam


Make sure you add your name to Sample_Names.txt before you proceed with the second step
Reply
#2
What's the purpose of this again? Why did your fam file need the ids updated?

Usually the issue is conversion strips the population names which need added back after using convertf.
Reply
#3
(10-20-2024, 10:00 PM)AimSmall Wrote: What's the purpose of this again?  Why did your fam file need the ids updated?

Usually the issue is conversion strips the population names which need added back after using convertf.

I was having issues converting back to eigenstrat because it said one of the population names was Too long which makes no sense because it’s not any different from the original eigenstrat data, so I decided to just stick with plink for now just to see if things are working. I need to figure out a way to convert it back to eigenstrat without getting that error and the process not being killed
Reply
#4
(10-20-2024, 10:35 PM)Genetics189291 Wrote:
(10-20-2024, 10:00 PM)AimSmall Wrote: What's the purpose of this again?  Why did your fam file need the ids updated?

Usually the issue is conversion strips the population names which need added back after using convertf.

I was having issues converting back to eigenstrat because it said one of the population names was Too long which makes no sense because it’s not any different from the original eigenstrat data, so I decided to just stick with plink for now just to see if things are working. I need to figure out a way to convert it back to eigenstrat without getting that error and the process not being killed

From Nganasankhan:

Quote:`convertf` removes the population names, but you can add the population names to the `fam` file like this: `f=v54.1_1240K_public;awk 'NR==FNR{a[$1]=$3;next}{$1=a[$2]}1' $f.{ind,fam}>$f.temp;mv $f.{temp,fam}`.

 
However when the combined length of the population name and sample name of some sample is over 39 characters, EIGENSOFT tools like SmartPCA exit with an error like this: `idnames too long Russian_Archangelsk_Krasnoborsky Rakr-203 ll: 41 limit: 39`. So in that case you can convert the population names back to a sequence of integers: `f=v54.1_1240K_public;awk '{$1=NR}1' $f.fam|sponge $f.fam`.
Reply

Check for new replies

Forum Jump:


Users browsing this thread: 1 Guest(s)