Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
codes to add individual name to plink .fam file from eigenstrat .ind file after you've merged your datasets as some people might want to stick with plink
awk '{print $1}' v62.0_1240k_public.ind > Sample_Names.txt
this extracted the population individuals I needed, I then used this to replace the first column in the plink .fam file
awk 'NR==FNR {new_id[NR]=$1; next} { $1=new_id[FNR]; print }' Family_Samples.txt v64_0_Berkane2.fam > v64_0_Berkane2_updated.fam
Make sure you add your name to Sample_Names.txt before you proceed with the second step
Posts: 864
Threads: 48
Joined: Aug 2023
Gender: Male
Ethnicity: Colonial American
Nationality: American
Y-DNA (P): R1b-U152 >R-FTA96415
Y-DNA (M): I2-P37 > I-BY77146
mtDNA (M): J1b1a1a
mtDNA (P): H66a
What's the purpose of this again? Why did your fam file need the ids updated?
Usually the issue is conversion strips the population names which need added back after using convertf.
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-20-2024, 10:00 PM)AimSmall Wrote: What's the purpose of this again? Why did your fam file need the ids updated?
Usually the issue is conversion strips the population names which need added back after using convertf.
I was having issues converting back to eigenstrat because it said one of the population names was Too long which makes no sense because it’s not any different from the original eigenstrat data, so I decided to just stick with plink for now just to see if things are working. I need to figure out a way to convert it back to eigenstrat without getting that error and the process not being killed
Posts: 864
Threads: 48
Joined: Aug 2023
Gender: Male
Ethnicity: Colonial American
Nationality: American
Y-DNA (P): R1b-U152 >R-FTA96415
Y-DNA (M): I2-P37 > I-BY77146
mtDNA (M): J1b1a1a
mtDNA (P): H66a
(10-20-2024, 10:35 PM)Genetics189291 Wrote: (10-20-2024, 10:00 PM)AimSmall Wrote: What's the purpose of this again? Why did your fam file need the ids updated?
Usually the issue is conversion strips the population names which need added back after using convertf.
I was having issues converting back to eigenstrat because it said one of the population names was Too long which makes no sense because it’s not any different from the original eigenstrat data, so I decided to just stick with plink for now just to see if things are working. I need to figure out a way to convert it back to eigenstrat without getting that error and the process not being killed
From Nganasankhan:
Quote:`convertf` removes the population names, but you can add the population names to the `fam` file like this: `f=v54.1_1240K_public;awk 'NR==FNR{a[$1]=$3;next}{$1=a[$2]}1' $f.{ind,fam}>$f.temp;mv $f.{temp,fam}`.
However when the combined length of the population name and sample name of some sample is over 39 characters, EIGENSOFT tools like SmartPCA exit with an error like this: `idnames too long Russian_Archangelsk_Krasnoborsky Rakr-203 ll: 41 limit: 39`. So in that case you can convert the population names back to a sequence of integers: `f=v54.1_1240K_public;awk '{$1=NR}1' $f.fam|sponge $f.fam`.