Posts: 369
Threads: 24
Joined: Oct 2023
Gender: Male
Ethnicity: Hispanic mestizo
Nationality: Mexican
Y-DNA (P): E1b-L19 -> M183 -> PF2546
Y-DNA (M): E1b-L19 -> M183 -> PF2546
mtDNA (M): B2g1
mtDNA (P): C4
Country:
Plink is used for qpAdm or am I wrong?
Maternal grandpa's mtDNA: L1b1a
23andMe: 55.5% European, 33.7% Indigenous American, 4.2% WANA, 3.4% SSA & 3.2% Unassigned
AncestryDNA: 55% Europe/Sephardic Jew, 38% Indigenous Americas-Mexico, 4% MENA & 3% SSA
FamilyTreeDNA: 56.9% Europe, 33% Americas, 8.2% MENA, <2% Horn of Africa & <1% Eastern India
Living DNA: 63.3% West Iberia, 34.4% Native Americas & 2.3% Yorubaland
MyHeritage DNA: 77.5% Mexico, 21.4% Iberian & 1.1% Moroccan
qpAdm
taildiff: 0.959427
60.1% Iberian ± 1.9%, 34% Native American ± 1.9% & 5.8% African ± 0.9%
Posts: 963
Threads: 24
Joined: Oct 2023
(03-03-2024, 08:28 AM)Jalisciense Wrote: Plink is used for qpAdm or am I wrong?
Wrong. Plink can be used for many many things. This is the best software to process the data , like to convert, extract, merge , have some statistics etc.. You may generate many kind of reports directly from plink.. Or you may just convert your data and use it next with other software like qpAdm. However you can't do everything with plink. One of the best advantages is you may use it both in Linux and Windows, using the same commands and scripts.
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(11-11-2023, 02:27 PM)AimSmall Wrote: If starting with FASTQ files... I'd recommend getting WGSExtract installed.
It can convert FASTQ and call them into BAM files. From there it will even produce RAW data extracts like 23&Me, AncestryDNA, etc.
With the RAW files, simple process from there to to make your PLINK Bed files.
https://wgsextract.github.io/
The problem with wgsextract is that it produces your raw data with chromosomes out of order which causes issues when converting to plink as it won’t allow you to proceed
Posts: 864
Threads: 48
Joined: Aug 2023
Gender: Male
Ethnicity: Colonial American
Nationality: American
Y-DNA (P): R1b-U152 >R-FTA96415
Y-DNA (M): I2-P37 > I-BY77146
mtDNA (M): J1b1a1a
mtDNA (P): H66a
(10-20-2024, 12:20 PM)Genetics189291 Wrote: (11-11-2023, 02:27 PM)AimSmall Wrote: If starting with FASTQ files... I'd recommend getting WGSExtract installed.
It can convert FASTQ and call them into BAM files. From there it will even produce RAW data extracts like 23&Me, AncestryDNA, etc.
With the RAW files, simple process from there to to make your PLINK Bed files.
https://wgsextract.github.io/
The problem with wgsextract is that it produces your raw data with chromosomes out of order which causes issues when converting to plink as it won’t allow you to proceed
Can you simply sort the text file?
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-20-2024, 12:23 PM)AimSmall Wrote: (10-20-2024, 12:20 PM)Genetics189291 Wrote: (11-11-2023, 02:27 PM)AimSmall Wrote: If starting with FASTQ files... I'd recommend getting WGSExtract installed.
It can convert FASTQ and call them into BAM files. From there it will even produce RAW data extracts like 23&Me, AncestryDNA, etc.
With the RAW files, simple process from there to to make your PLINK Bed files.
https://wgsextract.github.io/
The problem with wgsextract is that it produces your raw data with chromosomes out of order which causes issues when converting to plink as it won’t allow you to proceed
Can you simply sort the text file?
I tried it still fails to recognise the chromosomes in order I have no idea why. I’ve also tried my cram file with the same issue not sure how to proceed
Posts: 864
Threads: 48
Joined: Aug 2023
Gender: Male
Ethnicity: Colonial American
Nationality: American
Y-DNA (P): R1b-U152 >R-FTA96415
Y-DNA (M): I2-P37 > I-BY77146
mtDNA (M): J1b1a1a
mtDNA (P): H66a
plink --file input --make-bed --out output --allow-extra-chr --sort-vars
--file input: Specifies the base name of the input .ped and .map files.
--make-bed: Tells PLINK to convert the data to binary format.
--out output: Specifies the base name for the output files.
--sort-vars: Ensures that variants are sorted by chromosome and position.
--allow-extra-chr: Allows for non-standard chromosomes (optional, but useful if you have non-autosomal chromosomes like X, Y, or mitochondrial DNA).
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-20-2024, 12:50 PM)AimSmall Wrote: plink --file input --make-bed --out output --allow-extra-chr --sort-vars
--file input: Specifies the base name of the input .ped and .map files.
--make-bed: Tells PLINK to convert the data to binary format.
--out output: Specifies the base name for the output files.
--sort-vars: Ensures that variants are sorted by chromosome and position.
--allow-extra-chr: Allows for non-standard chromosomes (optional, but useful if you have non-autosomal chromosomes like X, Y, or mitochondrial DNA).
I can’t do this because if I even try to recode to .ped .map first it will say my chromosomes are still out of order
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-20-2024, 12:50 PM)AimSmall Wrote: plink --file input --make-bed --out output --allow-extra-chr --sort-vars
--file input: Specifies the base name of the input .ped and .map files.
--make-bed: Tells PLINK to convert the data to binary format.
--out output: Specifies the base name for the output files.
--sort-vars: Ensures that variants are sorted by chromosome and position.
--allow-extra-chr: Allows for non-standard chromosomes (optional, but useful if you have non-autosomal chromosomes like X, Y, or mitochondrial DNA).
awk '{
if ($2 == "X") $2 = 23;
else if ($2 == "Y") $2 = 24;
else if ($2 == "MT") $2 = 26;
print
}' Combinedkit.txt | (head -n 1 && tail -n +2 | sort -k2,2n -k3,3n) > merge_data.txt
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-20-2024, 12:50 PM)AimSmall Wrote: plink --file input --make-bed --out output --allow-extra-chr --sort-vars
--file input: Specifies the base name of the input .ped and .map files.
--make-bed: Tells PLINK to convert the data to binary format.
--out output: Specifies the base name for the output files.
--sort-vars: Ensures that variants are sorted by chromosome and position.
--allow-extra-chr: Allows for non-standard chromosomes (optional, but useful if you have non-autosomal chromosomes like X, Y, or mitochondrial DNA).
This is what I’m getting now I tried to look for this in the snp file it doesn’t exist I also removed all duplicates
Fatalx:
Duplicate key Rs2308040 this doesn’t exist in both datasets I’m confused I tried merging with both same error
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-20-2024, 12:50 PM)AimSmall Wrote: plink --file input --make-bed --out output --allow-extra-chr --sort-vars
--file input: Specifies the base name of the input .ped and .map files.
--make-bed: Tells PLINK to convert the data to binary format.
--out output: Specifies the base name for the output files.
--sort-vars: Ensures that variants are sorted by chromosome and position.
--allow-extra-chr: Allows for non-standard chromosomes (optional, but useful if you have non-autosomal chromosomes like X, Y, or mitochondrial DNA).
It wasn’t showing on Linux but windows confused why the duplicate snp command doesn’t work in removing all duplicated
Posts: 503
Threads: 72
Joined: Nov 2023
Gender: Male
Ethnicity: Arab
(10-20-2024, 12:50 PM)AimSmall Wrote: plink --file input --make-bed --out output --allow-extra-chr --sort-vars
--file input: Specifies the base name of the input .ped and .map files.
--make-bed: Tells PLINK to convert the data to binary format.
--out output: Specifies the base name for the output files.
--sort-vars: Ensures that variants are sorted by chromosome and position.
--allow-extra-chr: Allows for non-standard chromosomes (optional, but useful if you have non-autosomal chromosomes like X, Y, or mitochondrial DNA).
This is how i sorted out the duplicated now
awk '!seen[$2]++' oujda_berkane_hh.bim > oujda_berkane.bim
Posts: 95
Threads: 1
Joined: Apr 2024
Gender: Male
Ethnicity: Ru
Nationality: Ru
Y-DNA (P): I1-Y353312
Y-DNA (M): I1a3
mtDNA (M): U5b2a1b
Give me please advice.
I have got 12 files like 000005450130-E250049362_L01_UDB-412_1.fq.gz about 6-7 Gb each from my wgs provider.
I am trying to combine them into 1 big fastq to align with wgse.
Concatenating 12 files with command cat *.fq.gz >combined.fq.gz continues eternally until exhaust my ssd.
I've also tried ungzipping them and concatenating raw but failed - wgse doesn't read resulting file.
And I have metioned that cat creates resulting file which 1,5 times larger then original files together.
what I did wrong?
K36 52%Kivutkalns153, K13 25% North Atlantic, 45% Baltic, MTA 20% Svear
Posts: 963
Threads: 24
Joined: Oct 2023
(02-26-2025, 06:54 AM)Geo Wrote: Give me please advice.
I have got 12 files like 000005450130-E250049362_L01_UDB-412_1.fq.gz about 6-7 Gb each from my wgs provider.
I am trying to combine them into 1 big fastq to align with wgse.
Concatenating 12 files with command cat *.fq.gz >combined.fq.gz continues eternally until exhaust my ssd.
I've also tried ungzipping them and concatenating raw but failed - wgse doesn't read resulting file.
And I have metioned that cat creates resulting file which 1,5 times larger then original files together.
what I did wrong?
Yes, I did such merge few times. Concatenating 12 files should be OK, but you need lot of space.
Try using usegalaxy.org. usegalaxy.org allows to use up to 250 GB data space.
Posts: 95
Threads: 1
Joined: Apr 2024
Gender: Male
Ethnicity: Ru
Nationality: Ru
Y-DNA (P): I1-Y353312
Y-DNA (M): I1a3
mtDNA (M): U5b2a1b
(02-26-2025, 07:02 AM)TanTin Wrote: (02-26-2025, 06:54 AM)Geo Wrote: Give me please advice.
I have got 12 files like 000005450130-E250049362_L01_UDB-412_1.fq.gz about 6-7 Gb each from my wgs provider.
I am trying to combine them into 1 big fastq to align with wgse.
Concatenating 12 files with command cat *.fq.gz >combined.fq.gz continues eternally until exhaust my ssd.
I've also tried ungzipping them and concatenating raw but failed - wgse doesn't read resulting file.
And I have metioned that cat creates resulting file which 1,5 times larger then original files together.
what I did wrong?
Yes, I did such merge few times. Concatenating 12 files should be OK, but you need lot of space.
Try using usegalaxy.org. usegalaxy.org allows to use up to 250 GB data space.
thanks. What would you recommend on galaxy.org - FASTQ joiner ?
Should I extract gz before concatenating? I have heard 2 different opinions - the one that CAT can concatenate gz archives AND the opposite - that not - we should ungzip and cat raw files
K36 52%Kivutkalns153, K13 25% North Atlantic, 45% Baltic, MTA 20% Svear
Posts: 963
Threads: 24
Joined: Oct 2023
(02-26-2025, 07:40 AM)Geo Wrote: (02-26-2025, 07:02 AM)TanTin Wrote: (02-26-2025, 06:54 AM)Geo Wrote: Give me please advice.
I have got 12 files like 000005450130-E250049362_L01_UDB-412_1.fq.gz about 6-7 Gb each from my wgs provider.
I am trying to combine them into 1 big fastq to align with wgse.
Concatenating 12 files with command cat *.fq.gz >combined.fq.gz continues eternally until exhaust my ssd.
I've also tried ungzipping them and concatenating raw but failed - wgse doesn't read resulting file.
And I have metioned that cat creates resulting file which 1,5 times larger then original files together.
what I did wrong?
Yes, I did such merge few times. Concatenating 12 files should be OK, but you need lot of space.
Try using usegalaxy.org. usegalaxy.org allows to use up to 250 GB data space.
thanks. What would you recommend on galaxy.org - FASTQ joiner ?
Should I extract gz before concatenating? I have heard 2 different opinions - the one that CAT can concatenate gz archives AND the opposite - that not - we should ungzip and cat raw files
For small files: I prefer to use unzipped. For the large files: it is mandatory to use the compressed format, otherwise you will be short of space.
|