Converting from  fastq to plink
#16
Plink is used for qpAdm or am I wrong?
Maternal grandpa's mtDNA: L1b1a

23andMe: 55.5% European, 33.7% Indigenous American, 4.2% WANA, 3.4% SSA & 3.2% Unassigned
AncestryDNA: 55% Europe/Sephardic Jew, 38% Indigenous Americas-Mexico, 4% MENA & 3% SSA
FamilyTreeDNA: 56.9% Europe, 33% Americas, 8.2% MENA, <2% Horn of Africa & <1% Eastern India
Living DNA: 63.3% West Iberia, 34.4% Native Americas & 2.3% Yorubaland
MyHeritage DNA: 77.5% Mexico, 21.4% Iberian & 1.1% Moroccan

qpAdm
taildiff: 0.959427
60.1% Iberian ± 1.9%, 34% Native American ± 1.9% & 5.8% African ± 0.9%
Reply
#17
(03-03-2024, 08:28 AM)Jalisciense Wrote: Plink is used for qpAdm or am I wrong?

Wrong. Plink can be used for many many things. This is the best software to process the data  , like to convert, extract, merge , have some statistics etc.. You may generate many kind of reports directly from plink.. Or you may just convert your data and use it next with other software like qpAdm. However you can't do everything with plink. One of the best advantages is you may use it both in Linux and Windows, using the same commands and scripts.
Reply
#18
(11-11-2023, 02:27 PM)AimSmall Wrote: If starting with FASTQ files... I'd recommend getting WGSExtract installed.

It can convert FASTQ and call them into BAM files.  From there it will even produce RAW data extracts like 23&Me, AncestryDNA, etc.

With the RAW files, simple process from there to to make your PLINK Bed files.

https://wgsextract.github.io/

The problem with wgsextract is that it produces your raw data with chromosomes out of order which causes issues when converting to plink as it won’t allow you to proceed
Reply
#19
(10-20-2024, 12:20 PM)Genetics189291 Wrote:
(11-11-2023, 02:27 PM)AimSmall Wrote: If starting with FASTQ files... I'd recommend getting WGSExtract installed.

It can convert FASTQ and call them into BAM files.  From there it will even produce RAW data extracts like 23&Me, AncestryDNA, etc.

With the RAW files, simple process from there to to make your PLINK Bed files.

https://wgsextract.github.io/

The problem with wgsextract is that it produces your raw data with chromosomes out of order which causes issues when converting to plink as it won’t allow you to proceed

Can you simply sort the text file?
Reply
#20
(10-20-2024, 12:23 PM)AimSmall Wrote:
(10-20-2024, 12:20 PM)Genetics189291 Wrote:
(11-11-2023, 02:27 PM)AimSmall Wrote: If starting with FASTQ files... I'd recommend getting WGSExtract installed.

It can convert FASTQ and call them into BAM files.  From there it will even produce RAW data extracts like 23&Me, AncestryDNA, etc.

With the RAW files, simple process from there to to make your PLINK Bed files.

https://wgsextract.github.io/

The problem with wgsextract is that it produces your raw data with chromosomes out of order which causes issues when converting to plink as it won’t allow you to proceed

Can you simply sort the text file?

I tried it still fails to recognise the chromosomes in order I have no idea why. I’ve also tried my cram file with the same issue not sure how to proceed
Reply
#21
plink --file input --make-bed --out output --allow-extra-chr --sort-vars

--file input: Specifies the base name of the input .ped and .map files.
--make-bed: Tells PLINK to convert the data to binary format.
--out output: Specifies the base name for the output files.
--sort-vars: Ensures that variants are sorted by chromosome and position.
--allow-extra-chr: Allows for non-standard chromosomes (optional, but useful if you have non-autosomal chromosomes like X, Y, or mitochondrial DNA).
Reply
#22
(10-20-2024, 12:50 PM)AimSmall Wrote: plink --file input --make-bed --out output --allow-extra-chr --sort-vars

--file input: Specifies the base name of the input .ped and .map files.
--make-bed: Tells PLINK to convert the data to binary format.
--out output: Specifies the base name for the output files.
--sort-vars: Ensures that variants are sorted by chromosome and position.
--allow-extra-chr: Allows for non-standard chromosomes (optional, but useful if you have non-autosomal chromosomes like X, Y, or mitochondrial DNA).

I can’t do this because if I even try to recode to .ped .map first it will say my chromosomes are still out of order
Reply
#23
(10-20-2024, 12:50 PM)AimSmall Wrote: plink --file input --make-bed --out output --allow-extra-chr --sort-vars

--file input: Specifies the base name of the input .ped and .map files.
--make-bed: Tells PLINK to convert the data to binary format.
--out output: Specifies the base name for the output files.
--sort-vars: Ensures that variants are sorted by chromosome and position.
--allow-extra-chr: Allows for non-standard chromosomes (optional, but useful if you have non-autosomal chromosomes like X, Y, or mitochondrial DNA).

awk '{
if ($2 == "X") $2 = 23;
else if ($2 == "Y") $2 = 24;
else if ($2 == "MT") $2 = 26;
print
}' Combinedkit.txt | (head -n 1 && tail -n +2 | sort -k2,2n -k3,3n) > merge_data.txt
Reply
#24
(10-20-2024, 12:50 PM)AimSmall Wrote: plink --file input --make-bed --out output --allow-extra-chr --sort-vars

--file input: Specifies the base name of the input .ped and .map files.
--make-bed: Tells PLINK to convert the data to binary format.
--out output: Specifies the base name for the output files.
--sort-vars: Ensures that variants are sorted by chromosome and position.
--allow-extra-chr: Allows for non-standard chromosomes (optional, but useful if you have non-autosomal chromosomes like X, Y, or mitochondrial DNA).

This is what I’m getting now I tried to look for this in the snp file it doesn’t exist I also removed all duplicates 

Fatalx: 
Duplicate key Rs2308040 this doesn’t exist in both datasets I’m confused I tried merging with both same error
Reply
#25
(10-20-2024, 12:50 PM)AimSmall Wrote: plink --file input --make-bed --out output --allow-extra-chr --sort-vars

--file input: Specifies the base name of the input .ped and .map files.
--make-bed: Tells PLINK to convert the data to binary format.
--out output: Specifies the base name for the output files.
--sort-vars: Ensures that variants are sorted by chromosome and position.
--allow-extra-chr: Allows for non-standard chromosomes (optional, but useful if you have non-autosomal chromosomes like X, Y, or mitochondrial DNA).

It wasn’t showing on Linux but windows confused why the duplicate snp command doesn’t work in removing all duplicated
Reply
#26
(10-20-2024, 12:50 PM)AimSmall Wrote: plink --file input --make-bed --out output --allow-extra-chr --sort-vars

--file input: Specifies the base name of the input .ped and .map files.
--make-bed: Tells PLINK to convert the data to binary format.
--out output: Specifies the base name for the output files.
--sort-vars: Ensures that variants are sorted by chromosome and position.
--allow-extra-chr: Allows for non-standard chromosomes (optional, but useful if you have non-autosomal chromosomes like X, Y, or mitochondrial DNA).

This is how i sorted out the duplicated now 

awk '!seen[$2]++' oujda_berkane_hh.bim > oujda_berkane.bim
Reply
#27
Give me please advice.
I have got 12 files like 000005450130-E250049362_L01_UDB-412_1.fq.gz about 6-7 Gb each from my wgs provider.
I am trying to combine them into 1 big fastq to align with wgse.
Concatenating 12 files with command cat *.fq.gz >combined.fq.gz continues eternally until exhaust my ssd.
I've also tried ungzipping them and concatenating raw but failed - wgse doesn't read resulting file.
And I have metioned that cat creates resulting file which 1,5 times larger then original files together.
what I did wrong?
K36 52%Kivutkalns153, K13 25% North Atlantic, 45% Baltic, MTA 20% Svear
Reply
#28
(02-26-2025, 06:54 AM)Geo Wrote: Give me please advice.
I have got 12 files like 000005450130-E250049362_L01_UDB-412_1.fq.gz about 6-7 Gb each from my wgs provider.
I am trying to combine them into 1 big fastq to align with wgse.
Concatenating 12 files with command cat *.fq.gz >combined.fq.gz continues eternally until exhaust my ssd.
I've also tried ungzipping them and concatenating raw but failed - wgse doesn't read resulting file.
And I have metioned that cat creates resulting file which 1,5 times larger then original files together.
what I did wrong?

Yes, I did such merge few times.  Concatenating 12 files should be OK, but you need lot of space.
Try using usegalaxy.org.       usegalaxy.org allows to use up to 250 GB data space.
Geo likes this post
Reply
#29
(02-26-2025, 07:02 AM)TanTin Wrote:
(02-26-2025, 06:54 AM)Geo Wrote: Give me please advice.
I have got 12 files like 000005450130-E250049362_L01_UDB-412_1.fq.gz about 6-7 Gb each from my wgs provider.
I am trying to combine them into 1 big fastq to align with wgse.
Concatenating 12 files with command cat *.fq.gz >combined.fq.gz continues eternally until exhaust my ssd.
I've also tried ungzipping them and concatenating raw but failed - wgse doesn't read resulting file.
And I have metioned that cat creates resulting file which 1,5 times larger then original files together.
what I did wrong?

Yes, I did such merge few times.  Concatenating 12 files should be OK, but you need lot of space.
Try using usegalaxy.org.       usegalaxy.org allows to use up to 250 GB data space.

thanks. What would you recommend on galaxy.org - FASTQ joiner ?
Should I extract gz before concatenating? I have heard 2 different opinions - the one that CAT can concatenate gz archives AND the opposite - that not - we should ungzip and cat raw files
K36 52%Kivutkalns153, K13 25% North Atlantic, 45% Baltic, MTA 20% Svear
Reply
#30
(02-26-2025, 07:40 AM)Geo Wrote:
(02-26-2025, 07:02 AM)TanTin Wrote:
(02-26-2025, 06:54 AM)Geo Wrote: Give me please advice.
I have got 12 files like 000005450130-E250049362_L01_UDB-412_1.fq.gz about 6-7 Gb each from my wgs provider.
I am trying to combine them into 1 big fastq to align with wgse.
Concatenating 12 files with command cat *.fq.gz >combined.fq.gz continues eternally until exhaust my ssd.
I've also tried ungzipping them and concatenating raw but failed - wgse doesn't read resulting file.
And I have metioned that cat creates resulting file which 1,5 times larger then original files together.
what I did wrong?

Yes, I did such merge few times.  Concatenating 12 files should be OK, but you need lot of space.
Try using usegalaxy.org.       usegalaxy.org allows to use up to 250 GB data space.

thanks. What would you recommend on galaxy.org - FASTQ joiner ?
Should I extract gz before concatenating? I have heard 2 different opinions - the one that CAT can concatenate gz archives AND the opposite - that not - we should ungzip and cat raw files

For small files: I prefer to use unzipped.  For the large files: it is mandatory to use the compressed format, otherwise you will be short of space.
Geo likes this post
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)