Check for new replies
Converting from  fastq to plink
#31
How to cut adapters? Any tips for dealing with ancient DNA damage?
Reply
#32
(03-01-2025, 09:28 PM)kolompar Wrote: How to cut adapters? Any tips for dealing with ancient DNA damage?

#Adapter removal with fastp
fastp -i PAIR1.fastq.gz -I PAIR2.fastq.gz --merge --merged_out OUTNAME --include_unmerged --detect_adapter_for_pe -l 25 -g

#Same with AdapterRemoval but in two steps, the adapters identified in 1st step must be input in the second step
AdapterRemoval --file1 PAIR2.fastq.gz --file2 PAIR2.fastq.gz --identify-adapters
AdapterRemoval --file1 PAIR1.fastq.gz --file2 PAIR2.fastq.gz --basename OUTNAME --gzip --adapter1 ADAPTER1 --adapter2 ADAPTER2 --minlength 25 --collapse

#For dealing with ancient dna damage I trim with bamtools (replace the 6 with the number of bases you want to trim on each side)
bam trimBam NAME.bam NAME.trim.bam 6
kolompar likes this post
Reply
#33
Thanks, that's it, just trim? Is that the industry standard procedure?
Do I need to split fastq to cut adapters or what to do with the usual fastq files?
Let's take a real sample for example, this Jomon. Two kinds of fastq for autosomes, and the bam seems to be just unmapped reads too. How would you work with that?
https://www.ebi.ac.uk/ena/browser/view/S...show=reads
Or how about this, aligned bam available but it says unclipped, is that the same thing, is it usable? AADR anno says "library with technical problems".
https://www.ebi.ac.uk/ena/browser/view/PRJEB58199
Reply
#34
(03-07-2025, 08:50 PM)kolompar Wrote: Thanks, that's it, just trim? Is that the industry standard procedure?

Trimming is quite standard and is actually the method which gives the best results, but for low quality samples the trade off in loss of data might be a problem.  However if you have UDG libraries you don't have to trim and if you have half-UDG trim only 2 bp.

Another commonly used procedure is using mapdamage to reduce the quality score of potential ancient dna damaged reads, so many will get filtered out in the quality filtering step, but I have not tried it.

Finally a drastic procedure that works for single end data is discarding any g->a and C->T transitions and even more drastic to keep only transvertion sites. There are options in PileupCaller to do those.

(03-07-2025, 08:50 PM)kolompar Wrote: Do I need to split fastq to cut adapters or what to do with the usual fastq files?
Let's take a real sample for example, this Jomon. Two kinds of fastq for autosomes, and the bam seems to be just unmapped reads too. How would you work with that?
https://www.ebi.ac.uk/ena/browser/view/S...show=reads

This is single end data so you can run:

fastp -i INNAME.fastq.gz -o OUTNAME.fastq.gz -l 25 -g

or

AdapterRemoval --file1 INNAME.fastq.gz --basename OUTNAME --gzip --minlength 25

You can also manually specify the adapter if you know them (fastp with -a option, AdapterRemoval with --adapter1) but most of the time the default should work.

(03-07-2025, 08:50 PM)kolompar Wrote: Or how about this, aligned bam available but it says unclipped, is that the same thing, is it usable? AADR anno says "library with technical problems".
https://www.ebi.ac.uk/ena/browser/view/PRJEB58199

I am no expert, that one only looks like garbage to me.
kolompar likes this post
Reply
#35
(03-01-2025, 09:28 PM)kolompar Wrote: How to cut adapters? Any tips for dealing with ancient DNA damage?

I always run CUTADAPT for standard ENA FASTQs. 
you can import FASTQs to your usegalaxy.org account then run fastQC to see if adapters are cleaned or not.
 Normally, I can run usegalaxy CUTADAPT with default settings to remove adapters although about 20% ENA's FASTQs need special parameters or advanced trimming
AimSmall and kolompar like this post
Reply
#36
For your information:
Some fastq files are not recognized as such format. If you use usegalaxy: it doesn't allow to process such files.
If your FASTQ files are not being recognized: there is a tool in usegalaxy: 

FASTQ Groomer
convert between various FASTQ quality formats
(Galaxy Version 1.1.5+galaxy2)


Just use this tool to read and convert your  fastq file. Afther this process you will get another fastq file, which will be available to read and process further.
Obviously there are different types of fastq , so this one is not universal for all.
AimSmall likes this post
Reply
#37
usegalaxy.org has been upgraded a day ago.
Now I see some issues converting fastq.
Some files are not recognized for the correct type of dataset.
Some stay in queued status and doesn't allow to be used for the next step.
Another issue with the Workflow. It seems the previous Workflow may not work under the new interface.
All kind of errors.
Reply
#38
Small update on step 3.B
old version:

Quote: 3) MAKING THE BCF FILE FROM THE BAM
-GALAXY: a) use "bcftools mpileup" tool on the output file of preceding step and select the reference genome (same as before = hg19)
b) use "bcftools call" tool on the output file of preceding step and select "1 - Treat all samples as haploid" in "Select predefined ploidy" under "file format options"

On this step: with such settings  you will get same data for major / minor alleles.
Haploid refers to the presence of a single set of chromosomes in an organism's cells.
However humans are biploid.  
If you need the full details for both major / minor alleles the option to use is:

new version:
GRCh37 - Human Genome reference assembly GRCh37 / hg19
(--ploidy)
Reply

Check for new replies

Forum Jump:


Users browsing this thread: 1 Guest(s)