03-01-2025, 09:28 PM
How to cut adapters? Any tips for dealing with ancient DNA damage?
Converting from fastq to plink
|
03-01-2025, 09:28 PM
How to cut adapters? Any tips for dealing with ancient DNA damage?
(03-01-2025, 09:28 PM)kolompar Wrote: How to cut adapters? Any tips for dealing with ancient DNA damage? #Adapter removal with fastp fastp -i PAIR1.fastq.gz -I PAIR2.fastq.gz --merge --merged_out OUTNAME --include_unmerged --detect_adapter_for_pe -l 25 -g #Same with AdapterRemoval but in two steps, the adapters identified in 1st step must be input in the second step AdapterRemoval --file1 PAIR2.fastq.gz --file2 PAIR2.fastq.gz --identify-adapters AdapterRemoval --file1 PAIR1.fastq.gz --file2 PAIR2.fastq.gz --basename OUTNAME --gzip --adapter1 ADAPTER1 --adapter2 ADAPTER2 --minlength 25 --collapse #For dealing with ancient dna damage I trim with bamtools (replace the 6 with the number of bases you want to trim on each side) bam trimBam NAME.bam NAME.trim.bam 6
03-07-2025, 08:50 PM
Thanks, that's it, just trim? Is that the industry standard procedure?
Do I need to split fastq to cut adapters or what to do with the usual fastq files? Let's take a real sample for example, this Jomon. Two kinds of fastq for autosomes, and the bam seems to be just unmapped reads too. How would you work with that? https://www.ebi.ac.uk/ena/browser/view/S...show=reads Or how about this, aligned bam available but it says unclipped, is that the same thing, is it usable? AADR anno says "library with technical problems". https://www.ebi.ac.uk/ena/browser/view/PRJEB58199 (03-07-2025, 08:50 PM)kolompar Wrote: Thanks, that's it, just trim? Is that the industry standard procedure? Trimming is quite standard and is actually the method which gives the best results, but for low quality samples the trade off in loss of data might be a problem. However if you have UDG libraries you don't have to trim and if you have half-UDG trim only 2 bp. Another commonly used procedure is using mapdamage to reduce the quality score of potential ancient dna damaged reads, so many will get filtered out in the quality filtering step, but I have not tried it. Finally a drastic procedure that works for single end data is discarding any g->a and C->T transitions and even more drastic to keep only transvertion sites. There are options in PileupCaller to do those. (03-07-2025, 08:50 PM)kolompar Wrote: Do I need to split fastq to cut adapters or what to do with the usual fastq files? This is single end data so you can run: fastp -i INNAME.fastq.gz -o OUTNAME.fastq.gz -l 25 -g or AdapterRemoval --file1 INNAME.fastq.gz --basename OUTNAME --gzip --minlength 25 You can also manually specify the adapter if you know them (fastp with -a option, AdapterRemoval with --adapter1) but most of the time the default should work. (03-07-2025, 08:50 PM)kolompar Wrote: Or how about this, aligned bam available but it says unclipped, is that the same thing, is it usable? AADR anno says "library with technical problems". I am no expert, that one only looks like garbage to me.
03-08-2025, 09:50 PM
(03-01-2025, 09:28 PM)kolompar Wrote: How to cut adapters? Any tips for dealing with ancient DNA damage? I always run CUTADAPT for standard ENA FASTQs. you can import FASTQs to your usegalaxy.org account then run fastQC to see if adapters are cleaned or not. Normally, I can run usegalaxy CUTADAPT with default settings to remove adapters although about 20% ENA's FASTQs need special parameters or advanced trimming
05-13-2025, 02:03 AM
For your information:
Some fastq files are not recognized as such format. If you use usegalaxy: it doesn't allow to process such files. If your FASTQ files are not being recognized: there is a tool in usegalaxy: FASTQ Groomer convert between various FASTQ quality formats (Galaxy Version 1.1.5+galaxy2) Just use this tool to read and convert your fastq file. Afther this process you will get another fastq file, which will be available to read and process further. Obviously there are different types of fastq , so this one is not universal for all.
05-20-2025, 11:00 PM
usegalaxy.org has been upgraded a day ago.
Now I see some issues converting fastq. Some files are not recognized for the correct type of dataset. Some stay in queued status and doesn't allow to be used for the next step. Another issue with the Workflow. It seems the previous Workflow may not work under the new interface. All kind of errors.
06-15-2025, 02:57 AM
Small update on step 3.B
old version: Quote: 3) MAKING THE BCF FILE FROM THE BAM On this step: with such settings you will get same data for major / minor alleles. Haploid refers to the presence of a single set of chromosomes in an organism's cells. However humans are biploid. If you need the full details for both major / minor alleles the option to use is: new version: GRCh37 - Human Genome reference assembly GRCh37 / hg19 (--ploidy) |
« Next Oldest | Next Newest »
|