Published Pages | chanaka | Basic steps RNASeq Analysis

This is basic step to understand Galaxy and Galaxy workflow before we start RNASeq Analysis Lab exercise.Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information).Its more important to preprocess the FASTA/FASTQ files before mapping the sequences to the genome - manipulating the sequences to produce better mapping results.Here we are going to check the quality of given FASTQ file.

Here isthe sample FASTQ that we are going to use; 

Galaxy Dataset | asp5_leaf_read1.fq

Aspen leaves paired-end reads (2 into 50 bp) and the target insert size was 200 bp.

Understand the workflow

  1. FASTQ Groomer-This tool offers several conversions options relating to the FASTQ format.
  2. Compute Quality statistics-Creates quality statistics report for the given FASTQ file.
  3. Draw quality score boxplot-Creates a boxplot graph for the quality scores in the library. 
  4. Draw nucleotides distribution chart- Creates a stacked-histogram graph for the nucleotide distribution in the FASTQ library 

We can run above tools one by one and check the quality of the given FASTQ file but here we are going to merge all steps into one single workflow.

Create workflow

Once you logged into Galaxy, import input FASTQ file by clicking  icon.Soon after,the asp5_leaf_read1.fq file available on your history.Then click the Workflow>Create new workflow.Now you can create the workflow similar to  following image.You can do this by drag and drop tools in to workflow area and join each other using arrows.If you find It's difficult to create workflow, you can simply import the following workflow by clicking  icon.

'RNASeq 1

Now we can select asp5_leaf_read1.fq as input and run the workflow (Initial steps to RNASeq Analysis>Run).Here are the example results. 

Analyse the output

According to the box-plot graph manual.

  • An excellent quality library (median quality is 40 for almost all 36 cycles).
  • A relatively good quality library (median quality degrades towards later cycles) 
  • when its low quality median drops quickly.

According to the Nucleotide Distribution Plotter,The following chart shows a growing number of unknown (N) nucleotides towards later cycles (which might indicate a sequencing problem).