Manual

RNA-Seq Report

RNA-Seq Analysis Summary

The RNA-Seq analysis consists of the following steps:

Quality control analysis for the reads in each of the samples using FASTQC.
Read Trimming of low quality bases using Trimmomatic.
Read alignment using STAR.
BAM file manipulation using Picard.
Quality control metrics using Qualimap2.
Transcript quantification using HTSeq.
Sample clustering and differential expression analysis using DESeq2.
Enrichment of KEGG pathways and GO terms GO using fgsea. (For those cases in which gene set enrichment data is provided)

Quality Controls

Quality control metrics for each sample in the study.

Info-Boxes:

Library Type: RNA-seq library preparation protocol used for the samples.
Genome Reference: Name of the genome version used as reference.
Transcript Reference: GTF or GFF transcript file used as reference.
Genomic Feature: Genomic feature in the GTF or GFF transcript file used for read count quantification.

Read Distribution:

Bar plot showing the number of reads per sample for the following categories:

Total_Reads: Total number of raw reads in FASTQ(s). Adding R1 and R2 for paired reads.
Post_Trimming_Reads: If trimming is performed, this is the total number of reads in FASTQ(s) after trimming. For paired designs, adding paired and unpaired reads.
Total_Alignments: Total number of mapped reads. Adding paired and unpaired reads. This number counts also secondary alignments.
Unique_Alignments: Total number of mapped reads. Adding paired and unpaired reads. This number ignores secondary alignments.
Not_Duplicated_Reads: Mapped reads once duplicated (marked as PCR or Optical) reads are subtracted from “Unique_Alignments”.

Genomic Origin:

From the Unique_Alignments category, this bar plot shows the number of reads per sample for the following categories:

Exon: Reads that map to an exonic region. According to the GTF file.
Intron: Reads that map to an intronic region. According to the GTF file.
Exon_Overlap: Reads that map simultaneously to an intronic/intergenic region and an exonic region. According to the GTF file.
Intergenic: Reads that map to an intergenic region. According to the GTF file.
Ambiguous: Reads that belong to more than one gene. According to the GTF file.
Unclassified: Mapped reads that cannot be classified according to the GTF file. This category contains reads mapped to contigs or scaffolds not defined in the GTF file.

Gene Coverage Profile:

Plots displaying the log coverage along the transcript length (normalized from 0 to 100):

All Genes: Considering all genes defined in the GTF file.
Top 500 Genes: Considering the top 500 expressed genes in each sample.
Bottom 500 Genes: Considering the bottom 500 expressed genes in each sample.

Prime Bias:

Bar plot showing ratios between mean coverage at the 5’ region, the 3’ region and the whole transcript.

Five_Prime_Bias: Ratio between mean coverage at the 5’ region (first 100 bp) and the whole transcript.
Three_Prime_Bias: Ratio between mean coverage at the 3’ region (last 100 bp) and the whole transcript.
Five_Three_Prime_Bias: Ratio between “Three_Prime_Bias” and “Five_Prime_Bias”.

Trimming & Alignment:

Summary table with the quality control metrics for each sample in the study.

Differential Analysis

Section containing the differential expression results from DESeq2.

Info-Boxes:

Differential Analysis: DESeq2 as the algorithm used to test for differential expression genes.
Design: Formula used by DESeq2 in the differential expression comparison.
Significance: Multiple test correction method and criteria to consider genes as differentially expressed.
Nonzero Count Genes: Total number of genes expressed in the samples.
Differentially Expressed Genes: Number of differentially expressed genes according to the criteria defined in the Significance info-box.
Contrast: Factors considered in the comparison.

Plots:

6 plots are presented to summarize the differential expression analysis. All plots can be further inspected using the explore link.

PCA plot: Principal Component Analysis considering all the samples in the analysis. This plot is created using the data expression.
Adjusted p-value histogram: Adjusted p-value histogram for the differential expression genes.
MA plot: Plot displaying the log2FoldChange between conditions for each gene versus the mean of normalized counts for all the samples. Differentially expressed genes will be highlighted in red.
Volcano plot: Plot displaying the -log10 p-value of every gene versus its log2 fold change. Differentially expressed genes will be highlighted in red.
Heatmap: Clustering of samples and genes in a two dimensional heatmap. Only differentially expressed genes are considered to create this heatmap. In case no differential expressed genes are found the top 100 genes are displayed. Gene expression magnitude is obtained by ranking the expression of all genes in the plot. Gene expression magnitude ranges from 0 to 1 .
Correlation heatmap: Sample to sample distance shown as a heatmap considering all genes in the study. Distance between samples is measured using the euclidean method. Scale magnitude will depend on the distance between samples. A greater value implies a greater distance.

DESeq2 Diff Table:

Interactive and exportable table with the results from the differential expression analysis. The following information is included:

feature_id: Gene id. Cells in this column contain a link to a per sample gene expression plot. Depending on the approach, plots can be of type box-plot or time-series.
baseMean: Mean expression considering all samples in the study.
log2FoldChange: Expression log2 fold change between conditions.
lfcSE: Expression log2 fold change standard error.
stat: Value of the statistical test used to calculate the differential expression.
pvalue: p-value of the differential expression test.
padj: Adjusted p-value from the Benjamini–Hochberg (BH) procedure.
[annotation_name]: In case an annotation step was performed, additional columns may exist.

DESeq2 Norm Table:

Interactive and exportable table with the per sample normalized counts. The following information is included:

feature_id: Gene id. Cells in this column contain a link to a per sample gene expression plot. Depending on the approach, plots can be of type box-plot or time-series.
[sample_name]: Normalized counts per samples. The table will contain one column for each sample in the study.
[annotation_name]: In case an annotation step was performed, additional columns may exist.

GSEA

(For those cases in which gene set enrichment data is provided)

Section containing the Gene Set Enrichment Analysis (GSEA) results for KEGG pathways and GO terms (Molecular Function, Biological Process, Cellular Component). The analysis is performed using fgsea for fast preranked gene set enrichment analysis based on the differential expression results.

Info-Boxes:

KEGG Significantly Enriched Terms: Number of significantly enriched KEGG pathways, according to the adjusted p-value threshold described in the Significance info-box.
GO-MF Significantly Enriched Terms: Number of significantly enriched GO-MF terms, according to the adjusted p-value threshold described in the Significance info-box.
GO-BP Significantly Enriched Terms: Number of significantly enriched GO-BP terms, according to the adjusted p-value threshold described in the Significance info-box.
GO-CC Significantly Enriched Terms: Number of significantly enriched GO-CC terms, according to the adjusted p-value threshold described in the Significance info-box.

Plots:

Bubble chart plots displaying the enrichment score versus the adjusted p-value for each enriched pathway or term. Circle size relates to the number of genes in the study that significantly contribute to the pathway or term enrichment. Circle color classifies the pathway or term as differentially expressed or not. The plots can be further inspected using the explore link.

KEGG Plot: KEGG pathway enrichment bubble chart plot.
GO-MF Plot: GO enrichment bubble chart plot for Molecular Function terms.
GO-BP Plot: GO enrichment bubble chart plot for Biological Process terms.
GO-CC Plot: GO enrichment bubble chart plot for Cellular Component terms.

GSEA Tables:

Interactive and exportable tables with the results from the enrichment analysis. All tables include the following information:

pathway: KEGG pathway or GO term name. Cells in this column include a link to access additional information.
pval: Enrichment p-value.
padj: Enrichment adjusted p-value using BH multiple correction method.
log2err: Expected error for the standard deviation of the p-value.
ES: Enrichment score.
NES: Normalized enrichment score.
totalGeneCount: Total number of genes in the KEGG pathway or GO term.
diffGeneCount: Number of genes in the study that significantly contribute to the KEGG pathway or GO term enrichment.
diffGeneList: List of genes in the study that significantly contribute to the KEGG pathway or GO term enrichment.

Quality Controls

Row

Library Type

Genome Reference

Transcript Reference

Genomic Feature

Row

Read Distribution

Genomic Origin

Row

Gene Coverage Profile

Prime Bias

Row

Trimming & Alignment

Differential Analysis

Row

Differential Analysis

Design

Significance

Row

PCA plot explore

Adjusted p-value histogram explore

MA plot explore

Row

Nonzero Count Genes

Differentially Expressed Genes

Contrast

Row

Volcano plot explore

Heatmap explore

Correlation heatmap explore

Row

DESeq2 Diff Table

Expression Tables

GSEA

Row

KEGG Significantly Enriched Terms

GO-MF Significantly Enriched Terms

GO-BP Significantly Enriched Terms

GO-CC Significantly Enriched Terms

Row

KEGG Plot explore

GO-MF Plot explore

GO-BP Plot explore

GO-CC Plot explore

Row

KEGG Table

GO-MF Table

GO-BP Table

GO-CC Table

Manual

RNA-Seq Report

RNA-Seq Analysis Summary

Quality Controls

Info-Boxes:

Read Distribution:

Genomic Origin:

Gene Coverage Profile:

Prime Bias:

Trimming & Alignment:

Differential Analysis

Info-Boxes:

Plots:

DESeq2 Diff Table:

DESeq2 Norm Table:

GSEA

Info-Boxes:

Plots:

GSEA Tables: