set up

Parent folder

## /Users/lpantano/repos/pilm/templates-rmd-de

Check params

## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter,
##     Find, get, grep, grepl, intersect, is.unsorted, lapply, Map,
##     mapply, match, mget, order, paste, pmax, pmax.int, pmin,
##     pmin.int, Position, rank, rbind, Reduce, rownames, sapply,
##     setdiff, sort, table, tapply, union, unique, unsplit, which,
##     which.max, which.min
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
## 
##     expand.grid
## Loading required package: IRanges
## 
## Attaching package: 'IRanges'
## The following object is masked _by_ '.GlobalEnv':
## 
##     universe
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: DelayedArray
## Loading required package: matrixStats
## 
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:Biobase':
## 
##     anyMissing, rowMedians
## Loading required package: BiocParallel
## 
## Attaching package: 'DelayedArray'
## The following objects are masked from 'package:matrixStats':
## 
##     colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
## The following objects are masked from 'package:base':
## 
##     aperm, apply, rowsum

Load data

## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang

Quality metrics

Full qualialyt metrics can be found at [multiqc_report.html].

PCA

Principal component analysis is a technique to reduce the dimensionality of the data to allow visualization in two dimensions PCA. It takes all the gene abundances for the samples and creates a series of principal components (PCs) to explain the variability in the data. We normally plot the first two PCs for simplicity.

Results

We performed the analysis using a BH adjusted P value cutoff of 0.05 and a log fold-change (LFC) ratio cutoff of 0.

Alpha level (FDR) cutoffs

Let’s take a look at the number of genes we get with different false discovery rate (FDR) cutoffs. These tests subset P values that have been multiple test corrected using the Benjamini Hochberg (BH) method [@Benjamini:1995ws].

condition_treated_vs_untreated

0.1 0.05 0.01
LFC > 0 (up) LFC > 0 (up) : 613, 5% LFC > 0 (up) : 481, 3.9% LFC > 0 (up) : 328, 2.7%
LFC < 0 (down) LFC < 0 (down) : 717, 5.8% LFC < 0 (down) : 587, 4.7% LFC < 0 (down) : 403, 3.3%
outliers outliers [1] : 0, 0% outliers [1] : 0, 0% outliers [1] : 0, 0%
low counts low counts [2] : 3560, 29% low counts [2] : 3560, 29% low counts [2] : 3560, 29%
cutoff (mean count < 4) (mean count < 4) (mean count < 4)
NULL

type_paired-end_vs_single-read

0.1 0.05 0.01
LFC > 0 (up) LFC > 0 (up) : 621, 5% LFC > 0 (up) : 411, 3.3% LFC > 0 (up) : 192, 1.6%
LFC < 0 (down) LFC < 0 (down) : 657, 5.3% LFC < 0 (down) : 491, 4% LFC < 0 (down) : 284, 2.3%
outliers outliers [1] : 0, 0% outliers [1] : 0, 0% outliers [1] : 0, 0%
low counts low counts [2] : 3797, 31% low counts [2] : 3797, 31% low counts [2] : 3797, 31%
cutoff (mean count < 5) (mean count < 5) (mean count < 5)
NULL

Plots

Mean average (MA)

An MA plot compares transformed counts on M (log ratio) and A (mean average) scales [@Yang:2002ty]. Blue arrows represent a correction of the log2 Fold Change values due to variability. Normally this happens when the gene shows high variation and the log2FC is not accurate, here the model tries to estimate them. See this paper for more information:

Love, M.I., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550. https://doi.org/10.1186/s13059-014-0550-8

condition_treated_vs_untreated

type_paired-end_vs_single-read

Volcano

A volcano plot compares significance (BH-adjusted P value) against fold change (log2) [@Cui:2003kh; @Li:2014fv]. Genes in the green box with text labels have an adjusted P value are likely to be the top candidate genes of interest.

condition_treated_vs_untreated

type_paired-end_vs_single-read

PCA

Principal component analysis is a technique to reduce the dimensionality of the data to allow visualization in two dimensions PCA. It takes all the gene abundances for the samples and creates a series of principal components (PCs) to explain the variability in the data. We normally plot the first two PCs for simplicity.

condition_treated_vs_untreated

type_paired-end_vs_single-read

Gene Expression Patterns

In general, it is useful to cluster the significant genes together in similar patterns across samples. degPatterns uses standard expression correlation technique to generate a similarity matrix that can be clustered hierarchically and then split into groups of genes that follow similar expression patterns [degPatterns][].

We defined significance as genes with abs(log2FC) > and FDR < .

Top tables

Top 10 genes are shown order by False Discovery Rate. Genes below 0.05 are considered significant.

condition_treated_vs_untreated

gene baseMean log2FoldChange lfcSE pvalue padj gene_name cluster
FBgn0003360 4343.0 -3.110 0.1087 0 0 sesB 1
FBgn0026562 43909.3 -2.468 0.0874 0 0 SPARC 1
FBgn0039155 730.6 -4.574 0.1661 0 0 Kal1 1
FBgn0025111 1501.4 2.837 0.1040 0 0 Ant2 2
FBgn0029167 3706.1 -2.186 0.0961 0 0 Hml 1
FBgn0035085 638.2 -2.562 0.1290 0 0 CG3770 1
FBgn0039827 261.9 -4.111 0.2161 0 0 CG1544 1
FBgn0034736 225.9 -3.472 0.2049 0 0 gas 1
FBgn0029896 489.9 -2.415 0.1475 0 0 CG3168 1
FBgn0000071 342.2 2.588 0.1583 0 0 Ama 2

type_paired-end_vs_single-read

gene baseMean log2FoldChange lfcSE pvalue padj gene_name cluster
FBgn0003943 6603.89 2.545 0.0989 0 0 Ubi-p63E 2
FBgn0053105 294.59 2.575 0.1783 0 0 p24-2 2
FBgn0029856 1610.61 3.517 0.3350 0 0 CG11700 2
FBgn0083990 62.72 4.585 0.4321 0 0 NA 2
FBgn0086558 2752.52 1.020 0.0990 0 0 Ubi-p5E 2
FBgn0028430 1730.86 -1.138 0.1148 0 0 He 1
FBgn0030755 127.86 2.154 0.2290 0 0 NA 2
FBgn0037913 490.73 -1.591 0.1956 0 0 fabp 2
FBgn0039914 593.37 -1.045 0.1176 0 0 mav 1
FBgn0053909 139.45 10.353 1.1838 0 0 His4r 2

File downloads

The results are saved as gzip-compressed comma separated values (CSV). Gzip compression is natively supported on macOS and Linux-based operating systems. If you’re running Windows, we recommend installing 7-Zip. CSV files can be opened in Excel or RStudio.

Count matrices

Tables are under /Users/lpantano/repos/pilm/templates-rmd-de/docs/de/results/counts folder:

  • normalizedCounts.csv.gz: Use to evaluate individual genes and/or generate plots. These counts are normalized for the variation in sequencing depth across samples.
  • tpm.csv.gz: Transcripts per million, scaled by length and also suitable for plotting.
  • rawCounts.csv.gz: Only use to perform a new differential expression analysis. These counts will vary across samples due to differences in sequencing depth, and have not been normalized. Do not use this file for plotting genes.

Differential expression tables

Tables are under /Users/lpantano/repos/pilm/templates-rmd-de/docs/de/results/differential_expression folder:

DEG tables are sorted by BH-adjusted P value, and contain the following columns:

  • ensgene: Ensembl gene identifier.
  • baseMean: Mean of the normalized counts per gene for all samples.
  • log2FoldChange: log2 fold change.
  • lfcSE: log2 standard error.
  • stat: Wald statistic.
  • pvalue: Walt test P value.
  • padj: BH adjusted Wald test P value (corrected for multiple comparisons; aka FDR).
  • externalGeneName: Ensembl name (a.k.a. symbol).
  • description: Ensembl description.
  • geneBiotype: Ensembl biotype (e.g. protein_coding).

Methods

RNA-seq counts were generated by bcbio using salmon [@salmon]. Counts were imported into R using tximport [@tximport] and DESeq2 [@DESeq2]. Gene annotations were obtained from Ensembl. Plots were generated by ggplot2 [@ggplot2]. Heatmaps were generated by pheatmap [@pheatmap].

R session information

## ─ Session info ──────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.0 (2019-04-26)
##  os       macOS Mojave 10.14.3        
##  system   x86_64, darwin15.6.0        
##  ui       RStudio                     
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/New_York            
##  date     2019-06-11                  
## 
## ─ Packages ──────────────────────────────────────────────────────────────
##  package              * version   date       lib source        
##  acepack                1.4.1     2016-10-29 [1] CRAN (R 3.6.0)
##  annotate               1.62.0    2019-05-02 [1] Bioconductor  
##  AnnotationDbi          1.46.0    2019-05-02 [1] Bioconductor  
##  assertthat             0.2.1     2019-03-21 [1] CRAN (R 3.6.0)
##  backports              1.1.4     2019-04-10 [1] CRAN (R 3.6.0)
##  base64enc              0.1-3     2015-07-28 [1] CRAN (R 3.6.0)
##  Biobase              * 2.44.0    2019-05-02 [1] Bioconductor  
##  BiocGenerics         * 0.30.0    2019-05-02 [1] Bioconductor  
##  BiocParallel         * 1.18.0    2019-05-02 [1] Bioconductor  
##  bit                    1.1-14    2018-05-29 [1] CRAN (R 3.6.0)
##  bit64                  0.9-7     2017-05-08 [1] CRAN (R 3.6.0)
##  bitops                 1.0-6     2013-08-17 [1] CRAN (R 3.6.0)
##  blob                   1.1.1     2018-03-25 [1] CRAN (R 3.6.0)
##  broom                  0.5.2     2019-04-07 [1] CRAN (R 3.6.0)
##  callr                  3.2.0     2019-03-15 [1] CRAN (R 3.6.0)
##  cellranger             1.1.0     2016-07-27 [1] CRAN (R 3.6.0)
##  checkmate              1.9.3     2019-05-03 [1] CRAN (R 3.6.0)
##  circlize               0.4.6     2019-04-03 [1] CRAN (R 3.6.0)
##  cli                    1.1.0     2019-03-19 [1] CRAN (R 3.6.0)
##  clue                   0.3-57    2019-02-25 [1] CRAN (R 3.6.0)
##  cluster                2.0.9     2019-05-01 [1] CRAN (R 3.6.0)
##  colorspace             1.4-1     2019-03-18 [1] CRAN (R 3.6.0)
##  ComplexHeatmap         2.0.0     2019-05-02 [1] Bioconductor  
##  ConsensusClusterPlus   1.48.0    2019-05-02 [1] Bioconductor  
##  cowplot                0.9.4     2019-01-08 [1] CRAN (R 3.6.0)
##  crayon                 1.3.4     2017-09-16 [1] CRAN (R 3.6.0)
##  data.table             1.12.2    2019-04-07 [1] CRAN (R 3.6.0)
##  DBI                    1.0.0     2018-05-02 [1] CRAN (R 3.6.0)
##  DEGreport            * 1.21.1    2019-06-11 [1] local         
##  DelayedArray         * 0.10.0    2019-05-02 [1] Bioconductor  
##  desc                   1.2.0     2018-05-01 [1] CRAN (R 3.6.0)
##  DESeq2               * 1.24.0    2019-05-02 [1] Bioconductor  
##  devtools               2.0.2     2019-04-08 [1] CRAN (R 3.6.0)
##  digest                 0.6.19    2019-05-20 [1] CRAN (R 3.6.0)
##  dplyr                * 0.8.1     2019-05-14 [1] CRAN (R 3.6.0)
##  edgeR                  3.26.4    2019-05-27 [1] Bioconductor  
##  evaluate               0.14      2019-05-28 [1] CRAN (R 3.6.0)
##  forcats              * 0.4.0     2019-02-17 [1] CRAN (R 3.6.0)
##  foreign                0.8-71    2018-07-20 [1] CRAN (R 3.6.0)
##  formatR                1.6       2019-03-05 [1] CRAN (R 3.6.0)
##  Formula                1.2-3     2018-05-03 [1] CRAN (R 3.6.0)
##  fs                     1.3.1     2019-05-06 [1] CRAN (R 3.6.0)
##  genefilter             1.66.0    2019-05-02 [1] Bioconductor  
##  geneplotter            1.62.0    2019-05-02 [1] Bioconductor  
##  generics               0.0.2     2018-11-29 [1] CRAN (R 3.6.0)
##  GenomeInfoDb         * 1.20.0    2019-05-02 [1] Bioconductor  
##  GenomeInfoDbData       1.2.1     2019-05-06 [1] Bioconductor  
##  GenomicRanges        * 1.36.0    2019-05-02 [1] Bioconductor  
##  GetoptLong             0.1.7     2018-06-10 [1] CRAN (R 3.6.0)
##  ggdendro               0.1-20    2016-04-27 [1] CRAN (R 3.6.0)
##  ggplot2              * 3.1.1     2019-04-07 [1] CRAN (R 3.6.0)
##  ggrepel                0.8.1     2019-05-07 [1] CRAN (R 3.6.0)
##  git2r                  0.25.2    2019-03-19 [1] CRAN (R 3.6.0)
##  GlobalOptions          0.1.0     2018-06-09 [1] CRAN (R 3.6.0)
##  glue                   1.3.1     2019-03-12 [1] CRAN (R 3.6.0)
##  gridExtra              2.3       2017-09-09 [1] CRAN (R 3.6.0)
##  gtable                 0.3.0     2019-03-25 [1] CRAN (R 3.6.0)
##  haven                  2.1.0     2019-02-19 [1] CRAN (R 3.6.0)
##  highr                  0.8       2019-03-20 [1] CRAN (R 3.6.0)
##  Hmisc                  4.2-0     2019-01-26 [1] CRAN (R 3.6.0)
##  hms                    0.4.2     2018-03-10 [1] CRAN (R 3.6.0)
##  htmlTable              1.13.1    2019-01-07 [1] CRAN (R 3.6.0)
##  htmltools              0.3.6     2017-04-28 [1] CRAN (R 3.6.0)
##  htmlwidgets            1.3       2018-09-30 [1] CRAN (R 3.6.0)
##  httr                   1.4.0     2018-12-11 [1] CRAN (R 3.6.0)
##  IRanges              * 2.18.1    2019-05-31 [1] Bioconductor  
##  jsonlite               1.6       2018-12-07 [1] CRAN (R 3.6.0)
##  knitr                * 1.23      2019-05-18 [1] CRAN (R 3.6.0)
##  labeling               0.3       2014-08-23 [1] CRAN (R 3.6.0)
##  lasso2                 1.2-20    2018-11-27 [1] CRAN (R 3.6.0)
##  lattice                0.20-38   2018-11-04 [1] CRAN (R 3.6.0)
##  latticeExtra           0.6-28    2016-02-09 [1] CRAN (R 3.6.0)
##  lazyeval               0.2.2     2019-03-15 [1] CRAN (R 3.6.0)
##  limma                  3.40.2    2019-05-17 [1] Bioconductor  
##  locfit                 1.5-9.1   2013-04-20 [1] CRAN (R 3.6.0)
##  logging                0.9-107   2019-02-10 [1] CRAN (R 3.6.0)
##  lubridate              1.7.4     2018-04-11 [1] CRAN (R 3.6.0)
##  magrittr               1.5       2014-11-22 [1] CRAN (R 3.6.0)
##  MASS                   7.3-51.4  2019-03-31 [1] CRAN (R 3.6.0)
##  Matrix                 1.2-17    2019-03-22 [1] CRAN (R 3.6.0)
##  matrixStats          * 0.54.0    2018-07-23 [1] CRAN (R 3.6.0)
##  memoise                1.1.0     2017-04-21 [1] CRAN (R 3.6.0)
##  mnormt                 1.5-5     2016-10-15 [1] CRAN (R 3.6.0)
##  modelr                 0.1.4     2019-02-18 [1] CRAN (R 3.6.0)
##  munsell                0.5.0     2018-06-12 [1] CRAN (R 3.6.0)
##  nlme                   3.1-140   2019-05-12 [1] CRAN (R 3.6.0)
##  nnet                   7.3-12    2016-02-02 [1] CRAN (R 3.6.0)
##  Nozzle.R1              1.1-1     2013-05-15 [1] CRAN (R 3.6.0)
##  pheatmap             * 1.0.12    2019-01-04 [1] CRAN (R 3.6.0)
##  pillar                 1.4.1     2019-05-28 [1] CRAN (R 3.6.0)
##  pkgbuild               1.0.3     2019-03-20 [1] CRAN (R 3.6.0)
##  pkgconfig              2.0.2     2018-08-16 [1] CRAN (R 3.6.0)
##  pkgload                1.0.2     2018-10-29 [1] CRAN (R 3.6.0)
##  plyr                   1.8.4     2016-06-08 [1] CRAN (R 3.6.0)
##  png                    0.1-7     2013-12-03 [1] CRAN (R 3.6.0)
##  prettyunits            1.0.2     2015-07-13 [1] CRAN (R 3.6.0)
##  processx               3.3.1     2019-05-08 [1] CRAN (R 3.6.0)
##  ps                     1.3.0     2018-12-21 [1] CRAN (R 3.6.0)
##  psych                  1.8.12    2019-01-12 [1] CRAN (R 3.6.0)
##  purrr                * 0.3.2     2019-03-15 [1] CRAN (R 3.6.0)
##  R6                     2.4.0     2019-02-14 [1] CRAN (R 3.6.0)
##  RColorBrewer           1.1-2     2014-12-07 [1] CRAN (R 3.6.0)
##  Rcpp                   1.0.1     2019-03-17 [1] CRAN (R 3.6.0)
##  RCurl                  1.95-4.12 2019-03-04 [1] CRAN (R 3.6.0)
##  readr                * 1.3.1     2018-12-21 [1] CRAN (R 3.6.0)
##  readxl                 1.3.1     2019-03-13 [1] CRAN (R 3.6.0)
##  remotes                2.0.4     2019-04-10 [1] CRAN (R 3.6.0)
##  reshape                0.8.8     2018-10-23 [1] CRAN (R 3.6.0)
##  rjson                  0.2.20    2018-06-08 [1] CRAN (R 3.6.0)
##  rlang                  0.3.4     2019-04-07 [1] CRAN (R 3.6.0)
##  rmarkdown              1.13      2019-05-22 [1] CRAN (R 3.6.0)
##  rmdCore                0.1.0     2019-05-31 [1] local         
##  rpart                  4.1-15    2019-04-12 [1] CRAN (R 3.6.0)
##  rprojroot              1.3-2     2018-01-03 [1] CRAN (R 3.6.0)
##  RSQLite                2.1.1     2018-05-06 [1] CRAN (R 3.6.0)
##  rstudioapi             0.10      2019-03-19 [1] CRAN (R 3.6.0)
##  rvest                  0.3.4     2019-05-15 [1] CRAN (R 3.6.0)
##  S4Vectors            * 0.22.0    2019-05-02 [1] Bioconductor  
##  scales                 1.0.0     2018-08-09 [1] CRAN (R 3.6.0)
##  sessioninfo            1.1.1     2018-11-05 [1] CRAN (R 3.6.0)
##  shape                  1.4.4     2018-02-07 [1] CRAN (R 3.6.0)
##  stringi                1.4.3     2019-03-12 [1] CRAN (R 3.6.0)
##  stringr              * 1.4.0     2019-02-10 [1] CRAN (R 3.6.0)
##  SummarizedExperiment * 1.14.0    2019-05-02 [1] Bioconductor  
##  survival               2.44-1.1  2019-04-01 [1] CRAN (R 3.6.0)
##  testthat               2.1.1     2019-04-23 [1] CRAN (R 3.6.0)
##  tibble               * 2.1.2     2019-05-29 [1] CRAN (R 3.6.0)
##  tidyr                * 0.8.3     2019-03-01 [1] CRAN (R 3.6.0)
##  tidyselect             0.2.5     2018-10-11 [1] CRAN (R 3.6.0)
##  tidyverse            * 1.2.1     2017-11-14 [1] CRAN (R 3.6.0)
##  usethis                1.5.0     2019-04-07 [1] CRAN (R 3.6.0)
##  withr                  2.1.2     2018-03-15 [1] CRAN (R 3.6.0)
##  xfun                   0.7       2019-05-14 [1] CRAN (R 3.6.0)
##  XML                    3.98-1.19 2019-03-06 [1] CRAN (R 3.6.0)
##  xml2                   1.2.0     2018-01-24 [1] CRAN (R 3.6.0)
##  xtable                 1.8-4     2019-04-21 [1] CRAN (R 3.6.0)
##  XVector                0.24.0    2019-05-02 [1] Bioconductor  
##  yaml                   2.2.0     2018-07-25 [1] CRAN (R 3.6.0)
##  zlibbioc               1.30.0    2019-05-02 [1] Bioconductor  
## 
## [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
## $se_file
## [1] "data/se.rds"
## 
## $design
## [1] "~ type + condition"
## 
## $contrast
## [1] "condition.treated_vs_untreated,type.paired-end_vs_single-read"
## 
## $metadata
## [1] "condition" "type"     
## 
## $alpha
## [1] 0.05
## 
## $lfc
## [1] 0
## 
## $slot
## [1] "vst"
## 
## $cache_dir
## [1] "../cache"
## 
## $cache_on
## [1] FALSE
## 
## $output_dir
## [1] "/Users/lpantano/repos/pilm/templates-rmd-de/docs/de"