Lingli He1, Kai Song1, Sitan Qiao1, Yabin Chen3, Jiang Li1, Lin Qi1, and Xin Wang1, 2, 3
1 Department of Surgery, The Chinese University of Hong Kong, Hong Kong SAR, China. 2 Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China. 3 Research Institute, The Chinese University of Hong Kong, Shenzhen, China.
The vignette helps the user to do multi-omics colorectal cancer subtyping using sparse mCCA (Witten and Tibshirani (2009)) and weighted average. Paired mRNA expression, microRNA expression, and DNA methylation data from TCGA-COAD and TCGA-READ datasets were used for the training of multi-omics colorectal classifier. The package accepts any combination of mRNA expression, microRNA expression, and DNA methylation data as input.
Please run all analyses in this vignette under version 2.10 of R prior to installation of package MSCRCclassifier, R packages caret, naivebayes should be installed. These packages can be installed directly from CRAN (Comprehensive R Archive Network):
options(repos = c(CRAN = "https://cloud.r-project.org/"))
install.packages(c("caret", "naivebayes"))
library(caret)
library(naivebayes)
library(devtools)
# install the "MSCRCclassifier" package
install_github("CityUHK-CompBio/MSCRCclassifier")
The example dataset used in this analysis comes from a microarray experiment on 566 colon cancer patients, identified by the GEO number GSE39582 (Marisa et al. (2013)).
options(knitr.duplicate.label = "allow")
library(MSCRCclassifier)
library(dplyr)
#> Warning: 程辑包'dplyr'是用R版本4.3.3 来建造的
#>
#> 载入程辑包:'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Load example mRNA expression profile
data("GSE39582_expr")
# Load projection matrices derived from 315 TCGA-COAD and TCGA-READ samples
data("projection_mxs")
dim(projection_mxs$ws[[1]])
#> [1] 951 196
# Projecting each omics data into an unified space
mRNAexprCCA <- t(GSE39582_expr) %*% projection_mxs$ws[[1]]
mRNAexprCCA <- scale(mRNAexprCCA)
mRNAexprCCA[1:5,1:5] # samples in rows and genes in columns
#> [,1] [,2] [,3] [,4] [,5]
#> GSM971957 0.1101710 0.1097191 0.06029913 -1.238951 -0.1891825
#> GSM971958 0.4386480 0.5585471 1.04738767 -2.080878 -0.7772895
#> GSM971959 -1.1243716 -1.0543246 0.54838954 1.235461 0.3554117
#> GSM971960 1.6357041 1.5142172 -1.42486629 1.064347 -1.5720998
#> GSM971961 0.2828736 0.2701505 -0.28705388 -1.389689 0.3078082
a1<-0.4
data_input <- scale(a1*mRNAexprCCA)
colnames(data_input) <- paste("X",1:ncol(projection_mxs$ws[[1]]), sep = "")
data_input[1:5,1:5]
#> X1 X2 X3 X4 X5
#> GSM971957 0.1101710 0.1097191 0.06029913 -1.238951 -0.1891825
#> GSM971958 0.4386480 0.5585471 1.04738767 -2.080878 -0.7772895
#> GSM971959 -1.1243716 -1.0543246 0.54838954 1.235461 0.3554117
#> GSM971960 1.6357041 1.5142172 -1.42486629 1.064347 -1.5720998
#> GSM971961 0.2828736 0.2701505 -0.28705388 -1.389689 0.3078082
The classifyMSCRC
function requires an expression matrix with samples in rows and multi-omics features in columns. The column names of the expression profile should be X1, X2, …, X196. The code chunk below demonstrates how to perform classification using primary colorectal cancer example data.
# MSCRC prediction of primary colorectal cancer
result <- classifyMSCRC(data_input)
label <- result$label
prob <- result$prob %>%
`colnames<-`(paste("MSCRC", 1:5, "_prob", sep = ""))
res <- data.frame(prob, subtype = paste("MSCRC", label, sep = "") ) %>%
`rownames<-`(names(label))
head(res)
#> MSCRC1_prob MSCRC2_prob MSCRC3_prob MSCRC4_prob MSCRC5_prob
#> GSM971957 4.142554e-63 6.260621e-168 1.257628e-41 1.000000e+00 2.772609e-40
#> GSM971958 6.262548e-114 0.000000e+00 1.030506e-08 1.000000e+00 5.121095e-143
#> GSM971959 1.766813e-126 1.000000e+00 9.327311e-248 0.000000e+00 5.320991e-140
#> GSM971960 5.292759e-194 1.000000e+00 1.351652e-263 1.658364e-285 2.211272e-128
#> GSM971961 4.432350e-73 1.189128e-284 1.000000e+00 7.527886e-15 1.601268e-263
#> GSM971962 8.771647e-88 0.000000e+00 1.000000e+00 4.865823e-118 0.000000e+00
#> subtype
#> GSM971957 MSCRC4
#> GSM971958 MSCRC4
#> GSM971959 MSCRC2
#> GSM971960 MSCRC2
#> GSM971961 MSCRC3
#> GSM971962 MSCRC3
#> R version 4.3.2 (2023-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22631)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.utf8
#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8
#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Chinese (Simplified)_China.utf8
#>
#> time zone: Asia/Hong_Kong
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.1.4 MSCRCclassifier_0.1.0 devtools_2.4.5
#> [4] usethis_3.1.0 naivebayes_1.0.0 caret_7.0-1
#> [7] lattice_0.21-9 ggplot2_3.5.1 BiocStyle_2.30.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 timeDate_4041.110 fastmap_1.2.0
#> [4] promises_1.3.2 pROC_1.18.5 digest_0.6.37
#> [7] rpart_4.1.21 mime_0.12 timechange_0.3.0
#> [10] lifecycle_1.0.4 ellipsis_0.3.2 survival_3.5-7
#> [13] magrittr_2.0.3 compiler_4.3.2 rlang_1.1.4
#> [16] sass_0.4.9 tools_4.3.2 yaml_2.3.10
#> [19] data.table_1.16.2 knitr_1.48 htmlwidgets_1.6.4
#> [22] curl_6.0.1 pkgbuild_1.4.4 plyr_1.8.9
#> [25] pkgload_1.4.0 miniUI_0.1.1.1 withr_3.0.2
#> [28] purrr_1.0.2 nnet_7.3-19 grid_4.3.2
#> [31] stats4_4.3.2 urlchecker_1.0.1 profvis_0.4.0
#> [34] xtable_1.8-4 colorspace_2.1-1 future_1.34.0
#> [37] globals_0.16.3 scales_1.3.0 iterators_1.0.14
#> [40] MASS_7.3-60 cli_3.6.3 rmarkdown_2.28
#> [43] remotes_2.5.0 generics_0.1.3 rstudioapi_0.17.0
#> [46] future.apply_1.11.3 reshape2_1.4.4 sessioninfo_1.2.2
#> [49] cachem_1.1.0 stringr_1.5.1 splines_4.3.2
#> [52] parallel_4.3.2 BiocManager_1.30.25 vctrs_0.6.5
#> [55] hardhat_1.4.0 Matrix_1.6-1.1 jsonlite_1.8.9
#> [58] bookdown_0.41 listenv_0.9.1 foreach_1.5.2
#> [61] gower_1.0.1 jquerylib_0.1.4 recipes_1.1.0
#> [64] glue_1.8.0 parallelly_1.38.0 codetools_0.2-19
#> [67] lubridate_1.9.3 stringi_1.8.4 gtable_0.3.6
#> [70] later_1.4.1 munsell_0.5.1 tibble_3.2.1
#> [73] pillar_1.10.1 htmltools_0.5.8.1 ipred_0.9-15
#> [76] lava_1.8.0 R6_2.5.1 evaluate_1.0.1
#> [79] shiny_1.10.0 memoise_2.0.1 httpuv_1.6.15
#> [82] bslib_0.8.0 class_7.3-22 Rcpp_1.0.13
#> [85] nlme_3.1-163 prodlim_2024.06.25 xfun_0.48
#> [88] fs_1.6.4 ModelMetrics_1.2.2.2 pkgconfig_2.0.3