Lingli He1 and Xin Wang1
1 Department of Surgery, The Chinese University of Hong Kong, Hong Kong SAR, China.
The vignette helps the user to do multi-omics high-grade serous ovarian cancer subtyping using sparse mCCA (Witten and Tibshirani (2009)) and weighted average. Paired mRNA expression, microRNA expression, DNA methylation, copy number variation, and mutation data from TCGA-OV dataset were used for the training of multi-omics high-grade serous ovarian cancer classifier. The package accepts any combination of mRNA expression, microRNA expression, DNA methylation, copy number variation, and mutation data as input.
Please run all analyses in this vignette under version 2.10 of R prior to installation of package MSOCclassifier, R packages caret should be installed. The package can be installed directly from CRAN (Comprehensive R Archive Network):
options(repos = c(CRAN = "https://cloud.r-project.org/"))
install.packages("caret")
library(caret)
library(devtools)
# install the "MSOCclassifier" package
install_github("Carpentierbio/MSOCclassifier")
The example dataset used in this analysis comes from the ICGC-OV cohort on 79 ovarian cancer patients, downloaded from https://dcc.icgc.org/projects/OV-AU (This link may no longer be accessible as of now).
options(knitr.duplicate.label = "allow")
library(MSOCclassifier)
library(dplyr)
#> Warning: 程辑包'dplyr'是用R版本4.3.3 来建造的
#>
#> 载入程辑包:'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Load example multi-omics expression profile
data("rna_log_tpm_ICGC")
data("mir_ICGC")
data("methy_M_ICGC")
data("cnv_ICGC")
data("mut_ICGC")
# Load projection matrices derived from 226 TCGA-OV samples
data("TCGA_projection_mx")
# Load pre-processed TCGA-OV multi-omics data for feature selection in validation cohort
data("TCGAmRNAscaled")
data("TCGAmiRNAscaled")
data("TCGAmetscaled")
data("TCGAcnvscaled")
data("TCGAmutscaled")
# ensure that the number and order of features in the test data are identical to those in the training data (TCGA)
geneexp_bf_mapping = rna_log_tpm_ICGC[colnames(TCGAmRNAscaled), ]
geneexp_bf_mapping = t(scale(t(geneexp_bf_mapping)))
mirexp_bf_mapping = mir_ICGC[colnames(TCGAmiRNAscaled), ]
mirexp_bf_mapping = t(scale(t(mirexp_bf_mapping)))
methy_bf_mapping = methy_M_ICGC[colnames(TCGAmetscaled), ]
methy_bf_mapping = t(scale(t(methy_bf_mapping)))
cnv_bf_mapping = cnv_ICGC[colnames(TCGAcnvscaled), ]
cnv_bf_mapping = t(scale(t(cnv_bf_mapping)))
mut_bf_mapping = mut_ICGC[colnames(TCGAmutscaled), ]
mut_bf_mapping = t(scale(t(mut_bf_mapping)))
mut_bf_mapping[is.na(mut_bf_mapping)]=0
# Projecting each omics data into an unified space
mRNAexprCCA = t(geneexp_bf_mapping) %*% TCGA_projection_mx$ws[[1]]
mRNAexprCCA_2 = scale(mRNAexprCCA)
miRNAexprCCA = t(mirexp_bf_mapping) %*% TCGA_projection_mx$ws[[2]]
miRNAexprCCA_2 = scale(miRNAexprCCA)
methyexprCCA = t(methy_bf_mapping) %*% TCGA_projection_mx$ws[[3]]
methyexprCCA_2 = scale(methyexprCCA)
cnvexprCCA = t(cnv_bf_mapping) %*% TCGA_projection_mx$ws[[4]]
cnvexprCCA_2 = scale(cnvexprCCA)
mutexprCCA = t(mut_bf_mapping) %*% TCGA_projection_mx$ws[[5]]
mutexprCCA_2 = scale(mutexprCCA) # samples in rows and genes in columns
# Multi-omics data fusion
a1 = a2 = a3 = a4 = a5 = 0.2
data_input = a1*mRNAexprCCA_2 + a2*miRNAexprCCA_2 + a3*methyexprCCA_2 + a4*cnvexprCCA_2 + a5*mutexprCCA_2
colnames(data_input) = paste("X",1:ncol(data_input),sep = "")
The classifyMSOC function requires an expression matrix with samples in rows and multi-omics features in columns. The column names of the expression profile should be X1, X2, …, X100. The code chunk below demonstrates how to perform classification using primary high-grade serous ovarian cancer example data.
# MSOC prediction of primary high-grade serous ovarian cancer
result <- classifyMSOC(data_input)
label <- result$label
prob <- result$prob %>%
`colnames<-`(paste("MSOC", 1:5, "_prob", sep = ""))
res <- data.frame(prob, subtype = paste("MSOC", label, sep = "") ) %>%
`rownames<-`(names(label))
head(res)
#> MSOC1_prob MSOC2_prob MSOC3_prob MSOC4_prob MSOC5_prob subtype
#> DO46325 0.11748928 0.21354437 0.05026826 0.21696765 0.40173043 MSOCCluster5
#> DO46326 0.06550225 0.07277660 0.07769433 0.48309416 0.30093266 MSOCCluster4
#> DO46327 0.21244671 0.24825725 0.08599393 0.05895582 0.39434629 MSOCCluster5
#> DO46328 0.59245296 0.03296241 0.26581691 0.08998380 0.01878392 MSOCCluster1
#> DO46329 0.12995571 0.36259823 0.06636200 0.17749717 0.26358689 MSOCCluster2
#> DO46330 0.28355179 0.06877282 0.58781733 0.01797763 0.04188043 MSOCCluster3
#> R version 4.3.2 (2023-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22631)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.utf8
#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8
#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Chinese (Simplified)_China.utf8
#>
#> time zone: Asia/Hong_Kong
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.1.4 MSOCclassifier_0.1.0 devtools_2.4.5
#> [4] usethis_3.1.0 caret_6.0-94 lattice_0.21-9
#> [7] ggplot2_3.5.1 BiocStyle_2.30.0
#>
#> loaded via a namespace (and not attached):
#> [1] pROC_1.18.5 remotes_2.5.0 rlang_1.1.4
#> [4] magrittr_2.0.3 e1071_1.7-16 compiler_4.3.2
#> [7] callr_3.7.6 vctrs_0.6.5 reshape2_1.4.4
#> [10] stringr_1.5.1 profvis_0.4.0 pkgconfig_2.0.3
#> [13] fastmap_1.2.0 ellipsis_0.3.2 utf8_1.2.4
#> [16] promises_1.3.2 rmarkdown_2.28 prodlim_2024.06.25
#> [19] sessioninfo_1.2.2 ps_1.8.0 purrr_1.0.2
#> [22] xfun_0.48 cachem_1.1.0 jsonlite_1.8.9
#> [25] recipes_1.1.0 later_1.4.1 parallel_4.3.2
#> [28] R6_2.5.1 bslib_0.8.0 stringi_1.8.4
#> [31] parallelly_1.38.0 pkgload_1.4.0 rpart_4.1.21
#> [34] lubridate_1.9.3 jquerylib_0.1.4 Rcpp_1.0.13
#> [37] bookdown_0.41 iterators_1.0.14 knitr_1.48
#> [40] future.apply_1.11.3 httpuv_1.6.15 Matrix_1.6-1.1
#> [43] splines_4.3.2 nnet_7.3-19 timechange_0.3.0
#> [46] tidyselect_1.2.1 rstudioapi_0.17.0 yaml_2.3.10
#> [49] timeDate_4041.110 codetools_0.2-19 miniUI_0.1.1.1
#> [52] curl_6.0.1 processx_3.8.4 listenv_0.9.1
#> [55] pkgbuild_1.4.4 tibble_3.2.1 plyr_1.8.9
#> [58] shiny_1.10.0 withr_3.0.2 evaluate_1.0.1
#> [61] future_1.34.0 desc_1.4.3 survival_3.5-7
#> [64] proxy_0.4-27 urlchecker_1.0.1 pillar_1.9.0
#> [67] BiocManager_1.30.25 foreach_1.5.2 stats4_4.3.2
#> [70] generics_0.1.3 munsell_0.5.1 scales_1.3.0
#> [73] globals_0.16.3 xtable_1.8-4 class_7.3-22
#> [76] glue_1.7.0 tools_4.3.2 data.table_1.16.2
#> [79] ModelMetrics_1.2.2.2 gower_1.0.1 fs_1.6.4
#> [82] grid_4.3.2 ipred_0.9-15 colorspace_2.1-1
#> [85] nlme_3.1-163 cli_3.6.3 fansi_1.0.6
#> [88] lava_1.8.0 gtable_0.3.6 sass_0.4.9
#> [91] digest_0.6.37 htmlwidgets_1.6.4 memoise_2.0.1
#> [94] htmltools_0.5.8.1 lifecycle_1.0.4 hardhat_1.4.0
#> [97] mime_0.12 MASS_7.3-60