General introduction

Motivation

From the perspective of metabolites as the continuation of the central dogma of biology, metabolomics provides the closest link to many phenotypes of interest. This makes metabolomics research promising in teasing apart the complexities of living systems, attracting many new practitioners.

The notame R package was developed in parallel with an associated protocol article as a general guideline for data analysis in untargeted metabolomics studies (Klåvus et al. 2020). The main outcome is identifying interesting features for laborious downstream steps relating to biological context, such as metabolite identification and pathway analysis, which fall outside the purview of notame. Bioconductor packages with complementary functionality in Bioconductor include pmp, phenomis and qmtools; notame brings partially overlapping and new functionality to the table. There are also Bioconductor packages for preprocessing, metabolite identification and pathway analysis. Together, notame, Bioconductor’s dependency management and other Bioconductor functionality allow for quality, reproducible metabolomics research.

Installation

To install notame, install BiocManager first, if it is not installed. Afterwards use the install function from BiocManager.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("notame")
library(notame)

How it works

SummarizedExperiment is the primary data structure of this package, but MetaboSet is still supported for old users’ preference. One can use a single peak table throughout the analysis as with MetaboSet and also use multiple peak tables with SummarizedExperiment, using the assay.type and name arguments.

The functionality of notame can be broadly divided into tabular data preprocessing and feature selection, excluding sample preprocessing and functionality related to biological context (Figure 1). Tabular data processing involves reducing unwanted variation and data preparation dependent on downstream methods. The many visualizations used for inspecting the process also serve as exploratory data analysis. Feature selection aims to select a subset of interesting features across study groups before laborious steps relating to biological context. Please see the documentation for an overview of functionality (?notame), the Project Example vignette for usage and the associated protocol article for more information (Klåvus et al. 2020).

Overview of untargeted LC-MS metabolomics data analysis.

Input

Data can be read with read_from_excel(), which includes checks and preparation of metadata. To accommodate typical output from peak-picking software such as Agilent’s MassHunter or MS-DIAL, the output is transformed into a spreadsheet for read_from_excel(). Alternatively, data in R can be wrangled and passed to the construct_metabosets() or SummarizedExperiment() constructor.

Structure of spreadsheet for read_from_excel().

There are a few obligatory fields for read_from_excel(), including “Injection_order” in sample information, “Mass” or “Average mz” in feature data and “Retention time”, “RetentionTime”, “Average rt(min)” or “rt” in feature information (not case sensitive). There are further optional fields, including “Sample_ID” and “QC” in sample data as well as “Feature_ID” in feature data, which are automatically generated if unavailable. One or more fields in feature data can be used to split the data into parts, usually LC column x ionization mode, supplied as arguments to the split_by parameter. If the file only contains one mode, specify the name of the mode, e.g. “HILIC_pos” to the name parameter.

Tabular data preprocessing

The main functions return modified objects and are largely based on pooled QC samples (Broadhurst et al. 2018). Tabular data preprocessing is generally performed separately for each mode. The visualizations used to monitor tabular data preprocessing are saved to file by default, but can also be returned as ggplot objects. The visualizations() wrapper can be used for saving visualizations at different stages of processing.

Feature selection

Univariate statistics functions return a data.frame, to be manually filtered before inclusion into the feature data of the instance. Supervised learning functions return various data structures.

Comprehensive results visualizations are returned as ggplot objects and can be saved to file using save_plot(). Interesting features can be inspected with feature-wise visualizations which are saved to file by default but can be returned as a list.

Utilities

General utilities include combined_data() for representing the instance in a data.frame suitable for plotting and various functions for data wrangling. For keeping track of the analysis, notame offers a logging system operated using init_log(), log_text() and finish_log(). notame also keeps track of all the external packages used, offering you references for each. To see and log a list of references, use citations().

Parallellization is used in many feature-wise calculations and is provided by the BiocParallel package. BiocParallel defaults to a parallel backend. For small-scale testing on Windows, it can be quicker to use serial execution:

BiocParallel::register(BiocParallel::SerialParam())

Authors & Acknowledgements

The first version of notame was written by Anton Klåvus for his master’s thesis in Bioinformatics at Aalto university (published under former name Anton Mattsson), while working for University of Eastern Finland and Afekta Technologies. The package is inspired by analysis scripts written by Jussi Paananen and Oskari Timonen. The algorithm for clustering molecular features originating from the same compound is based on MATLAB code written by David Broadhurst, Professor of Data Science & Biostatistics in the School of Science, and director of the Centre for Integrative Metabolomics & Computational Biology at the Edith Covan University.

If you find any bugs or other things to fix, please submit an issue on GitHub! All contributions to the package are always welcome!

Session information

## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] BiocStyle_2.36.0
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37       desc_1.4.3          R6_2.6.1           
##  [4] bookdown_0.43       fastmap_1.2.0       xfun_0.52          
##  [7] cachem_1.1.0        knitr_1.50          htmltools_0.5.8.1  
## [10] png_0.1-8           rmarkdown_2.29      lifecycle_1.0.4    
## [13] cli_3.6.5           sass_0.4.10         pkgdown_2.1.3      
## [16] textshaping_1.0.1   jquerylib_0.1.4     systemfonts_1.2.3  
## [19] compiler_4.5.1      tools_4.5.1         ragg_1.4.0         
## [22] evaluate_1.0.4      bslib_0.9.0         yaml_2.3.10        
## [25] BiocManager_1.30.26 jsonlite_2.0.0      rlang_1.1.6        
## [28] fs_1.6.6            htmlwidgets_1.6.4

References

Broadhurst, David, Royston Goodacre, Stacey N Reinke, Julia Kuligowski, Ian D Wilson, Matthew R Lewis, and Warwick B Dunn. 2018. “Guidelines and Considerations for the Use of System Suitability and Quality Control Samples in Mass Spectrometry Assays Applied in Untargeted Clinical Metabolomic Studies.” Metabolomics 14: 1–17.

Klåvus, Anton, Marietta Kokla, Stefania Noerman, Ville M Koistinen, Marjo Tuomainen, Iman Zarei, Topi Meuronen, et al. 2020. “‘Notame’: Workflow for Non-Targeted LC–MS Metabolic Profiling.” Metabolites 10 (4): 135.

Anton Klåvus, Vilhelm Suksi

2025-06-23