Cluster correlated features originating from the same metabolite

Clusters features potentially originating from the same compound. Features with high Pearson correlation coefficient and small retention time difference are linked together. Then clusters are formed by setting a threshold for the relative degree that each node in a cluster needs to fulfil. Each cluster is named after the feature with the highest median peak area (median abundance). This is a wrapper around numerous functions that are based on the MATLAB code by David Broadhurst.

Usage

cluster_features(
  object,
  mz_col = NULL,
  rt_col = NULL,
  all_features = FALSE,
  rt_window = 1/60,
  corr_thresh = 0.9,
  d_thresh = 0.8,
  plotting = TRUE,
  min_size_plotting = 3,
  prefix = NULL,
  assay.type = NULL
)

Arguments

object: a SummarizedExperiment or MetaboSet object
mz_col: the column name in feature data that holds mass-to-charge ratios
rt_col: the column name in feature data that holds retention times
all_features: logical, should all features be included in the clustering? If FALSE, as the default, flagged features are not included in clustering
rt_window: the retention time window for potential links NOTE: use the same unit as the retention time
corr_thresh: the correlation threshold required for potential links between features
d_thresh: the threshold for the relative degree required by each node
plotting: should plots be drawn for each cluster?
min_size_plotting: the minimum number of features a cluster needs to have to be plotted
prefix: the prefix to the files to be plotted
assay.type: character, assay to be used in case of multiple assays

Value

a SummarizedExperiment or MetaboSet object, with median peak area (MPA), the cluster ID, the features in the cluster, and cluster size added to feature data.

Examples

data(example_set)
# The parameters are really weird because example data is imaginary
clustered <- cluster_features(example_set, rt_window = 1, corr_thresh = 0.5, 
  d_thresh = 0.6)
#> INFO [2025-06-23 22:36:26] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:36:26] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:36:26] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:36:26] 
#> Starting feature clustering at 2025-06-23 22:36:26.484164
#> INFO [2025-06-23 22:36:26] Finding connections between features in HILIC_neg
#> INFO [2025-06-23 22:36:26] Found 1 connections in HILIC_neg
#> INFO [2025-06-23 22:36:26] Finding connections between features in HILIC_pos
#> INFO [2025-06-23 22:36:26] Found 4 connections in HILIC_pos
#> INFO [2025-06-23 22:36:26] Finding connections between features in RP_neg
#> INFO [2025-06-23 22:36:26] Found 1 connections in RP_neg
#> INFO [2025-06-23 22:36:26] Finding connections between features in RP_pos
#> INFO [2025-06-23 22:36:26] Found 2 connections in RP_pos
#> INFO [2025-06-23 22:36:26] Found 8 connections
#> 5 components found
#> 1 components found
#> INFO [2025-06-23 22:36:26] Found 5 clusters of 2 or more features, clustering finished at 2025-06-23 22:36:26.968162
#> INFO [2025-06-23 22:36:27] Saved cluster plots to: