Skip to contents

Clusters features potentially originating from the same compound. Features with high Pearson correlation coefficient and small retention time difference are linked together. Then clusters are formed by setting a threshold for the relative degree that each node in a cluster needs to fulfil. Each cluster is named after the feature with the highest median peak area (median abundance). This is a wrapper around numerous functions that are based on the MATLAB code by David Broadhurst.

Usage

cluster_features(
  object,
  mz_col = NULL,
  rt_col = NULL,
  all_features = FALSE,
  rt_window = 1/60,
  corr_thresh = 0.9,
  d_thresh = 0.8,
  plotting = TRUE,
  min_size_plotting = 3,
  prefix = NULL,
  assay.type = NULL
)

Arguments

object

a SummarizedExperiment or MetaboSet object

mz_col

the column name in feature data that holds mass-to-charge ratios

rt_col

the column name in feature data that holds retention times

all_features

logical, should all features be included in the clustering? If FALSE, as the default, flagged features are not included in clustering

rt_window

the retention time window for potential links NOTE: use the same unit as the retention time

corr_thresh

the correlation threshold required for potential links between features

d_thresh

the threshold for the relative degree required by each node

plotting

should plots be drawn for each cluster?

min_size_plotting

the minimum number of features a cluster needs to have to be plotted

prefix

the prefix to the files to be plotted

assay.type

character, assay to be used in case of multiple assays

Value

a SummarizedExperiment or MetaboSet object, with median peak area (MPA), the cluster ID, the features in the cluster, and cluster size added to feature data.

Examples

data(example_set)
# The parameters are really weird because example data is imaginary
clustered <- cluster_features(example_set, rt_window = 1, corr_thresh = 0.5, 
  d_thresh = 0.6)
#> INFO [2025-06-23 22:36:26] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:36:26] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:36:26] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:36:26] 
#> Starting feature clustering at 2025-06-23 22:36:26.484164
#> INFO [2025-06-23 22:36:26] Finding connections between features in HILIC_neg
#> INFO [2025-06-23 22:36:26] Found 1 connections in HILIC_neg
#> INFO [2025-06-23 22:36:26] Finding connections between features in HILIC_pos
#> INFO [2025-06-23 22:36:26] Found 4 connections in HILIC_pos
#> INFO [2025-06-23 22:36:26] Finding connections between features in RP_neg
#> INFO [2025-06-23 22:36:26] Found 1 connections in RP_neg
#> INFO [2025-06-23 22:36:26] Finding connections between features in RP_pos
#> INFO [2025-06-23 22:36:26] Found 2 connections in RP_pos
#> INFO [2025-06-23 22:36:26] Found 8 connections
#> 5 components found
#> 1 components found
#> INFO [2025-06-23 22:36:26] Found 5 clusters of 2 or more features, clustering finished at 2025-06-23 22:36:26.968162
#> INFO [2025-06-23 22:36:27] Saved cluster plots to: