Cluster correlated features originating from the same metabolite
Source:R/feature_clustering.R
cluster_features.Rd
Clusters features potentially originating from the same compound. Features with high Pearson correlation coefficient and small retention time difference are linked together. Then clusters are formed by setting a threshold for the relative degree that each node in a cluster needs to fulfil. Each cluster is named after the feature with the highest median peak area (median abundance). This is a wrapper around numerous functions that are based on the MATLAB code by David Broadhurst.
Usage
cluster_features(
object,
mz_col = NULL,
rt_col = NULL,
all_features = FALSE,
rt_window = 1/60,
corr_thresh = 0.9,
d_thresh = 0.8,
plotting = TRUE,
min_size_plotting = 3,
prefix = NULL,
assay.type = NULL
)
Arguments
- object
a
SummarizedExperiment
orMetaboSet
object- mz_col
the column name in feature data that holds mass-to-charge ratios
- rt_col
the column name in feature data that holds retention times
- all_features
logical, should all features be included in the clustering? If FALSE, as the default, flagged features are not included in clustering
- rt_window
the retention time window for potential links NOTE: use the same unit as the retention time
- corr_thresh
the correlation threshold required for potential links between features
- d_thresh
the threshold for the relative degree required by each node
- plotting
should plots be drawn for each cluster?
- min_size_plotting
the minimum number of features a cluster needs to have to be plotted
- prefix
the prefix to the files to be plotted
- assay.type
character, assay to be used in case of multiple assays
Value
a SummarizedExperiment or MetaboSet object, with median peak area (MPA), the cluster ID, the features in the cluster, and cluster size added to feature data.
Examples
data(example_set)
# The parameters are really weird because example data is imaginary
clustered <- cluster_features(example_set, rt_window = 1, corr_thresh = 0.5,
d_thresh = 0.6)
#> INFO [2025-06-23 22:36:26] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:36:26] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:36:26] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:36:26]
#> Starting feature clustering at 2025-06-23 22:36:26.484164
#> INFO [2025-06-23 22:36:26] Finding connections between features in HILIC_neg
#> INFO [2025-06-23 22:36:26] Found 1 connections in HILIC_neg
#> INFO [2025-06-23 22:36:26] Finding connections between features in HILIC_pos
#> INFO [2025-06-23 22:36:26] Found 4 connections in HILIC_pos
#> INFO [2025-06-23 22:36:26] Finding connections between features in RP_neg
#> INFO [2025-06-23 22:36:26] Found 1 connections in RP_neg
#> INFO [2025-06-23 22:36:26] Finding connections between features in RP_pos
#> INFO [2025-06-23 22:36:26] Found 2 connections in RP_pos
#> INFO [2025-06-23 22:36:26] Found 8 connections
#> 5 components found
#> 1 components found
#> INFO [2025-06-23 22:36:26] Found 5 clusters of 2 or more features, clustering finished at 2025-06-23 22:36:26.968162
#> INFO [2025-06-23 22:36:27] Saved cluster plots to: