notame package. — notame-package • notame

Provides functionality for untargeted LC-MS metabolomics research as specified in the associated publication in the 'Metabolomics Data Processing and Data Analysis—Current Best Practices' special issue of the Metabolites journal (2020). This includes tabular data preprocessing, feature selection and supporting visualizations. Raw data preprocessing and functionality related to biological context, such as pathway analysis, is not included.

Details

In roughly chronological order, the functionality of notame is as follows. Please see the vignettes and paper for more information.

Tabular data preprocessing (reducing unwanted variation and completing the dataset, returning modifed objects):

mark_nas mark specified values as missing
flag_detection flag features with low detection rate
flag_contaminants flag contaminants based on blanks
correct_drift correct drift using cubic spline
ruvs_qc Remove Unwanted Variation (RUV) between batches
pca_bhattacharyya_dist Bhattacharyya distance between batches in PCA space
perform_repeatability compute repeatability measures
assess_quality assess quality information of features
quality extract quality information of features
flag_quality flag low-quality features
flag_report report of flagged features
impute_rf impute missing values using random forest
impute_simple impute missing values using simple imputation
cluster_features cluster correlated features originating from the same metabolite
compress_clusters Compress clusters of features to a single feature
log logarithms
exponential exponential
pqn_normalization probabilistic quotient normalization
inverse_normalize inverse-rank normalization
scale scale

Tabular data preprocessing visualizations (saved to file by default):

visualizations write all relevant data preprocessing visualizations to pdf
plot_injection_lm estimate the magnitude of drift
plot_sample_boxplots plot a boxplot for each sample
plot_dist_density plot distance density
plot_tsne, plot_pca t-SNE and PCA plot
plot_tsne_arrows, plot_pca_arrows t-SNE and PCA plots with arrows
plot_tsne_hexbin, plot_pca_hexbin t-SNE and PCA hexbin plots
plot_dendrogram sample dendrogram
plot_sample_heatmap sample heatmap
plot_pca_loadings PCA loadings plot
plot_quality plot quality metrics

Feature selection – Univariate analysis (return data.frames):

perform_lm linear models
perform_logistic logistic regression
perform_lmer linear mixed models
perform_oneway_anova Welch’s ANOVA and classic ANOVA
perform_lm_anova linear models ANOVA table
perform_t_test pairwise and paired t-tests
perform_kruskal_wallis Kruskal-Wallis rank-sum test
perform_non_parametric pairwise and paired non- parametric tests
perform_correlation_tests correlation test
perform_auc area under curve
perform_homoscedasticity_tests test homoscedasticity
cohens_d Cohen's D
fold_change fold change
summary_statistics summary statistics
summarize_results statistics cleaning

Feature selection – Supervised learning (return various objects):

muvr_analysis multivariate modelling with minimally biased variable selection (MUVR2)
mixomics_pls, mixomics_plsda a simple PLS(-DA) model with set number of components and all features
mixomics_pls_optimize, mixomics_plsda_optimize test different numbers of components for PLS(-DA)
mixomics_spls_optimize, mixomics_splsda_optimize test different numbers of components and features for PLS(-DA)
fit_rf, importance_rf fit random forest and feature importance
perform_permanova PERMANOVA

Feature-wise visualizations (these are often drawn for a subset of interesting features after analysis, saved by default):

save_beeswarm_plots save beeswarm plots of each feature by group
save_group_boxplots save box plots of each feature by group
save_scatter_plots save scatter plots of each feature against a set variable
save_subject_line_plots save line plots with mean
save_group_lineplots save line plots with errorbars by group
save_batch_plots save batch correction plots

Results visualizations (returned by default, save them using save_plot):

plot_p_histogram histogram of p-values
volcano_plot volcano plot
manhattan_plot manhattan plot
mz_rt_plot m/z vs retention time plot (cloud plot)
plot_effect_heatmap heatmap of effects between variables, such as correlations

Object utilities:

read_from_excel read formatted Excel files
construct_metabosets construct MetaboSet objects
write_to_excel write results to Excel file
group_col get and set name of the special column for group labels
time_col get and set the name of the special column for time points
subject_col get and set the name of the special column for subject identifiers
flag get and set values in the flag column
drop_flagged drop flagged features
drop_qcs drop QC samples
join_fData join new columns to feature data
join_pData join new columns to pheno data
combined_data retrieve both sample information and features
merge_metabosets merge MetaboSet objects together
merge_objects merge SummarizedExperiment objects together
fix_object fix object for functioning of notame
MetaboSet-class MetaboSet

Other utilities:

citations show citations
init_log initialize log to a file
log_text log text to the current log file
finish_log finish a log
save_plot save plot to chosen format
fix_MSMS transform the MS/MS output to publication ready

References

Klåvus et al. (2020). "notame": Workflow for Non-Targeted LC-MS Metabolic Profiling. Metabolites, 10: 135.

Author

Maintainer: Vilhelm Suksi vksuks@utu.fi (ORCID)

Authors:

Anton Klåvus (ORCID) [copyright holder]
Jussi Paananen (ORCID) [copyright holder]
Oskari Timonen (ORCID) [copyright holder]
Atte Lihtamo
Retu Haikonen (ORCID)
Leo Lahti (ORCID)
Kati Hanhineva (ORCID)

Other contributors:

Ville Koistinen (ORCID) [contributor]
Olli Kärkkäinen (ORCID) [contributor]
Artur Sannikov [contributor]

`notame` package.

Details

References

See also

Author