Skip to contents

Reads data from an Excel file of the following format:

  • Left side of the sheet contains information about the features, size features x feature info columns

  • Top part contains sample information, size sample info variables x samples

  • The middle contains the actual abundances, size features x samples

This function separates the three parts from the file, and returns them in a list.

Usage

read_from_excel(
  file,
  sheet = 1,
  id_column = NULL,
  corner_row = NULL,
  corner_column = NULL,
  id_prefix = "ID_",
  split_by = NULL,
  name = NULL,
  mz_limits = c(10, 2000),
  rt_limits = c(0, 20),
  skip_checks = FALSE
)

Arguments

file

path to the Excel file

sheet

the sheet number or name

id_column

character, column name for unique identification of samples

corner_row

integer, the bottom row of sample information, usually contains data file names and feature info column names. If set to NULL, will be detected automatically.

corner_column

integer or character, the corresponding column number or the column name (letter) in Excel. If set to NULL, will be detected automatically.

id_prefix

character, prefix for autogenerated sample IDs, see Details

split_by

character vector, in the case where all the modes are in the same Excel file, the column names of feature data used to separate the modes (usually Mode and Column)

name

in the case where the Excel file only contains one mode, the name of the mode, such as "Hilic_neg"

mz_limits

numeric vector of two, all m/z values should be in between these

rt_limits

numeric vector of two, all retention time values should be in between these

skip_checks

logical: skip checking and fixing of data integrity. Not recommended, but sometimes useful when you just want to read the data in as is and fix errors later. The data integrity checks are important for functioning of notame.

Value

A list of three data frames:

  • exprs: the actual abundances, size features x samples

  • pheno_data: sample information, size sample info variables x samples

  • feature_data: information about the features, size features x feature info columns

Details

If skip_checks = FALSE, read_from_excel attempts to modify the data as per fix_object and checks the data. If skip_checks = TRUE, parameters for fix_object are ignored.

Examples

data <- read_from_excel(
  file = system.file("extdata", "example_set.xlsx", 
  package = "notame"), sheet = 1, corner_row = 11, corner_column = "H",
  split_by = c("Column", "Ion_mode"))
#> INFO [2025-06-23 22:38:12] Corner detected correctly at row 11, column H
#> INFO [2025-06-23 22:38:12] 
#> Extracting sample information from rows 1 to 11 and columns I to BF
#> INFO [2025-06-23 22:38:12] Replacing spaces in sample information column names with underscores (_)
#> INFO [2025-06-23 22:38:12] Naming the last column of sample information "Datafile"
#> INFO [2025-06-23 22:38:12] 
#> Extracting feature information from rows 12 to 91 and columns A to H
#> INFO [2025-06-23 22:38:12] 
#> Extracting feature abundances from rows 12 to 91 and columns I to BF
#> INFO [2025-06-23 22:38:12] Pheno data was cleaned
#> INFO [2025-06-23 22:38:12] Feature data was cleaned
#> INFO [2025-06-23 22:38:12] 
#> Checking sample information
#> INFO [2025-06-23 22:38:12] Checking 'Injection_order' column in feature data
#> INFO [2025-06-23 22:38:12] Checking 'Sample_ID' column in pheno data
#> INFO [2025-06-23 22:38:12] Checking 'QC' column in feature data
#> INFO [2025-06-23 22:38:12] Checking that feature abundances only contain numeric values
#> INFO [2025-06-23 22:38:12] 
#> Checking feature information
#> INFO [2025-06-23 22:38:12] Checking that feature IDs are unique and not storedas numbers
#> INFO [2025-06-23 22:38:12] Checking that m/z and retention time values are reasonable.
#> INFO [2025-06-23 22:38:12] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:38:12] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:38:12] Checking that feature data includes a 'Split' column
#> INFO [2025-06-23 22:38:12] Checking that feature data includes a 'Flag' column

modes <- construct_metabosets(exprs = data$exprs, 
  pheno_data = data$pheno_data, feature_data = data$feature_data,
  group_col = "Group")
#> INFO [2025-06-23 22:38:12] 
#> Checking feature information
#> INFO [2025-06-23 22:38:12] Checking that feature IDs are unique and not storedas numbers
#> INFO [2025-06-23 22:38:12] Checking that feature data includes a 'Split' column
#> INFO [2025-06-23 22:38:12] Checking that feature data includes a 'Flag' column
#> INFO [2025-06-23 22:38:12] Checking that feature abundances only contain numeric values
#> INFO [2025-06-23 22:38:12] Setting row and column names of exprs based on feature and pheno data