Read formatted Excel files — read_from

Reads data from an Excel file of the following format:

Left side of the sheet contains information about the features, size features x feature info columns
Top part contains sample information, size sample info variables x samples
The middle contains the actual abundances, size features x samples

This function separates the three parts from the file, and returns them in a list.

Usage

read_from_excel(
  file,
  sheet = 1,
  id_column = NULL,
  corner_row = NULL,
  corner_column = NULL,
  id_prefix = "ID_",
  split_by = NULL,
  name = NULL,
  mz_limits = c(10, 2000),
  rt_limits = c(0, 20),
  skip_checks = FALSE
)

Arguments

file: path to the Excel file
sheet: the sheet number or name
id_column: character, column name for unique identification of samples
corner_row: integer, the bottom row of sample information, usually contains data file names and feature info column names. If set to NULL, will be detected automatically.
corner_column: integer or character, the corresponding column number or the column name (letter) in Excel. If set to NULL, will be detected automatically.
id_prefix: character, prefix for autogenerated sample IDs, see Details
split_by: character vector, in the case where all the modes are in the same Excel file, the column names of feature data used to separate the modes (usually Mode and Column)
name: in the case where the Excel file only contains one mode, the name of the mode, such as "Hilic_neg"
mz_limits: numeric vector of two, all m/z values should be in between these
rt_limits: numeric vector of two, all retention time values should be in between these
skip_checks: logical: skip checking and fixing of data integrity. Not recommended, but sometimes useful when you just want to read the data in as is and fix errors later. The data integrity checks are important for functioning of notame.

Value

A list of three data frames:

exprs: the actual abundances, size features x samples
pheno_data: sample information, size sample info variables x samples
feature_data: information about the features, size features x feature info columns

Details

If skip_checks = FALSE, read_from_excel attempts to modify the data as per fix_object and checks the data. If skip_checks = TRUE, parameters for fix_object are ignored.

Examples

data <- read_from_excel(
  file = system.file("extdata", "example_set.xlsx", 
  package = "notame"), sheet = 1, corner_row = 11, corner_column = "H",
  split_by = c("Column", "Ion_mode"))
#> INFO [2025-06-23 22:38:12] Corner detected correctly at row 11, column H
#> INFO [2025-06-23 22:38:12] 
#> Extracting sample information from rows 1 to 11 and columns I to BF
#> INFO [2025-06-23 22:38:12] Replacing spaces in sample information column names with underscores (_)
#> INFO [2025-06-23 22:38:12] Naming the last column of sample information "Datafile"
#> INFO [2025-06-23 22:38:12] 
#> Extracting feature information from rows 12 to 91 and columns A to H
#> INFO [2025-06-23 22:38:12] 
#> Extracting feature abundances from rows 12 to 91 and columns I to BF
#> INFO [2025-06-23 22:38:12] Pheno data was cleaned
#> INFO [2025-06-23 22:38:12] Feature data was cleaned
#> INFO [2025-06-23 22:38:12] 
#> Checking sample information
#> INFO [2025-06-23 22:38:12] Checking 'Injection_order' column in feature data
#> INFO [2025-06-23 22:38:12] Checking 'Sample_ID' column in pheno data
#> INFO [2025-06-23 22:38:12] Checking 'QC' column in feature data
#> INFO [2025-06-23 22:38:12] Checking that feature abundances only contain numeric values
#> INFO [2025-06-23 22:38:12] 
#> Checking feature information
#> INFO [2025-06-23 22:38:12] Checking that feature IDs are unique and not storedas numbers
#> INFO [2025-06-23 22:38:12] Checking that m/z and retention time values are reasonable.
#> INFO [2025-06-23 22:38:12] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:38:12] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:38:12] Checking that feature data includes a 'Split' column
#> INFO [2025-06-23 22:38:12] Checking that feature data includes a 'Flag' column

modes <- construct_metabosets(exprs = data$exprs, 
  pheno_data = data$pheno_data, feature_data = data$feature_data,
  group_col = "Group")
#> INFO [2025-06-23 22:38:12] 
#> Checking feature information
#> INFO [2025-06-23 22:38:12] Checking that feature IDs are unique and not storedas numbers
#> INFO [2025-06-23 22:38:12] Checking that feature data includes a 'Split' column
#> INFO [2025-06-23 22:38:12] Checking that feature data includes a 'Flag' column
#> INFO [2025-06-23 22:38:12] Checking that feature abundances only contain numeric values
#> INFO [2025-06-23 22:38:12] Setting row and column names of exprs based on feature and pheno data