Reads data from an Excel file of the following format:
Left side of the sheet contains information about the features, size features x feature info columns
Top part contains sample information, size sample info variables x samples
The middle contains the actual abundances, size features x samples
This function separates the three parts from the file, and returns them in a list.
Arguments
- file
path to the Excel file
- sheet
the sheet number or name
- id_column
character, column name for unique identification of samples
- corner_row
integer, the bottom row of sample information, usually contains data file names and feature info column names. If set to NULL, will be detected automatically.
- corner_column
integer or character, the corresponding column number or the column name (letter) in Excel. If set to NULL, will be detected automatically.
- id_prefix
character, prefix for autogenerated sample IDs, see Details
- split_by
character vector, in the case where all the modes are in the same Excel file, the column names of feature data used to separate the modes (usually Mode and Column)
- name
in the case where the Excel file only contains one mode, the name of the mode, such as "Hilic_neg"
- mz_limits
numeric vector of two, all m/z values should be in between these
- rt_limits
numeric vector of two, all retention time values should be in between these
- skip_checks
logical: skip checking and fixing of data integrity. Not recommended, but sometimes useful when you just want to read the data in as is and fix errors later. The data integrity checks are important for functioning of notame.
Value
A list of three data frames:
exprs: the actual abundances, size features x samples
pheno_data: sample information, size sample info variables x samples
feature_data: information about the features, size features x feature info columns
Details
If skip_checks = FALSE, read_from_excel
attempts to modify the
data as per fix_object
and checks the data. If skip_checks
= TRUE, parameters for fix_object
are ignored.
Examples
data <- read_from_excel(
file = system.file("extdata", "example_set.xlsx",
package = "notame"), sheet = 1, corner_row = 11, corner_column = "H",
split_by = c("Column", "Ion_mode"))
#> INFO [2025-06-23 22:38:12] Corner detected correctly at row 11, column H
#> INFO [2025-06-23 22:38:12]
#> Extracting sample information from rows 1 to 11 and columns I to BF
#> INFO [2025-06-23 22:38:12] Replacing spaces in sample information column names with underscores (_)
#> INFO [2025-06-23 22:38:12] Naming the last column of sample information "Datafile"
#> INFO [2025-06-23 22:38:12]
#> Extracting feature information from rows 12 to 91 and columns A to H
#> INFO [2025-06-23 22:38:12]
#> Extracting feature abundances from rows 12 to 91 and columns I to BF
#> INFO [2025-06-23 22:38:12] Pheno data was cleaned
#> INFO [2025-06-23 22:38:12] Feature data was cleaned
#> INFO [2025-06-23 22:38:12]
#> Checking sample information
#> INFO [2025-06-23 22:38:12] Checking 'Injection_order' column in feature data
#> INFO [2025-06-23 22:38:12] Checking 'Sample_ID' column in pheno data
#> INFO [2025-06-23 22:38:12] Checking 'QC' column in feature data
#> INFO [2025-06-23 22:38:12] Checking that feature abundances only contain numeric values
#> INFO [2025-06-23 22:38:12]
#> Checking feature information
#> INFO [2025-06-23 22:38:12] Checking that feature IDs are unique and not storedas numbers
#> INFO [2025-06-23 22:38:12] Checking that m/z and retention time values are reasonable.
#> INFO [2025-06-23 22:38:12] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:38:12] Identified m/z column Average_Mz and retention time column Average_Rt_min
#> INFO [2025-06-23 22:38:12] Checking that feature data includes a 'Split' column
#> INFO [2025-06-23 22:38:12] Checking that feature data includes a 'Flag' column
modes <- construct_metabosets(exprs = data$exprs,
pheno_data = data$pheno_data, feature_data = data$feature_data,
group_col = "Group")
#> INFO [2025-06-23 22:38:12]
#> Checking feature information
#> INFO [2025-06-23 22:38:12] Checking that feature IDs are unique and not storedas numbers
#> INFO [2025-06-23 22:38:12] Checking that feature data includes a 'Split' column
#> INFO [2025-06-23 22:38:12] Checking that feature data includes a 'Flag' column
#> INFO [2025-06-23 22:38:12] Checking that feature abundances only contain numeric values
#> INFO [2025-06-23 22:38:12] Setting row and column names of exprs based on feature and pheno data