Metabolite Quality Control — quality

This function is a wrapper function that performs the key quality controls steps on a metabolomics data set. Key principles: 1. keep the source underlying data as it is 2. copy the source data to a new data layer called qcing for processing 3. build an exclusion list, accumulating codes for exclusion reasons 4. make any adjustments needed in the destination copy of the data, flag these in the exclusion list 5. copy the final result to a data layer called post_qc 6. return the Metabolites object with the newly populated data layers

Usage

quality_control(
  metaboprep,
  source_layer = "input",
  sample_missingness = 0.5,
  feature_missingness = 0.5,
  total_peak_area_sd = 5,
  outlier_udist = 5,
  outlier_treatment = "leave_be",
  winsorize_quantile = 1,
  tree_cut_height = 0.5,
  pc_outlier_sd = 5,
  sample_ids = NULL,
  feature_ids = NULL,
  features_exclude_but_keep = NULL
)

Arguments

metaboprep: an object of class Metabolites
source_layer: character, the data layer to summarise
sample_missingness: numeric 0-1, percentage of data missingness which should prompt exclusion of a sample
feature_missingness: numeric 0-1, percentage of data missingness which should prompt exclusion of a feature
total_peak_area_sd: numeric, number of TPA SD after which a sample would be excluded
outlier_udist: the unit distance in SD or IQR from the mean or median estimate, respectively outliers are identified at. Default value is 5.
outlier_treatment: character, how to handle outlier data values - options 'leave_be', 'turn_NA', or 'winsorize'
winsorize_quantile: numeric, quantile to winsorize to, only relevant if 'outlier_treatment'='winsorize'
tree_cut_height: numeric, the threshold for feature independence in hierarchical clustering. Default is 0.5.
pc_outlier_sd: numeric, number of PCA SD after which a sample would be excluded
sample_ids: character, vector of sample ids to work with
feature_ids: character, vector of feature ids to work with
features_exclude_but_keep: character, vector of feature id indicating features to exclude from the sample and PCA summary analysis but keep in the data, OR a name of a logical column in the features data indicating the same