Title: | Plan and Apply Chained Preprocessing Operations on Spectra |
---|---|
Description: | Schedule and perform common spectroscopic signal processing (preprocessing) methods using a recipe-style syntax. Combine different operations in sequence. |
Authors: | Philipp Baumann [aut, cre] |
Maintainer: | Philipp Baumann <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.5 |
Built: | 2025-02-22 03:55:04 UTC |
Source: | https://github.com/spectral-cockpit/specprepper |
The function can be applied to spectral collections,
dt_prep_sets
. The list-column id_labels
with lists of data.tables each
containing a column named group
must be present. See also ids_apply()
.
colmean_group_apply(dt_prep_sets, append_rows = FALSE)
colmean_group_apply(dt_prep_sets, append_rows = FALSE)
dt_prep_sets |
A standardized |
append_rows |
logical whether to append the newly processed rows, when
|
A spectral collection typically represents an outcome of one or more
specific preprocessing with methods and possibly associated parameters used.
colmean_group_apply()
only accepts collections with structural conventions
of dt_prep_sets
. It requires a id_labels
list-column with a group
column specifying the lables used for aggregation in each data.table element
(one for each collection). Label columns such as row
or id
that were
present before will be removed because they are assumed to be aggregated.
A "data.table"
with as many rows as spectral collections. It contains
at least the following columns:
prep_set
: appends "-mean_group"
to the exisiting character vector
elements of the input data.
prep_label
: appends "mean_group"
to the exisiting character vector
elements of the input data.
prep_params
: A list-column with 1-row data.table's. Each data.table has
a new column mean_group
, contains the string "id_labels$group"
.
id_labels
: This list-column now only contains a sliced version of the
group
column, that correspond to the new rows of the aggregated
column means in spc_prep
.
spc_prep
: A list-column with data.tables that contain aggregated
means of spectra by group for each spectral collection (row of
dt_prep_sets
)
Adds labels to rows of all individual spectra in collections.
Such labels are required for subsequent processing functions that aggregate
spectral collections by group, for example colmean_group_apply()
.
It can also be used to initialize a single spectral collection with labels
when inputting a single matrix, data frame or data.table.
ids_apply(X, dt_prep_sets = NULL, vec_row, vec_id, vec_group)
ids_apply(X, dt_prep_sets = NULL, vec_row, vec_id, vec_group)
X |
|
dt_prep_sets |
A standardized |
vec_row |
atomic vector with row labels; need to have same length
as |
vec_id |
atomic vector with id labels, needs to have same length
as |
vec_group |
atomic vector with group labels; needs to have same length
as |
If X
is specified:
A one-row "data.table"
with the following columns
prep_set
: "init_ids"
,
prep_label
: "prep_label"
prep_params
: list-column of length 1 with "data.table"
containing
init_ids = NA
id_labels
: list-column (repeated across rows) with "data.table"
containing columns with labels: row
(from vec_row
),
id
(from vec_id
), and group
(from vec_group
).
If dt_prep_sets
is specified:
A "data.table"
with as many rows as spectral collections. A spectral
collection typically represents an outcome of one or more specific
preprocessing with methods and possibly associated parameters used.
Specifically, it augments the input dt_prep_sets
and outputs the
following (list-)columns:
prep_set
: appends "-init_ids
to the input string that states what
the main preprocessings done in previous steps.
prep_label
: appends "-init_ids
to the input string that states what
was done with abbreviations of methods in previous steps.
prep_params
: augments each data.table element in the list-column with
a new non-specific column init_ids = NA
(indicating a new label
column but no direct effect on the processed spectra).
id_labels
: new list-column that contains a set of labels that applies
for all spectral collections nested within respective rows of the
dt_prep_sets
input. Each data.table in the list contains the label
columns row
(from vec_row
),id
(from vec_id
), and group
(from vec_group
).
spec_prep
: unmodified list-column with sets of already prepared,
processed spectra. Each element is a data.table which rows corresponds
to the row labels in id_labels
.
Apply Savitzky-Golay filtering at variable combinations of parameter sets for set(s) of spectra.
sg_apply( X, dt_sg_plan, dt_prep_sets = NULL, nest_params = TRUE, append_rows = FALSE )
sg_apply( X, dt_sg_plan, dt_prep_sets = NULL, nest_params = TRUE, append_rows = FALSE )
X |
|
dt_sg_plan |
A standardized
|
dt_prep_sets |
A standardized |
nest_params |
logical whether to nest the Savitzky-Golay parameters in
a |
append_rows |
logical whether to append the newly processed rows, when
|
Savitzky-Golay transformation (moving window polynomial least-squares) prior modeling can help to reduce noise and enhance signals in spectra. This can allowing models to extract parsimonious predictable information from spectra for more accurate estimation. However, this process requires empirical optimization and fine-tuning of the parameters that control the nature and degree of smoothing and hence noise removal for calibration task at hand, which is often not done. For example, systematically varying the size of the smoothing window control the amount of information filtered and potential artefacts created. Nonetheless, non-stationary noise as opposed to white gaussian noise and informative fluctuations in chemically-driven spectral dynamics (e.g. slope changes and different absorption peak widths and compositional complexity) can make a simple nonrecursive application of the original Savitzky-Golay algorithm less appropriate to filter noise.
Templating code for sequential and/or recursive branching of preprocessing methods with variation their parameters, if applicable, can be repetitive and cumbersome. This is where the specprep package with combinatory planning and application tools jumps in.
The combinatory power of the sg_apply()
function stems from the ability to map Savitzly Golay
both over row-wise sets of parametrizations (see subsection Savitzky-Golay Plan) and previous
preprocessing rounds that yielded set(s) of (differently) processed spectra to be processed again
(see section Set(s) of Previously Processed Spectra). Since data.table
s are structured
consistently across the specprep::*_apply
type of functions, their inputs and outputs are
interoperable. This allows flexiblity for applying combinations of preprocessing methods.
dt_sg_plan
is most conventiently built with sg_plan()
. It parametrizes Savitzky-Golay
preprocessing scheduled on either X
or on all sets of already processed spectra contained in
dt_prep_sets
. Each row lays out one preprocessing step, linking the following data across
columns:
prep_set
: this string identifies the name of general preprocessing method that is chained
to sets of spectra.
tbd
data.table with the following (list)columns:by #to be filled
Philipp Baumann
Make a full-factorial combination of Savitzky-Golay parameters.
sg_make_plan(param_list)
sg_make_plan(param_list)
param_list |
A list of |
data.frame
Philipp Baumann
Compute the standard normal variate for collections of spectra
snv_apply(X, dt_prep_sets = NULL, append_rows = FALSE)
snv_apply(X, dt_prep_sets = NULL, append_rows = FALSE)
X |
|
dt_prep_sets |
A standardized |
append_rows |
logical whether to append the newly processed rows, when
|