| Title: | Reshape Disorganised Messy Data |
|---|---|
| Description: | Helps the user to build and register schema descriptions of disorganised (messy) tables. Disorganised tables are tables that are not in a topologically coherent form, where packages such as 'tidyr' could be used for reshaping. The schema description documents the arrangement of input tables and is used to reshape them into a standardised (tidy) output format. |
| Authors: | Steffen Ehrmann [aut, cre] (ORCID: <https://orcid.org/0000-0002-2958-0796>), Tsvetelina Tomova [ctb], Carsten Meyer [aut] (ORCID: <https://orcid.org/0000-0003-3927-5856>), Abdualmaged Alhemiary [ctb], Amelie Haas [ctb], Annika Ertel [ctb], Arne Rümmler [ctb] (ORCID: <https://orcid.org/0000-0001-8637-9071>), Caroline Busse [ctb] |
| Maintainer: | Steffen Ehrmann <[email protected]> |
| License: | GPL-3 |
| Version: | 0.6.0 |
| Built: | 2026-05-16 11:57:53 UTC |
| Source: | https://github.com/luckinet/tabshiftr |
Find the location of a variable not based on it's columns/rows, but based on a regular expression or function
.find( fun = NULL, pattern = NULL, col = NULL, row = NULL, invert = FALSE, relative = FALSE ).find( fun = NULL, pattern = NULL, col = NULL, row = NULL, invert = FALSE, relative = FALSE )
fun |
[ |
pattern |
[ |
col |
[ |
row |
[ |
invert |
[ |
relative |
[ |
This functions is basically a wild-card for when columns or rows are not known ad-hoc, but have to be assigned on the fly. This can be very helpful when several tables contain the same variables, but the arrangement may be slightly different.
the index values where the target was found.
The first step in using any schema is validating
it via the function validateSchema. This happens by default
in reorganise, but can also be done manually, for example
when debugging complicated schema descriptions.
In case that function encounters a schema that wants to find columns or
rows on the fly via .find, it combines all cells of columns and all
cells of rows into one character string and matches the regular expression
or function on those. Columns/rows that have a match are returned as the
respective column/row value.
# use regular expressions to find cell positions (input <- tabs2shift$clusters_messy) schema <- setCluster(id = "territories", left = .find(pattern = "comm*"), top = .find(pattern = "unit")) %>% setIDVar(name = "territories", columns = c(1, 1, 4), rows = c(2, 9, 9)) %>% setIDVar(name = "year", columns = 4, rows = c(3:6), distinct = TRUE) %>% setIDVar(name = "commodities", columns = c(1, 1, 4)) %>% setObsVar(name = "harvested", columns = c(2, 2, 5)) %>% setObsVar(name = "production", columns = c(3, 3, 6)) schema validateSchema(schema = schema, input = input) # use a function to find rows (input <- tabs2shift$messy_rows) schema <- setFilter(rows = .find(fun = is.numeric, col = 1, invert = TRUE)) %>% setIDVar(name = "territories", columns = 1) %>% setIDVar(name = "year", columns = 2) %>% setIDVar(name = "commodities", columns = 3) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) reorganise(schema = schema, input = input)# use regular expressions to find cell positions (input <- tabs2shift$clusters_messy) schema <- setCluster(id = "territories", left = .find(pattern = "comm*"), top = .find(pattern = "unit")) %>% setIDVar(name = "territories", columns = c(1, 1, 4), rows = c(2, 9, 9)) %>% setIDVar(name = "year", columns = 4, rows = c(3:6), distinct = TRUE) %>% setIDVar(name = "commodities", columns = c(1, 1, 4)) %>% setObsVar(name = "harvested", columns = c(2, 2, 5)) %>% setObsVar(name = "production", columns = c(3, 3, 6)) schema validateSchema(schema = schema, input = input) # use a function to find rows (input <- tabs2shift$messy_rows) schema <- setFilter(rows = .find(fun = is.numeric, col = 1, invert = TRUE)) %>% setIDVar(name = "territories", columns = 1) %>% setIDVar(name = "year", columns = 2) %>% setIDVar(name = "commodities", columns = 3) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) reorganise(schema = schema, input = input)
Summarise groups of rows or columns
.sum(..., character = NULL, numeric = NULL, fill = NULL).sum(..., character = NULL, numeric = NULL, fill = NULL)
... |
[ |
character |
[ |
numeric |
[ |
fill |
[ |
By default character values are summarised with the function
paste0(na.omit(x), collapse = "-/-") and numeric values with
the function sum(x, na.rm = TRUE). To avoid un-intuitive behavior,
it is wisest to explicitly specify how all exceptions, such as NA-values,
shall be handled and thus to provide a new function.
the index values where the target was found.
This function extracts the cluster variable from a table by applying a schema description to it.
getClusterVar(schema = NULL, input = NULL)getClusterVar(schema = NULL, input = NULL)
schema |
[ |
input |
[ |
a list per cluster with values of the cluster variable
input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getClusterVar(input = input)input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getClusterVar(input = input)
This function extracts the cluster grouping variable from a table by applying a schema description to it.
getGroupVar(schema = NULL, input = NULL)getGroupVar(schema = NULL, input = NULL)
schema |
[ |
input |
[ |
a list per cluster with values of the grouping variable
input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getGroupVar(input = input)input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getGroupVar(input = input)
This function extracts the identifying variables from a table by applying a schema description to it.
getIDVars(schema = NULL, input = NULL)getIDVars(schema = NULL, input = NULL)
schema |
[ |
input |
[ |
a list per cluster with values of the identifying variables
input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getIDVars(input = input)input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getIDVars(input = input)
This function extracts the observed variables from a table by applying a schema description to it.
getObsVars(schema = NULL, input = NULL)getObsVars(schema = NULL, input = NULL)
schema |
[ |
input |
[ |
a list per cluster with values of the observed variables
input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getObsVars(input = input)input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getObsVars(input = input)
This function takes a disorganised messy table and rearranges columns and rows into a tidy table based on a schema description.
reorganise(input = NULL, schema = NULL)reorganise(input = NULL, schema = NULL)
input |
[ |
schema |
[ |
A (tidy) table which is the result of reorganising input based
on schema.
# a rather disorganised table with messy clusters and a distinct variable (input <- tabs2shift$clusters_messy) # put together schema description by ... # ... identifying cluster positions schema <- setCluster(id = "territories", left = c(1, 1, 4), top = c(1, 8, 8)) # ... specifying the cluster ID as id variable (obligatory) schema <- schema %>% setIDVar(name = "territories", columns = c(1, 1, 4), rows = c(2, 9, 9)) # ... specifying the distinct variable (explicit position) schema <- schema %>% setIDVar(name = "year", columns = 4, rows = c(3:6), distinct = TRUE) # ... specifying a tidy variable (by giving the column values) schema <- schema %>% setIDVar(name = "commodities", columns = c(1, 1, 4)) # ... identifying the (tidy) observed variables schema <- schema %>% setObsVar(name = "harvested", columns = c(2, 2, 5)) %>% setObsVar(name = "production", columns = c(3, 3, 6)) # get the tidy output reorganise(input, schema)# a rather disorganised table with messy clusters and a distinct variable (input <- tabs2shift$clusters_messy) # put together schema description by ... # ... identifying cluster positions schema <- setCluster(id = "territories", left = c(1, 1, 4), top = c(1, 8, 8)) # ... specifying the cluster ID as id variable (obligatory) schema <- schema %>% setIDVar(name = "territories", columns = c(1, 1, 4), rows = c(2, 9, 9)) # ... specifying the distinct variable (explicit position) schema <- schema %>% setIDVar(name = "year", columns = 4, rows = c(3:6), distinct = TRUE) # ... specifying a tidy variable (by giving the column values) schema <- schema %>% setIDVar(name = "commodities", columns = c(1, 1, 4)) # ... identifying the (tidy) observed variables schema <- schema %>% setObsVar(name = "harvested", columns = c(2, 2, 5)) %>% setObsVar(name = "production", columns = c(3, 3, 6)) # get the tidy output reorganise(input, schema)
Opens a Shiny app in your browser that guides you through a visual decision
tree to describe the arrangement of your table, then generates the
corresponding schema R code.
schema_builder(input = NULL)schema_builder(input = NULL)
input |
[ |
Requires the shiny and DT packages. Install them with
install.packages(c("shiny", "DT")).
Invisibly returns the evaluated schema object when the user
clicks Finish, or NULL if the app is closed without finishing.
## Not run: schema_builder() schema_builder(input = tabs2shift$clusters_horizontal) ## End(Not run)## Not run: schema_builder() schema_builder(input = tabs2shift$clusters_horizontal) ## End(Not run)
Default template of a schema description
schema_defaultschema_default
The object of class schema describes at which position in a
table which information can be found. It contains the four slots
clusters, format, filter and variables.
The default schema description contains all slots and fields that are
required by default and identifying and observed variables are added to it
into the variables slot.
schema class (S4) and its methodsA schema stores the information of where which information is stored
in a table of data.
cluster[list(1)]
description of
clusters in the table.
format[list(1)]
description of the table
format
variables[named list(.)]
description of
identifying and observed
variables.
This section outlines the currently recommended strategy for setting up schema descriptions. For example tables and the respective schemas, see the vignette.
Variables: Clarify which are the identifying variables and which are the observed variables. Make sure not to mistake a listed observed variable as identifying variable.
Clusters: Determine whether there are clusters and if so, find
the origin (top left cell) of each cluster and provide the required
information in setCluster(top = ..., left =
...). It is advised to treat a table that contains meta-data in the top
rows as cluster, as this is often the case with implicit variables. All
variables need to be specified in each cluster (in case clusters are all
organised in the same arrangement), or relative = TRUE can be used.
Data may be organised into clusters a) whenever a set of variables occurs
more than once in the same table, nested into another variable, or b) when
the data are organised into separate spreadsheets or files according to one
of the variables (depending on the context, these issues can also be solved
differently). In both cases the variable responsible for clustering (the
cluster ID) can be either an identifying variable, or a categorical
observed variable:
in case the cluster ID is an identifying variable, provide its name
in setCluster(id = ...) and specify it as an
identifying variable (setIDVar)
in case it is a observed variable, provide simply
setCluster(..., id = "observed").
Meta-data: Provide potentially information about the format
(setFormat).
Identifying variables: Determine the following:
is the variable available at all? This is particularly important when
the data are split up into tables that are in spreadsheets or files. Often
the variable that splits up the data (and thus identifies the clusters) is
not explicitly available in the table anymore. In such a case, provide the
value in setIDVar(..., value = ...).
all columns in which the variable values sit.
in case the variable is in several columns, determine additionally the row in which its values sit. In this case, the values will look like they are part of a header.
in case the variable must be split off of another column, provide a
regular expression that results in the target subset via
setIDVar(..., split = ...).
in case the variable is distinct from the main table, provide the
explicit (non-relative) position and set
setIDVar(..., distinct = TRUE).
Observed variable: Determine the following:
all columns in which the values of the variable sit.
the conversion factor.
in case the variable is not tidy, go through the following cases one after the other:
in case the variable is nested in a wide identifying variable, determine in addition to the columns in which the values sit also the rows in which the variable name sits.
in case the names of the variable are given as a value of an
identifying variable, give the column name as
setObsVar(..., key = ...), together with the name
of the respective observed variable (as it appears in the table) in
values.
in case the name of the variable is the ID of clusters, specify
setObsVar(..., key = "cluster", value = ...),
where values has the cluster number the variable refers to.
There is hardly any limit to how data can be arranged in a spreadsheet, apart
from the apparent organisation into a lattice of cells. However, it is often
the case that data are gathered into topologically coherent chunks. Those
chunks are what is called 'cluster' in tabshiftr.
setCluster( schema = NULL, id = NULL, group = NULL, member = NULL, left = NULL, top = NULL, width = NULL, height = NULL )setCluster( schema = NULL, id = NULL, group = NULL, member = NULL, left = NULL, top = NULL, width = NULL, height = NULL )
schema |
[ |
id |
[ |
group |
[ |
member |
[ |
left |
[ |
top |
[ |
width |
[ |
height |
[ |
Please also take a look at the currently suggested strategy to set up a schema description.
An object of class schema.
Other functions to describe table arrangement:
setFilter(),
setFormat(),
setGroups(),
setIDVar(),
setObsVar()
# please check the vignette for examples# please check the vignette for examples
This function allows to specify additional rules to filter certain rows
setFilter( schema = NULL, rows = NULL, columns = NULL, invert = FALSE, clusters = TRUE, operator = NULL )setFilter( schema = NULL, rows = NULL, columns = NULL, invert = FALSE, clusters = TRUE, operator = NULL )
schema |
[ |
rows |
[ |
columns |
[ |
invert |
[ |
clusters |
[ |
operator |
[ |
An object of class schema.
Other functions to describe table arrangement:
setCluster(),
setFormat(),
setGroups(),
setIDVar(),
setObsVar()
(input <- tabs2shift$messy_rows) # select rows where there is 'unit 2' in column 1 or 'year 2' in column 2 schema <- setFilter(rows = .find(pattern = "unit 2", col = 1)) %>% setFilter(rows = .find(pattern = "year 2", col = 2), operator = `|`) %>% setIDVar(name = "territories", columns = 1) %>% setIDVar(name = "year", columns = 2) %>% setIDVar(name = "commodities", columns = 3) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) reorganise(schema = schema, input = input)(input <- tabs2shift$messy_rows) # select rows where there is 'unit 2' in column 1 or 'year 2' in column 2 schema <- setFilter(rows = .find(pattern = "unit 2", col = 1)) %>% setFilter(rows = .find(pattern = "year 2", col = 2), operator = `|`) %>% setIDVar(name = "territories", columns = 1) %>% setIDVar(name = "year", columns = 2) %>% setIDVar(name = "commodities", columns = 3) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) reorganise(schema = schema, input = input)
Any table makes some assumptions about the data, but they are mostly not explicitly recorded in the commonly available table format. This concerns, for example, the symbol(s) that signal "not available" values or the symbol that is used as decimal sign.
setFormat( schema = NULL, header = FALSE, decimal = NULL, thousand = NULL, na_values = NULL, zero_values = NULL, flags = NULL )setFormat( schema = NULL, header = FALSE, decimal = NULL, thousand = NULL, na_values = NULL, zero_values = NULL, flags = NULL )
schema |
[ |
header |
[ |
decimal |
[ |
thousand |
[ |
na_values |
[ |
zero_values |
[ |
flags |
[ |
Please also take a look at the currently suggested strategy to set up a schema description.
An object of class schema.
Other functions to describe table arrangement:
setCluster(),
setFilter(),
setGroups(),
setIDVar(),
setObsVar()
# please check the vignette for examples# please check the vignette for examples
This function allows to set groups for rows, columns or clusters that shall be summarised.
setGroups(schema = NULL, rows = NULL, columns = NULL)setGroups(schema = NULL, rows = NULL, columns = NULL)
schema |
[ |
rows |
[ |
columns |
[ |
An object of class schema.
Other functions to describe table arrangement:
setCluster(),
setFilter(),
setFormat(),
setIDVar(),
setObsVar()
# please check the vignette for examples# please check the vignette for examples
Identifying variables are those variables that describe the (qualitative)
properties that make each observation (as described by the
observed variables) unique.
setIDVar( schema = NULL, name = NULL, type = "character", value = NULL, columns = NULL, rows = NULL, split = NULL, merge = NULL, distinct = FALSE )setIDVar( schema = NULL, name = NULL, type = "character", value = NULL, columns = NULL, rows = NULL, split = NULL, merge = NULL, distinct = FALSE )
schema |
[ |
name |
[ |
type |
[ |
value |
[ |
columns |
[ |
rows |
[ |
split |
[ |
merge |
[ |
distinct |
[ |
Please also take a look at the currently suggested strategy to set up a schema description.
An object of class schema.
Other functions to describe table arrangement:
setCluster(),
setFilter(),
setFormat(),
setGroups(),
setObsVar()
# please check the vignette for examples# please check the vignette for examples
Observed variables are those variables that contain the (quantitative)
observed/measured values of each unique unit (as described by the
identifying variables). There may be several of them
and in a tidy table they'd be recorded as separate columns.
setObsVar( schema = NULL, name = NULL, type = "numeric", columns = NULL, top = NULL, distinct = FALSE, factor = 1, key = NULL, value = NULL )setObsVar( schema = NULL, name = NULL, type = "numeric", columns = NULL, top = NULL, distinct = FALSE, factor = 1, key = NULL, value = NULL )
schema |
[ |
name |
[ |
type |
[ |
columns |
[ |
top |
[ |
distinct |
[ |
factor |
[ |
key |
[ |
value |
[ |
Please also take a look at the currently suggested strategy to set up a schema description.
An object of class schema.
Other functions to describe table arrangement:
setCluster(),
setFilter(),
setFormat(),
setGroups(),
setIDVar()
# please check the vignette for examples# please check the vignette for examples
schema
Print the schema
## S4 method for signature 'schema' show(object)## S4 method for signature 'schema' show(object)
object |
[ |
List of table types
tabs2shifttabs2shift
The object of class list contains 20 different types of tables
that are used throughout the unit-tests and examples/vignette.
The tabshiftr package helps the user to build and register schema descriptions of disorganised (messy) tables. Disorganised tables are tables that are not in a topologically coherent form, where packages such as tidyr could be used for reshaping. The schema description documents the arrangement of input tables and is used to reshape them into a standardised (tidy) output format.
Maintainer, Author: Steffen Ehrmann [email protected]
Package website: https://luckinet.github.io/tabshiftr/
Github project: https://github.com/luckinet/tabshiftr
Report bugs: https://github.com/luckinet/tabshiftr/issues
This function groups rows, splices the header into the table and fills missing values where they should not exist.
validateInput(schema = NULL, input = NULL)validateInput(schema = NULL, input = NULL)
schema |
[ |
input |
[ |
validateInput is called automatically by reorganise and
does not usually need to be called directly. It performs two pre-processing
steps on the input table before variable extraction begins:
If setFormat(header = TRUE) was used, the column names that
were consumed by R when reading the file are spliced back into the table as
row 1. This makes row numbers stable and consistent with the schema
description.
If setGroups was used, the specified groups of rows are
summarised into single rows according to the aggregation functions provided
to .sum. Character columns are collapsed with
paste0(na.omit(x), collapse = " ") by default; numeric columns are
summed. Missing values within a group can be filled before aggregation by
passing a fill direction to .sum.
a table where grouped rows are summarised and, if applicable, the header row is spliced back in as row 1.
# validateInput is called implicitly by reorganise(); the example below shows # its effect when setGroups is used to collapse pairs of rows before extraction. (input <- tabs2shift$group_sum) schema <- setGroups(rows = .sum(c(3, 4))) |> setGroups(rows = .sum(c(6, 7))) |> setIDVar(name = "territories", columns = 1) |> setIDVar(name = "year", columns = 2) |> setIDVar(name = "commodities", columns = c(3:6), rows = 2) |> setObsVar(name = "harvested", columns = c(3, 4)) |> setObsVar(name = "production", columns = c(5, 6)) # inspect the pre-processed table directly schema_validated <- validateSchema(schema = schema, input = input) validateInput(schema = schema_validated, input = input)# validateInput is called implicitly by reorganise(); the example below shows # its effect when setGroups is used to collapse pairs of rows before extraction. (input <- tabs2shift$group_sum) schema <- setGroups(rows = .sum(c(3, 4))) |> setGroups(rows = .sum(c(6, 7))) |> setIDVar(name = "territories", columns = 1) |> setIDVar(name = "year", columns = 2) |> setIDVar(name = "commodities", columns = c(3:6), rows = 2) |> setObsVar(name = "harvested", columns = c(3, 4)) |> setObsVar(name = "production", columns = c(5, 6)) # inspect the pre-processed table directly schema_validated <- validateSchema(schema = schema, input = input) validateInput(schema = schema_validated, input = input)
This function takes a raw schema description and updates values that were
only given as wildcard or implied values. It is automatically called by
reorganise, but can also be used in concert with the getters to debug
a schema.
validateSchema(schema = NULL, input = NULL)validateSchema(schema = NULL, input = NULL)
schema |
[ |
input |
[ |
The core idea of a schema description is that it can be written in a
very generic way, as long as it describes sufficiently where in a table
what variable can be found. A very generic way can be via using the
function .find to identify the initially unknown
cell-locations of a variable on-the-fly, for example when it is merely
known that a variable must be in the table, but not where it is.
validateSchema matches a schema with an input table and inserts the
accordingly evaluated positions (of clusters, filters and variables),
adapts some of the meta-data and ensures formal consistency of the schema.
An updated schema description
# build a schema for an already tidy table (tidyTab <- tabs2shift$tidy) schema <- setIDVar(name = "territories", col = 1) %>% setIDVar(name = "year", col = .find(pattern = "period")) %>% setIDVar(name = "commodities", col = 3) %>% setObsVar(name = "harvested", col = 5) %>% setObsVar(name = "production", col = 6) # before ... schema # ... after validateSchema(schema = schema, input = tidyTab)# build a schema for an already tidy table (tidyTab <- tabs2shift$tidy) schema <- setIDVar(name = "territories", col = 1) %>% setIDVar(name = "year", col = .find(pattern = "period")) %>% setIDVar(name = "commodities", col = 3) %>% setObsVar(name = "harvested", col = 5) %>% setObsVar(name = "production", col = 6) # before ... schema # ... after validateSchema(schema = schema, input = tidyTab)