Title: | Reshape Disorganised Messy Data |
---|---|
Description: | Helps the user to build and register schema descriptions of disorganised (messy) tables. Disorganised tables are tables that are not in a topologically coherent form, where packages such as 'tidyr' could be used for reshaping. The schema description documents the arrangement of input tables and is used to reshape them into a standardised (tidy) output format. |
Authors: | Steffen Ehrmann [aut, cre] |
Maintainer: | Steffen Ehrmann <[email protected]> |
License: | GPL-3 |
Version: | 0.5.1 |
Built: | 2025-02-15 05:24:05 UTC |
Source: | https://github.com/luckinet/tabshiftr |
Evaluate .find constructs
.eval_find(input = NULL, col = NULL, row = NULL, clusters = NULL)
.eval_find(input = NULL, col = NULL, row = NULL, clusters = NULL)
input |
[ |
col |
[ |
row |
[ |
clusters |
[ |
the columns or rows of the evaluated position
Evaluate .sum constructs
.eval_sum(input = NULL, groups = NULL, data = NULL)
.eval_sum(input = NULL, groups = NULL, data = NULL)
input |
[ |
groups |
[ |
data |
[ |
the position of the evaluated position
This function is a collection of expectations which ensure that the output of
reorganise
is formally and contentwise correct. It is used in
the tests of this package.
.expect_valid_table( x = NULL, units = 1, variables = NULL, groups = FALSE, flags = FALSE )
.expect_valid_table( x = NULL, units = 1, variables = NULL, groups = FALSE, flags = FALSE )
x |
a table to test. |
units |
the number of units in the output table (from 1 to 3) |
variables |
the variables that should be in the output table (either "harvested" or "production") |
groups |
whether or not groups are in the test table. |
flags |
whether or not flags are in the test table. |
Either an error message of the invalid expectations, or the output of the last successful expectation.
Convenience wrapper around tidyr::fill()
.fill(x = NULL, direction = TRUE)
.fill(x = NULL, direction = TRUE)
x |
[ |
direction |
[ |
Find the location of a variable not based on it's columns/rows, but based on a regular expression or function
.find( fun = NULL, pattern = NULL, col = NULL, row = NULL, invert = FALSE, relative = FALSE )
.find( fun = NULL, pattern = NULL, col = NULL, row = NULL, invert = FALSE, relative = FALSE )
fun |
[ |
pattern |
[ |
col |
[ |
row |
[ |
invert |
[ |
relative |
[ |
This functions is basically a wild-card for when columns or rows are not known ad-hoc, but have to be assigned on the fly. This can be very helpful when several tables contain the same variables, but the arrangement may be slightly different.
the index values where the target was found.
The first step in using any schema is validating
it via the function validateSchema
. This happens by default
in reorganise
, but can also be done manually, for example
when debugging complicated schema descriptions.
In case that function encounters a schema that wants to find columns or
rows on the fly via .find
, it combines all cells of columns and all
cells of rows into one character string and matches the regular expression
or function on those. Columns/rows that have a match are returned as the
respective column/row value.
# use regular expressions to find cell positions (input <- tabs2shift$clusters_messy) schema <- setCluster(id = "territories", left = .find(pattern = "comm*"), top = .find(pattern = "comm*")) %>% setIDVar(name = "territories", columns = c(1, 1, 4), rows = c(2, 9, 9)) %>% setIDVar(name = "year", columns = 4, rows = c(3:6), distinct = TRUE) %>% setIDVar(name = "commodities", columns = c(1, 1, 4)) %>% setObsVar(name = "harvested", columns = c(2, 2, 5)) %>% setObsVar(name = "production", columns = c(3, 3, 6)) schema validateSchema(schema = schema, input = input) # use a function to find rows (input <- tabs2shift$messy_rows) schema <- setFilter(rows = .find(fun = is.numeric, col = 1, invert = TRUE)) %>% setIDVar(name = "territories", columns = 1) %>% setIDVar(name = "year", columns = 2) %>% setIDVar(name = "commodities", columns = 3) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) reorganise(schema = schema, input = input)
# use regular expressions to find cell positions (input <- tabs2shift$clusters_messy) schema <- setCluster(id = "territories", left = .find(pattern = "comm*"), top = .find(pattern = "comm*")) %>% setIDVar(name = "territories", columns = c(1, 1, 4), rows = c(2, 9, 9)) %>% setIDVar(name = "year", columns = 4, rows = c(3:6), distinct = TRUE) %>% setIDVar(name = "commodities", columns = c(1, 1, 4)) %>% setObsVar(name = "harvested", columns = c(2, 2, 5)) %>% setObsVar(name = "production", columns = c(3, 3, 6)) schema validateSchema(schema = schema, input = input) # use a function to find rows (input <- tabs2shift$messy_rows) schema <- setFilter(rows = .find(fun = is.numeric, col = 1, invert = TRUE)) %>% setIDVar(name = "territories", columns = 1) %>% setIDVar(name = "year", columns = 2) %>% setIDVar(name = "commodities", columns = 3) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) reorganise(schema = schema, input = input)
Get the column types of a tibble
.getColTypes(input = NULL, collapse = TRUE)
.getColTypes(input = NULL, collapse = TRUE)
input |
[ |
collapse |
[ |
Summarise groups of rows or columns
.sum(..., character = NULL, numeric = NULL, fill = NULL)
.sum(..., character = NULL, numeric = NULL, fill = NULL)
... |
[ |
character |
[ |
numeric |
[ |
fill |
[ |
By default character
values are summarised with the function
paste0(na.omit(x), collapse = "-/-")
and numeric
values with
the function sum(x, na.rm = TRUE)
. To avoid un-intuitive behavior,
it is wisest to explicitly specify how all exceptions, such as NA-values,
shall be handled and thus to provide a new function.
the index values where the target was found.
This function matches id and observed variables and reshapes them accordingly
.tidyVars(ids = NULL, obs = NULL, clust = NULL, grp = NULL)
.tidyVars(ids = NULL, obs = NULL, clust = NULL, grp = NULL)
ids |
list of id variables |
obs |
list of observed variables |
clust |
list of cluster variables |
grp |
list of group variables |
a symmetric list of variables (all with the same dimensions)
This function updates the format of a table by applying a schema description to it.
.updateFormat(input = NULL, schema = NULL)
.updateFormat(input = NULL, schema = NULL)
input |
[ |
schema |
[ |
This function extracts the cluster variable from a table by applying a schema description to it.
getClusterVar(schema = NULL, input = NULL)
getClusterVar(schema = NULL, input = NULL)
schema |
[ |
input |
[ |
a list per cluster with values of the cluster variable
input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getClusterVar(input = input)
input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getClusterVar(input = input)
This function extracts the cluster grouping variable from a table by applying a schema description to it.
getGroupVar(schema = NULL, input = NULL)
getGroupVar(schema = NULL, input = NULL)
schema |
[ |
input |
[ |
a list per cluster with values of the grouping variable
input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getGroupVar(input = input)
input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getGroupVar(input = input)
This function extracts the identifying variables from a table by applying a schema description to it.
getIDVars(schema = NULL, input = NULL)
getIDVars(schema = NULL, input = NULL)
schema |
[ |
input |
[ |
a list per cluster with values of the identifying variables
input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getIDVars(input = input)
input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getIDVars(input = input)
This function extracts the observed variables from a table by applying a schema description to it.
getObsVars(schema = NULL, input = NULL)
getObsVars(schema = NULL, input = NULL)
schema |
[ |
input |
[ |
a list per cluster with values of the observed variables
input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getObsVars(input = input)
input <- tabs2shift$clusters_nested schema <- setCluster(id = "sublevel", group = "territories", member = c(1, 1, 2), left = 1, top = c(3, 8, 15)) %>% setIDVar(name = "territories", columns = 1, rows = c(2, 14)) %>% setIDVar(name = "sublevel", columns = 1, rows = c(3, 8, 15)) %>% setIDVar(name = "year", columns = 7) %>% setIDVar(name = "commodities", columns = 2) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) validateSchema(schema = schema, input = input) %>% getObsVars(input = input)
This function takes a disorganised messy table and rearranges columns and rows into a tidy table based on a schema description.
reorganise(input = NULL, schema = NULL)
reorganise(input = NULL, schema = NULL)
input |
[ |
schema |
[ |
A (tidy) table which is the result of reorganising input
based
on schema
.
# a rather disorganised table with messy clusters and a distinct variable (input <- tabs2shift$clusters_messy) # put together schema description by ... # ... identifying cluster positions schema <- setCluster(id = "territories", left = c(1, 1, 4), top = c(1, 8, 8)) # ... specifying the cluster ID as id variable (obligatory) schema <- schema %>% setIDVar(name = "territories", columns = c(1, 1, 4), rows = c(2, 9, 9)) # ... specifying the distinct variable (explicit position) schema <- schema %>% setIDVar(name = "year", columns = 4, rows = c(3:6), distinct = TRUE) # ... specifying a tidy variable (by giving the column values) schema <- schema %>% setIDVar(name = "commodities", columns = c(1, 1, 4)) # ... identifying the (tidy) observed variables schema <- schema %>% setObsVar(name = "harvested", columns = c(2, 2, 5)) %>% setObsVar(name = "production", columns = c(3, 3, 6)) # get the tidy output reorganise(input, schema)
# a rather disorganised table with messy clusters and a distinct variable (input <- tabs2shift$clusters_messy) # put together schema description by ... # ... identifying cluster positions schema <- setCluster(id = "territories", left = c(1, 1, 4), top = c(1, 8, 8)) # ... specifying the cluster ID as id variable (obligatory) schema <- schema %>% setIDVar(name = "territories", columns = c(1, 1, 4), rows = c(2, 9, 9)) # ... specifying the distinct variable (explicit position) schema <- schema %>% setIDVar(name = "year", columns = 4, rows = c(3:6), distinct = TRUE) # ... specifying a tidy variable (by giving the column values) schema <- schema %>% setIDVar(name = "commodities", columns = c(1, 1, 4)) # ... identifying the (tidy) observed variables schema <- schema %>% setObsVar(name = "harvested", columns = c(2, 2, 5)) %>% setObsVar(name = "production", columns = c(3, 3, 6)) # get the tidy output reorganise(input, schema)
This function checks the current setup of a schema and reports problems that
will lead to an error of reorganise
if not fixed.
reportProblems(schema = NULL)
reportProblems(schema = NULL)
schema |
[ |
Default template of a schema description
schema_default
schema_default
The object of class schema
describes at which position in a
table which information can be found. It contains the four slots
clusters
, format
, filter
and variables
.
The default schema description contains all slots and fields that are
required by default and identifying and observed variables are added to it
into the variables
slot.
schema
class (S4) and its methodsA schema
stores the information of where which information is stored
in a table of data.
cluster
[list(1)
]
description of
clusters
in the table.
format
[list(1)
]
description of the table
format
variables
[named list(.)
]
description of
identifying
and observed
variables.
This section outlines the currently recommended strategy for setting up schema descriptions. For example tables and the respective schemas, see the vignette.
Variables: Clarify which are the identifying variables and which are the observed variables. Make sure not to mistake a listed observed variable as identifying variable.
Clusters: Determine whether there are clusters and if so, find
the origin (top left cell) of each cluster and provide the required
information in setCluster(top = ..., left =
...)
. It is advised to treat a table that contains meta-data in the top
rows as cluster, as this is often the case with implicit variables. All
variables need to be specified in each cluster (in case clusters are all
organised in the same arrangement), or relative = TRUE
can be used.
Data may be organised into clusters a) whenever a set of variables occurs
more than once in the same table, nested into another variable, or b) when
the data are organised into separate spreadsheets or files according to one
of the variables (depending on the context, these issues can also be solved
differently). In both cases the variable responsible for clustering (the
cluster ID) can be either an identifying variable, or a categorical
observed variable:
in case the cluster ID is an identifying variable, provide its name
in setCluster(id = ...)
and specify it as an
identifying variable (setIDVar
)
in case it is a observed variable, provide simply
setCluster(..., id = "observed")
.
Meta-data: Provide potentially information about the format
(setFormat
).
Identifying variables: Determine the following:
is the variable available at all? This is particularly important when
the data are split up into tables that are in spreadsheets or files. Often
the variable that splits up the data (and thus identifies the clusters) is
not explicitly available in the table anymore. In such a case, provide the
value in setIDVar(..., value = ...)
.
all columns in which the variable values sit.
in case the variable is in several columns, determine additionally the row in which its values sit. In this case, the values will look like they are part of a header.
in case the variable must be split off of another column, provide a
regular expression that results in the target subset via
setIDVar(..., split = ...)
.
in case the variable is distinct from the main table, provide the
explicit (non-relative) position and set
setIDVar(..., distinct = TRUE)
.
Observed variable: Determine the following:
all columns in which the values of the variable sit.
the conversion factor.
in case the variable is not tidy, go through the following cases one after the other:
in case the variable is nested in a wide identifying variable, determine in addition to the columns in which the values sit also the rows in which the variable name sits.
in case the names of the variable are given as a value of an
identifying variable, give the column name as
setObsVar(..., key = ...)
, together with the name
of the respective observed variable (as it appears in the table) in
values
.
in case the name of the variable is the ID of clusters, specify
setObsVar(..., key = "cluster", value = ...)
,
where values
has the cluster number the variable refers to.
There is hardly any limit to how data can be arranged in a spreadsheet, apart
from the apparent organisation into a lattice of cells. However, it is often
the case that data are gathered into topologically coherent chunks. Those
chunks are what is called 'cluster' in tabshiftr
.
setCluster( schema = NULL, id = NULL, group = NULL, member = NULL, left = NULL, top = NULL, width = NULL, height = NULL )
setCluster( schema = NULL, id = NULL, group = NULL, member = NULL, left = NULL, top = NULL, width = NULL, height = NULL )
schema |
[ |
id |
[ |
group |
[ |
member |
[ |
left |
[ |
top |
[ |
width |
[ |
height |
[ |
Please also take a look at the currently suggested strategy to set up a schema description.
An object of class schema
.
Other functions to describe table arrangement:
setFilter()
,
setFormat()
,
setGroups()
,
setIDVar()
,
setObsVar()
# please check the vignette for examples
# please check the vignette for examples
This function allows to specify additional rules to filter certain rows
setFilter( schema = NULL, rows = NULL, columns = NULL, invert = FALSE, clusters = TRUE, operator = NULL )
setFilter( schema = NULL, rows = NULL, columns = NULL, invert = FALSE, clusters = TRUE, operator = NULL )
schema |
[ |
rows |
[ |
columns |
[ |
invert |
[ |
clusters |
[ |
operator |
[ |
An object of class schema
.
Other functions to describe table arrangement:
setCluster()
,
setFormat()
,
setGroups()
,
setIDVar()
,
setObsVar()
(input <- tabs2shift$messy_rows) # select rows where there is 'unit 2' in column 1 or 'year 2' in column 2 schema <- setFilter(rows = .find(pattern = "unit 2", col = 1)) %>% setFilter(rows = .find(pattern = "year 2", col = 2), operator = `|`) %>% setIDVar(name = "territories", columns = 1) %>% setIDVar(name = "year", columns = 2) %>% setIDVar(name = "commodities", columns = 3) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) reorganise(schema = schema, input = input)
(input <- tabs2shift$messy_rows) # select rows where there is 'unit 2' in column 1 or 'year 2' in column 2 schema <- setFilter(rows = .find(pattern = "unit 2", col = 1)) %>% setFilter(rows = .find(pattern = "year 2", col = 2), operator = `|`) %>% setIDVar(name = "territories", columns = 1) %>% setIDVar(name = "year", columns = 2) %>% setIDVar(name = "commodities", columns = 3) %>% setObsVar(name = "harvested", columns = 5) %>% setObsVar(name = "production", columns = 6) reorganise(schema = schema, input = input)
Any table makes some assumptions about the data, but they are mostly not explicitly recorded in the commonly available table format. This concerns, for example, the symbol(s) that signal "not available" values or the symbol that is used as decimal sign.
setFormat( schema = NULL, header = 0, decimal = NULL, thousand = NULL, na_values = NULL, zero_values = NULL, flags = NULL )
setFormat( schema = NULL, header = 0, decimal = NULL, thousand = NULL, na_values = NULL, zero_values = NULL, flags = NULL )
schema |
[ |
header |
[ |
decimal |
[ |
thousand |
[ |
na_values |
[ |
zero_values |
[ |
flags |
[ |
Please also take a look at the currently suggested strategy to set up a schema description.
An object of class schema
.
Other functions to describe table arrangement:
setCluster()
,
setFilter()
,
setGroups()
,
setIDVar()
,
setObsVar()
# please check the vignette for examples
# please check the vignette for examples
This function allows to set groups for rows, columns or clusters that shall be summarised.
setGroups(schema = NULL, rows = NULL, columns = NULL)
setGroups(schema = NULL, rows = NULL, columns = NULL)
schema |
[ |
rows |
[ |
columns |
[ |
An object of class schema
.
Other functions to describe table arrangement:
setCluster()
,
setFilter()
,
setFormat()
,
setIDVar()
,
setObsVar()
# please check the vignette for examples
# please check the vignette for examples
Identifying variables are those variables that describe the (qualitative)
properties that make each observation (as described by the
observed variables
) unique.
setIDVar( schema = NULL, name = NULL, type = "character", value = NULL, columns = NULL, rows = NULL, split = NULL, merge = NULL, distinct = FALSE )
setIDVar( schema = NULL, name = NULL, type = "character", value = NULL, columns = NULL, rows = NULL, split = NULL, merge = NULL, distinct = FALSE )
schema |
[ |
name |
[ |
type |
[ |
value |
[ |
columns |
[ |
rows |
[ |
split |
[ |
merge |
[ |
distinct |
[ |
Please also take a look at the currently suggested strategy to set up a schema description.
An object of class schema
.
Other functions to describe table arrangement:
setCluster()
,
setFilter()
,
setFormat()
,
setGroups()
,
setObsVar()
# please check the vignette for examples
# please check the vignette for examples
Observed variables are those variables that contain the (quantitative)
observed/measured values of each unique unit (as described by the
identifying variables
). There may be several of them
and in a tidy table they'd be recorded as separate columns.
setObsVar( schema = NULL, name = NULL, type = "numeric", columns = NULL, top = NULL, distinct = FALSE, factor = 1, key = NULL, value = NULL )
setObsVar( schema = NULL, name = NULL, type = "numeric", columns = NULL, top = NULL, distinct = FALSE, factor = 1, key = NULL, value = NULL )
schema |
[ |
name |
[ |
type |
[ |
columns |
[ |
top |
[ |
distinct |
[ |
factor |
[ |
key |
[ |
value |
[ |
Please also take a look at the currently suggested strategy to set up a schema description.
An object of class schema
.
Other functions to describe table arrangement:
setCluster()
,
setFilter()
,
setFormat()
,
setGroups()
,
setIDVar()
# please check the vignette for examples
# please check the vignette for examples
schema
Print the schema
## S4 method for signature 'schema' show(object)
## S4 method for signature 'schema' show(object)
object |
[ |
List of table types
tabs2shift
tabs2shift
The object of class list
contains 20 different types of tables
that are used throughout the unit-tests and examples/vignette.
This function groups rows, splices the header into the table and fills missing values where they should not exist.
validateInput(schema = NULL, input = NULL)
validateInput(schema = NULL, input = NULL)
schema |
[ |
input |
[ |
a table where columns and rows are grouped and headers are spliced into the table.
This function takes a raw schema description and updates values that were
only given as wildcard or implied values. It is automatically called by
reorganise
, but can also be used in concert with the getters to debug
a schema.
validateSchema(schema = NULL, input = NULL)
validateSchema(schema = NULL, input = NULL)
schema |
[ |
input |
[ |
The core idea of a schema description is that it can be written in a
very generic way, as long as it describes sufficiently where in a table
what variable can be found. A very generic way can be via using the
function .find
to identify the initially unknown
cell-locations of a variable on-the-fly, for example when it is merely
known that a variable must be in the table, but not where it is.
validateSchema
matches a schema with an input table and inserts the
accordingly evaluated positions (of clusters, filters and variables),
adapts some of the meta-data and ensures formal consistency of the schema.
An updated schema description
# build a schema for an already tidy table (tidyTab <- tabs2shift$tidy) schema <- setIDVar(name = "territories", col = 1) %>% setIDVar(name = "year", col = .find(pattern = "period")) %>% setIDVar(name = "commodities", col = 3) %>% setObsVar(name = "harvested", col = 5) %>% setObsVar(name = "production", col = 6) # before ... schema # ... after validateSchema(schema = schema, input = tidyTab)
# build a schema for an already tidy table (tidyTab <- tabs2shift$tidy) schema <- setIDVar(name = "territories", col = 1) %>% setIDVar(name = "year", col = .find(pattern = "period")) %>% setIDVar(name = "commodities", col = 3) %>% setObsVar(name = "harvested", col = 5) %>% setObsVar(name = "production", col = 6) # before ... schema # ... after validateSchema(schema = schema, input = tidyTab)