--- title: "Map new concepts" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{map_new_concepts} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(knitr) ``` ```{r setup} library(ontologics) library(dplyr, warn.conflicts = FALSE) ``` When an ontology exists already, it can be used either by looking up which concepts exist in it, which relations the concepts have among each other, by adding and linking new concepts to the already existing concepts or by extending the harmonised concepts based on external ontologies. ## Relate an external dataset to harmonized concepts When adding (or mapping) new concepts, we first have to define the source of the new concepts. ```{r concept matches} # already existing ontology for some project about crops crops <- load_ontology(path = system.file("extdata", "crops.rds", package = "ontologics")) # where we have to set the external dataset as new source crops <- new_source(name = "externalDataset", version = "0.0.1", description = "a vocabulary", homepage = "https://www.something.net", license = "CC-BY-4.0", ontology = crops) # new concepts that occur in the external dataset, which should be harmonised with the ontology externalConcepts <- c("Wheat", "NUTS", "Avocado") ``` The new concepts are from different conceptual levels, both 'Wheat' and 'Avocado' are the crop itself, while 'NUTS' is an aggregate of various crops (such as walnut, hazelnut, etc). Let's first find out whether these concepts are in fact new concepts because they are missing from the ontology. ```{r get_concepts missing} missingConcepts <- get_concept(label = externalConcepts, ontology = crops) missingConcepts %>% select(1:5) %>% kable() ``` This tells us that that both 'NUTS' and 'Wheat' don't seem to be missing from the ontology. Moreover, we see that 'Wheat' is a *class* and not a *crop* and 'NUTS' also isn't *crop* and doesn't have any *broader* concept. While Avocado is missing from the ontology entirely, the crop Wheat is also missing so that both have to be defined as new concept. By studying the ontology (not shown here), we can identify the semantic relation between the new concepts and some of the already harmonized concepts. First of all, to assign new concepts from an external ontology, a harmonised concept must already exist in the harmonised ontology. For the new harmonized concepts, we chose the lower capital letter words to show the difference between those and the external concepts and we assign them as narrower concepts to the respective already existing, harmonized (broader) concepts. ```{r set_concept} broaderConcepts <- get_concept(label = c("Wheat", "Tropical and subtropical Fruit"), ontology = crops) crops <- new_concept(new = c("wheat", "avocado"), broader = broaderConcepts, class = "crop", ontology = crops) ``` Eventually, all new (external) concepts can be mapped to already harmonized concepts. Even though 'NUTS' already exists, this also applies to this new concept, because the already existing concept 'NUTS' doesn't necessarily have to be the same as the new concept 'NUTS'. This all depends on the respective description of the harmonized and external concepts. When setting a new mapping, the type and the certainty of the match have to be defined. As we have defined the closely related concepts as new concepts, we can chose a *close* match for all external concepts. Certainty has the value 3, because it is *clear, assigned according to a given definition*. ```{r set_mapping} toMap <- get_concept(label = c("wheat", "NUTS", "avocado"), ontology = crops) crops <- new_mapping(new = externalConcepts, target = toMap, match = c("close", "close", "close"), source = "externalDataset", certainty = 3, ontology = crops) ``` ## Extend the classes It may be, moreover, that we want to (or have to) add new concepts, that do not have a class defined yet. This can be the case when we have to nest new concepts, out of a range of concepts that do have a valid class, into concepts that are already at the lowest possible hierarchical level. These concepts are, when specified with `class = NA`, `class = c(bla, blubb, NA)`, or `class = NULL`, assigned the actual class `undefined` and you are informed about how to proceed. ```{r} broaderConcepts <- get_concept(label = c("wheat", "wheat"), ontology = crops) # for (some of) these concepts we do not know the class ... crops <- new_concept(new = c("wheat1", "wheat2"), broader = broaderConcepts, class = NA_character_, ontology = crops) make_tree(label = "Wheat", ontology = crops) %>% select(1:5) %>% kable() # ... ok, then let's specify that class and re-run new_concept crops <- new_class(new = "cultivar", target = "crop", description = "type of plant that people have bred for desired traits", ontology = crops) crops <- new_concept(new = c("wheat1", "wheat2"), broader = broaderConcepts, class = "cultivar", ontology = crops) ``` Now we can check whether the updated ontology is as we'd expect, for example by looking at the tree of the respective items again. We should expect that the new harmonized concepts now appear in the ontology and that they have some link to an external concept defined. ```{r new ontology} make_tree(label = "Wheat", ontology = crops) %>% select(1:5) %>% kable() make_tree(label = "NUTS", ontology = crops) %>% select(1:5) %>% kable() make_tree(label = "FRUIT", ontology = crops) %>% select(1:5) %>% kable() # and finally a list of all external concepts get_concept(external = TRUE, ontology = crops) %>% kable() ```