Having defined the schema category \({\mathbf {S}}\), instance category \({\mathbf {I}}\), and mapping \({\mathfrak {M}}\) between the categorical representation and particular models, in this section, we can introduce the algorithms for mutual transformation between categorical and logical data representations. We aim to provide a generic approach applicable to all data models (and their combinations). After we define the transformation process for both directions, we discuss how it can be used, e.g., for data migration.
Model-to-category transformation
First of all, we describe the process of data transformation from a particular logical model to the categorical representation. It consists of two steps: 1) we fetch data from an input logical model and 2) we insert selected records, one-by-one, to instance category \({\mathbf {I}}\).
Forest of records
To be able to uniformly manipulate records from different data models (recall Table 1), both aggregate-oriented and aggregate-ignorant, we first propose their tree-based representation. Each record r is represented as a directed (eventually orderedFootnote 11) tree \(r = (V, E)\). V contains a node \(v_i\) for each (eventually nested) property \(\phi _i\), \(i = 1, \dots , n\), in record r (only if property \(\phi _i\) appears in access path as a mapping of a categorical object) and an auxiliary root node \(v_0 \in V\) representing the whole record denoted as \(\phi _0\). Each node \(v \in V\) contains an array of name/value pairs \((name_v,value_v)\), where \(name_v\) represents the name of the property and \(value_v\) represents its value. Nodes \(v_j, v_k \in V\) are connected using a directed edge \(e = (v_j, v_k)\), \(e \in E \) if the corresponding properties \(\phi _j, \phi _k\) in record r are in a parent/child relationship, i.e., property \(\phi _k\) is nested in property \(\phi _j\). Hence, a property with a simple type or a property representing an array of a simple type is represented as a leaf node, while other types of properties are represented as an inner node.
Records of the same kind \(\kappa \) are grouped to form a forest of records \(F_\kappa = (T_\kappa , M_\kappa )\), where \(T_\kappa \) is a set of trees representing the records of \(\kappa \) and \(M_\kappa \) is a mapping that maps a categorical identifier of each property \(\phi \) occurring in kind \(\kappa \) to the list of the respective nodes in trees in \(T_\kappa \). The categorical identifiers correspond to a pair \(name_\phi : context_\phi \) for inner nodes and \(name_\phi : value_\phi \) for leaf nodes. The mapping allows a quick access to all properties corresponding to the same instance category object at the same level of trees in \(T_\kappa \). Hence, there is no need to traverse the whole tree to access a particular property. (Note that we do not materialize the whole forest for all input trees. Only the currently processed data fragments are constructed for further processing.)
Example 4.1
Fig. 7 illustrates the representation of document Order corresponding to the access path depicted in Fig. 6 as a tree (in a forest of size 1). On the left we can see the categorical identifiers, on the right the particular tree, whereas the levels represent the mapping. (Note that to simplify the figure, we do not depict \(value_v\) of node v if \(name_v\) is user-defined and thus it is a part of the categorical identifier.) The root of the tree corresponds to the document itself. All leaves correspond to properties with a simple type or an array of a simple type. Other nodes represent more complex structures. For example, node items corresponds to a complex-type array. Anonymous node _ corresponds to a nested document. Node contact corresponds to the map of contacts having dynamically derived names of properties. (Note that node _id does not appear in the forest of records, since the corresponding object is not in the schema category.) \(\square \)
Example 4.2
As illustrated in Fig. 9, the representation of records in relational table Customer is significantly simpler, since there are no hierarchical structures. In the figure, we can see the mapping of each categorical identifier (on the left) to all respective properties in all trees depicted at the same level (on the right). \(\square \)
Transformation algorithm
The input of the algorithm is formed of schema category \({\mathbf {S}}\), (possibly non-empty) instance category \({\mathbf {I}}\) corresponding to \({\mathbf {S}}\), the forest of input records \(F_\kappa = (T_\kappa , M_\kappa )\) of kind \(\kappa \), access path \(P_\kappa \) of kind \(\kappa \), root object \(root_{\kappa }\) and root morphism \(morph_{\kappa }\) associated with \(\kappa \). A model-specific command creates the forest of records (expressed in pseudocode, e.g., like SELECT * FROM KIND \(\kappa \)), followed by model-specific transformation of its result to the forest structure \(F_\kappa \). In "Framework MM-cat" section we show the respective implementation for particular models using wrappers.
The algorithm processes one-by-one every input record (tree) \(r \in T_\kappa \). Based on the DFS traversal, it traverses the access path \(P_\kappa \) which describes the required mapping and fills instance category \({\mathbf {I}}\) with appropriate data fragments. The pseudocode of the transformation algorithm is provided in Algorithm 1.
As we can see, processing one record r consists of two phases—preparation and processing of the rest of the tree.
Preparation Phase In the preparation phase, we distinguish two situations—if kind \(\kappa \) is associated with a root object or a root morphism. In the former case (line 8), we first gain object \(q_{\mathbf {I}}\) corresponding to \(root_{\kappa }\) using functor \(Inst_{{\mathbf {I}}} : {\mathbf {S}} \rightarrow {\mathbf {I}}\). Next, using function fetchSids() we acquire a set S which consists of sets of pairs (name, value), where name corresponds to a particular superid attribute of \(root_\kappa \) and value corresponds to the respective value in r, if it exists. (In the case of \(root_\kappa \), every record r is identified using a single (super)identifier, i.e., \(|S| = 1\).) Note that we work with the keys of schema category objects used both in the access path \(P_\kappa \) and in the mapping \(F_\kappa \) used in the input forest of records.
Example 4.3
Consider again Figs. 7 and 6. Object \(o_\kappa \) corresponding to Order is identified by a \(superid = \{1.21.24, 25\}\) corresponding to objects Id (with \(key = 101\)) and Number (with \(key = 112\)). Function fetchSids() exploits mapping \(M_\kappa \) to quickly navigate to specific values of properties customer and number, matches them to corresponding keys of objects representing these properties in \({\mathbf {S}}\) and returns set S that contains a single set \(\{(1.21.24, 1), (25, 2)\}\). \(\square \)
Then, the algorithm iterates through the set S. Each \(sid \in S\) internally modifies object \(q_{{\mathbf {I}}}\) and participates in further traversing of access path \(P_\kappa \). Internal modification of \(q_{\mathbf {I}}\) is done in function modifyActiveDomain() (line 12), where four cases may occur:
-
If \(sid \in q_{{\mathbf {I}}}\), nothing has to be done.
-
If sid is a part of an already existing \(sid_{{\mathbf {I}}} \in q_{{\mathbf {I}}}\), sid is replaced by \(sid_{{\mathbf {I}}}\).
-
If sid corresponds to an already existing \(S_{{\mathbf {I}}} \subseteq q_{{\mathbf {I}}}\), sid replaces \(S_{{\mathbf {I}}}\).
-
If \(sid \notin q_{{\mathbf {I}}}\), it is added.
Further traversing is ensured by function children() (line 13) which determines the new context and value to be processed in the same way. (We describe its body in detail in paragraph Function children() on page 21.) The result of the function associated with a particular sid is then pushed to the top of auxiliary stack M as a triple (sid, context, value). The reason for also involving sid is that we need to know the associated parent in the next steps to appropriately fill the morphisms context between corresponding parent and child objects in \({\mathbf {I}}\).
In the second option, i.e., if \(\kappa \) is associated with a root morphism (line 15), we gain both the domain and codomain of the root morphism \(morph_\kappa \). Next, for both of them, we also fetch the sets of corresponding superidentifiers using function fetchSids() and we apply function modifyActiveDomain() respectively. In lines 22 and 23 we fill relations corresponding to the root morphism and its dual morphism. Using function getSubpathBySignature() we get an access subpath \(t'\) of access path t provided in the first parameter corresponding to the signature of morphism m provided in the second parameter. In particular, it is a subpath \(t'\) such that every leaf l of \(t'\) has \(l.context = m\) or \(l.value = m\) or any ancestor a of l has \(a.context = m\). If there are more such subpaths, the one closest to t is returned. If m is null, then l such that \(l.value = \epsilon \) is returned.
Finally, we acquire all new pairs (context, value) to be processed regarding the root morphism’s domain and codomain to ensure further traversing. These pairs, except for the one representing the already processed root morphism, are then pushed to the auxiliary stack M together with respective sids.
Processing of the Tree After having completed the initial phase, the algorithm one-by-one releases and processes the top of the stack M until it is empty. The released triple \((pid, m_{{\mathbf {S}}}, t)\) forms the new context of the algorithm, i.e., context morphism \(m_{\mathbf {S}}\) and access (sub)path t associated with parent superidentifier pid. Morphism \(m_{{\mathbf {I}}} : p_{{\mathbf {I}}} \rightarrow o_{{\mathbf {I}}}\) and object \(q _{{\mathbf {I}}}\) are then computed using functor \(Inst_{\mathbf {I}}\) (line 32 and 34).
Once again, we fetch S as a set of superidentifiers corresponding to \(o_{{\mathbf {S}}}\) (being codomain of \(m_{{\mathbf {S}}}\)) from record r associated with currently processed pid (i.e., there is an edge \((pid, sid) \in r\)). This time size of S is not limited by 1 since the cardinalities of the properties allow multiplicity. S being fetched, the algorithm iterates through \(sid \in S\) and processes each of them in order:
-
1
to internally modify the active domain of object \(q_{{\mathbf {I}}}\) (line 37),
-
2
to add relations for \(m_{{\mathbf {I}}}\) (lines 38, 39), and
-
3
to participate in the further traversing of access path t (lines 40, 41).
Note that function fetchSids() returns only superid sets that are constructed from properties having as an ancestor value pid in the currently processed record r. In the preparation phase, the same function returns superid values related to null, e.g., having no ancestor.
Also note that the function fetchSids() returns an empty set if the data corresponding to the fragment of the access path does not occur in the record. As a consequence of an empty set of sids, the (possible) traversing of corresponding access subpath stops, since there is no data in the record to be traversed (applies for both simple and complex properties).
As for adding of relations, we distinguish two situations. If \(m_{{\mathbf {I}}}\) is a base morphism, we only add pair (pid, sid) to morphism \(m_{{\mathbf {I}}}\) and mapping (sid, pid) to dual morphism \(m_{{\mathbf {I}}}^{-1}\). If \(m_{{\mathbf {I}}}\) is a composite morphism, we add relations to all base morphisms forming the composite morphism \(m_{{\mathbf {I}}}\). Thus we need to extend also the active domains of the affected objects, respectively. To do so, the algorithm either determines the superidentifier of such objects from r, or computes a technical identifier (i.e., autoincrement).
The algorithm ends when the stack M is empty meaning that all the data are transformed into instance category \({\mathbf {I}}\), i.e., internal structures of objects and morphisms in \({\mathbf {I}}\) are appropriately extended.
Example 4.4
Suppose that we have an access path depicted in Fig. 6 and a corresponding forest of records depicted in Fig. 7. The intended transformation should convert the data represented in the document model to the categorical representation corresponding to the schema category \({\mathbf {S}}\) depicted in Fig. 3 and non-empty instance category \({\mathbf {I}}\).
The algorithm processes each record r as follows: First, properties customer and number corresponding to the superidentifier of object Order are fetched from record r, and as a set of tuples, i.e., \(\{(1.21.24, 1), (25,2)\}\), added to set S as the document identifier, i.e., a part of the superidentifier of object Order from schema category \({\mathbf {S}}\). Next, instance category \({\mathbf {I}}\) is extended using sid, i.e., the active domain of corresponding object \(q_{\mathbf {I}}\) is extended with the value of sid. And access path \(P_\kappa \) (depicted in Fig. 6) is traversed, creating triples for stack M for property Customer, Number, Items, and Contact related to sid, as depicted in Fig. 10. In the figure on the right we can also see the current content of instance category \({\mathbf {I}}\), i.e., a particular order was added.
Next, the top of the stack is released, i.e., the triple describing the access subpath leading to property items, i.e.:
-
{ id : 47.39,
-
name : 49.39,
-
price : 51.39,
-
quantity : 37 }
associated with pid (customer : 2, number : 1) and morphism \(35_{{\mathbf {I}}} : Order_{{\mathbf {I}}} \rightarrow Items_{{\mathbf {I}}}\) of instance category \({\mathbf {I}}\). Within this context, the active domain of object \(Items_{{\mathbf {I}}}\) is filled with the following tuple:
\(\{(1.21.24.36, 1), (25.36, 2), (47.39, ``A7'')\}\)
Relation:
\((\{(1.21.24.36, 1), (25.36, 2), (47.39, ``A7'')\}\),
\(\{(1.21.24, 1), (25, 2)\})\)
is added to morphism \(m_{{\mathbf {I}}}\) and dual morphism \(m_{{\mathbf {I}}}^{-1}\) is extended with relation:
\((\{(1.21.24, 1), (25, 2)\}\),
\(\{(1.21.24.36, 1), (25.36, 2), (47.39, ``A7'')\})\)
Finally, the access path leading to property items is further traversed to the access paths corresponding to leaves, i.e., 47.39, 49.39, 51.39, and 37.
The same applies to the other sid, i.e.,
\((1.21.24.36, 1), (25.36, 2), (47.39, ``B1'')\)
as can be seen in Fig. 11. The algorithm continues in the same way until stack M is empty. The resulting part of the instance category \({\mathbf {I}}\) corresponding to kind Order is depicted in Fig. 8. \(\square \)
Function children() Having the whole algorithm built on the DFS principle, the main purpose of function children() is to determine the access subpaths to be traversed from the input access path t. The function (see Algorithm 2) returns a set C of pairs (context, value), each consisting of possibly non-empty access sub-path value and morphism context, both corresponding to currently traversed access path t.
For each top-level property of access path t modeled as a triple (name, context, value) we traverse its name separately. Its context and value are traversed together to determine the body of the property. Both cases are ensured by calling function traverseAccessPath()—see Algorithm 3. While the context may contain a base/composite morphism, the value may contain a base/composite morphism or a complex structure. As we can see in the algorithm, multiple cases may occur:
-
If a name is static or anonymous, nothing has to be done. There is nothing to traverse, so an empty set is returned.
-
If a name is a signature of a base/composite morphism, its dynamic name must be computed and further traversed. Thus, the name and an empty access sub path are added (corresponding to the fact that it represents a leaf).
-
If the value is a signature or empty, i.e., a simple value, the concatenation of context and value is returned together with an empty set to be further traversed.
-
If the context is a signature and the value is complex, the pair (context, value) is returned.
-
Else, i.e., if there is no specified context, we must further traverse value to determine context. Hence, the function children() is recursively called.
Category-to-model transformation
Having an instance category \({\mathbf {I}}\) and mapping \({\mathfrak {M}}\), the opposite direction of transformation allows extraction of data from \({\mathbf {I}}\) and storing it into a particular logical model. The whole algorithm consists of three parts:
-
1
DDL Algorithm: Definition of the schema of the data including names of properties that are dynamically derived (see "DL algorithm" section).
-
2
DML Algorithm: Transformation of data instances from instance category \({\mathbf {I}}\) to a particular logical model (see "DML algorithm" section).
-
3
IC Algorithm: Finalization of schema definition with integrity constraints, i.e., adding of identifiers and references to other kinds (see "IC algorithm" section).
DDL algorithm
Having a schema category \({\mathbf {S}}\), instance category \({\mathbf {I}}\), access path \(P_\kappa \), kind name \(name_{\kappa }\), and particular database wrapper \(W_D\) working over database D, the first algorithm creates a DDL statement to define a schema of kind \(\kappa \) in database D, i.e., a statement of type CREATE KIND. The algorithm proceeds “lazily”. First, it provides all the information about the structure of the currently processed kind \(\kappa \) to wrapper \(W_D\). Second, it calls the method for constructing the output database-specific command. The command can be sent to D for execution or just visualized to the user, e.g., for checking.
The processing is again based on the DFS approach. The traversal of \(P_\kappa \) is implemented using stack M that contains the context of the traversing \((N_p, t)\), i.e., set of names \(N_p\) that correspond to the property represented by access sub-path t. There can be more than one name in \(N_p\) if the property’s name is dynamically derived. In addition, since the structure of \(\kappa \) can be hierarchical, for easier construction of the resulting command, the names in the context are constructed using their concatenation expressing the path from the root of the hierarchy (e.g., /Order/Items/_/Name)—we denote them as hierarchical names.
As we can see in Algorithm 4, we begin the processing with the setting of kind name \(name_{\kappa }\) to wrapper \(W_D\) and we check whether the schema is applicable, i.e., whether database D is not schema-less. If D is schema-less, only a trivial DDL statement is returned, i.e., kind \(\kappa \) is created without specification of its structure (for example, in MongoDB this would be command db.createCollection(“orders”)). Otherwise, traversing of the access path \(P_\kappa \) is carried out using stack M. It is initialized by pushing the initial context, i.e., set \(N_0\) containing only trivial name \(\epsilon \) (since the whole kind \(\kappa \) does not have a parent name) and the whole access path \(P_\kappa \) associated with kind \(\kappa \).
We iterate through the body of while cycle until the stack M is empty. First, we release from the top of the stack M the currently processed context \((N_p, t)\), i.e. a set of hierarchical property names \(N_p\) corresponding to parent property p of the property represented by access sub-path t. Next, using function determinePropertyName() we construct the set of names \(N_t\) of the current property. And we construct the set of new hierarchical names N as a concatenation of pairs resulting from Cartesian product \(N_p \times N_t\).
Depending on whether t describes a simple property (i.e., t.value corresponds to a SIGNATURE or it is empty) or a complex property we add new properties to wrapper \(W_D\). If t describes a simple property (line 14), we create a new property for each name \(n \in N\) within kind \(\kappa \).Footnote 12 Exploiting the cardinalities in schema category \({\mathbf {S}}\), we further specify whether the new property is an array or optional. If t describes a complex property (line 21), the processing is similar, but the wrapper is informed about a complex property or an array of complex properties. In addition, we push all child properties to stack M (line 28) to be processed as well.
Finally, using the wrapper \(W_D\) the algorithm constructs and returns the particular DDL statement. If D already contains a kind of the same name, the statement can be of type ALTER KIND, otherwise statement of type CREATE KIND is created.
Function determinePropertyName() This function returns the resulting name (or a set of names) depending on the way it was specified by the user. If the name is statically determined (user-defined, anonymous, or inherited from schema category \({\mathbf {S}}\)), it directly forms the output of the function. If the name is dynamically derived, the function acquires all values stored in the active domain of the object specified using a signature of its input morphism. The set of values forms the output of the function.
DML algorithm
Having the schema category \({\mathbf {S}}\), instance category \({\mathbf {I}}\), kind name \(name_{\kappa }\), access path \(P_\kappa \), root object \(root_{\kappa }\) and root morphism \(morph_{\kappa }\), both associated with kind \(\kappa \), and particular database wrapper \(W_D\) working over database D, the second algorithm creates a list of DML statements which store data into the schema of kind \(\kappa \) in database D, i.e., statements of type INSERT INTO KIND. If the resulting commands are sent for execution to database D, they can fill in the kind created using Algorithm 4 with data from instance category \({\mathbf {I}}\).
As we can see in Algorithm 5, we first initialize an empty name \(n_0 = \epsilon \), empty list dml, and empty stack M. The rest of the processing depends on whether \(\kappa \) has a root object or a root morphism. In the former case, we first acquire object \(q_{{\mathbf {I}}} \in {\mathbf {I}}\) corresponding to \(root_\kappa \) using functor \(Inst_{{\mathbf {I}}}\). In the next step, we get the active domain S of \(q_{{\mathbf {I}}}\). We push each \(sid \in S\) together with empty name \(n_0\) and \(P_\kappa \) to auxiliary stack M and we call function buidStatement() (see below) which creates the respective INSERT command that is then added to list dml.
In the latter case, i.e., \(\kappa \) with the root morphism, we first acquire the respective morphism \(m_{\mathbf {I}}\) using the functor \(Inst_{\mathbf {I}}\). Next, using function fetchRelations() we get a set of all pairs \((o_1, o_2)\), where \(o_1, o_2 \in {\mathcal {O}}_{\mathbf {I}}\) such that \(m_{\mathbf {I}}(o_1) = o_2\). Then we get access subpath \(t_{cod}\) of codomain \(morph_\kappa .cod\) using function getSubpathBySignature(). For each \(s \in S\) we initialize stack M with two values—one for the domain (line 22) and one for the codomain (line 23). In the former case, we use the original access path \(P_{\kappa }\) without subpath \(t_{cod}\) corresponding to the codomain. In the latter case we use the so-far unprocessed subpath \(t_{cod}\). Then we call function buildStatement() and add its result to the list dml.
Function buildStatement() As stated in Algorithm 6, function buildStatement() iteratively processes the initialized stack M until it is empty. First, the top of M is released as a triple consisting of an identifier of parent property pid, hierarchical property name \(n_p\), and respective access (sub)path t. Using function collectNameValuePairs() we acquire a set of pairs (name, value)Footnote 13 of data from \({\mathbf {I}}\) relative to pid as specified by t. Each pair (name, value) is then processed as follows: If t describes a simple property (line 11), the algorithm calls the wrapper to extend the current INSERT statement by adding value to kind \(\kappa \) as an attribute named \(n_p{++}name\). It is up to the wrapper to determine how the empty data (null) will be inserted. It is a model-dependent feature if the missing data leads to a missing property or a null metavalue. If t describes a complex property (line 15), the algorithm iterates through the set of nested properties within the complex property and for every such property it pushes to stack M the respective new triple, i.e., it moves the processing to the next level. After processing of whole stack M, the wrapper is invoked to create and return the final INSERT statement.
IC algorithm
This algorithm aims to modify the created kinds to add integrity constraints ensuring the respective identifiers and references. These parts of schema definition are the most system-specific ones; however, the proposed approach is general enough to cover all known cases. The intra-model references are propagated to the respective DBMS by the system-specific wrapper. In the case of inter-model references, the propagation differs depending on the underlying combination of systems. A traditional polystore, as well as a multi-model DBMS is considered a separate system having its single wrapper, so the system itself handles the inter-model reference. In the case of a polystore-like combination of systems, where each has its wrapper, the DBMSs (naturally) cannot handle the references, because they are not aware of each other. However, the proposed categorical framework keeps this information and, thus, the integrity constraints can be checked externally. And, in general, this external checking of integrity constrains can also be used for a single-model DBMS lacking a support for references.
The whole process is described in Algorithm 7 which extracts all primary identifiers and references related to a particular mapping m and ensures their application at the logical level of D using a command of type ALTER KIND. Its input is formed of mapping \(m \in {\mathfrak {M}}\), and respective wrapper \(W_D\). First, we process the identifier of \(\kappa \). Using function collectNames() we get ordered collection N which contains attributes of the identifier of \(\kappa \). The result is added to the wrapper \(W_D\) for system-specific processing. Note that we use names from \(m.P_\kappa \) which are, contrary to user-defined names, unique.
Next, we process the set of references by iterating through set \(m.ref_\kappa \). First, using function collectSigNamePairs() we get set O of pairs (signature, name), i.e., signature and name of referencing attributes. We get the mapping of the referenced kind \(r.name_{\kappa '}\) and similarly set R of pairs of signatures and names of referenced attributes. Function makeReferencingPairs() processes sets O and R and creates set S of pairs (referencing-name, referenced-name) which is added to wrapper \(W_D\). Finally, using function createICStatement() the respective command of type ALTER KIND is created.
Multi-model-to-multi-model migration
Having both directions of transformation, i.e., to and from the categorical representation, we can now easily perform the migration between any combination of models. Instead of mutually mapping n models, i.e., to create \(O(n^2)\) mappings, we only need to map each model to the categorical representation, i.e., to create O(n) mappings. This idea is not new; however, the categorical representation is sufficiently general that it covers all currently popular models (and probably many, if not all, coming in the future) and in particular their mutual combinations, i.e., inter-model references. Hence, we do not consider only model-to-model migration but more general multi-model-to-multi-model migrations. The level of abstraction enables us to “hide” many system-specific features, such as, e.g., different types of complex structures (e.g., arrays, maps, or lists), different types of links (e.g., foreign keys, references, or pointers), etc. At the same time, the abstract representation bears information that is not supported by particular underlying systems (e.g., the schema of schema-less systems or integrity constraints for inter-model links).
In the middle of Fig. 12, we can see a part of the schema category of the sample data. The colors represent mappings between the categorical representation and particular kinds (green for kind Order in the document model, blue for kind Customer in the graph model, yellow for kind Orders in the graph model, violet for kind Order in the graph model, and red for kind Items in the column model). For each model, we can see both the access path and the respectively highlighted part of the schema category.
For example, we may want to perform migration from the document model to a combination of the other four models. In the figure on the left, we can see the sample source (green) JSON document stored in the document model. On the bottom right we can see the target (red) column family; on the right up we can see (blue, yellow, and violet) graph data.
The migration process works as follows: Having defined all access paths, we first run the model-to-categorical transformation (see "Model-to-category transformation" section) whose result is provided in Fig. 8, i.e., we get an instance category filled with data from the underlying document DBMS (MongoDB). Next we run the categorical-to-model transformation (see "Category-to-model transformation" section). First, it creates the respective schemas (see "DDL algorithm" section). In the case of the schema-less graph model of neo4j, it does not define the structure, in the case of the column model of Cassandra it defines the schema of the table. Next, it stores the data instances in the DBMSs (see "DML algorithm" section) and in the last step it adds the respective integrity constraints (see "IC algorithm" section), namely command ALTER TABLE for Cassandra and no commands for neo4j.
All these steps were performed automatically, only with the mapping between the categorical representation and the particular DBMSs based on the idea of access paths. This is the only manual work required from the user. In addition, in the following section we introduce a user-friendly tool that enables us to specify them comfortably.