Data Models

In the world of databases, several data models have emerged over the years. Each data model has its own strengths and weaknesses, and they are all designed to cater to different requirements. Let's take a look at the most popular data models in this blog.

But before we study different data models, What is a Data model?

A data model is a conceptual representation of data that describes the structure, relationships, constraints, and other characteristics of the data. Common types of data models include hierarchical data model, network data model, relational data model, document data model, and graph models, each with its features and limitations depending on the use case.

classification of data models

Why bother?

When designing applications we need to store data in databases. Which database to use highly depends on which data model it is based on(among other factors). Hence, having a high-level understanding of all the data models will assist us to make better decisions for our applications.

Hierarchical Data Model

The hierarchical data model was the first data model that appeared in the 1970s. The most popular database for business data processing in the 1970s was IBM's IMS (Information Management System), which was based on the hierarchical data model.

heirarchial data model

In this model, there are different child nodes of a parent node and each child node has further nodes and so on(analogous to the HTML format). This way a hierarchical relationship between different data fields is created.

However, the limitation of this model was that it did not allow many-to-many relationships as one child node could have only one parent node. This limitation made it difficult to model complex relationships. Consequently, the network and relational data models were introduced, which provided more flexibility in modeling data relationships.

Network Data Model

In the network data model, also known as the CODASYL model, one child node could have multiple parent nodes which enabled many-to-many relationships among nodes.

network data model

However, the problem with this model was that of navigation. Every field was connected using pointers, and to access a field, one had to follow the pointers starting from the root. This was called an “access path” and traversing it was like traversing through n-dimensional data space.

Although the hierarchical and network data models are not widely used today, they were worth discussing because of their significance in the history of databases and had a considerable influence on the development of modern data models.

Relational Data Model

The relational data model was introduced in the 1970s and was initially just a theory, doubted by many. However, it soon gained widespread adoption. Its main aim was to hide the inner complexities and provide an easier user interface.

In this model, there are different relations stored in a database. Each relation is a collection of records and each record is a collection of data fields in it. All the columns are called attributes and all the rows are called tuples.

relational data model: students

For example, in the above relation, (ID, Name, Marks) are called attributes. There are 4 tuples here each having a data field for each attribute.

MySQL and PostgreSQL are popular relational database management systems among others. SQL is the primary query language used to interact with the relations.

The main advantage of this model is that there are no pointers(unlike the network data model), and records are fetched directly by giving some condition or asking for the particular primary key that the record has. The following is a SQL query to fetch the records from the above relation using a condition.

select * from students where marks>90;

Another key feature of this data model is that every relation has a predefined schema. And all the records have to abide by that schema.

With the expansion of the internet and its user base, different needs emerged, and the relational data model turned out to be useful in all of them. Today, it is still the most used data model in the world.

NoSQL

NoSQL, which stands for “Not Only SQL”, is now a widely accepted model. It was started as a hashtag on Twitter because of the need for greater scalability and high write throughput, more flexible schemas, open-source preference, and more specialized queries.

NoSQL can further be classified into two types: The document data model and the Graph data model.

Document Data Model

In the document data model, every database has documents inside it. Every document further has collections. The key feature is that every collection in this model is an independent unit. That means that they all can have different schemas. Also, we don’t need predefined schema here. However, the user can set optional schema validation constraints if required. Hence, the Document data model provides great schema flexibility. Here is an example of a document that represents the same data as the 'students' relation.

[ { "ID": "01", "Name": "Elena", "Marks": 87 }, { "ID": "02", "Name": "Ian", "Marks": 96 }, { "ID": "03", "Name": "Matt", "Marks": 79 }, { "ID": "04", "Name": "Stephan", "Marks": 93 } ]

document data model

The limitation of this data model though is that the joins are weak in it. Hence, it is not good for many-to-many or many-to-one relationships as there is no referential integrity.

The popular JSON, XML and MongoDB's BSON are all based on the document data model only.

Graph Data Model

As the term suggests, the data is represented in the form of graphs in this model. The graph data models are further classified into property data models and triple-store models.

The property data model uses edges to show the relationship and a vertex for storing objects. In the triple-store model, there is a subject, a predicate, and an object.

property data model

Here, the line drawn between the vertexes and labeled as works at is an edge. This edge is connecting the two vertexes(labeled as "Mary Smith" and "XYZ corp").

triple-store data model

Here, "Person" is subject, "XYZ" is object and "Account manager" is predicate showing relationship between the former two.

When multiple data fields are connected (called many-to-many relationships) then the graph data model is the most natural to use.

Conclusion

To summarize, the hierarchical data model was the first to be introduced, and it was later followed by the network and relational data models. However, the network model was not widely adopted and eventually became obsolete, while the relational model continues to be the most commonly used data model today. As technology and data management needs evolved, the NoSQL data model emerged to address specific requirements.

Over time, the differences between various data models have been decreasing, and they seem to be converging. For instance, the Relational Data Model (RDM) can now be converted into formats like JSON and XML. The downtime required by RDM during schema change has become more manageable, and the joins, which were previously a weakness in document-based data models, are now being approached using RDM methodology. This indicates that the limitations of different models are decreasing with time, which is a positive development.