Data Modeling

In this section we’ll:

Summary of what we’ll learn:

RDBMS


A relational database organizes data into tables with predefined schemas and relationships among tables. In a relational model, you avoid duplicating any data so that the data is stored once and updated in one location. Wherever and whenever the data is needed, the data is referenced. That ’s the relational aspect of the data.

Books Catalog

Books

BookID Title AuthorID ISBN Published Year
1 The Great Gatsby 1 978-0743273565 1925
2 To Kill a Mockingbird 2 978-0061120084 1960
3 1984 3 978-0451524935 1949
4 Pride and Prejudice 4 978-0141439518 1813
5 The Hobbit 5 978-0618260300 1937

Authors

AuthorID Name
1 F. Scott Fitzgerald
2 Harper Lee
3 George Orwell
4 Jane Austen
5 J.R.R. Tolkien

Genres

GenreID Genre Name
1 Fiction
2 Classic
3 Dystopian
4 Romance
5 Fantasy

BookGenres

BookID GenreID
1 1
1 2
2 1
3 1
3 3
4 1
4 4
5 1
5 5

Complexity of Querying

Let’s say we want all the book details at once

You will structure your SQL query by performing a JOIN among those tables:

The number of joins, use of the GROUP_CONCAT function, and the GROUP BY function make this query complex. Here’s how the query breaks down.

  • The SELECT command obtains the book title, author name, and a concatenated list of genres for each book.
  • The JOIN command joins the Books table with the Authors table on the AuthorID field to get the author’s name.
  • The JOIN command joins the Books table with the BookGenres table on the BookID field to associate books with genres.
  • The JOIN command joins the BookGenres table with the Genres table on the GenreID field to get genre names.
  • The query uses the GROUP_CONCAT function (the exact function may vary depending on your SQL database system; GROUP_CONCAT is used in MySQL and STRING_AGG in PostgreSQL) to concatenate multiple genre names into a single string.
  • The query uses GROUP BY to display the results by book title and author name
SELECT   Books.Title AS BookTitle,
         Authors.AuthorName AS Author,
         GROUP_CONCAT(Genres.GenreName) AS Genres
FROM     Books
         JOIN    Authors
         ON      Books.AuthorID = Authors.AuthorID
        
         JOIN    BookGenres
         ON      Books.BookID = BookGenres.BookID
        
         JOIN    Genres
         ON      BookGenres.GenreID = Genres.GenreID
GROUP BY Books.Title,
         Authors.AuthorName;

OUTPUT:

BookTitle Author Genres
1984 George Orwell Fiction, Dystopian
Pride and Prejudice Jane Austen Fiction, Romance
The Great Gatsby F. Scott Fitzgerald Fiction, Classic
The Hobbit J.R.R. Tolkien Fiction, Fantasy
To Kill a Mockingbird Harper Lee Fiction

Data Duplication

Data duplication is a common practice and is known as “denormalization.” Data duplication can improve read performance by avoiding complex joins and queries. However, you will face challenges with data consistency and increased storage requirements.
For example, imagine that the author, J.R.R. Tolkien, wants to be known by his full name, John Ronald Reuel Tolkien, and you need to make this change in a relational database and a document database. How would you implement this change in these two different types of databases?

Relational database Document database
This request requires only one change in the Authors table in a relational database. Since the author has written 12 books in a document database, you must update this information in 12 documents, which is a small price to pay, as this is a rare event.

Schema Modifications

Changing the schema in a database can be necessary to accommodate evolving application requirements. However, a significant difference exists in how a relational and document database enables schema changes.

  • In relational databases, changing the schema typically involves modifying existing tables, adding new tables, or altering relationships between tables.
  • After making schema changes, you might need to migrate existing data to match the new schema, which can involve data transformation and migration scripts.
ALTER TABLE table_nameADD column_name data_type;

NoSQL


On the other hand, an excellent document-based design starts with using that data.

Books Catalog

In contrast, a document database is simplified and displayed as a single document with all the required information. Let’s use the example of library books and data organization from above:

{“_id”: 1,”title”: “The Great Gatsby”,”author”: “F. Scott Fitzgerald,””isbn”: “978-0743273565”,”published_year”: 1925,”genres”: [“Fiction”, “Classic”]}

Querying

  • When creating a document database query, using MongoDB
  • you specify empty brackets as the first argument denoting you want to see all documents
  • the second argument, called projection, allows you to choose which fields to present to the client information
  • you won ’t always need to show all of the fields all of the time and
  • you’ll notice the simplicity of this query, as it didn ’t need any joins with other collections
  • the data you need to fulfill the request is already in the document and does not require any joins with other collections.
Db.books.find({}, { title:1, author: 1, genres: 1 })

Schema Modifications

In document databases like MongoDB, the schema is typically more flexible, and you can often add or remove fields to documents (individual documents, as there is generally no collection-wide enforced schema) without a predefined structure.

Db.books.update ({ _id: 1 },{$set: {Newfield: “Some value”}});