Introduction
I am writing this quick introduction to databases, based on my studies (MBA level database class, focused on database theory to implementation using Microsoft SQL Server) and professional experience (using MySQL, SQLite, MariaDB, MongoDB, Neo4J, DLT).
Relational “SQL” Databases
Conjured up in the 1970, relational database was created to address the problem of “how can businesses store large amounts of data, then easily retrieve it”. The computer system that served to store the data was called Relational Database Management System (RDBMS). The language that allowed businesses to create and retrieve data from the RDBMS was called Structured Query Language (SQL). But raw data cannot just be simply thrown into the RDBMS database. The data first had to be analyzed, and only then can the database be designed. The database design phase involves creation of an Entity Relation Diagram (ERD) to model the business needs of the database, followed by normalization of the data to conform to the “normal form” to reduce data redundancy. The RDBMS had to support core basic functions such as Create, Read, Update, Delete (CRUD) methods. And for the RDBMS to ensure that data was stored accurately, it behaved in accordance with Atomic Consistency Isolation Durability (ACID). RDBMS implemented the concept of On Line Transactional Processing (OLTP) to support businesses that broadly adopted RDBMS to handle daily critical business transactions. ACID properties of a RDBMS gave banks the confidence to store highly critical data on RDBMS. And once the transactional data is stored in RDBMS, an Extraction Transform Load (ETL) process loaded the OLTP data into another database that is more geared for analysis. This other database is usually called a data warehouse or data lake. This second database enabled On Line Analytical Processing (OLAP), which was ideal for analysis and reporting.
Big Data
Relational databases serve the business world with its ability to store transactions with its OLTP features, then provide analytics and insights with its OLAP abilities. But with the advancement of new technologies (namely broadband internet, 5G, and powerful mobile devices to use up that bandwidth), real world data no longer fit into tidy RDBMS schema. What were the major characteristics of this new data? In some studies, the requirements of this new data, the 5V framework was created needed to handle Velocity, Variety, Veracity, Volume, and Value. Velocity = real time data. Variety = text, pictures, videos, geo-data. Veracity = Volume = social media easily creating and consuming multimedia, multiplied by millions of users. Value = real time data, geo-tagged, enriched by media can provide infinitely more insights than structured data.
NoSQL Databases
With Big Data, a new class of database was needed - to handle the new, unpredictable format. So with NoSQL databases, a database schema no longer needs to be PREDEFINED. In addition to this, NoSQL databases are usually distributed globally to be close to where the data is used in real time. While sounding reasonable, this poses the problem of how to keep data that is spread around the globe consistent. Whereas traditional RDBMS was focused on accuracy of data, enforced via the ACID principles, modern data requirements are different. Modern data requirements favored availability of any-even-old data, possibly at the expense of accuracy. This requirement of big data is known as Basically Available Soft state Eventually consistent (BASE), the antithesis of ACID. The Consistency-Availability-Partition Resistance (CAP) Theorem, which is a framework for trade offs required between accuracy (which ACID affords) and availability (BASE offered by NoSQL databases) helped to put a framework around which database type to use. In additional to the flexibility of NoSQL databases, NoSQL databases can be scaled horizontally (as opposed to scaling vertically in a RDBMS system). The benefit of horizontal scaling is that to add extra new compute capacity, just add new servers - instead of stopping a current server and add extra CPU/RAM/storage used in vertical scaling.
How Databases Fit In Cloud Era:
RDBMS were created during the client-server era, which means that the database ran on a server. To access the database, the client needs to connect to a server that usually was running in a room somewhere in the office. For example, to run Oracle Database 12.2, the minimum server configuration is listed here at Oracle. This is hardware that you will need to procure and install somewhere in your office. You will also need an IT person who can install, configure, manage, patch, and upgrade the database software. Fast forward 30 years to the world of cloud computing, where you can have Oracle now manage both the hardware and software - called “Oracle Database Standard Service”, where the hardware is Oracle Cloud Infrastructure (OCI). Let’s look at another more modern example like MongoDB. MongoDB is a “new SQL” database, gives you the option to 1) MongoDB Enterprise Advanced - run locally using your own hardware, like the olden client-server days or 2) MongDB Atlas aka “MongoDB Enterprise Advanced in the Cloud”. If you are an Amazon Web Services (AWS) user, you can visit the AWS Marketplace to subscribe to MongoDB Atlas (Pay As You Go) or use an AWS clone of MongoDB called Amazon DocumentDB.
Current “Top” Players in the Database Marketplace (1):
SQL:
Oracle, Oracle MySQL, Microsoft SQL Server
New SQL:
PostGRES, MongoDB
NoSQL:
Document : MongDB ( #5), CouchDB (#40),
Key-Value : Redis (#6), Memcached (#33), etcd (#46), Aerospike (#70), RocksDB (#89), LevelDB (#107)
Wide Column: Cassandra (#11), HBase (#26)
Graph : Neo4j (#19)
Search engine : Elasticsearch (#7), Splunk (#13)
Conclusion
Databases have evolved from plan-ahead SQL RDBMS systems that run on a server located in your office - to ad-hoc NoSQL databases that run in the cloud. Although RDBMS was invented 50 years ago, most of the world’s data still resides on them and new applications will still be designed with RDBMS. I hope that I have given a broad view here to provide a map to the world of databases.
Footnote:
https://db-engines.com/en/ranking