Saturday, December 10, 2022

A Database Overview - from SQL to NoSQL

 Introduction


I am writing this quick introduction to databases, based on my studies (MBA level database class, focused on database theory to implementation using Microsoft SQL Server) and professional experience (using MySQL, SQLite, MariaDB, MongoDB, Neo4J, DLT).




Relational “SQL” Databases


Conjured up in the 1970, relational database was created to address the problem of “how can businesses store large amounts of data, then easily retrieve it”.  The computer system that served to store the data was called Relational Database Management System (RDBMS). The language that allowed businesses to create and retrieve data from the RDBMS was called Structured Query Language (SQL). But raw data cannot just be simply thrown into the RDBMS database. The data first had to be analyzed, and only then can the database be designed. The database design phase involves creation of an Entity Relation Diagram (ERD) to model the business needs of the database, followed by normalization of the data to conform to the “normal form”  to reduce data redundancy.  The RDBMS had to support core basic functions such as Create, Read, Update, Delete (CRUD) methods. And for the RDBMS to ensure that data was stored accurately, it behaved in accordance with Atomic Consistency Isolation Durability (ACID). RDBMS implemented the concept of On Line Transactional Processing (OLTP) to support businesses that broadly adopted RDBMS to handle daily critical business transactions. ACID properties of a RDBMS gave banks the confidence to store highly critical data on RDBMS. And once the transactional data is stored in RDBMS, an Extraction Transform Load (ETL) process loaded the OLTP data into another database that is more geared for analysis. This other database is usually called a data warehouse or data lake. This second database enabled On Line Analytical Processing (OLAP), which was ideal for analysis and reporting. 




Big Data


Relational databases serve the business world with its ability to store transactions with its OLTP features, then provide analytics and insights with its OLAP abilities. But with the advancement of new technologies (namely broadband internet, 5G, and powerful mobile devices to use up that bandwidth), real world data no longer fit into tidy RDBMS schema. What were the major characteristics of this new data? In some studies, the requirements of this new data, the 5V framework was created needed to handle Velocity, Variety, Veracity, Volume, and Value.  Velocity = real time data. Variety = text, pictures, videos, geo-data. Veracity = Volume = social media easily creating and consuming multimedia, multiplied by millions of users. Value = real time data, geo-tagged, enriched by media can provide infinitely more insights than structured data.  




NoSQL Databases


With Big Data, a new class of database was needed - to handle the new, unpredictable format. So with NoSQL databases, a database schema no longer needs to be PREDEFINED. In addition to this, NoSQL databases are usually distributed globally to be close to where the data is used in real time. While sounding reasonable, this poses the problem of how to keep data that is spread around the globe consistent.  Whereas traditional RDBMS was focused on accuracy of data, enforced via the ACID principles, modern data requirements are different. Modern data requirements favored availability of any-even-old data, possibly at the expense of accuracy. This requirement of big data is known as Basically Available Soft state Eventually consistent (BASE), the antithesis of ACID. The Consistency-Availability-Partition Resistance (CAP) Theorem, which is a framework for trade offs required between accuracy (which ACID affords) and availability (BASE offered by NoSQL databases) helped to put a framework around which database type to use. In additional to the flexibility of NoSQL databases, NoSQL databases can be scaled horizontally (as opposed to scaling vertically in a RDBMS system). The benefit of horizontal scaling is that to add extra new compute capacity, just add new servers - instead of stopping a current server and add extra CPU/RAM/storage used in vertical scaling.




How Databases Fit In Cloud Era:


RDBMS were created during the client-server era, which means that the database ran on a server. To access the database, the client needs to connect to a server that usually was running in a room somewhere in the office. For example, to run Oracle Database 12.2, the minimum server configuration is listed here at Oracle. This is hardware that you will need to procure and install somewhere in your office. You will also need an IT person who can install, configure, manage, patch, and upgrade the database software.  Fast forward 30 years to the world of cloud computing, where you can have Oracle now manage both the hardware and software - called “Oracle Database Standard Service”, where the hardware is Oracle Cloud Infrastructure (OCI). Let’s look at another more modern example like MongoDB. MongoDB is a “new SQL” database, gives you the option to 1) MongoDB Enterprise Advanced - run locally using your own hardware, like the olden client-server days or  2) MongDB Atlas aka “MongoDB Enterprise Advanced in the Cloud”.  If you are an Amazon Web Services (AWS) user, you can visit the AWS Marketplace to subscribe to MongoDB Atlas (Pay As You Go) or use an AWS clone of MongoDB called Amazon DocumentDB



Current “Top” Players in the Database Marketplace (1):


SQL:  

  • Oracle, Oracle MySQL, Microsoft SQL Server


New SQL:

  • PostGRES, MongoDB


NoSQL:


  • Document : MongDB ( #5), CouchDB (#40), 


  • Key-Value : Redis (#6), Memcached (#33), etcd (#46), Aerospike (#70), RocksDB (#89), LevelDB (#107)


  • Wide Column: Cassandra (#11), HBase (#26)


  • Graph : Neo4j (#19)


  • Search engine : Elasticsearch (#7), Splunk (#13)





Conclusion


Databases have evolved from plan-ahead SQL RDBMS systems that run on a server located in your office - to ad-hoc NoSQL databases that run in the cloud.  Although RDBMS was invented 50 years ago, most of the world’s data still resides on them and new applications will still be designed with RDBMS. I hope that I have given a broad view here to provide a map to the world of databases.




Footnote:

  1. https://db-engines.com/en/ranking

Monday, December 5, 2022

Security According to Gartner MQs

Trying to figure out how many MQ Gartner has just for security? Me too. Here is my attempt to capture all of them in one chart.  So far, I have captured 1) Secure Service Edge 2) Application Security Testing 3) Security Information & Event Management 4) Enterprise Network Firewalls 5) Web Application and API Protection - previously Web Application Firewalls 6) Endpoint Security Platform 7) Security Awareness Computer-Based Training. Not sure if they will have a MQ for Security Orchestration, Automation and Response (SOAR), Next Generation Firewall (NGFW), etc.











Security Incident and Event Management (SIEM) Overview

Security Incident and Event Management (SIEM) Overview

Starting from the top of a cybersecurity organization, a Security Operations Center (SOC) is in charge of proactively keeping company assets safe, such data stored on a company's network. One such tool for the SOC to do their job is to use a Security Incident and Event Management (SIEM) tool. SIEM collects data from a 1) wide variety of sources in 2) wide variety of formats - to enable a real time view and broad analysis of past data. SIEM is more about collecting, indexing, tagging, contextual mining than it is about security per se. That is why Splunk (a logging collection and analytics platform) is a leader in the Gartner SIEM MQ

Sources of data to SIEM: 

  • Endpoint (EP) Security : As the agent that protects laptops, tablets, and phones an the edge of enterprise control, the End Point Agents needs to Prevent Threat, Protect Data, Secure the Network, Ensure Compliance, Respond to Incidences.
  • Firewall (FW) Security : A firewall acts as a barrier between a internal network and external networks, by controlling incoming and outgoing traffic based on predetermined security rules. Packet filtering, stateful packet inspection are some of the techniques used.  
  • Intrusion Detection System (IDS) & Intrusion Prevention System (IPS) : An IDS monitors network traffic for suspicious activity and potential threats, alerting administrators when such activities are detected. An IPS takes IDS a step further and ACTS to prevent or mitigate further damage.

Types of data sent or accessible to SIEM:

  • events : a simple record of what happened in the network - can be just a time stamp, source identifiers, and a code
  • messages : a bigger, verbose record of what happened in the network - usually containing contextual information to help with forensic investigation
  • logs : a record of systems happenings, such as events and messages

Types of actions from SIEM:

  • alert
  • analysis
  • reporting : start with alerts, and after an analsis


At the bottom, I have listed Security Orchestration, Automation and Response (SOAR) as a receiver of SIEM alerts, and SOAR can provide automated responses to the alerts. SIEM alerts can also be fed to User Entity and Behavioral Analysis (UEBA) to leverage the power of AI/ML to find threats.

Security Landscape

The term "Security", especially in the Cybersecurity context, is broad and confusing. Terms like "NGFW" and "UEBA" are thrown around - sometimes without context. So here I lay out the security landscape, but with focus on Network Security - the old guard and hence has the highest market share of roughly 40%. In this drawing, I will try to show Network Security, and how it is peers to other important security measures such as Application Security and Endpoint Security, and where "NGFW" and "UEBA" fit.



Note that some cybersecurity practioners might categorize IDS and IPS as part of a firewall - and that is completely ok. Firewalls are primarily about controlling access to networks. IDS focuses on detecting and alerting on suspicious activities. IPS combines detection with proactive blocking capabilities to prevent attacks. But for this blog, I purposely broke them apart because terms like "Gen 2 Firewall" and Gen 3 aka NG Firewall" and thrown around, and I wanted this diagram to make that easier to find.