Thursday, March 29, 2018

Big Data (Hadoop) On Windows 8

Are you interested in playing with BigData on your on Windows 8 laptop?

A quick recap of what is BigData & why:

Why BigData:

All business systems need data. Customer names. Transaction history. Time of punch in. Data is stored in databases, more specifically, relational databases. Data in a relational databases are stored in pre-defined tables. The pre-defined tables dictate what the database can hold. In the example of a customer address database, the typical fields are name, address, phone number. This pre-defined table is also called a data schema.

There is an entire art to how data schema should be defined. This was called the Entity Relation modeling (ER). Several techniques were published, including Crow's foot notation, to help create this.  In addition, database bases were normalized for optimal efficiencies. Databases were graded on how normalized the database is - using the 1st/2nd/3rd/4th/5th/6th normal form as a metric. Examples of database normalization include no duplicate data, only one entry per column (so no arrays), unique indexing, etc.

This type of pre-defined data schema worked for multiple decades - until the advent of the internet and the myriads of new types of data : text, e-mail, audio, video, images, social media. Also scale was an issue. Databases were being asked to a lot than its pre-bigdata version, driven by the 4 Vs of bigdata :  variety, velocity, voracity, volume.

Most of us (in this field) probably have experimented with relational databases such as MySQL, Oracle Database, Microsoft SQL Server, etc. What to do if you want to try out BigData at home? Follow me through these steps!


Here is a high level diagram:

1.  CDH - Cloudera Distribution including Hadoop is an open source Hadoop

2. Redis


No comments :

Post a Comment