Tuesday, February 28, 2017

Intro To Google Cloud Compute Via Sudoku Solver

Solving Sudoku With Google Cloud Compute


Introduction 

Sudoku is popular puzzle game, available in puzzle books, newspapers, and even as a mobile app. I recently downloaded a Sudoku app and spent hours playing it. As an average player, I sometimes become frustrated that I cannot solve it quickly enough and have a strong desire just  to solve the puzzle immediately. Using Google Cloud Compute, I was able to click a few buttons, type a few commands, and see an instant solution. 




Cloud Infrastructure (IaaS) Market

The cloud infrastructure market has become a money maker for Amazon, where its AWS division racked up $3.53B in just Q4 of 2016 alone (1). Gartner estimated that the IaaS market will be $22B for 2016 (2). Gartner ranks Amazon as #1 in cloud infrastructure, followed by Microsoft and Google (3). Google recognizes that is is #3 and is actively trying to catch up. They hired a VMWare veteran to do this (4).


Google Cloud Solve Sudoku Puzzle

Wishing to solve the above Sudoku puzzle, I found an open source Sudoku solver from Bob Carpenter (5). It is written in Java. To use his Sudoku solver, I need a computer to compile and run his Java code. I have decided to give Google Cloud a shot.



10 Easy Steps To Using Google Cloud Compute

1. Apply For A Google Cloud Compute Account. For Some, You Will Be Offered A $300 Credit To Try It Out. Google Promises That They Won't Charge Without Your Permission. 



2. Log Into Your Google Cloud Platform Console. 




3. Create A Google Cloud Project Using The Browser. The Project Is Called "projsoduko9x9" It Will Solve A 9x9 Sudoku Matrix.








4. Spool Up A  Google Cloud Compute Engine Using The Web GUI.




5 . Active The Google Cloud Shell. Once Activated, You Will Be In The Project Shell. This Is Where You Will Create Instances Of VMs.




6. Then Use The Google Cloud Shell (Still In The Web GUI)  To Provision A Virtual Machine Instance In The Project. I Have Named My Instance "instsudoku9x9".




7. Using Google Cloud Shell (CLI) To List VM Instances And SSH Into "instsudoku9x9".



6. Copy And Paste Carpenter's Apache Sudoku Java Source File (5) Into The VM Instance. I Did This By Using "vi" Text Editor To Edit A New File "Sodoku.java", Then Pasting The Source Code. 






7. Install Java SDK Using Ubuntu Apt-Get. 



8. Java Compile (javac) And Run (java) Sudoku Solver. If You Look At The Unsolved Sudoku From The Above, Row One Already Has 8,3,7,4,6. The Program Accepts A (Row, Column, Value) Notation, And Hence 008, 023, 037, 074, and 086 Are The Inputs. You Can Repeat This For Rows 1-8. Note : I Had A Typo In The File Name. Was "Soduko.java", Now Corrected To "Sudoku.java".





9. Delete VM Instance Using Google Cloud Shell. The Shell Is Somewhat User Friendly - I Missed Typed A Command Argument And The Shell Recommended The Right Command.





10. Delete Google Cloud Project "projsudoku9x9".





Conclusion

We had a problem to solve : a Sudoku puzzle. To solve the puzzle, we need compute resources. Google Cloud  gives us instant resources via their infrastructure (IaaS). We were able to compile a Sudoku solver, input the puzzle state, and have the solver spit out a solution. 


References

(1)http://venturebeat.com/2017/02/02/aws-posts-3-53-billion-in-revenue-in-q4-2016-up-47-from-last-year)
(2)http://www.gartner.com/newsroom/id/3188817
(3)https://thenextweb.com/offers/2016/03/11/amazon-web-services-dominates-cloud-services-market/#.tnw_J0QPissD
(4)https://www.forbes.com/sites/alexkonrad/2015/11/30/what-diane-greene-lessons-at-vmware-tells-us-about-google-cloud/#6aa1ed0b120d
(5)https://bob-carpenter.github.io/games/sudoku/java_sudoku.html

Tuesday, February 14, 2017

RDBMS to Big Data Hadoop Via Cloudera




Consumer Electronics Database

A manufacture of consumer electronics (local - so safe from Donald) was concerned about his supply chain. Why? As with most supply chain systems, there are multiple factors to risk. Geo-political risk is usually one of the factors - to track risk to supply chain from other regions (riots, couplet,Coup d'état, etc).   But now with the new White House administration, who would have thought that a "far away" geo-political risk factor originates from us - Donald potential blocking heavily taxing parts imports. The company needed to track all parts, its suppliers, and the region of the suppliers.


Graduate To  RDBMS From Excel

They had a list of their suppliers stored in an Excel spread sheet - first on a local disk drive, then later "upgraded" to cloud (Google & Microsoft 365). But soon the need to write extensive queries outweighed what pivot tables, sorts, filters, and macros can do.  The decision was made to implement the XLS into a RDBMS.  The first shot was done using Oracle MySQL  Workbench CE running on Windows 10. The data cleaning and ingestion was done via Python (another topic). Once ingested into RDBMS, standard MySQL queries can be used. Here is an example:


An supplier database on a RDBMS running on W10


Big Data = Volume. Velocity. Variety.

The CEO was happy - he finally graduated from a spreadsheet to  RDBMS.  But he also wanted to deploy it as "Big Data". The list of supplier (and parts) will grow as we start to migrate other product lines into the RDBMS. He also wanted to use all the goodies of Big Data - deployment in the cloud, advanced analytics, different data types (like pictures). Volume. Velocity. Variety.


Cloudera - First Dip Into "Big Data" Apache Hadoop

So using the same trusty W10 machine, I decided to prototype a Big Data for the supplier RDBMS. Cloudera offers a VM (in multiple flavors) and Container as a prototyping vehicle. So after installing Oracle VirtualBox  (I have better luck with it over VMWare), my own Cloudera was running.

Cloudera VM (Guest OS'd on CentOS), Running on W10 VirtualBox



Cloudera Allows Steps Of Migration 

The neat thing about the Cloudera setup is that RDBMS is already setup. So you can check out your RDBMS in the Cloudera VM before you migrate to Hadoop.

MySQL In The Cloudera VM

Apache Sqoop : RDBMS -> Hadoop

Once I had confidence that the RDBMS setup was good in the Cloudera VM, I started the unknown path of converting it to Hadoop. Apache Sqoop supports this endeavor - but Cloudera made it super easy.

A little CShell Script To Convert RDBMS Into Hadoop


Once the script is launched, the map reduce takes over for hours. You can look at the progress using a web browser.

Using A Web Browser To Check On Apache Sqoop


In Big Data Land!

After the process, we can now use Cloudera Hue (its version of Apache Hive) to reuse many of the MySQL queries.

The Supplier Database, Originally In RDBMS, Is Now In "Big Data"!


Conclusion:

The steps from RDBMS to Hadoop is manageable if taken in baby steps. Cloudera's environment makes that easy to do. For my next task, I will create clusters.