Wednesday, October 16, 2024

Home Storage : Direct Attached Storage, Network Attached Storage, Over Firewire, USB-SATA, Ethernet

Storage has sort of been a interest of mine. Not only did i use it for primary storage, but also for backup, and in some cases, to boot an OS.  As I dug through my boxes of drives, I became amazed at the changes of storage through the years.  I have owned several different types of hosts (Windows, Mac, Linux), connector (Firewire, USB, SATA, IDE), and media (3.5" HDD, 2.5" HDD, 2.5" SSD). Here is a quick walk through of some of my setup:



1) Firewire to iMac HDD : on the "Intel Aluminum" iMacs, the Fireware port was supposedly a cool thing to have. So I bought a Firewire enclosure to fit a 3.5" HDD. Worked great, but later on found that USB drives were ok too.

2) 3.5" SATA HDD : this might have been one of my first SATA drives that pivoted away from messy IDE.

3) USB DAS : I used this Direct Attached Storage as a desktop unit to backup my Windows laptop. Its slick shape made it look appealing on my desk.

4) NAS : this was my first Network Attached Storage, configured to RAID 0 because I really needed space (so there is no redundancy of data - bad!). This particular model also can host an Apache Web Server with a MySQL database.

5) 2.5" SATA SSD : this small for factor was idea for laptops and NUCs. I found this cheaper than to buy a "proper" flash USB drive. Had to buy a SATA to USB connector to use the SSD as plug in storage.

6) Boot macOS from 2.5" SATA SSD : I wanted to have a sandbox macOS environment, and so I installed macOS on to the external 2.5" SATA SSD and boot my Macbook from it.

Friday, January 12, 2024

Introducing Dr Data 2024

A data professional is an individual with expertise in managing and analyzing data to derive meaningful insights. They work with various data sources, databases, and tools to collect, organize, and interpret information. Data professionals may have skills in data cleaning, transformation, and visualization, as well as statistical analysis and machine learning. They play a crucial role in helping organizations make informed decisions based on data-driven evidence. (1)

Because the data profession is such a deep field, the major includes :

  • enterprise storage engineer
  • data engineer
  • database administrator
  • data analyst
  • data scientist
  • machine learning engineer

 


 


Each title in its own right is a full profession. In future blogs, I will cover in detail the responsibilites of each profession - as told by Dr Data 2024!


(1) ChatGPT "User give a summary of a data professional"

(2) DALL-E "mad scientist, name tag is "Dr Data", surrounded by data, cartoon style"

Wednesday, December 13, 2023

VR, AR and MR Path to Adoption (2023)

Introduction

Virtual Reality (VR)  has found moderate success in consumer applications, such as gaming. Major gaming manufacturers including Sony, Nintendo, and Microsoft, all have dabbled in VR to a limited success. For example, Nintendo hastily released its Virtual Boy back in 1995. Due to a combination of poor design and  headaches from use, it was unceremoniously pulled off the market in 1996. Note : I was one of the few who benefited from this and was able to snag one from Target's clearance shelf.  Augmented Reality (AR) early usage was in Heads Up Display (HUD) in aircraft to overlay important information in front of the pilot, in car information system, and now consumer and industrial use. Mixed Reality (MR) is now helping to drive the future of VR and AR with  product releases such as Meta Oculus Quest family of MR googles, Microsoft Hololens Mixed Reality googles adoption by the U.S. Army Integrated Visual Augmentation System (IVAS), and awaiting Apple to finally launch its Vision Pro to validate the MR market.

   

Quick Definitions

Virtual Reality (VR) : You buy special goggles (usually called VR goggles - which does not have a camera to pass outside world in), put it on, and everything you see in the goggles is not real (virtual). Augmented Reality (AR): You use an every day device (say mobile phone), aim its camera at a thing (say a building), and additional information about the building is overlaid on to screen of the mobile phone. Mixed Reality (MR):  You buy special goggles (usually called MR goggles - which have cameras to pass outside world in), put it on, and and you see both real world and virtual world visuals.  This is where all the action is, with : Meta Quest (version 3 launched October 2023), Microsoft HoloLens (version 2),  Apple Vision Pro (not yet released, but website is up).

 

Enterprise Adoption

Fast forward to 2018. I am at Mobile World Congress (MWC) in Barcelona to explore how emerging technologies, including VR and AR, is making inroads into industrial applications. Here at the show, AR seem to have gained a stronger traction over VR. VR is a bit "intrusive" in its use model - usually requires donning special glasses or goggles - and if the Nintendo Virtual Boy is of any predictor, adoption might be resisted. AR, on the other hand, is less intrusive - usually only requires holding up a tablet (such as an iPad) or mobile phone (such as iPhone) towards a "re-conditioned" environment.






Gaming Adoption

In spite of early failures, gaming manufacture continue to power through early design challenges to try to  bring VR back to dizzy gamers. SonyVR, Microsoft Hololens 2, and new challengers such as Meta Quest VR Headsets are not giving up the fight to make VR right. In fact, Meta's vision for VR is not squarely targeted at gamers. Rather, it is targeted at those who want to be transported to an alternative reality called The Metaverse.  Meta has already sunk over $20 billion in 2021-2022 and will continue to invest in it - in hardware, software, and content. As an Oculus Quest owner, I have already enjoyed transporting myself to space, looking out at a serene and calm environment - away from the polluted and contentious planet that I want to escape from. 






Daily Life Adoption

AR has now been quietly gaining traction as a utility to improve daily life. In a recent trip to The Hague, I exited the metro and had no idea where to go.  On my moderately new iPhone (Android has this feature too), I have already installed the Google Map app. Google Map gave me the option of using my iPhone to augment information on top of what my camera sees. So from the metro station, I pointed my phone at a nearby building, and Google Maps was able to identify the building, and it proceeded to guide me on where to walk to. Digging into the technology a bit - Apple has made combining maps and camera data easier via its ARKit software libraries.





E-Commerce Adoption

With the mobile phone as the primary way for consumer to discover, learn, and buy products, AR is now enabling the consumer to explore, visualize, and imagine the product more immersive. This picture was taken at MWC. But if you are a Amazon Prime subscriber, you probably have seen AR features to overlay products (such as a lamp) onto your living room.


Keys To Adoption : Content

As a semi-avid gamer for decades, I have owned almost every major gaming console from Atari, Sega, Nintendo, Sony,  Microsoft, and Meta. And from a casual observation, I can say that games sells consoles. That is why every gaming console has a must have launch title  - such as Microsoft's Halo and Nintendo's Legend of Zelda.  Looking at VR, AR, and MR, it will be games and content and a killer app that drives adoption.  In Oct 2023, wedged between the launch for Meta's Quest 3 and the 2023 holiday season, NPR writes Meta Quest 3 review: powerful augmented reality lacks the games to back it up.  And Microsoft's long and contentious $70B acquisition of Activision highlights this phenomenon - Bill Gates was right 20+ years ago when he wrote the "Content is King" essay.

 



How to Scale Contention for VR, AR, MR

Make creation of  virtual worlds easier by further leveraging tools that have already been used to do this such as Unity3D or Unreal Engine. Enable easier 3-D modeling of real world assets, perhaps open source versions of builder tools such as Blender,  Maya (Autodesk), or 3ds Max. Lower  Write code to implement the functionality of your VR experience. This includes user input handling, interaction mechanics, and any custom features.  Use the chosen game engine's scripting language (e.g., C# in Unity, C++ in Unreal

 

Conclusion

MR, AR, and VR is continuing to refine the future of human to machine engagement. It has had a rough start - early failures of Nintendo's Virtual Boy,  soldiers complaining of headaches with Microsoft Hololens,  the snail pace transition to the Metaverse.  But if the gaming industry (and its tools) has anything to teach us - it is that Content is King.

Friday, November 3, 2023

Path from VM to Containers

Business require software to run their operations. These software were written using a software pattern called Model-View-Controller (MVC), bundled into a single software package called monolith (because everything that the software needed to do was bundled into that one software package). That software run on-premise hardware as Virtual Machines. Because those virtual machines ran on on-premise hardware, the business needed to pay for an IT team, plus give them a budget to procure/install/configure/maintain/patch/upgrade/backup. If the software required more compute or storage (say during the holidays when orders are streaming in), IT was summoned to scale up and follow the aforementioned long process - which was as bottleneck. Cloud solved the IT bottleneck issue, where the need to own your own on-premise hardware and the IT team was eliminated. If that need that software to be able to scale up during the holidays, and be accessible from around the planet, while remaining resilient. Also if one of the MVC components of the software failed (which is 8/9 of the combinations), the software dies. Virtual Machines could not keep up with the requirements to scale and be resilient. Problems with monolithic software that ran on virtual machines gave rise to micro-services that ran on Kubernetes containers. Micro-services broke the single monolithic software package into micro pieces, so that when one piece fails, the others run to keep the software running just enough. And microservices did not need the overhead of virtual machines that were designed to run huge monoliths, so containers were invented. Kubernetes is just management software that track containers.

Monday, July 24, 2023

Seeing New Insights From Data - Thanks to Data Lakes



Geo-Data + Social = New Business Intelligence & Insights


Whereas much of the enterprise business analysis has been focused on RDBMS and Big Data sales transactions to test hypothesis and create reports, the adoption of geo-data on business transactions is an area of huge opportunity. Several companies and industries have already adopted geo-data and are reaping financial benefits. For example, UPS is using geo-data to optimize truck delivery routes, aiming for as many right turns at traffic intersections as possible. This will result in an anticipated $50M saving per year. 




Enterprise Insights Exploration of Geo-Data + Social







If you looked at your favorite social media apps, you will find that they want to track your location. These apps take your location—combined with what you are doing, how you feel, who you are with, and why you are there—provide invaluable and difficult to obtain insights about you. For example, if on January 21, 2017, between 2PM-8PM, you were at location 37.79° N, 122.39° W, and you tweeted that you were feeling happy and civic, you were probably part of the Women’s March in San Francisco. Hence, a certain marketing profile can be built up on you for target marketing.




Enterprise Insights   Exploration Hampered by a Lack of Data Diversity




A business analyst, seeing the value of geo-data, wants to perform an ad-hoc query. She has data from Women’s March with an estimated 4 million marchers nationwide. She can query who was at the start location of the Washington D.C. March (38.88° N, 77.01° W), at the starting time (1:15 PM EST), and Tweeted or Liked positively. This is the profile of a enthusiastic, conscientious person. The analyst can also query who was at the end location of the March (38.89° N, -77.03° W), but at the starting time of the March— perhaps a supporter or reporter.  Acting on the speed of thought, the analyst wants access to billions of rows of data, to draw a perimeter of the map to localize around the start of the March, focus on the start time, and filter by contextual data. And after that, try again with another set of criteria so that she can constantly refine her hypothesis to reach a conclusion.  But currently, each click will cause minutes or even hours of calculations before results are seen. This is due to the nature of CPUs – limited number of cores, memory speed, and the types of instructions it excels at.





Augmenting IT Data with  OT Data

Querying a database requires processing cores and fast memory. CPU based servers is limited up to only 22 processing cores and fairly fast memory. CPUs need to be clustered together to be able to serve the queries of billions of rows of data. Another type of processor, called GPU, has thousands of cores and very fast memory. The cores in the GPU process data in parallel and pass data extremely fast to memory. GPUs are so powerful that a single GPU server can sometimes replace multiple clusters of CPU servers. GPU can save money, reduce labor, lower energy consumption, and reduce space over CPU.








Whilst GPU is a great match for looking through billions of records in milliseconds, a database optimized for GPU is needed. That’s where G-DB comes in. G-DB offers two synergistic products – G-DB Server and G-DB Visual. G-DB Query is a GPU optimized database. It is an in-memory, columnar data highly optimized to harness the power of thousands of cores in the GPU.  Every SQL query that you submitted is broken down and re-targeted to run in parallel on thousands of GPU cores. That’s how we are able to return queries on billions of rows in milliseconds.  But the magic doesn’t stop there. Synergistically, GPU is also ultrafast at drawing the output of the query results. This is where G-DB Visual comes in. It renders the results of your queries immediately – so that you can use your eyes to help you brains to discover insights immediately.



Conclusion

Transaction, geo-data, and social media combined will enable insights into people not possible before.  Processing billions of rows of this type of data will be slow and/or expensive on a CPU based system, making this valuable data inaccessible. But GPU based systems, like G-DB, can handle this type and size of data with ease. With G-DB, not only can you gain insights at the speed of thought, you have ultrafast high fidelity visuals to match.

Monday, March 20, 2023

AWS Cloud Storage (S3, EBS, EFS) Explained in One Picture - Hopefully

AWS offers multiple cloud services for storage : 1. Simple Storage Service (S3) 2. Elastic Block Storage (EBS) 3). Elastic File Service (EFS) and more. If you have been reading various documents and comparisons and still confused, I wrote this to hopefully  clear things up a tad.


  • S3 (Simple Storage Service - object):  if you have big files, want to share it EXTERNAL TO AWS with many others in the outside world, can use HTTP to access, does NOT need to be attached to an EC2 instance
  • EBS (Elastic Block Storage - block): fast storage for your EC2 compute to use, usually SSD and NVMe based
  • EFS (Elastic File Service - file): share data  (pictures, document, ...) using files INTERNALLY WITHIN AWS, such as between EC2 instances

 

Saturday, December 10, 2022

A Database Overview - from SQL to NoSQL

 Introduction


I am writing this quick introduction to databases, based on my studies (MBA level database class, focused on database theory to implementation using Microsoft SQL Server) and professional experience (using MySQL, SQLite, MariaDB, MongoDB, Neo4J, DLT).




Relational “SQL” Databases


Conjured up in the 1970, relational database was created to address the problem of “how can businesses store large amounts of data, then easily retrieve it”.  The computer system that served to store the data was called Relational Database Management System (RDBMS). The language that allowed businesses to create and retrieve data from the RDBMS was called Structured Query Language (SQL). But raw data cannot just be simply thrown into the RDBMS database. The data first had to be analyzed, and only then can the database be designed. The database design phase involves creation of an Entity Relation Diagram (ERD) to model the business needs of the database, followed by normalization of the data to conform to the “normal form”  to reduce data redundancy.  The RDBMS had to support core basic functions such as Create, Read, Update, Delete (CRUD) methods. And for the RDBMS to ensure that data was stored accurately, it behaved in accordance with Atomic Consistency Isolation Durability (ACID). RDBMS implemented the concept of On Line Transactional Processing (OLTP) to support businesses that broadly adopted RDBMS to handle daily critical business transactions. ACID properties of a RDBMS gave banks the confidence to store highly critical data on RDBMS. And once the transactional data is stored in RDBMS, an Extraction Transform Load (ETL) process loaded the OLTP data into another database that is more geared for analysis. This other database is usually called a data warehouse or data lake. This second database enabled On Line Analytical Processing (OLAP), which was ideal for analysis and reporting. 




Big Data


Relational databases serve the business world with its ability to store transactions with its OLTP features, then provide analytics and insights with its OLAP abilities. But with the advancement of new technologies (namely broadband internet, 5G, and powerful mobile devices to use up that bandwidth), real world data no longer fit into tidy RDBMS schema. What were the major characteristics of this new data? In some studies, the requirements of this new data, the 5V framework was created needed to handle Velocity, Variety, Veracity, Volume, and Value.  Velocity = real time data. Variety = text, pictures, videos, geo-data. Veracity = Volume = social media easily creating and consuming multimedia, multiplied by millions of users. Value = real time data, geo-tagged, enriched by media can provide infinitely more insights than structured data.  




NoSQL Databases


With Big Data, a new class of database was needed - to handle the new, unpredictable format. So with NoSQL databases, a database schema no longer needs to be PREDEFINED. In addition to this, NoSQL databases are usually distributed globally to be close to where the data is used in real time. While sounding reasonable, this poses the problem of how to keep data that is spread around the globe consistent.  Whereas traditional RDBMS was focused on accuracy of data, enforced via the ACID principles, modern data requirements are different. Modern data requirements favored availability of any-even-old data, possibly at the expense of accuracy. This requirement of big data is known as Basically Available Soft state Eventually consistent (BASE), the antithesis of ACID. The Consistency-Availability-Partition Resistance (CAP) Theorem, which is a framework for trade offs required between accuracy (which ACID affords) and availability (BASE offered by NoSQL databases) helped to put a framework around which database type to use. In additional to the flexibility of NoSQL databases, NoSQL databases can be scaled horizontally (as opposed to scaling vertically in a RDBMS system). The benefit of horizontal scaling is that to add extra new compute capacity, just add new servers - instead of stopping a current server and add extra CPU/RAM/storage used in vertical scaling.




How Databases Fit In Cloud Era:


RDBMS were created during the client-server era, which means that the database ran on a server. To access the database, the client needs to connect to a server that usually was running in a room somewhere in the office. For example, to run Oracle Database 12.2, the minimum server configuration is listed here at Oracle. This is hardware that you will need to procure and install somewhere in your office. You will also need an IT person who can install, configure, manage, patch, and upgrade the database software.  Fast forward 30 years to the world of cloud computing, where you can have Oracle now manage both the hardware and software - called “Oracle Database Standard Service”, where the hardware is Oracle Cloud Infrastructure (OCI). Let’s look at another more modern example like MongoDB. MongoDB is a “new SQL” database, gives you the option to 1) MongoDB Enterprise Advanced - run locally using your own hardware, like the olden client-server days or  2) MongDB Atlas aka “MongoDB Enterprise Advanced in the Cloud”.  If you are an Amazon Web Services (AWS) user, you can visit the AWS Marketplace to subscribe to MongoDB Atlas (Pay As You Go) or use an AWS clone of MongoDB called Amazon DocumentDB



Current “Top” Players in the Database Marketplace (1):


SQL:  

  • Oracle, Oracle MySQL, Microsoft SQL Server


New SQL:

  • PostGRES, MongoDB


NoSQL:


  • Document : MongDB ( #5), CouchDB (#40), 


  • Key-Value : Redis (#6), Memcached (#33), etcd (#46), Aerospike (#70), RocksDB (#89), LevelDB (#107)


  • Wide Column: Cassandra (#11), HBase (#26)


  • Graph : Neo4j (#19)


  • Search engine : Elasticsearch (#7), Splunk (#13)





Conclusion


Databases have evolved from plan-ahead SQL RDBMS systems that run on a server located in your office - to ad-hoc NoSQL databases that run in the cloud.  Although RDBMS was invented 50 years ago, most of the world’s data still resides on them and new applications will still be designed with RDBMS. I hope that I have given a broad view here to provide a map to the world of databases.




Footnote:

  1. https://db-engines.com/en/ranking