Monday, November 29, 2021

BFT in a Distributed Compute and/or Storage System - A Simple Introduction

Distributed Systems - physical

Distributed systems is breaking up a large computer/storage (the slash means and/or) into smaller ones. The smaller compute/storage is called a node. Distributed systems can make an entire larger system safer (no single point of failure), more powerful (break up a big compute task into smaller ones), scalable.  Apache Hadoop, where storage is spread across a wide network of storage nodes, is an example of a distributed (storage and processing) system.


But these nodes need to be in sync (aka consistent). And if the nodeA and nodeB have differences in opinion on what the final state of the system is, who is right?  In a space capsule example - two computers are computing the exact time to fire thrusters. ComputerA says 10:01, ComputerB says 10:04. Who is right?


This is why distributed systems have a consensus, or voting, mechanism.  And BFT is the science behind making voting mechanism. BFT is a voting mechanism that allows for multiple nodes to reach a consensus.  BFT can help to make a distributed system 1) failure tolerant - so that if a node on the distributed system fails, the other working nodes can take over quickly 2) malice tolerant - so that if a node on the distributed system is injecting wrong information, it will be voted out by other good nodes.


Decentralization - control 

Decentralization is a CONTROL concept, not a physical concept. So distributed is a physical concept. In a network that is responsible for control, decisions needs to be made and agreed upon. What is the final state of Joe's bank account? Should the reverse thrusters be fired at 10:01 or 10:03? BFT is critical in a decentralized network - to ensure that far away nodes can make decisions.  BFT needs to handle both failure and malice. Malice can come in many forms : from sending the wrong data (Joe owes me $1,000 instead of $1) to jamming the network. Examples of the latter include email DDoS - remedied by asking the sending to do some work first before the sender's email is read by the receiver. 


BFT in Blockchain

Blockchain uses several different ways to keep its distributed nodes in sync (consistent). BFT is one. Another way is to use Proof of Work (Nakamoto Consensus) used by Bitcoin. Rather than using BFT, Bitcoin wants to see proof that the node did work before its vote is accepted. The work that a node needs to perform is very compute intensive, using up much energy and is frowned upon as a way to solve a consensus. Proof of Stake is slated to replace PoW.

Blockchain Refresh

- append only, making it immutable

- time stamped

- consensus 

- hashing : compress transactions, SHA256

- digital signature : private key (held secretly in a wallet) and public key (forms public address)

- chain : a block of data contains multiple transaction hash; and the following block has a hash of the current block so that changes to the current block will impact all blocks 

- store efficiently : in Merkle Tree (hash)


Conclusion

BFT is the science and art of making nodes in a decentralized environment stay sync (consistent) by providing a consensus (voting) mechanism that works robustly (failures are handled) and safely (malice proof). BFT is used in blockchain.



Friday, November 12, 2021

Hashcash : Before you get to vote/say or email - you need to work or pay for it - to prevent noise/spam

Hashcash : a proof-of-work system
limit email spam and hence reduce denial-of-service attacks
 requiring the sender of an email to perform a small amount of computational work 
before their message can be sent.


--- Email DDoS ---

Email is an essential tool in both business and personal life. Email remains a major channel for business communication, accounting for a significant portion of interactions. While exact percentages can vary by industry and company size, studies suggest that email often makes up around 60-70% of all business communication. Other channels, like instant messaging (e.g., Slack, Microsoft Teams), video calls (e.g., Zoom), and project management tools (e.g., Asana, Trello), are also growing in use, especially for quick chats and collaboration. However, email is still favored for formal communications, documentation, and messages that need to be referenced later.
 
Email inbox can be flooded to the point it is full and cannot receive any more new email. So an email inbox that is full can disrupt business (orders are not received), disrupt personal lives (invitation to a birthday party is not received).

Evil people can easily disrupt a business or person by flooding their email inbox. So from a single computer, the evil person can automate sending thousands of emails an hour, with the goal of flooding the email inbox of the business or person. This is essentially free – there is no cost to create and send email. In the cybersecurity world, this is considered a Denial of Service (DoS) attack. If the evil person uses multiple computers to simultaneously send out thousands of emails per hour, this type of attack is call   ed Distributed of Denial of Service (DDoS).

There are several ways to reduce email inbox DDoS attack. 1) A firewall into the receipient’s email can be used to block a flood of email that is coming from the same email address. 2) Another method is to use a novel scheme created 30 years ago before firewalls became popular. It is called “Hashcash”. Hashcash, proposed by Adam Black in 1997, is a method that requires the sender to do some work before the sender can send an email.  Here is how it works (I think!)

  • Sender sends email to recipient 
  • Before sender accept recipient’s email, recipient sends a number (say 10) to the sender
  • The sender takes the number (10), creates a random number (called nonce), computes the hash digest of the noonce, checks to see if the first 10 digits of the hash digest are 0s... if not, randomly create another nonce... until the first 10 digits of the hash digest are 0s
    • guess NONCE1 -> HASH -> HASH_DIGEST_1; compare 1st 10 digits of HASH_DIGEST_1 to recipients request of 10 zeros; results is no
    • guess NONCE2 -> HASH -> HASH_DIGEST_2; compare 1st 10 digits of HASH_DIGEST_2 to recipients request of 10 zeros; results is no
  • This will take a while… and CPU resources … for the sender to compute 
  • Sender sends the output (hash of data ABCD) to the recipient
  • The recipient can easily verify that the hash is correct, and that the sender did do work 
  • The recipients accepts sender’s email 


So asking the sender to do work before the recipient will accept the email should and will reduce spam.


--- Bitcoin Proof of Work ---

--- Amazon Retail E-commerce ---