Announcing the preview of Apache HBase clusters inside Microsoft Azure HDInsight

On June 3, Microsoft announced an update to HDInsight to support Hadoop 2.4 for 100x faster queries.  Today, we are announcing the preview of Apache HBase clusters inside Microsoft Azure HDInsight.

HBase is a NoSQL (“not only Structured Query Language”) database component of the Apache Hadoop ecosystem. While relational database management systems (RDBMS) typically use rigid tabular schemas, NoSQL databases uses fluid techniques such as key-value, column, graph, or document. They are usually designed for elasticity over large datasets and are less rigorous when it comes to schema.

HBase is a columnar NoSQL database that was built to run on top of the Hadoop Distributed File System (HDFS). As a low-latency database, it can do OLTP capabilities like updates, inserts, and deletes of data in Hadoop. HBase will have a set of tables that contain rows and column families that you must predefine. However, it provides flexibility in that new columns can be added to the column families at any time.  This makes HBase have flexibility in the schema to adapt to changing requirements quickly.

This preview announcement will enable customers to run HBase as a managed cluster in the cloud (as an integrated feature of Azure HDInsight). The HBase clusters are configured to store data directly in Azure Blob storage. This will enable use cases like:

  • Building interactive websites that work with large datasets stored in Azure Blobs
  • Building services that store sensor and telemetry data from millions of end points in Azure Blobs (which can then be analyzed using HDInsight (Hadoop)

To learn more about HBase, we invite you to read the following resources:

To learn more about Azure HDInsight and Hadoop, we invite you to the following resources: