The ins and outs of Apache Storm – real-time processing for Hadoop

Yesterday at Strata + Hadoop World, Microsoft announced the preview of Apache Storm clusters on Azure HDInsight.  This post will give you the ins and outs of Storm.

What is Storm?

Apache Storm is a distributed, fault-tolerant, open source real-time event processing solution. Storm was originally used by Twitter to process massive streams of data from the Twitter firehose. Today, Storm is an incubator project as part of the Apache Software foundation. Typically, Storm will be integrated with a scalable event queuing system like Apache Kafka or Azure Event Hubs.

What can it do?

Combined with an event queuing system, the combined solution will be able to process a large amount of real-time data. This can enable many different scenarios like real-time fraud detection, click-stream analysis, financial alerts, telemetry from connected sensors/devices, and more. For information on real world scenarios, read how companies are using Storm.

How do I get started?

For Microsoft customers, we offer Storm as a preview cluster in Azure HDInsight. This gives you a managed cluster where you will have the benefit of being easy-to-setup (within a few clicks and a few minutes), having high availability (clusters are monitored 24/7 and under the Azure SLA for uptime), having elastic scale (where more resources can be added depending on need), and being integrated to the broad Azure ecosystem (ie. Event Hubs, HBase, VNet, etc).

To get started, customers will need to have an Azure subscription or a free trial to Azure. With this in hand, you should be able to get a Storm cluster up and running in minutes by going through this getting started guide.

For more information on Storm:

For more information on Azure HDInsight: