Introducing Event Hubs Data Generator

An image representing Azure Event Hubs, next to an illustration of Bit the Raccoon.

Event Hub Data Generator (EHDG) is a tool which generates realistic fake data based on a provided schema and sends it to Azure Event Hubs. The purpose is to remove the pain of initial set up that is faced by many developers. For instance, if someone is trying to demonstrate streaming data sets in PowerBI, they likely need to set up a stream analytics job, and in order for them to set up a stream analytics job, they need data; in particular hot data. Setting up this stream of hot data, particularly data that is useful for your use case, can be a tedious and lengthy process. Even if they had access to real data they could stream, real data is often messy – requiring endless cleaning. Event Hub Data Generator (EHDG) is here to make simplify this process and allow the developer to work on the later parts of the pipeline sooner.

Event Hubs

Azure Event Hubs, a Platform-as-a-Service offering, is a big data streaming platform and event ingestion service. Highly scalable and reliable, it can receive and process millions of events per second with low latency.

Event Hubs can be used for a number of different scenarios, such as:

  • Anomaly detection
  • Device telemetry streaming
  • Live dashboarding

Fake Data

Not every developer has the time to generate a tool to populate their databases or ingestion services. This is where fake data, otherwise known as test or dummy data comes in. Fake data serves to exist where real data is normally present, acting as a placeholder for testing purposes in a non-production environment.

JSON-Faker-Schema

Event Hub Data Generator (EHDG) is heavily based on the work of JSON-Schema-Faker’s JavaScript library. JSON-Faker-Schema can generate realistic fake data in JSON format for many areas, including people, address, finance, and company etc. The data generated will be based on the schema provided by the user. JSON-Faker-Schema adheres to the JSON Schema, which is a standard for a valid JSON file.

A very simple example of valid schema;

{
  "firstName": {
    "type": "string",
    "faker": "name.firstName"
  }
}

Given that Event Hubs Data Generator makes use of a library called JSON-Faker-Schema – it would be wise to refer to the documentation provided on their Github when you’re getting started. Their website includes a playground environment that is especially useful when testing whether your schema is valid.

Providers

The concept of providers is an important one in JSON-Faker-Schema – providers are the generator attributes that are packaged together. For instance, generator attributes such as firstName, lastName are packaged into a Provider called Person. Providers can be thought of as logical groupings of attributes. A complete and exhaustive list of providers can be found on the Faker Github Wiki.

How EHDG Works

This tool is web based, and as such, the implementation differs slightly from the documentation in the Github repository for the library we are using. We are using HTML forms (index.html) to pass data into the JavaScript file (index.js) which includes the Faker logic.

On the HTML page, index.html, the textbox field ‘method’ you are to insert a valid schema:

{
  "firstName": {
    "type": "string",
    "faker": "name.firstName"
  }
}

In our index.js file we can pass these values from the form using the Express library. Here’s an example of how it works:

var method = req.body.method;

Once we have the form values we can then perform our Faker logic:

var dataJSON = JSON.parse(method);
        var data = fakerSchema.generate(dataJSON);
        const eventData = {body: data};
        client.send(eventData);

Outputting ‘eventData’ to console would look like this:

{
  "firstName": "Oswald"
}

Using EHDG

Instructions to set up this tool to run locally can be found here.

Below is a short, simple guide on how to use the tool. A more detailed guide on the form is available here.

  1. Set up an Event Hub through the Azure Portal
  2. Go into the Event Hub’s SAS settings and copy the Event Hub’s Connection String and Namepace.
  3. Paste those aforementioned values into the web UI into the respective fields.
  4. Specify number of messages you want produced.
  5. Input a valid schema.
  6. Press submit.
  7. Check your Event Hub’s monitor to see incoming messages.

Going Forward

The possibilities are endless. As mentioned earlier, you can use Event Hub for live dashboarding, and anomaly detection amongst other things. For instance, you can set up a pipeline that includes the following: Event Hub, Stream Analytics and PowerBI and set data alerts in the PowerBI dashboard, which is effectively a way to carry out anomaly detection.

If there are any further questions, please take a look at the Github.