A Look at Azure Event Hubs Archive

Published on : Nov 22, 2016

Category : Microsoft Azure

kent

Author

In this guest blog post we are going to look at a recently released feature called Azure Event Hubs Archive. For those who may not be familiar with Event Hubs, Event Hubs is an Azure service that allows for large-scale ingestion of events. Customers typically send telemetry events to Event Hubs and then consume these events using other Azure services, like Azure Stream Analytics (ASA) or Azure Functions. At my employer, TransAlta, we have been using Azure Event Hubs in our Industrial Internet of Things (IoT) projects. We documented one of our projects as part of a case study with the Azure Messaging team this past summer, you can read about it here. A common requirement for customers is to archive event data. Customers may do this for a variety of reasons. For some, they may want to look back at the events that were processed to support operational triage or investigations. At some point, if you are supporting a messaging solution, you are bound to be asked a question about whether a specific event was processed. If you don’t have the “evidence”, you are bound to lose that conversation. However, if you have the event in your archive, then you may be on the right side of that conversation. Check out our blog on “Understanding the consumer side of Azure EventHubs“. Another use case for having an archive of your events is if you want to run an analytic “cold path”. In some scenarios, you may use an ASA job to provide a “hot path” for your real-time analytics that may be focused on a real-time event stream. Conversely, you may want a “cold path” where you are essentially batching-up a series of events over a longer duration, only to perform that analysis later. You may be asking yourself, why would I want to enable this archive feature? I can build my own logger using log4net or NLog within my consumer. Yes, you can do this. But, you are now responsible for writing, or integrating that code in your consumer. You are also responsible for providing storage and compute for that process to run in. For data that needs to be archived, you generally want to store this data in the most cost effective location. In most situations, this place is in the cloud.
You can download the entire article as a PDF document. A Look at Azure Event Hubs Archive

Configuration

Let’s now setup Event Hubs Archive and see it in action. For the purpose of this blog post, I am going to take an existing Event Hub that I provisioned for my Ignite talk.
  1. Within the Azure portal, find your Event Hub and click on Properties.event hubs properties
  2. Enable the Archive feature by turning the slider to On. We also need to specify the Time Window and the Size Window. event hubs enable archive Note: The values posted are the default values. The minimum window is 60 seconds up to 900 seconds, which is 15 minutes. The size window goes from 10 MB to 500 MB. Since we have two different thresholds, the threshold that is reached first will cause an archive of the Event Hub to occur.
  3. Next, we need to provide a Storage Account and a Blob Container before we can save our settings.

Testing

  1. We can now start our publisher. In my case, I am going to send a batch of events every 1 second from my simulator.event hubs publisher
  2. In order to see our events, I have downloaded Azure Storage Explorer and configured it with my storage account and key. When we explore our Blob Container we will see a series of files.event hubs azure storage explorer
  3. Note the taxonomy of the files. We have: <Namespace>/<EventHub>/<Partition>/<YYYY>/<MM>/<DD>/<HH>/<mm>/<ss>
  4. If we open up a file, we will discover our contents in Avro format.event hubs avro formatAlso, note that we may see 0 byte files within our Blob Container if we do not have events processed for that period.  Remember, the archive event will be executed when the first threshold is exceeded. In this case, it was the time elapsing before the size constraint was exceeded.

Conclusion

In this post, we discussed how we can very simply and quickly add archive capabilities to our Event Hub projects without any performance impact.  It is a great option for customers who want additional traceability and/or for additional analytic streaming options. Do be aware that there is a cost implication to enable Event Hubs Archive. In addition to the costs related to our Event Hub Throughput Unit(s) and storage, there is also an hourly charge for using this feature.  Please consult with the Azure Event Hub Pricing page for more details. Are you new to Azure Event Hubs? Here’s our blog on “Understand Azure Event Hubs” to help you evolve with Event Hubs.
You can download the entire article as a PDF document. A Look at Azure Event Hubs Archive