Azure Stream Analytics now supports MATCH_RECOGNIZE

MATCH_RECOGNIZE in Azure Stream Analytics significantly reduces the complexity and cost associated with building, modifying, and maintaining queries that match sequence of events for alerts or further data computation.

What is Azure Stream Analytics?

Azure Stream Analytics is a fully managed serverless PaaS offering on Azure that enables customers to analyze and process fast moving streams of data and deliver real-time insights for mission critical scenarios. Developers can use a simple SQL language, extensible to include custom code, in order to author and deploy powerful analytics processing logic that can scale-up and scale-out to deliver insights with milli-second latencies.

Traditional way to incorporate pattern matching in stream processing

Many customers use Azure Stream Analytics to continuously monitor massive amounts of data, detecting sequence of events and deriving alerts or aggregating data from those events. This in essence is pattern matching.

For pattern matching, customers traditionally relied on multiple joins, each one detecting a single event in particular. These joins are combined to find a sequence of events, compute results or create alerts. Developing queries for pattern matching is a complex process and very error prone, difficult to maintain and debug. Also, there are limitations when trying to express more complex patterns like Kleene Stars, Kleene Plus, or Wild Cards.

To address these issues and improve customer experience, Azure Stream Analytics provides a MATCH_RECOGNIZE clause to define patterns and compute values from the matched events. MATCH_RECOGNIZE clause increases user productivity as it is easy to read, write and maintain.

Typical scenario for MATCH_RECOGNIZE

Event matching is an important aspect of data stream processing. The ability to express and search for patterns in a data stream enable users to create simple yet powerful algorithms that can trigger alerts or compute values when a specific sequence of events is found.

An example scenario would be a food preparing facility with multiple cookers, each with its own temperature monitor. A shut down operation for a specific cooker need to be generated in case its temperature doubles within five minutes. In this case, the cooker must be shut down as temperature is increasing too rapidly and could either burn the food or cause a fire hazard.

Query
SELECT * INTO ShutDown from Temperature
MATCH_RECOGNIZE (
     LIMIT DURATION (minute, 5)
     PARTITON BY cookerId
     AFTER MATCH SKIP TO NEXT ROW
     MEASURES
         1 AS shouldShutDown
     PATTERN (temperature1 temperature2)
     DEFINE
         temperature1 AS temperature1.temp > 0,
         temperature2 AS temperature2.temp > 2 * MAX(temperature1.temp)
) AS T

In the example above, MATCH_RECOGNIZE defines a limit duration of five minutes, the measures to output when a match is found, the pattern to match and lastly how each pattern variable is defined. Once a match is found, an event containing the MEASURES values will be output into ShutDown. This match is partitioned over all the cookers by cookerId and are evaluated independently from one another.

MATCH_RECOGNIZE brings an easier way to express patterns matching, decreases the time spent on writing and maintaining pattern matching queries and enable richer scenarios that were practically impossible to write or debug before.

Get started with Azure Stream Analytics

Azure Stream Analytics enables the processing of fast-moving streams of data from IoT devices, applications, clickstreams, and other data streams in real-time. To get started, refer to the Azure Stream Analytics documentation.

New capabilities in Stream Analytics reduce development time for big data apps

Azure Stream Analytics is a fully managed PaaS offering that enables real-time analytics and complex event processing on fast moving data streams. Thanks to zero-code integration with over 15 Azure services, developers and data engineers can easily build complex pipelines for hot-path analytics within a few minutes. Today, at Inspire, we are announcing various new innovations in Stream Analytics that help further reduce time to value for solutions that are powered by real-time insights. These are as follows:

Bringing the power of real-time insights to Azure Event Hubs customers

Today, we are announcing one-click integration with Event Hubs. Available as a public preview feature, this allows an Event Hubs customer to visualize incoming data and start to write a Stream Analytics query with one click from the Event Hub portal. Once the query is ready, they will be able to operationalize it in few clicks and start deriving real time insights. This will significantly reduce the time and cost to develop real-time analytics solutions.

GIF showing the one-click integration between Event Hubs and Azure Stream Analytics

One-click integration between Event Hubs and Azure Stream Analytics

Augmenting streaming data with SQL reference data support

Reference data is a static or slow changing dataset used to augment real-time data streams to deliver more contextual insights. An example scenario would be currency exchange rates regularly updated to reflect market trends, and then converting a stream of billing events in different currencies to a common currency of choice.

Now generally available (GA), this feature provides out-of-the-box support for Azure SQL Database as reference data input. This includes the ability to automatically refresh your reference dataset periodically. Also, to preserve the performance of your Stream Analytics job, we provide the option to fetch incremental changes from your Azure SQL Database by writing a delta query. Finally, Stream Analytics leverages versioning of reference data to augment streaming data with the reference data that was valid at the time the event was generated. This ensures repeatability of results.

New analytics functions for stream processing

  • Pattern matching:

      With the new MATCH_RECOGNIZE function, you can easily define event patterns using regular expressions and aggregate methods to verify and extract values from the match. This enables you to easily express and run complex event processing (CEP) on your streams of data. For example, this function will enable users to easily author a query to detect “head and shoulder” patterns on the on a stock market feed.

      • Use of analytics function as aggregate:

          You can now use aggregates such as SUM, COUNT, AVG, MIN, and MAX directly with the OVER clause, without having to define a window. Analytics functions as Aggregates enables users to easily express queries such as “Is the latest temperature greater than the maximum temperature reported in the last 24 hours?”

          Egress to Azure Data Lake Storage Gen2

          Azure Stream Analytics is a central component within the Big Data analytics pipelines of Azure customers. While Stream Analytics focuses on the real-time or hot-path analytics, services like Azure Data Lake help enable batch processing and advanced machine learning. Azure Data Lake Storage Gen2 takes core capabilities from Azure Data Lake Storage Gen1 such as a Hadoop compatible file system, Azure Active Directory, and POSIX based ACLs and integrates them into Azure Blob Storage. This combination enables best in class analytics performance along with storage tiering and data lifecycle management capabilities and the fundamental availability, security, and durability capabilities of Azure Storage.

          Azure Stream Analytics now offers native zero-code integration with Azure Data Lake Storage Gen2 output (preview.)

          Enhancements to blob output

          • Native support for Apache parquet format:

              Native support for egress in Apache parquet format into Azure Blob Storage is now generally available. Parquet is a columnar format enabling efficient big data processing. By outputting data in parquet format into a blob store or a data lake, you can take advantage of Azure Stream Analytics to power large scale streaming extract, transfer, and load (ETL), to run batch processing, to train machine learning algorithms, or to run interactive queries on your historical data. We are now announcing general availability of this feature for egress to Azure Blob Storage.

              • Managed identities (formerly MSI) authentication:

                  Azure Stream Analytics now offers full support for Managed Identity based authentication with Azure Blob Storage on the output side. Customers can continue to use the connection string based authentication model. This feature is available as a public preview.

                  Many of these features just started rolling out worldwide and will be available in all regions within several weeks.

                  Feedback

                  The Azure Stream Analytics team is highly committed to listening to your feedback and letting the user voice influence our future investments. We welcome you to join the conversation and make your voice heard via our UserVoice page.