Processing Big data in real-time is an operational necessity for many businesses. Azure Stream Analytics is Microsoft’s serverless real-time analytics offering for complex event processing.
We are excited and humbled to announce that Microsoft has been named a leader in The Forrester Wave™: Streaming Analytics, Q3 2019. Microsoft believes this report truly reflects the market momentum of Azure Stream Analytics, satisfied customers, a growing partner ecosystem and the overall strength of our Azure cloud platform. You can access the full report here.
The Forrester Wave™: Streaming Analytics, Q3 2019
Forrester Wave™: Streaming Analytics, Q3 2019 report evaluated streaming analytics offerings from 11 different solution providers and we are honored to share that that Forrester has recognized Microsoft as a Leader in this category. Azure Stream Analytics received the highest possible score in 12 different categories including Ability to execute, Administration, Deployment, Solution Roadmap, Customer adoption and many more.
The report states, “Microsoft Azure Stream Analytics has strengths in scalability, high availability, deployment, and applications. Azure Stream Analytics is an easy on-ramp for developers who already know SQL. Zero-code integration with over 15 other Azure services makes it easy to try and therefore adopt, making the product the real-time backbone for enterprises needing real-time streaming applications on the Azure cloud. Additionally, through integration with IoT Hub and Azure Functions, it offers seamless interoperability with thousands of devices and business applications.”
Key Differentiators for Azure Stream Analytics
Fully integrated with Azure ecosystem: Build powerful pipelines with few clicks
Whether you have millions of IoT devices streaming data to Azure IoT Hub or have apps sending critical telemetry events to Azure Event Hubs, it only takes a few clicks to connect multiple sources and sinks to create an end-to-end pipeline.
Stream Analytics contains a wide array of analytic capabilities such as native support for geospatial functions, built-in callouts to custom machine learning (ML) models for real-time scoring, built-in ML models for Anomaly Detection, Pattern matching, and more to help developers easily tackle complex scenarios while staying in a familiar context.
Azure Stream Analytics helps bring real-time insights and analytics capabilities closer to where your data originates. Customers can easily enable new scenarios with true hybrid architectures for stream processing and run the same query in the cloud or on the IoT edge.
Best-in-class financially backed SLA by the minute
We understand it is critical for businesses to prevent data loss and have business continuity. Stream Analytics guarantees event processing with a 99.9 percent availability service-level agreement (SLA) at the minute level, which is unparalleled in the industry.
Stream Analytics is a fully managed serverless (PaaS) offering on Azure. There is no infrastructure to worry about, and no servers, virtual machines, or clusters to manage. We do all the heavy lifting for you in the background. You can instantly scale up or scale-out the processing power from one to hundreds of streaming units for any job.
Stream Analytics guarantees “exactly once” event processing and at least once delivery of events. It has built-in recovery capabilities in case the delivery of an event fails. So, you never have to worry about your events getting dropped.
Try it today
There is a strong and growing developer community that supports Stream Analytics. Learn how to get started and build a real-time fraud detection system.
On November fourth, we announced Azure Synapse Analytics, the next evolution of Azure SQL Data Warehouse. Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs.
With Azure Synapse, data professionals can query both relational and non-relational data using the familiar SQL language. This can be done using either serverless on-demand queries for data exploration and ad hoc analysis or provisioned resources for your most demanding data warehousing needs. A single service for any workload.
In fact, it’s the first and only analytics system to have run all the TPC-H queries at petabyte-scale. For current SQL Data Warehouse customers, you can continue running your existing data warehouse workloads in production today with Azure Synapse and will automatically benefit from the new preview capabilities when they become generally available. You can sign up to preview new features like serverless on-demand query, Azure Synapse studio, and Apache Spark™ integration.
A cloud native, distributed SQL processing engine is at the foundation of Azure Synapse and is what enables the service to support the most demanding enterprise data warehousing workloads. This week at Ignite we introduced a number of exciting features to make data warehousing with Azure Synapse easier and allow organizations to use SQL for a broader set of analytics use cases.
Unlock powerful insights faster from all data
Azure Synapse deeply integrates with Power BI and Azure Machine Learning to drive insights for all users, from data scientists coding with statistics to the business user with Power BI. And to make all types of analytics possible, we’re announcing native and built-in prediction support, as well as runtime level improvements to how Azure Synapse handles streaming data, parquet files, and Polybase. Let’s dive into more detail:
With the native PREDICT statement, you can score machine learning models within your data warehouse—avoiding the need for large and complex data movement. The PREDICT function (available in preview) relies on open model framework and takes user data as input to generate predictions. Users can convert existing models trained in Azure Machine Learning, Apache Spark™, or other frameworks into an internal format representation without having to start from scratch, accelerating time to insight.
We’ve enabled direct streaming ingestion support and ability to execute analytical queries over streaming data. Capabilities such as: joins across multiple streaming inputs, aggregations within one or more streaming inputs, transform semi-structured data and multiple temporal windows are all supported directly in your data warehousing environment (available in preview). For streaming ingestion, customers can integrate with Event Hubs (including Event Hubs for Kafka) and IoT Hubs.
We’re also removing the barrier that inhibits securely and easily sharing data inside or outside your organization with Azure Data Share integration for sharing both data lake and data warehouse data.
By using new ParquetDirect technology, we are making interactive queries over the data lake a reality (in preview). It’s designed to access Parquet files with native support directly built into the engine. Through improved data scan rates, intelligent data caching and columnstore batch processing, we’ve improved Polybase execution by over 13x.
To support customers as they democratize their data warehouses, we are announcing new features for intelligent workload management. The new Workload Isolation functionality allows you to manage the execution of heterogeneous workloads while providing flexibility and control over data warehouse resources. This leads to improved execution predictability and enhances the ability to satisfy predefined SLAs.
Analyzing petabyte-scale data requires ingesting petabyte-scale data. To streamline the data ingestion process, we are introducing a simple and flexible COPY statement. With only one command, Azure Synapse now enables data to be seamlessly ingested into a data warehouse in a fast and secure manner.
This new COPY statement enables using a single T-SQL statement to load data, parse standard CSV files, and more.
Azure has the most advanced security and privacy features in the market. These features are built into the fabric of Azure Synapse, such as automated threat detection and always-on data encryption. And for fine-grained access control businesses can ensure data stays safe and private using column-level security, native row-level security, and dynamic data masking (now generally available) to automatically protect sensitive data in real time.
To further enhance security and privacy, we are introducing Azure Private Link. It provides a secure and scalable way to consume deployed resources from your own Azure Virtual Network (VNet). A secure connection is established using a consent-based call flow. Once established, all data that flows between Azure Synapse and service consumers is isolated from the internet and stays on the Microsoft network. There is no longer a need for gateways, network addresses translation (NAT) devices, or public IP addresses to communicate with the service.
Get started today
Businesses can continue running their existing data warehouse workloads in production today with generally available features on Azure Synapse.
Today, businesses are forced to maintain two types of analytical systems, data warehouses and data lakes. Data warehouses provide critical insights on business health. Data lakes can uncover important signals on customers, products, employees, and processes. Both are critical, yet operate independently of one another, which can lead to uninformed decisions. At the same time, businesses need to unlock insights from all their data to stay competitive and fuel innovation with purpose. Can a single cloud analytics service bridge this gap and enable the agility that businesses demand?
Azure Synapse Analytics
Today, we are announcing Azure Synapse Analytics, a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs.
Simply put, Azure Synapse is the next evolution of Azure SQL Data Warehouse. We have taken the same industry-leading data warehouse to a whole new level of performance and capabilities. In fact, it’s the first and only analytics system to have run all TPC-H queries at petabyte-scale. Businesses can continue running their existing data warehouse workloads in production today with Azure Synapse and will automatically benefit from the new capabilities which are in preview. Businesses can put their data to work much more quickly, productively, and securely, pulling together insights from all data sources, data warehouses, and big data analytics systems. Partners can continue to build with us as Azure Synapse will offer a rich and vibrant ecosystem of partners like Databricks, Informatica, Accenture, Talend, Panoply, Attunity, Pragmatic Works, and Adatis.
With Azure Synapse, data professionals of all types can collaborate, build, manage, and analyze their most important data with ease, all within the same service. From Apache Spark integration with the powerful and trusted SQL engine to code-free data integration and management, Azure Synapse is built for every data professional.
That is why companies like Unilever are choosing Azure Synapse.
“Our adoption of the Azure Analytics platform has revolutionized our ability to deliver insights to the business. We are very excited that Azure Synapse Analytics will streamline our analytics processes even further with the seamless integration the way all the pieces have come together so well.“
Nallan Sriraman, Global Head of Technology, Unilever
Azure Synapse delivers insights from all your data, across data warehouses and big data analytics systems, with blazing speed. With Azure Synapse, data professionals can query both relational and non-relational data at petabyte-scale using the familiar SQL language. For mission-critical workloads, they can easily optimize the performance of all queries with intelligent workload management, workload isolation, and limitless concurrency.
With Azure Synapse, enabling business intelligence and machine learning is a breeze. It is deeply integrated with Power BI and Azure Machine Learning to greatly expand the discovery of insights from all your data and apply machine learning models to all your intelligent apps. Significantly reduce project development time for business intelligence and machine learning projects with a limitless analytics service that enables you to seamlessly apply intelligence over all your most important data — from Dynamics 365 to Office 365, to SaaS services that support Open Data Initiative — and easily share data with just a few clicks.
Build end-to-end analytics solutions with a unified experience. The Azure Synapse studio provides a unified workspace for data prep, data management, data warehousing, big data, and AI tasks. Data engineers can use a code-free visual environment for managing data pipelines. Database administrators can automate query optimization. Data scientists can build proofs of concept in minutes. Business analysts can securely access datasets and use Power BI to build dashboards in minutes, all while using the same analytics service.
Azure has the most advanced security and privacy features in the market. These features are built into the fabric of Azure Synapse, such as automated threat detection and always-on data encryption. And for fine-grained access control, businesses can help ensure data stays safe and private using column-level security and native row-level security, as well as dynamic data masking to automatically protect sensitive data in real-time.
Get started today
Businesses can continue running their existing data warehouse workloads in production today with generally available features on Azure Synapse.
A data-driven culture is critical for businesses to thrive in today’s environment. In fact, a brand-new Harvard Business Review Analytic Services survey found that companies who embrace a data-driven culture experience a 4x improvement in revenue performance and better customer satisfaction.
Foundational to this culture is the ability to deliver timely insights to everyone in your organization across all your data. At our core, that is exactly what we aim to deliver with Azure Analytics and Power BI, and our work is paying off in value for our customers. According to a recent commissioned Forrester Consulting Total Economic Impact™ study, Azure Analytics and Power BI deliver incredible value to customers with a 271 percent ROI, while increasing satisfaction by 60 percent.
Our position in the leaders quadrant in Gartner’s 2019 Magic Quadrant for Analytics & Power BI, coupled with our undisputed performance in analytics provides you with the foundation you need to implement a data-driven culture.
But what are three key attributes needed to establish a data-driven culture?
First, it is vital to get the best performance from your analytics solution across all your data, at the best possible price.
Second, it is critical that your data is accurate and trusted, with all the security and privacy rigor needed for today’s business environment.
Finally, a data-driven culture necessitates self-service tools that empower everyone in your organization to gain insights from your data.
Let’s take a deeper look into each one of these critical attributes.
When it comes to performance, Azure has you covered. An independent study by GigaOm found that Azure SQL Data Warehouse is up to 14x faster and costs 94% less than other cloud providers. This unmatched performance is why leading companies like Anheuser-Busch Inbev adopt Azure.
“We leveraged the elasticity of SQL Data Warehouse to scale the instance up or down, so that we only pay for the resources when they’re in use, significantly lowering our costs. This architecture performs significantly better than the legacy on-premises solutions it replaced, and it also provides a single source of truth for all of the company’s data.” – Chetan Kundavaram, Global Director, Anheuser-Busch Inbev
Azure is the most secure cloud for analytics. This is according to Donald Farmer, a well-respected thought leader in the data industry, who recently stated, “Azure SQL Data Warehouse platform offers by far the most comprehensive set of compliance and security capabilities of any cloud data warehouse provider”. Since then, we announced Dynamic Data Masking and Data Discovery and Classification to automatically help protect and obfuscate sensitive data on-the-fly to further enhance your data security and privacy.
Insights for all
Only when everyone in your organization has access to timely insights can you achieve a truly data-driven culture. Companies drive results when they break down data silos and establish a shared context of their business based on trusted data. Customers that use Azure Analytics and Power BI do exactly that. According to the same Forrester study, customers stated.
“Azure Analytics has helped with a culture change at our company. We are expanding into other areas so that everyone can make informed business decisions.” — Study interviewee
“Power BI was a huge success. We’ve added 25,000 users organically in three years.” — Study interviewee
Only Azure Analytics and Power BI together can unlock the performance, security and insights for your entire organization. We are uniquely positioned to empower you to develop a data-driven culture needed to thrive. We are excited to see customers like Reckitt Benckiser, choose Azure for their analytics needs.
“Data is most powerful when it’s accessible and understandable. With this Azure solution, our employees can query the data however they want versus being confined to the few rigid queries our previous system required. It’s very easy for them to use Power BI Pro to integrate new data sets to deliver enormous value. When you put BI solutions in the hands of your boots on the ground—your sales force, marketing managers, product managers—it delivers a huge impact to the business.” — Wilmer Peres, Information Services Director, Reckitt Benckise
When you add it all up, Azure Analytics and Power BI are simply unmatched.
Gartner, Magic Quadrant for Analytics and Business Intelligence Platforms, 11 February 2019, Cindi Howson, James Richardson, Rita Sallam, Austin Kronz
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, express or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Azure Databricks is a fast, easy, and collaborative Apache Spark based analytics platform that simplifies the process of building big data and artificial intelligence (AI) solutions. Azure Databricks provides data engineers and data scientists an interactive workplace where they can use the languages and frameworks of their choice. Natively integrated with services like Azure Machine Learning and Azure SQL Data Warehouse, Azure Databricks enables customers to build an end-to-end modern data warehouse, real-time analytics, and machine learning solutions.
Save up to 37 percent on your Azure Databricks workloads
Azure Databricks Unit pre-purchase plan is now generally available—expanding our commitment to make Azure the most cost-effective cloud for running your analytics workloads.
Today, with the Azure Databricks Unit pre-purchase plan, you can start unlocking the benefits of Azure Databricks at significantly reduced costs when you pre-pay for Databricks compute for a one or three-year term. With this new pricing option, you can achieve savings of up to 37 percent compared to pay-as-you-go pricing. You can learn more about the discount tiers on our pricing page. All Azure Databricks SKUs—Premium and Standard SKUs for Data Engineering Light, Data Engineering, and Data Analytics—are eligible for DBU pre-purchase.
Compared with other Azure services with reserved capacity pricing, which have a per hour capacity purchase, this plan allows you to pre-purchase DBUs that can be used at any time. You also have the flexibility to consume units across all workload types and tiers.
Azure Databricks is offered as a first party Azure service. You can pre-purchase Databricks compute either from your Azure prepayment or existing payment instruments.
Azure Databricks is now available in South Africa and South Korea
Azure Databricks is now generally available in additional regions—South Africa and South Korea. These additional locations bring the product worldwide availability count to 26 regions backed by a 99.95 percent SLA.
Driven by the motto of innovation and accessibility, we aim to ensure that we build a cloud infrastructure to serve the needs of customers globally. Stay updated with the region availability for Azure Databricks.
Organizations also benefit from Azure Databricks’ native integration with other services like Azure Blob storage, Azure Data Factory, Azure SQL Data Warehouse, Azure Machine Learning, and Azure Cosmos DB. This enables new analytics solutions that support modern data warehousing, advanced analytics, and real-time analytics scenarios.
Get started today
Getting started with DBU pre-purchase is easy, and is done via the Azure portal. For details on how to get started, see our documentation. For more information on discount tiers, please visit the pricing page.
As the amount of data stored and queried continues to rise, it becomes increasingly important to have the most price-performant data warehouse. While we’re excited about being the industry leader in both of Gigaom’s TPC-H and TPC-DS benchmark reports, we don’t plan to stop innovating on behalf of our customers.
As Rohan Kumar mentioned in his blog on Monday, we’re excited to introduce several new features that will continue to make Azure SQL Data Warehouse the unmatched industry leader in price-performance, flexibility, and security.
To enable customers to continue improving the performance of their applications without adding any additional cost, we’re announcing preview availability of result-set caching, materialized views, and ordered clustered columnstore indexes.
In addition to price-performance enhancements, we’ve added new capabilities that enable customers to be more agile and flexible. The first is workload importance, which is a new feature that enables users to decide how workloads with conflicting needs get prioritized. Second, our new support for automatic statistics maintenance (auto-update statistics) means that manageability and maintenance of Azure SQL Data Warehouse just got easier and more effective. And finally, we’re also adding support for managing and querying JSON data.Users can now load JSON data directly into their data warehouses and mix it with other relational data, leading to faster and easier insights.
Our last announcement focuses on security and privacy. As you know, deploying data warehousing solutions in the cloud demands sophisticated and robust security. While Azure SQL Data Warehouse already enables an advanced security model to be deployed, today we’re announcing support for Dynamic Data Masking (DDM). DDM allows you to protect private data, through user-defined policies, ensuring it’s visible only to those that have permission to see it.
In the sections below, we’ll dive into these new features and the benefits that each provide.
Price-performance is a reoccurring theme in our releases because it ensures we provide one of the fastest analytics services at incredible value. With new functionalities announced today, we continue to demonstrate our commitment towards offering the leading price-performance platform.
Interactive dashboarding with result-set caching (preview)
Interactive dashboards come with predictable and repetitive query patterns. Result-set caching, now available in preview, helps with this scenario as it enables instant query response times while reducing time-to-insight for business analysts and reporting users.
With result-set caching enabled, Azure SQL Data Warehouse automatically caches results from repetitive queries, causing subsequent query executions to return results from the persisted cache that omits full query execution. In addition to saving compute cycles, queries satisfied by result-set cache do not use any concurrency slots and thus do not count against existing concurrency limits. For security reasons, only users with the appropriate security credentials can access the result sets in cache.
Materialized views to improve performance (preview)
Another new feature that greatly enhances query performance for a wide set of queries is materialized view support, now available in preview. A materialized view improves the performance of complex queries (typically queries with joins and aggregations) while offering simple maintenance operations.
When materialized views are created, Azure SQL Data Warehouse query optimizer transparently and automatically rewrites user queries to leverage deployed materialized views, leading to improved query performance. Best of all, as the data gets loaded into base tables, Azure SQL Data Warehouse automatically maintains and refreshes materialized views, providing a simplified view of maintenance and management. As the user queries leverage materialized views, queries run significantly faster and use less system resources. The more complex and expensive the query within the view is, the bigger potential there is for execution time savings.
Fast scans with ordered clustered columnstore indexes (preview)
Columnstore is a key enabler for storing and efficiently querying large amounts of data. For each table, it divides incoming data into row groups and each column of a row group forms a segment on a disk. When querying columnstore indexes, only the column segments that are relevant to user queries are read from the disk. Ordered clustered columnstore indexes further optimize query execution by enabling efficient segment elimination.
Due to pre-ordered data, you can drastically reduce the number of segments that are read from the disk, leading to faster query processing. Ordered clustered columnstore indexes is now available in preview, and queries containing filters and predicates can greatly benefit from this feature.
As business requirements evolve, the ability to change and adapt solution behavior is one of the key benefits of a modern data warehousing product. The ability to handle and manage heterogeneous data that enterprises have while offering ease of use and management is critical. To support these needs, Azure SQL Data Warehouse is introducing the following new functionalities to help you deal with ever-evolving requirements.
Prioritize workloads with workload importance (general availability)
Running mixed workloads on your analytics solution is often a necessity to effectively and quickly execute business processes. In situations where resources are constrained, the capability to decide which workloads need to be executed first is critical, as it helps with overall solution cost management. For instance, executive dashboard reports may be more important than ad-hoc queries. Workload importance now enables this scenario. Requests with higher importance are guaranteed quicker access to resources, which helps meet predefined SLAs and ensures important requests are prioritized.
Workload classification concept
To define workload priority, various requests must be classified. Azure SQL Data Warehouse supports flexible classification policies that can be set for a SQL query, a database user, database role, Azure Active Directory login, or Azure Active Directory group. Workload classification is achieved using the new CREATE WORKLOAD CLASSIFIER syntax.
The diagram below illustrates the workload classification and importance function:
Workload importance concept
Workload importance is established through classification. Importance influences a requester’s access to system resources including memory, CPU, and IO and locks. A request can be assigned one of these five levels of importance: low, below_normal, normal, above_normal, and high. If a request with above_normal importance is scheduled, it gets access to resources before a request with the default normal importance.
Manage and query JSON data (preview)
Organizations are increasingly faced with dealing with multiple data sources and heterogeneous file formats, JSON being among the top ones, aside from CSV files. To speed up time to insight and minimize unnecessary data transformation processes, Azure SQL Data Warehouse now enables support for querying JSON data. This feature is now available in preview.
Business analysts can now use the familiar T-SQL language to query and manipulate documents that are formatted as JSON data. JSON functions, such as JSON_VALUE, JSON_QUERY, JSON_MODIFY, and OPENJSON are now supported in Azure SQL Data Warehouse. Azure SQL Data Warehouse can now effectively support both relational and non-relational data, including joins between the two, while enabling users to use their traditional BI tools, such as Power BI.
Automatic statistics maintenance and update (preview)
Azure SQL Data Warehouse implements a cost-based optimizer to ensure optimal execution plans are being generated and used. For any cost-based optimizer to be effective, column level statistics are needed. When these statistics are stale, there is potential for selecting a non-optimal plan, leading to slower query performance.
Today, we’re extending that support for auto statistics creation by adding the ability to automatically refresh and maintain statistics. As data warehouse tables get loaded and updated, the system can now automatically detect and update out-of-date statistics. With the auto-update statistics capability now available in preview, Azure SQL Data Warehouse delivers full statistics management capabilities while simplifying statistics maintenance processes. You no longer need to manually maintain statistics, which leads to a simplified and more cost-effective data warehouse deployment.
Azure SQL Data Warehouse provides one of the most advanced security and privacy features in the market. This is achieved through using proven SQL Server technology. SQL Server, as the core technology and component of Azure SQL Data Warehouse, has been the least vulnerable databases over the last eight years according to the NIST national vulnerabilities database. To expand existing Azure SQL Data Warehouse’s security and privacy features, we’re announcing Dynamic Data Masking (DDM) support is now available in preview.
Protect sensitive data with dynamic data masking (preview)
Dynamic data masking (DDM) enables administrators and data developers to control access to their company’s data, allowing sensitive data to be safe and restricted. It prevents unauthorized access to private data by obscuring the data on-the-fly. Based on user-defined data masking policies, Azure SQL Data Warehouse can dynamically obfuscate data as the queries execute, and before results are shown to users.
Azure SQL Data Warehouse implements the DDM capability directly inside the engine. When creating tables with DDM, policies are stored in the system’s metadata and then enforced by the engine as queries get executed. This centralized policy enforcement process simplifies data masking rules management as access control is not implemented and repeated at the application layer. As various users access queries tables, policies are automatically honored and applied while protecting sensitive data. DDM comes with flexible policies and you can choose to define a partial mask, which exposes some of the data in the selected columns, or a full mask that obfuscates the data completely. Azure SQL Data Warehouse also provides built-in masking functions that users can choose from.
This is the result of relentless innovation and laser-focused execution on providing new features our customers need, all while reducing prices so customers get industry-leading performance at the best possible value. In just the past year, SQL Data Warehouse has released 130+ features focused on providing customers with enhanced speed, flexibility, and security. And today we are excited to announce three additional enhancements that continue to make SQL Data Warehouse the industry leader:
Unparalleled query performance
Intelligent workload management
Unmatched security and privacy
In this blog, we’ll take a closer look at the technical capabilities of these new features and, most importantly, how you can start using them today.
Unparalleled query performance
In our March 2019 release, a collection of newly available features improved workload performance by up to 22x compared to previous versions of Azure SQL Data Warehouse, which contributed to our leadership position in both the TPC-H and TPC-DS benchmark reports.
This didn’t just happen overnight. With decades of experience building industry-leading database systems, like SQL Server, Azure SQL Data Warehouse is built on top of the world’s largest cloud architectures.
Key innovations that have improved query performance include:
Query Optimizer enhancements
Instant Data Movement
Additional advanced analytic functions
Query Optimizer enhancements
Query Optimizer is one of the most critical components in any database. Making optimal choices on how to best execute a query can and does yield significant improvement. When executing complex analytical queries, the number of operations to be executed in a distributed environment matters. Every opportunity to eliminate redundant computation, such as repeated subqueries, has a direct impact to query performance. For instance, the following query is reduced from 13 down to 5 operations using the latest Query Optimizer enhancements.
Instant Data Movement
For a distributed database system, having the most efficient data movement mechanism is also a critical ingredient in achieving great performance. Instant Data Movement was introduced with the launch of the second generation of Azure SQL Data Warehouse. To improve instant data movement performance, broadcast and partition data movement operations were added. In addition, performance optimizations around how strings are processed during the data movement operations yielded improvements of up to 2x.
Advanced analytic functions
Having a rich set of analytic functions simplifies how you can write SQL across multiple dimensions that not only streamlines the query, but improves its performance. A set of such functions is GROUP BY ROLLUP, GROUPING(), GROUPING_ID(). See the example of a GROUP BY query from the online documentation below:
,SUM(Sales) AS TotalSales
GROUP BY ROLLUP(Country, Region)
ORDER BY Country
Intelligent workload management
The new workload importance feature in Azure SQL Data Warehouse enables prioritization over workloads that need to be executed on the data warehouse system. Workload importance provides administrators the ability to prioritize workloads based on business requirements (e.g., executive dashboard queries, ELT executions).
It all starts with workload classification. SQL Data Warehouse classifies a request based on a set of criteria, which administrators can define. In the absence of a matching classifier, the default classifier is chosen. SQL Data Warehouse supports classification at different levels including at the SQL query level, a database user, database role, Azure Active Directory login, or Azure Active Directory group, and maps the request to a system defined workload group classification.
Each workload classification can be assigned one of five levels of importance: low, below_normal, normal, above_normal, and high. Access to resources during compilation, lock acquisition, and execution are prioritized based on the associated importance of a request.
The diagram below illustrates the workload classification and importance function:
Classifying requests with importance
Classifying requests is done with the new CREATE WORKLOAD CLASSIFIER syntax. Below is an example that maps the login for the ExecutiveReports role to ABOVE_NORMAL importance and the AdhocUsers role to BELOW_NORMAL importance. With this configuration, members of the ExecutiveReports role have their queries complete sooner because they get access to resources before members of the AdhocUsers role.
When using a data warehouse, customers often have questions regarding security and privacy. As illustrated by Donald Farmer, a well-respected thought leader in the analytics space, Azure SQL Data Warehouse has the most advanced security and privacy features in the market. This wasn’t achieved by chance. In fact, SQL Server, the core technology of SQL Data Warehouse, has been the least vulnerable database over the last eight years in the NIST vulnerabilities database.
One of our newest security and privacy features in SQL Data Warehouse is Data Discovery and Classification. This feature enables automated discovery of columns potentially containing sensitive data, recommends metadata tags to associate with the columns, and can persistently attach those tags to your tables.
These tags will appear in the Audit log for queries against sensitive data, in addition to being included alongside the query results for clients which support this feature.
We all want the truth. To properly assess your cloud analytics provider, ask them about the only three things that matter:
Independent benchmark results
Company-wide access to insights
Security and privacy
What are their results on independent, industry-standard benchmarks?
Perhaps you’ve heard from other providers that benchmarks are irrelevant. If that’s what you’re hearing, maybe you should be asking yourself why? Independent, industry-standard benchmarks are important because they help you measure price and performance on both common and complex analytics workloads. They are essential indicators of value because as data volumes grow, it is vital to get the best performance you can at the lowest price possible.
In February, an independent study by GigaOm compared Azure SQL Data Warehouse, Amazon Redshift, and Google BigQuery using the highly recognized TPC-H benchmark. They found that Azure SQL Data Warehouse is up to 14x faster and costs 94 percent less than other cloud providers. And today, we are pleased to announce that in GigaOm’s second benchmark report, this time with the equally important TPC-DS benchmark, Azure SQL Data Warehouse is again the industry leader. Not Amazon Redshift. Not Google BigQuery. These results prove that Azure is the best place for all your analytics.
This is why customers like Columbia Sportswear choose Azure.
“Azure SQL Data Warehouse instantly gave us equal or better performance as our current system, which has been incrementally tuned over the last 6.5 years for our demanding performance requirements.”
Lara Minor, Sr. Enterprise Data Manager, Columbia Sportswear
Can they easily deliver powerful insights across your organization?
Insights from your analytics must be accessible to everyone in your organization. While other providers may say they can deliver this, the end result is often catered to specific workgroups versus being an enterprise-wide solution. Data can become quickly siloed in these situations, making it difficult to deliver insights across all users.
The TPC-DS industry benchmark I mentioned above is particularly useful for organizations that run intense analytics workloads because it uses demanding queries to test actual performance. For instance, one of the queries used in the TPC-DS benchmark report calculates the number of orders, time window for the orders, and filters by state on non-returned orders shipped from a single warehouse. This type of complex query, which spans across billions of rows and multiple tables, is a real-world example of how companies use a data warehouse for business insights. And with Power BI, users can perform intense queries like this by easily integrating with SQL Data Warehouse for fast, industry-leading performance.
How robust is their security?
Everyone is a target. When it comes to data, privacy and security are non-negotiable. No matter how cautious you are, there is always a threat lurking around the corner. Your analytics system contains the most valuable business data and must have both stringent security and privacy capabilities.
Azure has you covered. As illustrated by Donald Farmer, a well-respected thought leader in the analytics space, analytics in Azure has the most advanced security and privacy features in the market. From proactive threat detection to providing custom recommendations that enhance security, Azure SQL Data Warehouse uses machine learning and AI to secure your data. It also enables you to encrypt your data, both in flight and at rest. You can provide users with appropriate levels of access, from a single source, using row and column level security. This not only secures your data, but also helps you meet stringent privacy requirements.
“It was immediately clear to us that with Azure, particularly Azure Key Vault, we would be able to meet our own rigorous requirements for data protection and security.”
Guido Vetter, Head of Corporate Center of Excellence Advanced Analytics & Big Data, Daimler
Azure’s leading security and data privacy features not only make it the most trusted cloud in the market, but also complements its leadership in other areas, such as price-performance, making itsimply unmatched.
Get started today
To learn more about Azure’s industry-leading price-performance and security, get started today!
Gartner Magic Quadrant for Analytics and Business Intelligence Platforms Cindi Howson, James Richardson, Rita Sallam, Austin Kronz, 11 February 2019.
Gartner Magic Quadrant for Data Management Solutions for Analytics, Adam Ronthal, Roxane Edjlali, Rick Greenwald, 21 January 2019.
This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Microsoft.
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Special thanks to Lee Schlesinger and the Talend team for their contribution to this blog post.
Following the significant announcement around the continued price-performance leadership of Azure Data Warehouse in February 2019, Talend announced support of Stitch Data Loader for Azure SQL Data Warehouse. Stitch Data Loader is Talend’s recent addition to its offering portfolio small and mid-market customers. With Stitch Data Loader, customers can load 5 million rows/month into Azure SQL Data Warehouse for free or scale up to an unlimited number of rows with a subscription.
All across the industry, there is a rapid shift to the cloud. Utilizing fast, flexible, and secure cloud data warehouse is an important first step in that journey. With Microsoft Azure SQL Data Warehouse and Stitch Data Loader companies can get started faster than ever. The fact that ADW can be up to 14x faster, and 94 percent less expensive than similar options in the marketplace, should only help further accelerate adoption of cloud scale analytics by customers of all sizes.
Building pipelines to the cloud with Stitch Data Loader
The Stitch team built the Azure SQL Data Warehouse integration with the help of Microsoft engineers. The solution leverages Azure Blob Storage and PolyBase to get data into the Azure cloud and ultimately loaded to SQL Data Warehouse. We take care of all issues with data type transformation between source and destination, schema changes, and bulk loading.
To start moving data, just specify your host address and database name and provide authentication credentials. Stitch will then start loading data from all of your sources in minutes.
Stitch Data Loader enables Azure SQL Data Warehouse users to analyze data from more than 90 data sources, including databases, SaaS tools, and ad networks. We also sponsor and integrate with the Singer open source ETL project, which makes it easy to get additional or custom data sources into Azure SQL Data Warehouse.
Stitch’s destination switching feature also makes it easy for existing Stitch users to take their existing integrations and start loading them into Azure SQL Data Warehouse right away.
Going further with Talend Cloud and Azure SQL Data Warehouse
What if you’re ready to scale out your data warehousing efforts and layer on data transformation, profiling, and quality? Talend Cloud offers many more sources as well as more advanced data processing and data quality features that are available within the ADW and the Azure Platform. With over 900 connectors available, you’ll be able to move all your data, no matter the format or source. With data preparation and additional security features built-in, you can get Azure-ready in no time.
Take Uniper for instance. Using Azure and Talend Cloud, they built a cloud-based data analytics platform to integrate over 100 data sources including temperature and IoT sensors, from various external and internal sources. They constructed the full flow of business transactions — spanning market analytics, trading, asset management, and post-trading — while enabling data governance and self-service, resulting in reduced integration costs by 80 percent and achieving ROI in 6 months.
Special thanks to Rik Tamm-Daniels and the Informatica team for their contribution to this blog post.
With the latest release of Azure SQL Data Warehouse, Microsoft doubles-down on Azure SQL DW as one of the core data services for digital transformation on Azure. In addition to the fundamental benefits of agility, on-demand scaling and unlimited compute availability, the most recent price-to-performance metrics from the GigaOM report are one of several the compelling arguments they have made for customers to adopt Azure SQL DW. Interestingly, Microsoft is also announcing the general availability of Azure Data Lake Gen 2 and Azure Data Explorer. Along with Power BI for rich visualization, these enhanced set of capabilities cement Microsoft’s leadership position around Cloud Scale Analytics.
Every day, I speak with joint Informatica and Microsoft customers who are looking to transform their approach to their data estate with a cohesive data lake and cloud data warehousing solution architecture. These customers range from global logistics companies, to auto manufacturers to the world’s largest insurers, and all of them see the tremendous potential of the Microsoft modern data estate approach; in fact, just via Informatica’s iPaaS (integration platform-as-a-service) offering, Informatica Intelligent Cloud Services, we’ve seen a significant quarter-to-quarter growth in customer data volumes being moved to Azure SQL DW.
Of course, as compelling as the Azure SQL DW technology is, for many customers, modernizing a legacy enterprise data warehouse is a daunting proposition to even consider. The thought of touching the intricate web of dependencies around the warehouse can keep even the most battle-tested CIO up at night. A key consideration when attempting your own cloud data warehousing/cloud data modernization initiative is to ensure you have intelligence about the existing schemas, lineage and dependencies to enable companies to incrementally unravel the data web surrounding the warehouse, and with laser-like precision, begin to move workloads and use case to Azure SQL DW.
Enter Informatica’s Enterprise Data Catalog with full end-to-end source-to-destination lineage and searchable machine-learning and AI-driven intelligent metadata about what data lives where in the warehouse to clear the fog of complexity and illuminate a clear path to cloud data warehousing. In fact, the concept of discovery and catalog driven-modernization is such a compelling leap forward that Microsoft and Informatica developed a single-sign-on Data Accelerator on Informatica’s Intelligent Cloud Services on Azure that can be accessed directly from the Azure SQL DW management console with your Azure credentials.
Data Accelerator for Azure
Want to see how Informatica and Microsoft can jumpstart your cloud data warehousing modernization initiative? Join us on Informatica’s world tour of hands-on workshop at a Microsoft Technology Center near you. Workshops are taking place in North America right now and will be coming to EMEA and APJ very soon!