Introducing Discovery Ad Performance Analysis

Posted by Manisha Arora, Nithya Mahadevan, and Aritra Biswas, gPS Data Science team

Overview of Discovery Ads and need for Ad Performance Analysis

Discovery ads, launched in May 2019, allow advertisers to easily extend their reach of social ads users across YouTube, Google Feed and Gmail worldwide. They provide brands a new opportunity to reach 3 billion people as they explore their interests and search for inspiration across their favorite Google feeds (YouTube, Gmail, and Discover) — all with a single campaign. Learn more about Discovery ads here.

Due to these uniquenesses, customers need a data driven method to identify textual & imagery elements in Discovery Ad copies that drive Interaction Rate of their Discovery Ad campaigns, where interaction is defined as the main user action associated with an ad format—clicks and swipes for text and Shopping ads, views for video ads, calls for call extensions, and so on.

Interaction Rate = interaction / impressions

“Customers need a data driven method to identify textual & imagery elements in Discovery Ad copies that drive Interaction Rate of their campaigns.”

– Manisha Arora, Data Scientist

Our analysis approach:

The Data Science team at Google is investing in a machine learning approach to uncover insights from complex unstructured data and provide machine learning based recommendations to our customers. Machine Learning helps us study what works in ads at scale and these insights can greatly benefit the advertisers.

We follow a six-step based approach for Discovery Ad Performance Analysis:

  • Understand Business Goals
  • Build Creative Hypothesis
  • Data Extraction
  • Feature Engineering
  • Machine Learning Modeling
  • Analysis & Insight Generation

To begin with, we work closely with the advertisers to understand their business goals, current ad strategy, and future goals. We closely map this to industry insights to draw a larger picture and provide a customized analysis for each advertiser. As a next step, we build hypotheses that best describe the problem we are trying to solve. An example of a hypothesis can be -”Do superlatives (words like “top”, “best”) in the ad copy drive performance?”

“Machine Learning helps us study what works in ads at scale and these insights can greatly benefit the advertisers.”

Manisha Arora, Data Scientist

Once we have a hypothesis we are working towards, the next step is to deep-dive into the technical analysis.

Data Extraction & Pre-processing

Our initial dataset includes raw ad text, imagery, performance KPIs & target audience details from historic ad campaigns in the industry. Each Discovery ad contains two text assets (Headline and Description) and one image asset. We then apply ML to extract text and image features from these assets.

Text Feature Extraction

We apply NLP to extract the text features from the ad text. We pass the raw text in the ad headline & description through Google Cloud’s Language API which parses the raw text into our feature set: commonly used keywords, sentiments etc.

Example: 

Image Feature Extraction

We apply Image Processing to extract image features from the ad copy imagery. We pass the raw images through Google Cloud’s Vision API & extract image components including objects, person, background, lighting etc.

Following are the holistic set of features that are extracted from the ad content:

Feature Design


Text Feature Design

There are two types of text features being included in DisCat:

1. Generic text feature

a. These are features returned by Google Cloud’s Language API including sentiment, word / character count, tone (imperative vs indicative), symbols, most frequent words and so on.

2. Industry-specific value propositions

a. These are features that only apply to a specific industry (e.g. finance) that are manually curated by the data science developer in collaboration with specialists and other industry experts.

  • For example, for the finance industry, one value proposition can be “Price Offer”. A list of keywords / phrases that are related to price offers (e.g. “discount”, “low rate”, “X% off”) will be curated based on domain knowledge to identify this value proposition in the ad copies. NLP techniques (e.g. wordnet synset) and manual examination will be used to make sure this list is inclusive and accurate.

Image Feature Design

Like the text features, image features can largely be grouped into two categories:

1. Generic image features

a. These features apply to all images and include the color profile, whether any logos were detected, how many human faces are included, etc.

b. The face-related features also include some advanced aspects: we look for prominent smiling faces looking directly at the camera, we differentiate between individuals vs. small groups vs. crowds, etc.

2. Object-based features

a. These features are based on the list of objects and labels detected in all the images in the dataset, which can often be a massive list including generic objects like “Person” and specific ones like particular dog breeds.

b. The biggest challenge here is dimensionality: we have to cluster together related objects into logical themes like natural vs. urban imagery.

c. We currently have a hybrid approach to this problem: we use unsupervised clustering approaches to create an initial clustering, but we manually revise it as we inspect sample images. The process is:

  • Extract object and label names (e.g. Person, Chair, Beach, Table) from the Vision API output and filter out the most uncommon objects
  • Convert these names to 50-dimensional semantic vectors using a Word2Vec model trained on the Google News corpus
  • Using PCA, extract the top 5 principal components from the semantic vectors. This step takes advantage of the fact that each Word2Vec neuron encodes a set of commonly adjacent words, and different sets represent different axes of similarity and should be weighted differently
  • Use an unsupervised clustering algorithm, namely either k-means or DBSCAN, to find semantically similar clusters of words
  • We are also exploring augmenting this approach with a combined distance metric:

d(w1, w2) = a * (semantic distance) + b * (co-appearance distance)

where the latter is a Jaccard distance metric

Each of these components represents a choice the advertiser made when creating the messaging for an ad. Now that we have a variety of ads broken down into components, we can ask: which components are associated with ads that perform well or not so well?

We use a fixed effects1 model to control for unobserved differences in the context in which different ads were served. This is because the features we are measuring are observed multiple times in different contexts i.e. ad copy, audience groups, time of year & device in which ad is served.

The trained model will seek to estimate the impact of individual keywords, phrases & image components in the discovery ad copies. The model form estimates Interaction Rate (denoted as ‘IR’ in the following formulas) as a function of individual ad copy features + controls:

We use ElasticNet to spread the effect of features in presence of multicollinearity & improve the explanatory power of the model:

“Machine Learning model estimates the impact of individual keywords, phrases, and image components in discovery ad copies.”

– Manisha Arora, Data Scientist

 

Outputs & Insights

Outputs from the machine learning model help us determine the significant features. Coefficient of each feature represents the percentage point effect on CTR.

In other words, if the mean CTR without feature is X% and the feature ‘xx’ has a coeff of Y, then the mean CTR with feature ‘xx’ included will be (X + Y)%. This can help us determine the expected CTR if the most important features are included as part of the ad copies.

Key-takeaways (sample insights):

We analyze keywords & imagery tied to the unique value propositions of the product being advertised. There are 6 key value propositions we study in the model. Following are the sample insights we have received from the analyses:

Shortcomings:

Although insights from DisCat are quite accurate and highly actionable, the moel does have a few limitations:

1. The current model does not consider groups of keywords that might be driving ad performance instead of individual keywords (Example – “Buy Now” phrase instead of “Buy” and “Now” individual keywords).

2. Inference and predictions are based on historical data and aren’t necessarily an indication of future success.

3. Insights are based on industry insights and may need to be tailored for a given advertiser.

DisCat breaks down exactly which features are working well for the ad and which ones have scope for improvement. These insights can help us identify high-impact keywords in the ads which can then be used to improve ad quality, thus improving business outcomes. As next steps, we recommend testing out the new ad copies with experiments to provide a more robust analysis. Google Ads A/B testing feature also allows you to create and run experiments to test these insights in your own campaigns.

Summary

Discovery Ads are a great way for advertisers to extend their social outreach to millions of people across the globe. DisCat helps break down discovery ads by analyzing text and images separately and using advanced ML/AI techniques to identify key aspects of the ad that drives greater performance. These insights help advertisers identify room for growth, identify high-impact keywords, and design better creatives that drive business outcomes.

Acknowledgement

Thank you to Shoresh Shafei and Jade Zhang for their contributions. Special mention to Nikhil Madan for facilitating the publishing of this blog.

Notes

  1. Greene, W.H., 2011. Econometric Analysis, 7th ed., Prentice Hall;

    Cameron, A. Colin; Trivedi, Pravin K. (2005). Microeconometrics: Methods and Applications

Prediction Framework, a time saver for Data Science prediction projects

Posted by Álvaro Lamas, Héctor Parra, Jaime Martínez, Julia Hernández, Miguel Fernandes, Pablo Gil

Acquiring high value customers using predicted Lifetime Value, taking specific actions on high propensity of churn users, generating and activating audiences based on machine learning processed signals…All of those marketing scenarios require of analyzing first party data, performing predictions on the data and activating the results into the different marketing platforms like Google Ads as frequently as possible to keep the data fresh.

Feeding marketing platforms like Google Ads on a regular and frequent basis, requires a robust, report oriented and cost reduced ETL & prediction pipeline. These pipelines are very similar regardless of the use case and it’s very easy to fall into reinventing the wheel every time or manually copy & paste structural code increasing the risk of introducing errors.

Wouldn’t it be great to have a common reusable structure and just add the specific code for each of the stages?

Here is where Prediction Framework plays a key role in helping you implement and accelerate your first-party data prediction projects by providing the backbone elements of the predictive process.

Prediction Framework is a fully customizable pipeline that allows you to simplify the implementation of prediction projects. You only need to have the input data source, the logic to extract and process the data and a Vertex AutoML model ready to use along with the right feature list, and the framework will be in charge of creating and deploying the required artifacts. With a simple configuration, all the common artifacts of the different stages of this type of projects will be created and deployed for you: data extraction, data preparation (aka feature engineering), filtering, prediction and post-processing, in addition to some other operational functionality including backfilling, throttling (for API limits), synchronization, storage and reporting.

The Prediction Framework was built to be hosted in the Google Cloud Platform and it makes use of Cloud Functions to do all the data processing (extraction, preparation, filtering and post-prediction processing), Firestore, Pub/Sub and Schedulers for the throttling system and to coordinate the different phases of the predictive process, Vertex AutoML to host your machine learning model and BigQuery as the final storage of your predictions.

Prediction Framework Architecture

To get involved and start using the Prediction Framework, a configuration file needs to be prepared with some environment variables about the Google Cloud Project to be used, the data sources, the ML model to make the predictions and the scheduler for the throttling system. In addition, custom queries for the data extraction, preparation, filtering and post-processing need to be added in the deploy files customization. Then, the deployment is done automatically using a deployment script provided by the tool.

Once deployed, all the stages will be executed one after the other, storing the intermediate and final data in the BigQuery tables:

  • Extract: this step will, on a timely basis, query the transactions from the data source, corresponding to the run date (scheduler or backfill run date) and will store them in a new table into the local project BigQuery.
  • Prepare: immediately after the extract of the transactions for one specific date is available, the data will be picked up from the local BigQuery and processed according to the specs of the model. Once the data is processed, it will be stored in a new table into the local project BigQuery.
  • Filter: this step will query the data stored by the prepare process and will filter the required data and store it into the local project BigQuery. (i.e only taking into consideration new customers transactionsWhat a new customer is up to the instantiation of the framework for the specific use case. Will be covered later).
  • Predict: once the new customers are stored, this step will read them from BigQuery and call the prediction using Vertex API. A formula based on the result of the prediction could be applied to tune the value or to apply thresholds. Once the data is ready, it will be stored into the BigQuery within the target project.
  • Post_process: A formula could be applied to the AutoML batch results to tune the value or to apply thresholds. Once the data is ready, it will be stored into the BigQuery within the target project.

One of the powerful features of the prediction framework is that it allows backfilling directly from the BigQuery user interface, so in case you’d need to reprocess a whole period of time, it could be done in literally 4 clicks.

In summary: Prediction Framework simplifies the implementation of first-party data prediction projects, saving time and minimizing errors of manual deployments of recurrent architectures.

For additional information and to start experimenting, you can visit the Prediction Framework repository on Github.

13 Most Common Google Cloud Reference Architectures

Posted by Priyanka Vergadia, Developer Advocate

Google Cloud is a cloud computing platform that can be used to build and deploy applications. It allows you to take advantage of the flexibility of development while scaling the infrastructure as needed.

I’m often asked by developers to provide a list of Google Cloud architectures that help to get started on the cloud journey. Last month, I decided to start a mini-series on Twitter called “#13DaysOfGCP” where I shared the most common use cases on Google Cloud. I have compiled the list of all 13 architectures in this post. Some of the topics covered are hybrid cloud, mobile app backends, microservices, serverless, CICD and more. If you were not able to catch it, or if you missed a few days, here we bring to you the summary!

Series kickoff #13DaysOfGCP

#1: How to set up hybrid architecture in Google Cloud and on-premises

Day 1

#2: How to mask sensitive data in chatbots using Data loss prevention (DLP) API?

Day 2

#3: How to build mobile app backends on Google Cloud?

Day 3

#4: How to migrate Oracle Database to Spanner?

Day 4

#5: How to set up hybrid architecture for cloud bursting?

Day 5

#6: How to build a data lake in Google Cloud?

Day 6

#7: How to host websites on Google Cloud?

Day 7

#8: How to set up Continuous Integration and Continuous Delivery (CICD) pipeline on Google Cloud?

Day 8

#9: How to build serverless microservices in Google Cloud?

Day 9

#10: Machine Learning on Google Cloud

Day 10

#11: Serverless image, video or text processing in Google Cloud

Day 11

#12: Internet of Things (IoT) on Google Cloud

Day 12

#13: How to set up BeyondCorp zero trust security model?

Day 13

Wrap up with a puzzle

Wrap up!

We hope you enjoy this list of the most common reference architectures. Please let us know your thoughts in the comments below!

Azure Data Explorer and Stream Analytics for anomaly detection

Anomaly detection plays a vital role in many industries across the globe, such as fraud detection for the financial industry, health monitoring in hospitals, fault detection and operating environment monitoring in the manufacturing, oil and gas, utility, transportation, aviation, and automotive industries.

Anomaly detection is about finding patterns in data that do not conform to expected behavior. It is important for decision-makers to be able to detect them and take proactive actions if needed. Using the oil and gas industry as one example, deep-water rigs with various equipment are intensively monitored by hundreds of sensors that send measurements in various frequencies and formats. Analysis or visualization is hard using traditional software platforms, and any non-productive time on deep-water oil rig platforms caused by the failure to detect anomaly could mean large financial losses each day.

Companies need new technologies like Azure IoT, Azure Stream Analytics, Azure Data Explorer and machine learning to ingest, processes, and transform data into strategic business intelligence to enhance exploration and production, improve manufacturing efficiency, and ensure safety and environmental protection. These managed services also help customers dramatically reduce software development time, accelerate time to market, provide cost-effectiveness, and achieve high availability and scalability.

While the Azure platform provides lots of options for anomaly detection and customers can choose the technology that best suits their needs, customers also brought questions to field facing architects on what use cases are most suitable for each solution. We’ll examine the answers to these questions below, but first, you’ll need to know a couple definitions:

What is a time series? A time series is a series of data points indexed in time order. In the oil and gas industry, most equipment or sensor readings are sequences taken at successive points in time or depth.

What is decomposition of additive time series? Decomposition is the task to separate a time series into components as shown on the graph below.

Decomposition is the task to separate a time series into components

Time-series forecasting and anomaly detection

A graph showing a time series with forecasting.

Anomaly detection is the process to identify observations that are different significantly from majority of the datasets.

A graph showing an anomaly detection example.

This is an anomaly detection example with Azure Data Explorer.

  • The red line is the original time series.
  • The blue line is the baseline (seasonal + trend) component.
  • The purple points are anomalous points on top of the original time series.

To detect anomalies, either Azure Stream Analytics or Azure Data Explorer can be used for real-time analytics and detection as illustrated in the diagram below.

A diagram showing an Azure powered pattern for real-time analytics.

Azure Stream Analytics is an easy-to-use, real-time analytics service that is designed for mission-critical workloads. You can build an end-to-end serverless streaming pipeline with just a few clicks, go from zero to production in minutes using SQL, or extend it with custom code and built-in machine learning capabilities for more advanced scenarios.

Azure Data Explorer is a fast, fully managed data analytics service for near real-time analysis on large volumes of data streaming from applications, websites, IoT devices, and more. You can ask questions and iteratively explore data on the fly to improve products, enhance customer experiences, monitor devices, boost operations, and quickly identify patterns, anomalies, and trends in your data.

Azure Stream Analytics or Azure Data Explorer?

Use Case

Stream Analytics is for continuous or streaming real-time analytics, with aggregate functions support hopping, sliding, tumbling, or session windows. It will not suit your use case if you want to write UDFs or UDAs in languages other than JavaScript or C#, or if  your solution is in a multi-cloud or on-premises environment.

Data Explorer is for on-demand or interactive near real-time analytics, data exploration on large volumes of data streams, seasonality decomposition, ad hoc work, dashboards, and root cause analyses on data from near real-time to historical. It will not suit you use case if you need to deploy analytics onto the edge.

Forecasting

You can set up a Stream Analytics job that integrates with Azure Machine Learning Studio.

Data Explorer provides native function for forecasting time series based on the same decomposition model. Forecasting is useful for many scenarios like preventive maintenance, resource planning, and more.

Seasonality

Stream Analytics does not provide seasonality support, with the limitation of sliding windows size.

Data Explorer provides functionalities to automatically detect the periods in the time series or allows you to verify that a metric should have specific distinct period(s) if you know them.

Decomposition

Stream Analytics does not support decomposition.

Data Explorer provides function which takes a set of time series and automatically decomposes each time series to its seasonal, trend, residual, and baseline components.

Filtering and Analysis

Stream Analytics provides functions to detect spikes and dips or change points.

Data Explorer provides analysis to finds anomalous points on a set of time series, and a root cause analysis (RCA) function after anomaly is detected.

Filtering

Stream Analytics provides a filter with reference data, slow-moving, or static.

Data Explorer provides two generic functions:
•    Finite impulse response (FIR) which can be used for moving average, differentiation, shape matching
•    Infinite impulse response (IIR) for exponential smoothing and cumulative sum

Anomaly Detection

Stream Analytics provides detections for:
•    Spikes and dips (temporary anomalies)
•    Change points (persistent anomalies such as level or trend change)

Data Explorer provides detections for:
•    Spikes & dips, based on enhanced seasonal decomposition model (supporting automatic seasonality detection, robustness to anomalies in the training data)
•    Changepoint (level shift, trend change) by segmented linear regression
•    KQL Inline Python/R plugins enable extensibility with other models implemented in Python or R

What’s next?

Azure Data Analytics, in general, brings you the best of breed technologies for each workload. The new Real-Time Analytics architecture (shown above) allows leveraging the best technology for each type of workload for stream and time-series analytics including anomaly detection. The following is a list of resources that may help you get started quickly:

Multi-language identification and transcription in Video Indexer

Multi-language speech transcription was recently introduced into Microsoft Video Indexer at the International Broadcasters Conference (IBC). It is available as a preview capability and customers can already start experiencing it in our portal. More details on all our IBC2019 enhancements can be found here.

Multi-language videos are common media assets in the globalization context, global political summits, economic forums, and sport press conferences are examples of venues where speakers use their native language to convey their own statements. Those videos pose a unique challenge for companies that need to provide automatic transcription for video archives of large volumes. Automatic transcription technologies expect users to explicitly determine the video language in advance to convert speech to text. This manual step becomes a scalability obstacle when transcribing multi-language content as one would have to manually tag audio segments with the appropriate language.

Microsoft Video Indexer provides a unique capability of automatic spoken language identification for multi-language content. This solution allows users to easily transcribe multi-language content without going through tedious manual preparation steps before triggering it. By that, it can save anyone with large archive of videos both time and money, and enable discoverability and accessibility scenarios.

Multi-language audio transcription in Video Indexer

The multi-language transcription capability is available as part of the Video Indexer portal. Currently, it supports four languages including English, French, German and Spanish, while expecting up to three different languages in an input media asset. While uploading a new media asset you can select the “Auto-detect multi-language” option as shown below.

1.	A new multi-language option available in the upload page of Video Indexer portal

Our application programming interface (API) supports this capability as well by enabling users to specify ‘multi’ as the language in the upload API. Once the indexing process is completed, the index JavaScript object notation (JSON) will include the underlying languages. Refer to our documentation for more details.

Additionally, each instance in the transcription section will include the language in which it was transcribed.

2.	A transcription snippet from Video Indexer timeline presenting different language segments

Customers can view the transcript and identified languages by time, jump to the specific places in the video for each language, and even see the multi-language transcription as video captions. The result transcription is also available as closed caption files (VTT, TTML, SRT, TXT, and CSV).

two languages

Methodology

Language identification from an audio signal is a complex task. Acoustic environment, speaker gender, and speaker age are among a variety of factors that affect this process. We represent audio signal using a visual representation, such as spectrograms, assuming that, different languages induce unique visual patterns which can be learned using deep neural networks.

Our solution has two main stages to determine the languages used in multi-language media content. First, it employs a deep neural network to classify audio segments with very high granularity, in other words, very few seconds. While a good model will successfully identify the underlying language, it can still miss-identify some segments due to similarities between languages. Therefore, we apply a second stage for examining these misses and smooth the results accordingly.

3.	A new insight pane showing the detected spoken languages and their exact occurrences on the timeline

Next steps

We introduced a differentiated capability for multi-language speech transcription. With this unique capability in Video Indexer, you can become more effective about the content of your videos as it allows you to immediately start searching across videos for different language segments. During the coming few months, we will be improving this capability by adding support for more languages and improving the model’s accuracy.

For more information, visit Video Indexer’s portal or the Video Indexer developer portal, and try this new capability. Read more about the new multi-language option and how to use it in our documentation.

Please use our UserVoice to share feedback and help us prioritize features or email [email protected] with any questions.

Built-in Jupyter notebooks in Azure Cosmos DB are now available

Earlier this year, we announced a preview of built-in Jupyter notebooks for Azure Cosmos DB. These notebooks, running inside Azure Cosmos DB, are now available.

Overview of built-in Jupyter notebooks in Azure Cosmos DB.

Cosmic notebooks are available for all data models and APIs including Cassandra, MongoDB, SQL (Core), Gremlin, and Spark to enhance the developer experience in Azure Cosmos DB. These notebooks are directly integrated into the Azure Portal and your Cosmos accounts, making them convenient and easy to use. Developers, data scientists, engineers and analysts can use the familiar Jupyter notebooks experience to:

  • Interactively run queries
  • Explore and analyze data
  • Visualize data
  • Build, train, and run machine learning and AI models

In this blog post, we’ll explore how notebooks make it easy for you to work with and visualize your Azure Cosmos DB data.

Easily query your data

With notebooks, we’ve included built-in commands to make it easy to query your data for ad-hoc or exploratory analysis. From the Portal, you can use the %%sql magic command to run a SQL query against any container in your account, no configuration needed. The results are returned immediately in the notebook.

SQL query using built-in Azure Cosmos DB notebook magic command.

Improved developer productivity

We’ve also bundled in version 4 of our Azure Cosmos DB Python SDK for SQL API, which has our latest performance and usability improvements. The SDK can be used directly from notebooks without having to install any packages. You can perform any SDK operation including creating new databases, containers, importing data, and more.

Create new database and container with built-in Python SDK in notebook.

Visualize your data

Azure Cosmos DB notebooks comes with a built-in set of packages, including Pandas, a popular Python data analysis library, Matplotlib, a Python plotting library, and more. You can customize your environment by installing any package you need.

Install custom package using pip install.

For example, to build interactive visualizations, we can install bokeh and use it to build an interactive chart of our data.

Histogram of data stored in Azure Cosmos DB, showing users who viewed, added, and purchased an item.

Users with geospatial data in Azure Cosmos DB can also use the built-in GeoPandas library, along with their visualization library of choice to more easily visualize their data.

Choropleth world map of data stored in Azure Cosmos DB, showing revenue by country.

Getting started

  1. Follow our documentation to create a new Cosmos account with notebooks enabled or enable notebooks on an existing account. Create account with notebooks or enable notebooks on existing account in Azure portal.
  2. Start with one of the notebooks included in the sample gallery in Azure Cosmos Explorer or Data Explorer.Azure Cosmos DB notebooks sample gallery.
  3. Share your favorite notebooks with the community by sending them to the Azure Cosmos DB notebooks GitHub repo.
  4. Tag your notebooks with #CosmosDB, #CosmicNotebooks, #PoweredByCosmos on social media. We will feature the best and most popular Cosmic notebooks globally!

Stay up-to-date on the latest Azure #CosmosDB news and features by following us on Twitter or LinkedIn. We’d love to hear your feedback and see your best notebooks built with Azure Cosmos DB!

Announcing the general availability of Python support in Azure Functions

Python support for Azure Functions is now generally available and ready to host your production workloads across data science and machine learning, automated resource management, and more. You can now develop Python 3.6 apps to run on the cross-platform, open-source Functions 2.0 runtime. These can be published as code or Docker containers to a Linux-based serverless hosting platform in Azure. This stack powers the solution innovations of our early adopters, with customers such as General Electric Aviation and TCF Bank already using Azure Functions written in Python for their serverless production workloads. Our thanks to them for their continued partnership!

In the words of David Havera, blockchain Chief Technology Officer of the GE Aviation Digital Group, “GE Aviation Digital Group’s hope is to have a common language that can be used for backend Data Engineering to front end Analytics and Machine Learning. Microsoft have been instrumental in supporting this vision by bringing Python support in Azure Functions from preview to life, enabling a real world data science and Blockchain implementation in our TRUEngine project.”

Throughout the Python preview for Azure Functions we gathered feedback from the community to build easier authoring experiences, introduce an idiomatic programming model, and create a more performant and robust hosting platform on Linux. This post is a one-stop summary for everything you need to know about Python support in Azure Functions and includes resources to help you get started using the tools of your choice.

Bring your Python workloads to Azure Functions

Many Python workloads align very nicely with the serverless model, allowing you to focus on your unique business logic while letting Azure take care of how your code is run. We’ve been delighted by the interest from the Python community and by the productive solutions built using Python on Functions.

Workloads and design patterns

While this is by no means an exhaustive list, here are some examples of workloads and design patterns that translate well to Azure Functions written in Python.

Simplified data science pipelines

Python is a great language for data science and machine learning (ML). You can leverage the Python support in Azure Functions to provide serverless hosting for your intelligent applications. Consider a few ideas:

  • Use Azure Functions to deploy a trained ML model along with a scoring script to create an inferencing application.

Azure Functions inferencing app

  • Leverage triggers and data bindings to ingest, move prepare, transform, and process data using Functions.
  • Use Functions to introduce event-driven triggers to re-training and model update pipelines when new datasets become available.

Automated resource management

As an increasing number of assets and workloads move to the cloud, there’s a clear need to provide more powerful ways to manage, govern, and automate the corresponding cloud resources. Such automation scenarios require custom logic that can be easily expressed using Python. Here are some common scenarios:

  • Process Azure Monitor alerts generated by Azure services.
  • React to Azure events captured by Azure Event Grid and apply operational requirements on resources.

Event-driven automated resource management

  • Leverage Azure Logic Apps to connect to external systems like IT service management, DevOps, or monitoring systems while processing the payload with a Python function.
  • Perform scheduled operational tasks on virtual machines, SQL Server, web apps, and other Azure resources.

Powerful programming model

To power accelerated Python development, Azure Functions provides a productive programming model based on event triggers and data bindings. The programming model is supported by a world class end-to-end developer experience that spans from building and debugging locally to deploying and monitoring in the cloud.

The programming model is designed to provide a seamless experience for Python developers so you can quickly start writing functions using code constructs that you’re already familiar with, or import existing .py scripts and modules to build the function. For example, you can implement your functions as asynchronous coroutines using the async def qualifier or send monitoring traces to the host using the standard logging module. Additional dependencies to pip install can be configured using the requirements.txt file.

Azure Functions programming model

With the event-driven programming model in Functions, based on triggers and bindings, you can easily configure the events that will trigger the function execution and any data sources the function needs to orchestrate with. This model helps increase productivity when developing apps that interact with multiple data sources by reducing the amount of boilerplate code, SDKs, and dependencies that you need to manage and support. Once configured, you can quickly retrieve data from the bindings or write back using the method attributes of your entry-point function. The Python SDK for Azure Functions provides a rich API layer for binding to HTTP requests, timer events, and other Azure services, such as Azure Storage, Azure Cosmos DB, Service Bus, Event Hubs, or Event Grid, so you can use productivity enhancements like autocomplete and Intellisense when writing your code. By leveraging the Azure Functions extensibility model, you can also bring your own bindings to use with your function, so you can also connect to other streams of data like Kafka or SignalR.

Azure Functions queue trigger example

Easier development

As a Python developer, you can use your preferred tools to develop your functions. The Azure Functions Core Tools will enable you to get started using trigger-based templates, run locally to test against real-time events coming from the actual cloud sources, and publish directly to Azure, while automatically invoking a server-side dependency build on deployment. The Core Tools can be used in conjunction with the IDE or text editor of your choice for an enhanced authoring experience.

You can also choose to take advantage of the Azure Functions extension for Visual Studio Code for a tightly integrated editing experience to help you create a new app, add functions, and deploy, all within a matter of minutes. The one-click debugging experience enables you to test your functions locally, set breakpoints in your code, and evaluate the call stack, simply with the press of F5. Combine this with the Python extension for Visual Studio Code, and you have an enhanced Python development experience with auto-complete, Intellisense, linting, and debugging.

Azure Functions Visual Studio Code development

For a complete continuous delivery experience, you can now leverage the integration with Azure Pipelines, one of the services in Azure DevOps, via an Azure Functions-optimized task to build the dependencies for your app and publish them to the cloud. The pipeline can be configured using an Azure DevOps template or through the Azure CLI.

Advance observability and monitoring through Azure Application Insights is also available for functions written in Python, so you can monitor your apps using the live metrics stream, collect data, query execution logs, and view the distributed traces across a variety of services in Azure.

Host your Python apps with Azure Functions

Host your Python apps with the Azure Functions Consumption plan or the Azure Functions Premium plan on Linux.

The Consumption plan is now generally available for Linux-based hosting and ready for production workloads. This serverless plan provides event-driven dynamic scale and you are charged for compute resources only when your functions are running. Our Linux plan also now has support for managed identities, allowing your app to seamlessly work with Azure resources such as Azure Key Vault, without requiring additional secrets.

Azure Functions Linux Consumption managed identities

The Consumption plan for Linux hosting also includes a preview of integrated remote builds to simplify dependency management. This new capability is available as an option when publishing via the Azure Functions Core Tools and enables you to build in the cloud on the same environment used to host your apps as opposed to configuring your local build environment in alignment with Azure Functions hosting.

Python remote build with Azure Functions

Workloads that require advanced features such as more powerful hardware, the ability to keep instances warm indefinitely, and virtual network connectivity can benefit from the Premium plan with Linux-based hosting now available in preview.

Azure Functions Premium plan virtual network integration

With the Premium plan for Linux hosting you can choose between bringing only your app code or bringing a custom Docker image to encapsulate all your dependencies, including the Azure Functions runtime as described in the documentation “Create a function on Linux using a custom image.” Both options benefit from avoiding cold start and from scaling dynamically based on events.

Azure Functions Premium plan hosting for code or containers

Next steps

Here are a few resources you can leverage to start building your Python apps in Azure Functions today:

On the Azure Functions team, we are committed to providing a seamless and productive serverless experience for developing and hosting Python applications. With so much being released now and coming soon, we’d love to hear your feedback and learn more about your scenarios. You can reach the team on Twitter and on GitHub. We actively monitor StackOverflow and UserVoice as well, so feel free to ask questions or leave your suggestions. We look forward to hearing from you!

Announcing preview of Azure Data Share

In a world where data volume, variety, and type are exponentially growing, organizations need to collaborate with data of any size and shape. In many cases data is at its most powerful when it can be shared and combined with data that resides outside organizational boundaries with business partners and third parties. For customers, sharing this data in a simple and governed way is challenging. Common data sharing approaches using file transfer protocol (FTP) or web APIs tend to be bespoke development and require infrastructure to manage. These tools do not provide the security or governance required to meet enterprise standards, and they often are not suitable for sharing large datasets. To enable enterprise collaboration, we are excited to unveil Azure Data Share Preview, a new data service for sharing data across organizations.

Simple and safe data sharing

Data professionals in the enterprise can now use Azure Data Share to easily and safely share big data with external organizations in Azure Blob Storage and Azure Data Lake Storage. New services will continue to come online. As a fully managed Azure service, Azure Data Share does not require infrastructure to set up and it scales to meet big data sharing demands. The intuitive interface makes sharing easy and productive, directly from the Azure portal. With just a few clicks data professionals choose which data to share and who to share it with. They can schedule the service to automatically share new or changed data pertaining to specific datasets, as well as stop future updates from flowing through at any time. With Azure Data Share, data professionals have greater control over each data sharing relationship and can govern use by associating term of use with each data share created. To receive the data, recipients must agree to the terms of use specified.

Alongside governance, security is fundamental in Azure Data Share and leverages core Azure security measures to help protect the data.

Azure Data Share, view of sent shares in the Azure portal

Enabling data collaboration

Azure Data Share maximizes access to simple and safe data sharing for organizations in many industries. For example, retailers can leverage Azure Data Share to easily share sales inventory and demographic data for demand forecasting and price optimization with their suppliers.

In the finance industry, Microsoft collaborated with Finastra, a multi-billion dollar company and provider of the broadest portfolio of financial services software in the world today that spans retail banking, transaction banking, lending, and treasury and capital markets. Finastra is fully integrating Azure Data Share with their open platform, FusionFabric.cloud, to enable seamless distribution of premium datasets to a wider ecosystem of application developers across the FinTech value chain. These datasets have been curated by Finastra over several years, and by leveraging the data distribution capabilities of Azure Data Share, ingestion by app developers and other partners requires simple wrangling, significantly reducing the go to market timeframe and unlocking net new revenue potential for Finastra.

“Our decision to integrate Azure Data Share with Finastra’s FusionFabric.cloud platform is now a great way to further accelerate innovation via an expanded open ecosystem. Our partnership with Microsoft truly provides us with limitless opportunities to drive transformation in Financial Services.”

– Eli Rosner, Chief Product and Technology Officer, Finastra

Next steps

Industries of all types need a simple and safe way to share data. Azure Data Share opens up new opportunities for innovation and insights to drive greater business impact.

Automate MLOps workflows with Azure Machine Learning service CLI

This blog was co-authored by Jordan Edwards, Senior Program Manager, Azure Machine Learning

This year at Microsoft Build 2019, we announced a slew of new releases as part of Azure Machine Learning service which focused on MLOps. These capabilities help you automate and manage the end-to-end machine learning lifecycle.

Image with reference to the title "Automate MLOps workflows with Azure Machine Learning service CLI"

Historically, Azure Machine Learning service’s management plane has been via its Python SDK. To make our service more accessible to IT and app development customers unfamiliar with Python, we have delivered an extension to the Azure CLI focused on interacting with Azure Machine Learning.

While it’s not a replacement for the Azure Machine Learning service Python SDK, it is a complimentary tool that is optimized to handle highly parameterized tasks which suit themselves well to automation. With this new CLI, you can easily perform a variety of automated tasks against the machine learning workspace, including:

  • Datastore management
  • Compute target management
  • Experiment submission and job management
  • Model registration and deployment

Combining these commands enables you to train, register their model, package it, and deploy your model as an API. To help you quickly get started with MLOps, we have also released a predefined template in Azure Pipelines. This template allows you to easily train, register, and deploy your machine learning models. Data scientists and developers can work together to build a custom application for their scenario built from their own data set.

The Azure Machine Learning service Command-Line Interface is an extension to the interface for the Azure platform. This extension provides commands for working with Azure Machine Learning service from the command-line and allows you to automate your machine learning workflows. Some key scenarios would include:

  • Running experiments to create machine learning models
  • Registering machine learning models for customer usage
  • Packaging, deploying, and tracking the lifecycle of machine learning models

To use the Azure Machine Learning CLI, you must have an Azure subscription. If you don’t have an Azure subscription, you can create a free account before you begin. Try the free or paid version of Azure Machine Learning service to get started today.

Next steps

Learn more about the Azure Machine Learning service.

Get started with a free trial of the Azure Machine Learning service.

Azure Cosmos DB: A competitive advantage for healthcare ISVs

This blog was co-authored by Shweta Mishra, Senior Solutions Architect, CitiusTech and Vinil Menon, Chief Technology Officer, CitiusTech

CitiusTech is a specialist provider of healthcare technology services which helps its customers to accelerate innovation in healthcare. CitiusTech used Azure Cosmos DB to simplify the real-time collection and movement of healthcare data from variety of sources in a secured manner. With the proliferation of patient information from established and current sources, accompanied with scrupulous regulations, healthcare systems today are gradually shifting towards near real-time data integration. To realize such performance, healthcare systems not only need to have low latency and high availability, but should also be highly responsive. Furthermore, they need to scale effectively to manage the inflow of high speed, large volumes of healthcare data.

The situation

The rise of Internet of Things (IoT) has enabled ordinary medical devices, wearables, traditional hospital deployed medical equipment to collect and share data. Within a wide area network (WAN), there are well defined standards and protocols, but with the ever increasing number of devices getting connected to the internet, there is a general lack of standards compliance and consistency of implementation. Moreover, data collation and generation from IoT enabled medical/mobile devices need specialized applications to cope with increasing volumes of data.

This free-form approach provides a great deal of flexibility, since different data can be stored in document oriented stores as business requirements change. Relational databases aren’t efficient in performing CRUD operations on such data but are essential for handling transactional data where consistent data integrity is necessary. Different databases are designed to solve different problems, using a single database engine for multiple purposes usually leads to non-performant solutions. Whereas management of multiple types of databases is an operational overhead.

Developing distributed global scale solutions are challenged by the capability and complexity of scaling databases across multiple regions without compromising performance, and while complying with data sovereignty needs. This often leads to inefficient management of multiple regional databases and/or underperformance.

Solution

Azure Cosmos DB has the ability of polyglot persistence, which allows it to use a mix of data store technologies without compromising on performance. It is a multi-model, highly-available, globally scalable database which supports proven low latency reads and writes. Azure Cosmos DB has enterprise grade security features and keeps all data encrypted at rest.

Azure Cosmos DB is suited for distributed global scale solutions as it not only provides a turnkey global distribution feature but can geo-fence a database to specific regions to manage data sovereignty compliance. Its multi-master feature allows writes to be made and synchronized across regions with guaranteed consistency. In addition, it supports multi-document transactions with ACID guarantees.

Use cases in healthcare

Azure Cosmos DB works very well for the following workloads.

1. Global scale secure solutions

Organizations like CitiusTech that offer a mission-critical, global-scale solution should consider Azure Cosmos DB a critical component of their solution stack. For example, An ISV developing a non-drug treatment for patients through a medical device at a facility can develop web or mobile applications which store the treatment information and medical device metadata in Azure Cosmos DB. Treatment information can be pushed to medical devices at global facilities for the treatment. ISVs can comply to the compliance requirement by using geo-fencing feature.

Azure Cosmos DB can also be used as a multi-tenant database with carefully designed strategy. For instance, if a tenant has different scaling requirements, different Azure Cosmos containers can be created for such tenants. In Azure Cosmos DB, containers serve as logical units of distribution and scalability. Multi-tenancy may be possible at a partition level within an Azure Cosmos container, but needs to be designed carefully to avoid creating hot-spots and compromising the overall performance.

2. Real-time location system, Internet of Things

Azure Cosmos DB is effective for building a solution for real-time tracking and management of medical devices and patients, as it often requires rapid velocity of data, scale, and resilience. Azure Cosmos DB supports low latency writes and reads so that all data is replicated across multiple fault and update domains in each region for high availability and resilience. It supports session consistency as one of its five consistency levels which is suitable for such scenarios. Session consistency guarantees strong consistency within a session.

Using Azure Cosmos DB also allows scaling of processing power, this is useful for burst scenarios and also provides elastic scale petabytes of storage. This enables request units (RU’s) to be programmatically adjusted as per the workload.

CitiusTech worked with a leading provider of medical grade vital signs and physiological monitoring solution to build a medical IoT based platform with the following requirements:

  • Monitor vitals with medical quality
  • Provide solutions for partners to integrate custom solutions
  • Deliver personalized, actionable insights
  • Messages and/or device generated data don’t have a fixed structure and may change in the future
  • Data producer(s) to simultaneously upload data for at least 100 subjects in less than two seconds per subject, receiving no more than 40*21=840 data points per subject, per request
  • Data consumer(s) to read simultaneously, data of at least 100 subjects in less than two seconds, producing no more than 15,000 data points per data consumer
  • Data for most recent 14 days shall be ready to be queried, and data older than 14 days to be moved to a cold storage

CitiusTech used Azure Cosmos DB as a hot storage to store health data, since it enabled low latency writes and reads of health data that was generated by the wearable sensor continuously. Azure Cosmos DB provided schema agnostic flexible storage to store documents with different shapes and size at scale and allowed enterprise grade security with Azure compliance certification.

The time to live (TTL) feature in Azure Cosmos DB automatically deleted expired items based on the TTL value. It was geo-distributed with its geo-fencing feature to address data sovereignty compliance requirements.

Solution architecture

Diagram showing architecture of data flow in CitiusTech's solution using Azure Cosmos DB

Architecture of data flow in CitiusTech’s solution using Azure Cosmos DB

Key insights

Azure Cosmos DB unlocks the potential of polyglot persistence for healthcare systems to integrate healthcare data from multiple systems of record. It also ensures the need for flexibility, adaptability, speed, security and scale in healthcare is addressed while maintaining low operational overheads and high performance.

About CitiusTech

CitiusTech is a specialist provider of healthcare technology services and solutions to healthcare technology companies, providers, payers and life sciences organizations. CitiusTech helps customers accelerate innovation in healthcare through specialized solutions, healthcare technology platforms, proficiencies and accelerators. Find out more about CitiusTech.

Three things to know about Azure Machine Learning Notebook VM

Data scientists have a dynamic role. They need environments that are fast and flexible while upholding their organization’s security and compliance policies.

Three things to know about Azure Machine Learning Notebook VM

Data scientists working on machine learning projects need a flexible environment to run experiments, train models, iterate models, and innovate in. They want to focus on building, training, and deploying models without getting bogged down in prepping virtual machines (VMs), vigorously entering parameters, and constantly going back to IT to make changes to their environments. Moreover, they need to remain within compliance and security policies outlined by their organizations.

Organizations seek to empower their data scientists to do their job effectively, while keeping their work environment secure. Enterprise IT pros want to lock down security and have a centralized authentication system. Meanwhile, data scientists are more focused on having direct access to virtual machines (VMs) to tinker at the lower level of CUDA drivers and special versions of the latest machine learning frameworks. However, direct access to the VM makes it hard for IT pros to enforce security policies. Azure Machine Learning service is developing innovative features that allow data scientists to get the most out of their data and spend time focusing on their business objectives while maintaining their organizations’ security and compliance posture.

Azure Machine Learning service’s Notebook Virtual Machine (VM), announced in May 2019, resolves these conflicting requirements while simplifying the overall experience for data scientists. Notebook VM is a cloud-based workstation created specifically for data scientists. Notebook VM based authoring is directly integrated into Azure Machine Learning service, providing a code-first experience for Python developers to conveniently build and deploy models in the workspace. Developers and data scientists can perform every operation supported by the Azure Machine Learning Python SDK using a familiar Jupyter notebook in a secure, enterprise-ready environment. Notebook VM is secure and easy-to-use, preconfigured for machine learning, and fully customizable.

Let’s take a look at how Azure Machine Learning service Notebook VMs are:

  1. Secure and easy-to-use
  2. Preconfigured for machine learning and,
  3. Fully customizable

1. Secure and easy to use

When a data scientist creates a notebook in standard infrastructure-as-a-service (IaaS) VM, it requires a lot of intricate, IT specific parameters. They need to name the VM and specify titles of images, security parameters (virtual network, subnet, and more), storage accounts, and a variety of other IT specific parameters. If incorrect parameters are given, or details are overlooked, this can open an organization up to serious security risks.

Compared to an IaaS VM, the Notebook VM creation experience has been streamlined, as it takes just two parameters – a VM name and a VM type. Once the Notebook VM is created it provides access to Jupyter and JupyterLab – two popular notebook environments for data science. The access to the notebooks is secured out-of-the-box with HTTPS and Azure Active Directory, which makes it possible for IT pros to enforce a single sign-on environment with strong security features like Multi-Factor Authentication, ensuring a secure environment in compliance with organizational policies.

Azure Machine Learning Notebook VM - //build2019 demo

2. Preconfigured for machine Learning

Setting up GPU drivers and deploying libraries on a traditional IaaS VM can be cumbersome and require substantial amounts of time. It can also get complicated finding the right drivers for given hardware, libraries, and frameworks. For instance, the latest versions of PyTorch may not work with the drivers a data scientist is currently using. Installation of client libraries for services such as Azure Machine Learning Python SDK can also be time-consuming, and some Python packages can be incompatible with others, depending on the environment where they are installed.

Notebook VM has the most up-to-date, compatible packages preconfigured and ready to use. This way, data scientists can use any of the latest frameworks on Notebook VM without versioning issues and with access to all the latest functionality of Azure Machine Learning service. Inside of the VM, along with Jupyter and JupyterLab, data scientists will find a fully prepared environment for machine learning. Notebook VM draws its pedigree from Data Science Virtual Machine (DSVM), a popular IaaS VM offering on Azure. Similar to DSVM it comes equipped with preconfigured GPU drivers and a selection of ML and Deep Learning Frameworks.

Notebook VM is also integrated with its parent, Azure Machine Learning workspace. The notebooks that data scientists run on the VM have access to the data stores and compute resources of the workspace. The notebooks themselves are stored in a Blob Storage account of the workspace. This makes it easy to share notebooks between VMs, as well as keeps them safely preserved when the VM is deleted.

3. Fully customizable

In environments where IT pros prepare virtual machines for data scientists, there is a very vigorous process for this preparation and limitations on what can be done on these machines. Alternatively, data scientists are very dynamic and need the ability to customize VMs to fit their ever-changing needs. This often means going back to IT pros to have them make the necessary changes to the VMs. Even then, data scientists hit blockers when iterations don’t meet their needs or take too long. Some data scientists will resort to using their personal laptop to run jobs their corporate VMs don’t support, breaking compliance policies and putting the organization at risk.

While Notebook VM is a managed VM offering, it retains full access to hardware capabilities. Data scientists can create VMs of any type, all supported by Azure. This way they can customize it to their heart’s desire by adding custom packages and drivers. For example, data scientists can quickly create the latest NVidia V100 powered VM to perform step-by-step debugging of novel neural network architectures.

Get started

If you are working with code, Notebook VM will offer you a smooth experience. It includes a set of tutorials and samples which make every capability of the Azure Machine Learning service just one-click away. Give it a try and let us know your feedback.

Learn more about the Azure Machine Learning service. Get started with a free trial of the Azure Machine Learning service.

Take your machine learning models to production with new MLOps capabilities

This blog post was authored by Jordan Edwards, Senior Program Manager, Microsoft Azure.

At Microsoft Build 2019 we announced MLOps capabilities in Azure Machine Learning service. MLOps, also known as DevOps for machine learning, is the practice of collaboration and communication between data scientists and DevOps professionals to help manage the production of the machine learning (ML) lifecycle.

Azure Machine Learning service’s MLOps capabilities provide customers with asset management and orchestration services, enabling effective ML lifecycle management. With this announcement, Azure is reaffirming its commitment to help customers safely bring their machine learning models to production and solve their business’s key problems faster and more accurately than ever before.

 
Quote from Eric Boyed, VP of C+AI - "We have heard from customers everywhere that they want to adopt ML but struggle to actually get modles into production. With the new MLOps capabilities in Azure Machine Learning, bringing ML to add calue to your business has become better, faster, and more reliable than ever before."

An Image showing Azure MLOps.

Here is a quick look at some of the new features:

Azure Machine Learning Command Line Interface (CLI) 

Azure Machine Learning’s management plane has historically been via the Python SDK. With the new Azure Machine Learning CLI, you can easily perform a variety of automated tasks against the ML workspace including:

  • Compute target management
  • Experiment submission
  • Model registration and deployment

Management capabilities

Azure Machine Learning service introduced new capabilities to help manage the code, data, and environments used in your ML lifecycle.

An Image showing the ML lifecycle: Train Model to Package Model to Validate Model to Deploy Model to Monitor Model, to Retrain Model.

Code management

Git repositories are commonly used in industry for source control management and as key assets in the software development lifecycle. We are including our first version of Git repository tracking – any time you submit code artifacts to Azure Machine Learning service, you can specify a Git repository reference. This is done automatically when you are running from a CI/CD solution such as Azure Pipelines.

Data set management

With Azure Machine Learning data sets you can version, profile, and snapshot your data to enable you to reproduce your training process by having access to the same data. You can also compare data set profiles and determine how much your data has changed or if you need to retrain your model.

Environment management

Azure Machine Learning Environments are shared across Azure Machine Learning scenarios, from data preparation to model training to inferencing. Shared environments help to simplify handoff from training to inferencing as well as the ability to reproduce a training environment locally.

Environments provide automatic Docker image management (and caching!), plus tracking to streamline reproducibility.

Simplified model debugging and deployment

Some data scientists have difficulty getting an ML model prepared to run in a production system. To alleviate this, we have introduced new capabilities to help you package and debug your ML models locally, prior to pushing them to the cloud. This should greatly reduce the inner loop time required to iterate and arrive at a satisfactory inferencing service, prior to the packaged model reaching the datacenter.

Model validation and profiling 

Another challenge that data scientists commonly face is guaranteeing that models will perform as expected once they are deployed to the cloud or the edge. With the new model validation and profiling capabilities, you can provide sample input queries to your model. We will automatically deploy and test the packaged model on a variety of inference CPU/memory configurations to determine the optimal performance profile. We also check that the inference service is responding correctly to these types of queries.

Model interpretability

Data scientists want to know why models predict in a specific manner. With the new model interpretability capabilities, we can explain why a model is behaving a certain way during both training and inferencing.

ML audit trail

Azure Machine Learning is used for managing all of the artifacts in your model training and deployment process. With new audit trail capabilities, we are enabling automatic tracking of the experiments and datasets that corresponds to your registered ML model. This helps to answer the question, “What code/data was used to create this model?”

Azure DevOps extension for machine learning

Azure DevOps provides commonly used tools data scientists leverage to manage code, work items, and CI/CD pipelines. With the Azure DevOps extension for machine learning, we are introducing new capabilities to make it easy to manage your ML CI/CD pipelines with the same tools you use for software development processes. The extension includes the abilities to trigger Azure Pipelines release on model registration, easily connect an Azure Machine Learning Workspace to an Azure DevOps project, and perform a series of tasks designed to help interaction with Azure Machine Learning as easy as possible from the existing automation tooling.

Get started today

These new MLOps features in the Azure Machine Learning service aim to enable users to bring their ML scenarios to production by supporting reproducibility, auditability, and automation of the end-to-end ML lifecycle. We’ll be publishing more blogs that go in-depth with these features in the following weeks, so follow along for the latest updates and releases.