Azure Cost Management + Billing updates – February 2020

Whether you’re a new student, thriving startup, or the largest enterprise, you have financial constraints and you need to know what you’re spending, where, and how to plan for the future. Nobody wants a surprise when it comes to the bill, and this is where Azure Cost Management + Billing comes in.

We’re always looking for ways to learn more about your challenges and how Azure Cost Management + Billing can help you better understand where you’re accruing costs in the cloud, identify and prevent bad spending patterns, and optimize costs to empower you to do more with less. Here are a few of the latest improvements and updates based on your feedback:

Let’s dig into the details.

 

New Power BI reports for Azure reservations and Azure Hybrid Benefit

Azure Cost Management + Billing offers several ways to report on your cost and usage data. You can start in the portal, download data or schedule an automated export for offline analysis, or even integrate with Cost Management APIs directly. But maybe you just need detailed reporting alongside other business reports. This is where the Power BI comes in. We last talked about the addition of reservation purchases in the Azure Cost Management Power BI connector in October. Building on top of that, the new Azure Cost Management Power BI app offers an extensive set of reports to get you started, including detailed reservation and Azure Hybrid Benefit reports.

The Account overview offers a summary of all usage and purchases as well as your credit balance to help you track monthly expenses. From here, you can dig in to usage costs broken down by subscription, resource group, or service in additional pages. Or, if you simply want to see your prices, take a look at the Price sheet page.

If you’re already using Azure Hybrid Benefit (AHB) or have existing, unused on-prem Windows licenses, check out the Windows Server AHB Usage page. Start by checking how many VMs currently have AHB enabled to determine if you have additional licenses that could help you further lower your costs. If you do have additional licenses, you can also identify eligible VMs based on their core/vCPU count. Apply AHB to your most expensive VMs to maximize your potential savings.

Azure Hybrid Benefit (AHB) report in the new Azure Cost Management Power BI app

If you’re using Azure reservations or are interested in potential savings you could benefit from if you did, you’ll want to check out the VM RI coverage pages to identify any new opportunities where you can save with new reservations, including the historical usage so you can see why that reservation is recommended. You can drill in to a specific region or instance size flexibility group and more. You can see your past purchases in the RI purchases page and get a breakdown of those costs by region, subscription, or resource group in the RI chargeback page, if you need to do any internal chargeback. And, don’t forget to check out the RI savings page, where you can see how much you’ve saved so far by using Azure reservations.

Azure reservation coverage report in the new Azure Cost Management Power BI app

This is just the first release of a new generation of Power BI reports. Get started with the Azure Cost Management Power BI quickstart today and let us know what you’d like to see next.

 

Quicker access to help and support

Learning something new can be a challenge; especially when it’s not your primary focus. But given how critical it is to meet your financial goals, getting help and support needs to be front and center. To support this, Cost Management now includes a contextual Help menu to direct you to documentation and support experiences.

Get started with a quickstart tutorial and, when you’re ready to automate that experience or integrate it into your own apps, check out the API reference. If you have any suggestions on how the experience could be improved for you, please don’t hesitate to share your feedback. If you run into an issue or see something that doesn’t make sense, start with Diagnose and solve problems, and if you don’t see a solution, then please do submit a new support request. We’re closely monitoring all feedback and support requests to identify ways the experience could be streamlined for you. Let us know what you’d like to see next.

Help menu in Azure Cost Management showing options to navigate to a Quickstart tutorial, API reference, Feedback, Diagnose and solve problems, and New support request

 

We need your feedback

As you know, we’re always looking for ways to learn more about your needs and expectations. This month, we’d like to learn more about how you report on and analyze your cloud usage and costs in a brief survey. We’ll use your inputs from this survey to inform ease of use and navigation improvements within Cost Management + Billing experiences. The 15-question survey should take about 10 minutes.

Take the survey.

 

What’s new in Cost Management Labs

With Cost Management Labs, you get a sneak peek at what’s coming in Azure Cost Management and can engage directly with us to share feedback and help us better understand how you use the service, so we can deliver more tuned and optimized experiences. Here are a few features you can see in Cost Management Labs:

  • Get started quicker with the cost analysis Home view
    Azure Cost Management offers five built-in views to get started with understanding and drilling into your costs. The Home view gives you quick access to those views so you get to what you need faster.
  • New: More details in the cost by resource view
    Drill in to the cost of your resources to break them down by meter. Simply expand the row to see more details or click the link to open and take action on your resources.
  • New: Explain what “not applicable” means
    Break down “not applicable” to explain why specific properties don’t have values within cost analysis.

Of course, that’s not all. Every change in Azure Cost Management is available in Cost Management Labs a week before it’s in the full Azure portal. We’re eager to hear your thoughts and understand what you’d like to see next. What are you waiting for? Try Cost Management Labs today.

 

Drill in to the costs for your resources

Resources are the fundamental building block in the cloud. Whether you’re using the cloud as infrastructure or componentized microservices, you use resources to piece together your solution and achieve your vision. And how you use these resources ultimately determines what you’re billed for, which breaks down to individual “meters” for each of your resources. Each service tracks a unique set of meters covering time, size, or other generalized unit. The more units you use, the higher the cost.

Today, you can see costs broken down by resource or meter with built-in views, but seeing both together requires additional filtering and grouping to get down to the data you need, which can be tedious. To simplify this, you can now expand each row in the Cost by resource view to see the individual meters that contribute to the cost of that resource.

Cost by resource view showing a breakdown of meters under a resource

This additional clarity and transparency should help you better understand the costs you’re accruing for each resource at the lowest level. And if you see a resource that shouldn’t be running, simply click the name to open the resource, where you can stop or delete it to avoid incurring additional cost.

You can see the updated Cost by resource view in Cost Management Labs today, while in preview. Let us know if you have any feedback. We’d love to know what you’d like to see next. This should be available everywhere within the next few weeks.

 

Understanding why you see “not applicable”

Azure Cost Management + Billing includes all usage, purchases, and refunds for your billing account. Seeing every line item in the full usage and charges file allows you to reconcile your bill at the lowest level, but since each of these records has different properties, aggregating them within cost analysis can result in groups of empty properties. This is when you see “not applicable” today.

Now, in Cost Management Labs, you can see these costs broken down and categorized into separate groups to bring additional clarity and explain what each represents. Here are a few examples:

  • You may see Other classic resources for any classic resources that don’t include resource group in usage data when grouping by resource or resource group.
  • If you’re using any services that aren’t deployed to resource groups, like Security Center or Azure DevOps (Visual Studio Online), you will see Other subscription resources when grouping by resource group.
  • You may recall seeing Untagged costs when grouping by a specific tag. This group is now broken down further into Tags not available and Tags not supported groups. These signify services that don’t include tags in usage data (see How tags are used) and costs that can’t be tagged, like purchases and resources not deployed to resource groups, covered above.
  • Since purchases aren’t associated with an Azure resource, you might see Other Azure purchases or Other Marketplace purchases when grouping by resource, resource group, or subscription.
  • You may also see Other Marketplace purchases when grouping by reservation. This represents other purchases, which aren’t associated with a reservation.
  • If you have a reservation, you may see Unused reservation when viewing amortized costs and grouping by resource, resource group, or subscription. This represents the unused portion of your reservation that isn’t associated with any resources. These costs will only be visible from your billing account or billing profile.

Of course, these are just a few examples. You may see more. When there simply isn’t a value, you’ll see something like No department, as an example, which represents Enterprise Agreement (EA) subscriptions that aren’t grouped into a department.

We hope these changes help you better understand your cost and usage data. You can see this today in Cost Management Labs while in preview. Please check it out and let us know if you have any feedback. This should be available everywhere within the next few weeks.

 

Upcoming changes to Azure usage data

Many organizations use the full Azure usage and charges to understand what’s being used, identify what charges should be internally billed to which teams, and/or to look for opportunities to optimize costs with Azure reservations and Azure Hybrid Benefit, just to name a few. If you’re doing any analysis or have setup integration based on product details in the usage data, please update your logic for the following services.

The following change will start effective March 1:

Also, remember the key-based Enterprise Agreement (EA) billing APIs have been replaced by new Azure Resource Manager APIs. The key-based APIs will still work through the end of your enrollment, but will no longer be available when you renew and transition into Microsoft Customer Agreement. Please plan your migration to the latest version of the UsageDetails API to ease your transition to Microsoft Customer Agreement at your next renewal.

 

New videos and learning opportunities

For those visual learners out there, here are 2 new resources you should check out:

Follow the Azure Cost Management + Billing YouTube channel to stay in the loop with new videos as they’re released and let us know what you’d like to see next!

 

Documentation updates

There were lots of documentation updates. Here are a few you might be interested in:

Want to keep an eye on all of the documentation updates? Check out the Cost Management + Billing doc change history in the azure-docs repository on GitHub. If you see something missing, select Edit at the top of the document and submit a quick pull request.

What’s next?

These are just a few of the big updates from last month. We’re always listening and making constant improvements based on your feedback, so please keep the feedback coming.

Follow @AzureCostMgmt on Twitter and subscribe to the YouTube channel for updates, tips, and tricks. And, as always, share your ideas and vote up others in the Cost Management feedback forum.

Advancing safe deployment practices

“What is the primary cause of service reliability issues that we see in Azure, other than small but common hardware failures? Change. One of the value propositions of the cloud is that it’s continually improving, delivering new capabilities and features, as well as security and reliability enhancements. But since the platform is continuously evolving, change is inevitable. This requires a very different approach to ensuring quality and stability than the box product or traditional IT approaches — which is to test for long periods of time, and once something is deployed, to avoid changes. This post is the fifth in the series I kicked off in my July blog post that shares insights into what we’re doing to ensure that Azure’s reliability supports your most mission critical workloads. Today we’ll describe our safe deployment practices, which is how we manage change automation so that all code and configuration updates go through well-defined stages to catch regressions and bugs before they reach customers, or if they do make it past the early stages, impact the smallest number possible. Cristina del Amo Casado from our Compute engineering team authored this posts, as she has been driving our safe deployment initiatives.” – Mark Russinovich, CTO, Azure


 

When running IT systems on-premises, you might try to ensure perfect availability by having gold-plated hardware, locking up the server room and throwing away the key. Software wise, IT would traditionally prevent as much change as possible — avoiding applying updates to the operating system or applications because they’re too critical, and pushing back on change requests from users. With everyone treading carefully around the system, this ‘nobody breathe!’ approach stifles continued system improvement, and sometimes even compromises security for systems that are deemed too crucial to patch regularly. As Mark mentioned above, this approach doesn’t work for change and release management in a hyperscale public cloud like Azure. Change is both inevitable and beneficial, given the need to deploy service updates and improvements, and given our commitment to you to act quickly in the face of security vulnerabilities. As we can’t simply avoid change, Microsoft, our customers, and our partners need to acknowledge that change is expected, and we plan for it. Microsoft continues to work on making updates as transparent as possible and will deploy the changes safely as described below. Having said that, our customers and partners should also design for high availability, consume maintenance events sent by the platform to adapt as needed. Finally, in some cases, customers can take control of initiating the platform updates at a suitable time for their organization.

Changing safely

When considering how to deploy releases throughout our Azure datacenters, one of the key premises that shapes our processes is to assume that there could be an unknown problem introduced by the change being deployed, plan in a way that enables the discovery of said problem with minimal impact, and automate mitigation actions for when the problem surfaces. While a developer might judge it as completely innocuous and guarantee that it won’t affect the service, even the smallest change to a system poses a risk to the stability of the system, so ‘changes’ here refers to all kinds of new releases and covers both code changes and configuration changes. In most cases a configuration change has a less dramatic impact on the behavior of a system but, just as for a code change, no configuration change is free of risk for activating a latent code defect or a new code path.

Teams across Azure follow similar processes to prevent or at least minimize impact related to changes. Firstly, by ensuring that changes meet the quality bar before the deployment starts, through test and integration validations. Then after sign off, we roll out the change in a gradual manner and measure health signals continuously, so that we can detect in relative isolation if there is any unexpected impact associated with the change that did not surface during testing. We do not want a change causing problems to ever make it to broad production, so steps are taken to ensure we can avoid that whenever possible. The gradual deployment gives us a good opportunity to detect issues at a smaller scale (or a smaller ‘blast radius’) before it causes widespread impact.

Azure approaches change automation, aligned with the high level process above, through a safe deployment practice (SDP) framework, which aims to ensure that all code and configuration changes go through a lifecycle of specific stages, where health metrics are monitored along the way to trigger automatic actions and alerts in case of any degradation detected. These stages (shown in the diagram that follows) reduce the risk that software changes will negatively affect your existing Azure workloads.

A diagram showing how the cost and impact of failures increases throughout the production rollout pipeline, and is minimized by going through rounds of development and testing, quality gates, and integration.

This shows a simplification of our deployment pipeline, starting on the left with developers modifying their code, testing it on their own systems, and pushing it to staging environments. Generally, this integration environment is dedicated to teams for a subset of Azure services that need to test the interactions of their particular components together. For example, core infrastructure teams such as compute, networking, and storage share an integration environment. Each team runs synthetic tests and stress tests on the software in that environment, iterate until stable, and then once the quality results indicate that a given release, feature, or change is ready for production they deploy the changes into the canary regions.

Canary regions

Publicly we refer to canary regions as “Early Updates Access Program” regions, and they’re effectively full-blown Azure regions with the vast majority of Azure services. One of the canary regions is built with Availability Zones and the other without it, and both regions form a region pair so that we can validate data geo-replication capabilities. These canary regions are used for full, production level, end to end validations and scenario coverage at scale. They host some first party services (for internal customers), several third party services, and a small set of external customers that we invite into the program to help increase the richness and complexity of scenarios covered, all to ensure that canary regions have patterns of usage representative of our public Azure regions. Azure teams also run stress and synthetic tests in these environments, and periodically we execute fault injections or disaster recovery drills at the region or Availability Zone level, to practice the detection and recovery workflows that would be run if this occurred in real life. Separately and together, these exercises attempt to ensure that software is of the highest quality before the changes touch broad customer workloads in Azure.

Pilot phase

Once the results from canary indicate that there are no known issues detected, the progressive deployment to production can get started, beginning with what we call our pilot phase. This phase enables us to try the changes, still at a relatively small scale, but with more diversity of hardware and configurations. This phase is especially important for software like core storage services and core compute infrastructure services, that have hardware dependencies. For example, Azure offers servers with GPU’s, large memory servers, commodity servers, multiple generations and types of processors, Infiniband, and more, so this enables flighting the changes and may enable detection of issues that would not surface during the smaller scale testing. In each step along the way, thorough health monitoring and extended ‘bake times’ enable potential failure patterns to surface, and increase our confidence in the changes while greatly reducing the overall risk to our customers.

Once we determine that the results from the pilot phase are good, the deployment systems proceed by allowing the change to progress to more and more regions incrementally. Throughout the deployment to the broader Azure regions, the deployment systems endeavor to respect Availability Zones (a change only goes to one Availability Zone within a region) and region pairing (every region is ‘paired up’ with a second region for georedundant storage) so a change deploys first to a region and then to its pair. In general, the changes deploy only as long as no negative signals surface.

Safe deployment practices in action

Given the scale of Azure globally, the entire rollout process is completely automated and driven by policy. These declarative policies and processes (not the developers) determine how quickly software can be rolled out. Policies are defined centrally and include mandatory health signals for monitoring the quality of software as well as mandatory ‘bake times’ between the different stages outlined above. The reason to have software sitting and baking for different periods of time across each phase is to make sure to expose the change to a full spectrum of load on that service. For example, diverse organizational users might be coming online in the morning, gaming customers might be coming online in the evening, and new virtual machines (VMs) or resource creations from customers may occur over an extended period of time.

Global services, which cannot take the approach of progressively deploying to different clusters, regions, or service rings, also practice a version of progressive rollouts in alignment with SDP. These services follow the model of updating their service instances in multiple phases, progressively deviating traffic to the updated instances through Azure Traffic Manager. If the signals are positive, more traffic gets deviated over time to updated instances, increasing confidence and unblocking the deployment from being applied to more service instances over time.

Of course, the Azure platform also has the ability to deploy a change simultaneously to all of Azure, in case this is necessary to mitigate an extremely critical vulnerability. Although our safe deployment policy is mandatory, we can choose to accelerate it when certain emergency conditions are met. For example, to release a security update that requires us to move much more quickly than we normally would, or for a fix where the risk of regression is overcome by the fix mitigating a problem that’s already very impactful to customers. These exceptions are very rare, in general our deployment tools and processes intentionally sacrifice velocity to maximize the chance for signals to build up and scenarios and workflows to be exercised at scale, thus creating the opportunity to discover issues at the smallest possible scale of impact.

Continuing improvements

Our safe deployment practices and deployment tooling continue to evolve with learnings from previous outages and maintenance events, and in line with our goal of detecting issues at a significantly smaller scale. For example, we have learned about the importance of continuing to enrich our health signals and about using machine learning to better correlate faults and detect anomalies. We also continue to improve the way in which we do pilots and flighting, so that we can cover more diversity of hardware with smaller risk. We continue to improve our ability to rollback changes automatically if they show potential signs of problems. We also continue to invest in platform features that reduce or eliminate the impact of changes generally.

With over a thousand new capabilities released in the last year, we know that the pace of change in Azure can feel overwhelming. As Mark mentioned, the agility and continual improvement of cloud services is one of the key value propositions of the cloud – change is a feature, not a bug. To learn about the latest releases, we encourage customers and partners to stay in the know at Azure.com/Updates. We endeavor to keep this as the single place to learn about recent and upcoming Azure product updates, including the roadmap of innovations we have in development. To understand the regions in which these different services are available, or when they will be available, you can also use our tool at Azure.com/ProductsbyRegion.

Backup Explorer now available in preview

As organizations continue to expand their use of IT and the cloud, protecting critical enterprise data becomes extremely important. And if you are a backup admin on Microsoft Azure, being able to efficiently monitor backups on a daily basis is a key requirement to ensuring that your organization has no weaknesses in its last line of defense.

Up until now, you could use a Recovery Services vault to get a bird’s eye view of items being backed up under that vault, along with the associated jobs, policies, and alerts. But as your backup estate expands to span multiple vaults across subscriptions, regions, and tenants, monitoring this estate in real-time becomes a non-trivial task, requiring you to write your own customizations.

What if there was a simpler way to aggregate information across your entire backup estate into a single pane of glass, enabling you to quickly identify exactly where to focus your energy on?

Today, we are pleased to share the preview of Backup Explorer. Backup Explorer is a built-in Azure Monitor Workbook enabling you to have a single pane of glass for performing real-time monitoring across your entire backup estate on Azure. It comes completely out-of-the-box, with no additional costs, via native integration with Azure Resource Graph and Azure Workbooks.

Key Benefits

1) At-scale views – With Backup Explorer, monitoring is no longer limited to a Recovery Services vault. You can get an aggregated view of your entire estate from a backup perspective. This includes not only information on your backup items, but also resources that are not configured for backup, ensuring that you don’t ever miss protecting critical data in your growing estate. And if you are an Azure Lighthouse user, you can view all of this information even across multiple tenant, enabling truly boundary-less monitoring.

2) Deep drill-downs – You can quickly switch between aggregated views and highly granular data for any of your backup-related artifacts, be it backup items, jobs, alerts or policies.

3) Quick troubleshooting and actionability – The at-scale views and deep drill-downs are designed to aid you in getting to the root cause of a backup-related issue. Once you identify an issue, you can act on it by seamlessly navigating to the backup item or the Azure resource, right from Backup Explorer.

Backup Explorer is currently supported for Azure Virtual Machines. Support for other Azure workloads will be added soon.

At Azure Backup, Backup Explorer is just one part of our overall goal to enable a delightful, enterprise-ready management-at-scale experience for all our customers.

Getting Started

To get started with using Backup Explorer, you can simply navigate to any Recovery Services vault and click on Backup Explorer in the quick links section.

Backup Explorer link in Recovery Services Vault

You will be redirected to Backup Explorer which gives a view across all the vaults, subscriptions, and tenants that you have access to.

Summary tab of Backup Explorer

More information

Read the Backup Explorer documentation for detailed information on leveraging the various tabs to solve different use-cases.

Azure Cost Management updates – January 2020

Whether you’re a new student, thriving startup, or the largest enterprise, you have financial constraints and you need to know what you’re spending, where, and how to plan for the future. Nobody wants a surprise when it comes to the bill, and this is where Azure Cost Management comes in.

We’re always looking for ways to learn more about your challenges and how Azure Cost Management can help you better understand where you’re accruing costs in the cloud, identify and prevent bad spending patterns, and optimize costs to empower you to do more with less. Here are a few of the latest improvements and updates based on your feedback:

Let’s dig into the details. 

Automate reporting for Microsoft Customer Agreement with scheduled exports

You already know you can dig into your cost and usage data from the Azure portal. You may even know you can get rich reporting from the Cost Management Query API or get the full details, in all its glory, from the UsageDetails API. These are both great for ad-hoc queries, but maybe you’re looking for a simpler solution. This is where Azure Cost Management exports come in.

Azure Cost Management exports automatically publish your cost and usage data to a storage account on a daily, weekly, or monthly basis. Up to this month, you’ve been able to schedule exports for Enterprise Agreement (EA) and pay-as-you-go (PAYG) accounts. Now, you can also schedule exports across subscriptions for Microsoft Customer Agreement billing accounts, subscriptions, and resource groups.

Learn more about scheduled exports in Create and manage exported data

Raising awareness of disabled costs

Enterprise Agreement (EA) and Microsoft Customer Agreement (MCA) accounts both offer an option to hide prices and charges from subscription users. While this can be useful to obscure negotiated discounts (including vendors), it also puts you at risk of over-spending since teams that deploy and manage resources don’t have visibility and cannot effectively keep costs down. To avoid this, we recommend using custom Azure RBAC roles for anyone who shouldn’t see costs, while allowing everyone else to fully manage and optimize costs.

Unfortunately, some organizations may not realize costs have been disabled. This can happen when you renew your EA enrollment or when you switch between EA partners, as an example. In an effort to help raise awareness of these settings, you will see new messaging when costs have been disabled for the organization. Someone who does not have access to see costs will see a message like the following in cost analysis:

Message stating "Cost Management not enabled for subscription users. Contact your subscription account admin about enabling 'Account owner can view charges' on the billing account."

EA billing account admins and MCA billing profile owners will also see a message in cost analysis to ensure they’re aware that subscription users cannot see or optimize costs.

Cost analysis showing a warning to Enterprise Agreement (EA) and Microsoft Customer Agreement (MCA) admins that "Subscription users cannot see or optimize costs. Enable Cost Management." with a link to enable view charges for everyone

To enable access to Azure Cost Management, simply click the banner and turn on “Account owners can view charges” for EA accounts and “Azure charges” for MCA accounts. If you’re not sure whether subscription users can see costs on your billing account, check today and unlock new cost reporting, control, and optimization capabilities for your teams. 

What’s new in Cost Management Labs

With Cost Management Labs, you get a sneak peek at what’s coming in Azure Cost Management and can engage directly with us to share feedback and help us better understand how you use the service, so we can deliver more tuned and optimized experiences. Here are a few features you can see in Cost Management Labs:

  • Get started quicker with the cost analysis Home view
    Azure Cost Management offers five built-in views to get started with understanding and drilling into your costs. The Home view gives you quick access to those views so you get to what you need faster.
  • NEW: Try Preview gives you quick access to preview featuresNow available in the public portal
    You already know Cost Management Labs gives you early access to the latest changes. Now you can also opt in to individual preview features from the public portal using the Try preview command in cost analysis.

Of course, that’s not all. Every change in Azure Cost Management is available in Cost Management Labs a week before it’s in the full Azure portal. We’re eager to hear your thoughts and understand what you’d like to see next. What are you waiting for? Try Cost Management Labs today. 

Custom RBAC role preview for management groups

Management groups now support defining custom RBAC roles to allow you to assign more specific permissions to users, groups, and apps within your organization. One example could be a role that allows someone to be able to create and manage the management group hierarchy as well as manage costs using Azure Cost Management + Billing APIs. Today, this requires both the Management Group Contributor and Cost Management Contributor roles, but these permissions could be combined into a single custom role to streamline role assignment.

If you’re unfamiliar with RBAC, Azure role-based access control (RBAC) is the authorization system used to manage access to Azure resources. To grant access, you assign roles to users, groups, service principals, or managed identities at a particular scope, like a resource group, subscription, or in this case, a management group. Cost Management + Billing supports the following built-in Azure RBAC roles, from least to most privileged:

  • Cost Management Reader: Can view cost data, configuration (including budgets exports), and recommendations.
  • Billing Reader: Lets you read billing data.
  • Reader: Lets you view everything, but not make any changes.
  • Cost Management Contributor: Can view costs, manage cost configuration (including budgets and exports), and view recommendations.
  • Contributor: Lets you manage everything except access to resources.
  • Owner: Lets you manage everything, including access to resources.

While most organizations will find the built-in roles to be sufficient, there are times when you need something more specific. This is where custom RBAC roles come in. Custom RBAC roles allow you to define your own set of unique permissions by specifying a set of wildcard “actions” that map to Azure Resource Manager API calls. You can mix and match actions as needed to meet your specific needs, whether that’s to allow an action or deny one (using “not actions”). Below are a few examples of the most common actions:

  • Microsoft.Consumption/*/read – Read access to all cost and usage data, including prices, usage, purchases, reservations, and resource tags.
  • Microsoft.Consumption/budgets/* – Full access to manage budgets.
  • Microsoft.CostManagement/*/read – Read access to cost and usage data and alerts.
  • Microsoft.CostManagement/views/* – Full access to manage shared views used in cost analysis.
  • Microsoft.CostManagement/exports/* – Full access to manage scheduled exports that automatically push data to storage on a regular basis.
  • Microsoft.CostManagement/cloudConnectors/* – Full access to manage AWS cloud connectors that allow you manage Azure and AWS costs together in the same management group. 

New ways to save money with Azure

Lots of cost optimization improvements over the past month! Here are a few you might be interested in:

Recent changes to Azure usage data

Many organizations use the full Azure usage and charges dataset to understand what’s being used, identify what charges should be internally billed to which teams, and/or to look for opportunities to optimize costs with Azure reservations and Azure Hybrid Benefit, just to name a few. If you’re doing any analysis or have setup integration based on product details in the usage data, please update your logic for the following services.

All of the following changes were effective January 1:

Also, remember the key-based Enterprise Agreement (EA) billing APIs have been replaced by new Azure Resource Manager APIs. The key-based APIs will still work through the end of your enrollment, but will no longer be available when you renew and transition into Microsoft Customer Agreement. Please plan your migration to the latest version of the UsageDetails API to ease your transition to Microsoft Customer Agreement at your next renewal. 

Documentation updates

There were tots of documentation updates. Here are a few you might be interested in:

Want to keep an eye on all of the documentation updates? Check out the Cost Management doc change history in the azure-docs repository on GitHub. If you see something missing, select Edit at the top of the document and submit a quick pull request.

What’s next?

These are just a few of the big updates from last month. We’re always listening and making constant improvements based on your feedback, so please keep the feedback coming.

Follow @AzureCostMgmt on Twitter and subscribe to the YouTube channel for updates, tips, and tricks. And, as always, share your ideas and vote up others in the Cost Management feedback forum.

MSC Mediterranean Shipping Company on Azure Site Recovery

Today’s Q&A post covers an interview between Siddharth Deekshit, Program Manager, Microsoft Azure Site Recovery engineering and Quentin Drion, IT Director of Infrastructure and Operations, MSC. MSC is a global shipping and logistics business, our conversation focused on their organization’s journey with Azure Site Recovery (ASR). To learn more about achieving resilience in Azure, refer to this whitepaper.

I wanted to start by understanding the transformation journey that MSC is going through, including consolidating on Azure. Can you talk about how Azure is helping you run your business today?

We are a shipping line, so we move containers worldwide. Over the years, we have developed our own software to manage our core business. We have a different set of software for small, medium, and large entities, which were running on-premises. That meant we had to maintain a lot of on-premises resources to support all these business applications. A decision was taken a few years ago to consolidate all these business workloads inside Azure regardless of the size of the entity. When we are migrating, we turn off what we have on-premises and then start using software hosted in Azure and provide it as a service for our subsidiaries. This new design is managed in a centralized manner by an internal IT team.

That’s fantastic. Consolidation is a big benefit of using Azure. Apart from that, what other benefits do you see of moving to Azure?

For us, automation is a big one that is a huge improvement, the capabilities in terms of API in the integration and automation that we can have with Azure allows us to deploy environments in a matter of hours where before that it took much, much longer as we had to order the hardware, set it up, and then configure. Now we no longer need to worry about the set up as well as hardware support, and warranties. The environment is all virtualized and we can, of course, provide the same level of recovery point objective (RPO), recovery time objective (RTO), and security to all the entities that we have worldwide.

Speaking of RTO and RPO, let’s talk a little bit about Site Recovery. Can you tell me what life was like before using Site Recovery?

Actually, when we started migrating workloads, we had a much more traditional approach, in the sense that we were doing primary production workloads in one Azure region, and we were setting up and managing a complete disaster recovery infrastructure in another region. So the traditional on-premises data center approach was really how we started with disaster recovery (DR) on Azure, but then we spent the time to study what Site Recovery could provide us. Based on the findings and some testing that we performed, we decided to change the implementation that we had in place for two to three years and switch to Site Recovery, ultimately to reduce our cost significantly, since we no longer have to keep our DR Azure Virtual Machines running in another region. In terms of management, it’s also easier for us. For traditional workloads, we have better RPO and RTO than we saw with our previous approach. So we’ve seen great benefits across the board.

That’s great to know. What were you most skeptical about when it came to using Site Recovery? You mentioned that your team ran tests, so what convinced you that Site Recovery was the right choice?

It was really based on the tests that we did. Earlier, we were doing a lot of manual work to switch to the DR region, to ensure that domain name system (DNS) settings and other networking settings were appropriate, so there were a lot of constraints. When we tested it compared to this manual way of doing things, Site Recovery worked like magic. The fact that our primary region could fail and that didn’t require us to do a lot was amazing. Our applications could start again in the DR region and we just had to manage the upper layer of the app to ensure that it started correctly. We were cautious about this app restart, not because of the Virtual Machine(s), because we were confident that Site Recovery would work, but because of our database engine. We were positively surprised to see how well Site Recovery works. All our teams were very happy about the solution and they are seeing the added value of moving to this kind of technology for them as operational teams, but also for us in management to be able to save money, because we reduced the number of Virtual Machines that we had that were actually not being used.

Can you talk to me a little bit about your onboarding experience with Site Recovery?

I think we had six or seven major in house developed applications in Azure at that time. We picked one of these applications as a candidate for testing. The test was successful. We then extended to a different set of applications that were in production. There were again no major issues. The only drawback we had was with some large disks. Initially, some of our larger disks were not supported. This was solved quickly and since then it has been, I would say, really straightforward. Based on the success of our testing, we worked to switch all the applications we have on the platform to use Site Recovery for disaster recovery.

Can you give me a sense of what workloads you are running on your Azure Virtual Machines today? How many people leverage the applications running on those Virtual Machines for their day job?

So it’s really core business apps. There is, of course, the main infrastructure underneath, but what we serve is business applications that we have written internally, presented to Citrix frontend in Azure. These applications do container bookings, customer registrations, etc. I mean, we have different workloads associated with the complete process of shipping. In terms of users, we have some applications that are being used by more than 5,000 people, and more and more it’s becoming their primary day-to-day application.

Wow, that’s a ton of usage and I’m glad you trust Site Recovery for your DR needs. Can you tell me a little bit about the architecture of those workloads?

Most of them are Windows-based workloads. The software that gets the most used worldwide is a 3-tier application. We have a database on SQL, a middle-tier server, application server, and also some web frontend servers. But for the new one that we have developed now, it’s based on microservices. There are also some Linux servers being used for specific usage.

Tell me more about your experience with Linux.

Site Recovery works like a charm with Linux workloads. We only had a few mistakes in the beginning, made on our side. We wanted to use a product from Red Hat called Satellite for updates, but we did not realize that we cannot change the way that the Virtual Machines are being managed if you want to use Satellite. It needs to be defined at the beginning otherwise it’s too late. But besides this, the ‘bring your own license’ story works very well and especially with Site Recovery.

Glad to hear that you found it to be a seamless experience. Was there any other aspect of Site Recovery that impressed you, or that you think other organizations should know about?

For me, it’s the capability to be able to perform drills in an easy way. With the more traditional approach, each time that you want to do a complete disaster recovery test, it’s always time and resource-consuming in terms of preparation. With Site Recovery, we did a test a few weeks back on the complete environment and it was really easy to prepare. It was fast to do the switch to the recovery region, and just as easy to bring back the workload to the primary region. So, I mean for me today, it’s really the ease of using Site Recovery.

If you had to do it all over again, what would you do differently on your Site Recovery Journey?

I would start to use it earlier. If we hadn’t gone with the traditional active-passive approach, I think we could have saved time and money for the company. On the other hand, we were in this way confident in the journey. Other than that, I think we wouldn’t have changed much. But what we want to do now, is start looking at Azure Site Recovery services to be able to replicate workloads running on on-premises Virtual Machines in Hyper-V. For those applications that are still not migrated to Azure, we want to at least ensure proper disaster recovery. We also want to replicate some VMware Virtual Machines that we still have as part of our migration journey to Hyper-V. This is what we are looking at.

Do you have any advice for folks for other prospective or current customers of Site Recovery?

One piece of advice that I could share is to suggest starting sooner and if required, smaller. Start using Site Recovery even if it’s on one small app. It will help you see the added value, and that will help you convince the operational teams that there is a lot of value and that they can trust the services that Site Recovery is providing instead of trying to do everything on their own.

That’s excellent advice. Those were all my questions, Quentin. Thanks for sharing your experiences.

Learn more about resilience with Azure. 

MSC Mediterranean Shipping Company on Azure Site Recovery

Today’s Q&A post covers an interview between Siddharth Deekshit, Program Manager, Microsoft Azure Site Recovery engineering and Quentin Drion, IT Director of Infrastructure and Operations, MSC. MSC is a global shipping and logistics business, our conversation focused on their organization’s journey with Azure Site Recovery (ASR). To learn more about achieving resilience in Azure, refer to this whitepaper.

I wanted to start by understanding the transformation journey that MSC is going through, including consolidating on Azure. Can you talk about how Azure is helping you run your business today?

We are a shipping line, so we move containers worldwide. Over the years, we have developed our own software to manage our core business. We have a different set of software for small, medium, and large entities, which were running on-premises. That meant we had to maintain a lot of on-premises resources to support all these business applications. A decision was taken a few years ago to consolidate all these business workloads inside Azure regardless of the size of the entity. When we are migrating, we turn off what we have on-premises and then start using software hosted in Azure and provide it as a service for our subsidiaries. This new design is managed in a centralized manner by an internal IT team.

That’s fantastic. Consolidation is a big benefit of using Azure. Apart from that, what other benefits do you see of moving to Azure?

For us, automation is a big one that is a huge improvement, the capabilities in terms of API in the integration and automation that we can have with Azure allows us to deploy environments in a matter of hours where before that it took much, much longer as we had to order the hardware, set it up, and then configure. Now we no longer need to worry about the set up as well as hardware support, and warranties. The environment is all virtualized and we can, of course, provide the same level of recovery point objective (RPO), recovery time objective (RTO), and security to all the entities that we have worldwide.

Speaking of RTO and RPO, let’s talk a little bit about Site Recovery. Can you tell me what life was like before using Site Recovery?

Actually, when we started migrating workloads, we had a much more traditional approach, in the sense that we were doing primary production workloads in one Azure region, and we were setting up and managing a complete disaster recovery infrastructure in another region. So the traditional on-premises data center approach was really how we started with disaster recovery (DR) on Azure, but then we spent the time to study what Site Recovery could provide us. Based on the findings and some testing that we performed, we decided to change the implementation that we had in place for two to three years and switch to Site Recovery, ultimately to reduce our cost significantly, since we no longer have to keep our DR Azure Virtual Machines running in another region. In terms of management, it’s also easier for us. For traditional workloads, we have better RPO and RTO than we saw with our previous approach. So we’ve seen great benefits across the board.

That’s great to know. What were you most skeptical about when it came to using Site Recovery? You mentioned that your team ran tests, so what convinced you that Site Recovery was the right choice?

It was really based on the tests that we did. Earlier, we were doing a lot of manual work to switch to the DR region, to ensure that domain name system (DNS) settings and other networking settings were appropriate, so there were a lot of constraints. When we tested it compared to this manual way of doing things, Site Recovery worked like magic. The fact that our primary region could fail and that didn’t require us to do a lot was amazing. Our applications could start again in the DR region and we just had to manage the upper layer of the app to ensure that it started correctly. We were cautious about this app restart, not because of the Virtual Machine(s), because we were confident that Site Recovery would work, but because of our database engine. We were positively surprised to see how well Site Recovery works. All our teams were very happy about the solution and they are seeing the added value of moving to this kind of technology for them as operational teams, but also for us in management to be able to save money, because we reduced the number of Virtual Machines that we had that were actually not being used.

Can you talk to me a little bit about your onboarding experience with Site Recovery?

I think we had six or seven major in house developed applications in Azure at that time. We picked one of these applications as a candidate for testing. The test was successful. We then extended to a different set of applications that were in production. There were again no major issues. The only drawback we had was with some large disks. Initially, some of our larger disks were not supported. This was solved quickly and since then it has been, I would say, really straightforward. Based on the success of our testing, we worked to switch all the applications we have on the platform to use Site Recovery for disaster recovery.

Can you give me a sense of what workloads you are running on your Azure Virtual Machines today? How many people leverage the applications running on those Virtual Machines for their day job?

So it’s really core business apps. There is, of course, the main infrastructure underneath, but what we serve is business applications that we have written internally, presented to Citrix frontend in Azure. These applications do container bookings, customer registrations, etc. I mean, we have different workloads associated with the complete process of shipping. In terms of users, we have some applications that are being used by more than 5,000 people, and more and more it’s becoming their primary day-to-day application.

Wow, that’s a ton of usage and I’m glad you trust Site Recovery for your DR needs. Can you tell me a little bit about the architecture of those workloads?

Most of them are Windows-based workloads. The software that gets the most used worldwide is a 3-tier application. We have a database on SQL, a middle-tier server, application server, and also some web frontend servers. But for the new one that we have developed now, it’s based on microservices. There are also some Linux servers being used for specific usage.

Tell me more about your experience with Linux.

Site Recovery works like a charm with Linux workloads. We only had a few mistakes in the beginning, made on our side. We wanted to use a product from Red Hat called Satellite for updates, but we did not realize that we cannot change the way that the Virtual Machines are being managed if you want to use Satellite. It needs to be defined at the beginning otherwise it’s too late. But besides this, the ‘bring your own license’ story works very well and especially with Site Recovery.

Glad to hear that you found it to be a seamless experience. Was there any other aspect of Site Recovery that impressed you, or that you think other organizations should know about?

For me, it’s the capability to be able to perform drills in an easy way. With the more traditional approach, each time that you want to do a complete disaster recovery test, it’s always time and resource-consuming in terms of preparation. With Site Recovery, we did a test a few weeks back on the complete environment and it was really easy to prepare. It was fast to do the switch to the recovery region, and just as easy to bring back the workload to the primary region. So, I mean for me today, it’s really the ease of using Site Recovery.

If you had to do it all over again, what would you do differently on your Site Recovery Journey?

I would start to use it earlier. If we hadn’t gone with the traditional active-passive approach, I think we could have saved time and money for the company. On the other hand, we were in this way confident in the journey. Other than that, I think we wouldn’t have changed much. But what we want to do now, is start looking at Azure Site Recovery services to be able to replicate workloads running on on-premises Virtual Machines in Hyper-V. For those applications that are still not migrated to Azure, we want to at least ensure proper disaster recovery. We also want to replicate some VMware Virtual Machines that we still have as part of our migration journey to Hyper-V. This is what we are looking at.

Do you have any advice for folks for other prospective or current customers of Site Recovery?

One piece of advice that I could share is to suggest starting sooner and if required, smaller. Start using Site Recovery even if it’s on one small app. It will help you see the added value, and that will help you convince the operational teams that there is a lot of value and that they can trust the services that Site Recovery is providing instead of trying to do everything on their own.

That’s excellent advice. Those were all my questions, Quentin. Thanks for sharing your experiences.

Learn more about resilience with Azure. 

MSC Mediterranean Shipping Company on Azure Site Recovery, “ASR worked like magic”

Today’s Q&A post covers an interview between Siddharth Deekshit, Program Manager, Microsoft Azure Site Recovery engineering and Quentin Drion, IT Director of Infrastructure and Operations, MSC. MSC is a global shipping and logistics business, our conversation focused on their organization’s journey with Azure Site Recovery (ASR). To learn more about achieving resilience in Azure, refer to this whitepaper.

I wanted to start by understanding the transformation journey that MSC is going through, including consolidating on Azure. Can you talk about how Azure is helping you run your business today?

We are a shipping line, so we move containers worldwide. Over the years, we have developed our own software to manage our core business. We have a different set of software for small, medium, and large entities, which were running on-premises. That meant we had to maintain a lot of on-premises resources to support all these business applications. A decision was taken a few years ago to consolidate all these business workloads inside Azure regardless of the size of the entity. When we are migrating, we turn off what we have on-premises and then start using software hosted in Azure and provide it as a service for our subsidiaries. This new design is managed in a centralized manner by an internal IT team.

That’s fantastic. Consolidation is a big benefit of using Azure. Apart from that, what other benefits do you see of moving to Azure?

For us, automation is a big one that is a huge improvement, the capabilities in terms of API in the integration and automation that we can have with Azure allows us to deploy environments in a matter of hours where before that it took much, much longer as we had to order the hardware, set it up, and then configure. Now we no longer need to worry about the set up as well as hardware support, and warranties. The environment is all virtualized and we can, of course, provide the same level of recovery point objective (RPO), recovery time objective (RTO), and security to all the entities that we have worldwide.

Speaking of RTO and RPO, let’s talk a little bit about Site Recovery. Can you tell me what life was like before using Site Recovery?

Actually, when we started migrating workloads, we had a much more traditional approach, in the sense that we were doing primary production workloads in one Azure region, and we were setting up and managing a complete disaster recovery infrastructure in another region. So the traditional on-premises data center approach was really how we started with disaster recovery (DR) on Azure, but then we spent the time to study what Site Recovery could provide us. Based on the findings and some testing that we performed, we decided to change the implementation that we had in place for two to three years and switch to Site Recovery, ultimately to reduce our cost significantly, since we no longer have to keep our DR Azure Virtual Machines running in another region. In terms of management, it’s also easier for us. For traditional workloads, we have better RPO and RTO than we saw with our previous approach. So we’ve seen great benefits across the board.

That’s great to know. What were you most skeptical about when it came to using Site Recovery? You mentioned that your team ran tests, so what convinced you that Site Recovery was the right choice?

It was really based on the tests that we did. Earlier, we were doing a lot of manual work to switch to the DR region, to ensure that domain name system (DNS) settings and other networking settings were appropriate, so there were a lot of constraints. When we tested it compared to this manual way of doing things, Site Recovery worked like magic. The fact that our primary region could fail and that didn’t require us to do a lot was amazing. Our applications could start again in the DR region and we just had to manage the upper layer of the app to ensure that it started correctly. We were cautious about this app restart, not because of the Virtual Machine(s), because we were confident that Site Recovery would work, but because of our database engine. We were positively surprised to see how well Site Recovery works. All our teams were very happy about the solution and they are seeing the added value of moving to this kind of technology for them as operational teams, but also for us in management to be able to save money, because we reduced the number of Virtual Machines that we had that were actually not being used.

Can you talk to me a little bit about your onboarding experience with Site Recovery?

I think we had six or seven major in house developed applications in Azure at that time. We picked one of these applications as a candidate for testing. The test was successful. We then extended to a different set of applications that were in production. There were again no major issues. The only drawback we had was with some large disks. Initially, some of our larger disks were not supported. This was solved quickly and since then it has been, I would say, really straightforward. Based on the success of our testing, we worked to switch all the applications we have on the platform to use Site Recovery for disaster recovery.

Can you give me a sense of what workloads you are running on your Azure Virtual Machines today? How many people leverage the applications running on those Virtual Machines for their day job?

So it’s really core business apps. There is, of course, the main infrastructure underneath, but what we serve is business applications that we have written internally, presented to Citrix frontend in Azure. These applications do container bookings, customer registrations, etc. I mean, we have different workloads associated with the complete process of shipping. In terms of users, we have some applications that are being used by more than 5,000 people, and more and more it’s becoming their primary day-to-day application.

Wow, that’s a ton of usage and I’m glad you trust Site Recovery for your DR needs. Can you tell me a little bit about the architecture of those workloads?

Most of them are Windows-based workloads. The software that gets the most used worldwide is a 3-tier application. We have a database on SQL, a middle-tier server, application server, and also some web frontend servers. But for the new one that we have developed now, it’s based on microservices. There are also some Linux servers being used for specific usage.

Tell me more about your experience with Linux.

Site Recovery works like a charm with Linux workloads. We only had a few mistakes in the beginning, made on our side. We wanted to use a product from Red Hat called Satellite for updates, but we did not realize that we cannot change the way that the Virtual Machines are being managed if you want to use Satellite. It needs to be defined at the beginning otherwise it’s too late. But besides this, the ‘bring your own license’ story works very well and especially with Site Recovery.

Glad to hear that you found it to be a seamless experience. Was there any other aspect of Site Recovery that impressed you, or that you think other organizations should know about?

For me, it’s the capability to be able to perform drills in an easy way. With the more traditional approach, each time that you want to do a complete disaster recovery test, it’s always time and resource-consuming in terms of preparation. With Site Recovery, we did a test a few weeks back on the complete environment and it was really easy to prepare. It was fast to do the switch to the recovery region, and just as easy to bring back the workload to the primary region. So, I mean for me today, it’s really the ease of using Site Recovery.

If you had to do it all over again, what would you do differently on your Site Recovery Journey?

I would start to use it earlier. If we hadn’t gone with the traditional active-passive approach, I think we could have saved time and money for the company. On the other hand, we were in this way confident in the journey. Other than that, I think we wouldn’t have changed much. But what we want to do now, is start looking at Azure Site Recovery services to be able to replicate workloads running on on-premises Virtual Machines in Hyper-V. For those applications that are still not migrated to Azure, we want to at least ensure proper disaster recovery. We also want to replicate some VMware Virtual Machines that we still have as part of our migration journey to Hyper-V. This is what we are looking at.

Do you have any advice for folks for other prospective or current customers of Site Recovery?

One piece of advice that I could share is to suggest starting sooner and if required, smaller. Start using Site Recovery even if it’s on one small app. It will help you see the added value, and that will help you convince the operational teams that there is a lot of value and that they can trust the services that Site Recovery is providing instead of trying to do everything on their own.

That’s excellent advice. Those were all my questions, Quentin. Thanks for sharing your experiences.

Learn more about resilience with Azure. 

New Azure blueprint for CIS Benchmark

We’ve released our newest Azure blueprint that maps to another key industry-standard, the Center for Internet Security (CIS) Microsoft Azure Foundations Benchmark. This follows the recent announcement of our Azure blueprint for FedRAMP moderate and adds to the growing list of Azure blueprints for regulatory compliance, which now includes ISO 27001, NIST SP 800-53, PCI-DSS, UK OFFICIAL, UK NHS, and IRS 1075.

Azure Blueprints is a free service that enables cloud architects and central information technology groups to define a set of Azure resources that implements and adheres to an organization’s standards, patterns, and requirements. Azure Blueprints makes it possible for development teams to rapidly build and stand up new trusted environments within organizational compliance requirements. Customers can apply the new CIS Microsoft Azure Foundations Benchmark blueprint to new subscriptions as well as existing environments.

CIS benchmarks are configuration baselines and best practices for securely configuring a system developed by CIS, a nonprofit entity whose mission is to ”identify, develop, validate, promote, and sustain best practice solutions for cyber defense.” A global community collaborates in a consensus-based process to develop these internationally recognized security standards for defending IT systems and data against cyberattacks. Used by thousands of businesses, they offer prescriptive guidance for establishing a secure baseline system configuration. System and application administrators, security specialists, and others who develop solutions using Microsoft products and services can use these best practices to assess and improve the security of their applications.

Each of the CIS Microsoft Azure Foundations Benchmark recommendations are mapped to one or more of the 20 CIS Controls that were developed to help organizations improve their cyber defense. The blueprint assigns Azure Policy definitions to help customers assess their compliance with the recommendations. Major elements of all nine sections of the recommendations from the CIS Microsoft Azure Foundation Benchmark v1.1.0 include:

Identity and Access Management (1.0)

  • Assigns Azure Policy definitions that help you monitor when multi-factor authentication isn’t enabled on privileged Azure Active Directory accounts.
  • Assigns an Azure Policy definition that helps you monitor when multi-factor authentication isn’t enabled on non-privileged Azure Active Directory accounts.
  • Assigns Azure Policy definitions that help you monitor for guest accounts and custom subscription roles that may need to be removed.

Security Center (2.0)

  • Assigns Azure Policy definitions that help you monitor networks and virtual machines where the Security Center standard tier isn’t enabled.
  • Assigns Azure Policy definitions that helps you ensure that virtual machines are monitored for vulnerabilities and remediated, endpoint protection is enabled, system updates are installed on virtual machines.
  • Assigns an Azure Policy definition that helps you ensure virtual machine disks are encrypted.

Storage Accounts (3.0)

  • Assigns an Azure Policy definition that helps you monitor storage accounts that allow insecure connections.
  • Assigns an Azure Policy definition that helps you monitor storage accounts that allow unrestricted access.
  • Assigns an Azure Policy definition that helps you monitor storage accounts that don’t allow access from trusted Microsoft services.

Database Services (4.0)

  • Assigns an Azure Policy definition that helps ensure SQL Server auditing is enabled as well as properly configured, and logs are retained for at least 90 days.
  • Assigns an Azure Policy definition that helps you ensure advanced data security notifications are properly enabled.
  • Assigns an Azure Policy definition that helps you ensure that SQL Servers are configured for encryption and other security settings.

Logging and Monitoring (5.0)

  • Assigns Azure Policy definitions that help you ensure a log profile exists and is properly configured for all Azure subscriptions, and activity logs are retained for at least one year.

Networking (6.0)

  • Assigns an Azure Policy definition that helps you ensure Network Watcher is enabled for all regions where resources are deployed.

Virtual Machines (7.0)

  • Assigns an Azure Policy definition that helps you ensure disk encryption is enabled on virtual machines.
  • Assigns an Azure Policy definition that helps you ensure that only approved virtual machine extensions are installed.
  • Assigns Azure Policy definitions that help you ensure that system updates are installed, and endpoint protection is enabled on virtual machines.

Other Security Considerations (8.0)

  • Assigns an Azure Policy definition that helps you ensure that key vault objects are recoverable in the case of accidental deletion.
  • Assigns an Azure Policy definition that helps you ensure role-based access control is used to managed permissions in Kubernetes service clusters

AppService (9.0)

  • Assigns an Azure Policy definition that helps you ensure web applications are accessible only over secure connections.
  • Assigns Azure Policy definitions that help you ensure web applications are only accessible using HTTPS, use the latest version of TLS encryption, and are only reachable by clients with valid certificates.
  • Assigns Azure Policy definitions to ensure that .Net Framework, PHP, Python, Java, and HTTP versions are the latest.

Azure customers seeking to implement compliance with CIS Benchmarks should note that although this Azure Blueprint may help customers assess compliance with particular configuration recommendations, it does not ensure full compliance with all requirements of the CIS Benchmark and CIS Controls. In addition, recommendations are associated with one or more Azure Policy definitions, and the compliance standard includes recommendations that aren’t addressed by any Azure Policy definitions in blueprints at this time. Therefore, compliance in Azure Policy will only consist of a partial view of your overall compliance status.  Customers are ultimately responsible for meeting the compliance requirements applicable to their environments and must determine for themselves whether particular information helps meet their compliance needs.

Learn more about the CIS Microsoft Azure Foundation Benchmark blueprint in our documentation.

Azure is now certified for the ISO/IEC 27701 privacy standard

We are pleased to share that Azure is the first major US cloud provider to achieve certification as a data processor for the new international standard ISO/IEC 27701 Privacy Information Management System (PIMS). The PIMS certification demonstrates that Azure provides a comprehensive set of management and operational controls that can help your organization demonstrate compliance with privacy laws and regulations. Microsoft’s successful audit can also help enable Azure customers to build upon our certification and seek their own certification to more easily comply with an ever-increasing number of global privacy requirements.

Being the first major US cloud provider to achieve a PIMS certification is the latest in a series of privacy firsts for Azure, including being the first to achieve compliance with EU Model clauses. Microsoft was also the first major cloud provider to voluntarily extend the core data privacy rights included in the GDPR (General Data Protection Regulation) to customers around the world.

PIMS is built as an extension of the widely-used ISO/IEC 27001 standard for information security management, making the implementation of PIMS’s privacy information management system a helpful compliance extension for the many organizations that rely on ISO/IEC 27001, as well as creating a strong integration point for aligning security and privacy controls. PIMS accomplishes this through a framework for managing personal data that can be used by both data controllers and data processors, a key distinction for GDPR compliance. In addition, any PIMS audit requires the organization to declare applicable laws/regulations in its criteria for the audit meaning that the standard can be mapped to many of the requirements under GDPR, CCPA (California Consumer Privacy Act), or other laws. This universal framework allows organizations to efficiently operationalize compliance with new regulatory requirements.

PIMS also helps customers by providing a template for implementing compliance with new privacy regulations, helping reduce the need for multiple certifications and audits against new requirements and thereby saving both time and money. This will be critical for supply chain business relationships as well as cross-border data movement. 

This short video demonstrates how Microsoft complies with ISO/IEC 27701 and our compliance benefits customers. 

Schellman & Company LLC issued a certificate of registration for ISO/IEC 27701:2019 that covers the requirements, controls, and guidelines for implementing a privacy information security management system as an extension to ISO/IEC 27001:2013 for privacy management as a personally identifiable information (PII) processor relevant to the information security management system supporting Microsoft Azure, Dynamics, and other online services that are deployed in Azure Public, Government cloud, and Germany Cloud, including their development, operations, and infrastructures and their associated security, privacy, and compliance per the statement of applicability version 2019-02. A copy of the certification is available on the Service Trust Portal.

Modern business is driven by digital transformation, including the ability to deeply understand data and unlock the power of big data analytics and AI. But before customers – and regulators – will allow you to leverage this data, you must first win their trust. Microsoft simplifies this privacy burden with tools that can help you automate privacy, including built-in controls like PIMS. 

Microsoft has longstanding commitments to privacy, and we continue to take steps to give customers more control over their data. Our Trusted Cloud is built on our commitments to privacy, security, transparency, and compliance, and our Trust Center provides access to validated audit reports, data management capabilities, and information about the number of legal demands we received for customer data from law enforcement.

Azure Cost Management 2019 year in review

When we talk about cost management, we focus on three core tenets:

  1. Ensuring cost visibility so everyone is aware of the financial impact their solutions have.
  2. Driving accountability throughout the organization to stop bad spending patterns.
  3. Continuous cost optimization as your usage changes over time to do more with less.

These were the driving forces in 2019 as we set out to build a strong foundation that pulls together all costs across all account types and ensures everyone in the organization has a means to report on, control, and optimize costs. Our ultimate goal is to empower you to lead a healthier, more financially responsible organization.

All costs behind a single pane of glass

On the heels of the Azure Cost Management preview, 2019 started off strong with the general availability of Enterprise Agreement (EA) accounts in February and pay-as-you-go (PAYG) in April. At the same time, Microsoft as a whole embarked on a journey to modernize the entire commerce platform with the new Microsoft Customer Agreement (MCA), which started rolling out for enterprises in March, pay-as-you-go subscriptions in July, and Cloud Solution Providers (CSP) using Azure plan in November. Whether you get Azure through the Microsoft field, directly from Azure.com, or through a Microsoft partner, you have the power of Azure Cost Management at your fingertips. But getting basic coverage of your Azure usage is only part of the story.

To effectively manage costs, you need all costs together, in a single repository. This is exactly what Azure Cost Management brings you. From the unprecedented ability to monitor Amazon Web Services (AWS) costs within the Azure portal in May (a first for any cloud provider), to the inclusion of reservation and Marketplace purchases in June, Azure Cost Management enables you to manage all your costs from a single pane of glass, whether you’re using Azure or AWS.

What’s next?

Support for Sponsorship and CSP subscriptions not on an Azure plan are at the top of the list to ensure every Azure subscription can use Azure Cost Management. AWS support will become generally available and then Google Cloud Platform (GCP) support will be added.

Making it easier to report on and analyze costs

Getting all costs in one place is only the beginning. 2019 also saw many improvements that help you report on and analyze costs. You were able to dig in and explore costs with the 2018 preview, but the only way to truly control and optimize costs is to raise awareness of current spending patterns. To that end, reporting in 2019 was focused on making it easier to customize and share.

The year kicked off with the ability to pin customized views to the Azure portal dashboard in January. You could share links in May, save views directly from cost analysis in August, and download charts as an image in September. You also saw a major Power BI refresh in October that no longer required classic API keys and added reservation details and recommendations. Each option helps you not only save time, but also starts that journey of driving accountability by ensuring everyone is aware of the costs they’re responsible for.

Looking beyond sharing, you also saw new capabilities like forecasting costs in June and switching between currencies in July, simpler out-of-the-box options like the new date picker in May and invoice details view in September, and changes that simply help you get your job done the way you want to like support for the Azure portal dark theme and continuous accessibility improvements throughout the year.

From an API automation and integration perspective, 2019 was also a critical milestone as EA cost and usage APIs moved to Azure Resource Manager. The Resource Manager APIs are forward-looking and designed to minimize your effort when it comes time to transition to Microsoft Customer Agreement by standardizing terminology across account types. If you haven’t started the migration to the Resource Manager APIs, make that your number one resolution for the new year!

What’s next?

2020 will continue down this path, from more flexible reporting and scheduling email notifications to general improvements around ease of use and increased visibility throughout the Azure portal. Power BI will get Azure reservation and Hybrid Benefit reports as well as support for subscription and resource group users who don’t have access to the whole billing account. You can also expect to see continued API improvements to help make it easier than ever to integrate cost data into your business systems and processes.

Flexible cost control that puts the power in your hands

Once you understand what you’re spending and where, your next step is to figure out how to stop the bad spending patterns and keep costs under control. You already know you can define budgets to get notified about and take action on overages. You decide what actions you want to take, whether that be as simple as an email notification or as drastic as deleting all your resources to ensure you won’t be charged. Cost control in 2019 was centered on helping you stay on top of your costs and giving you the tools to control spending as you see fit.

This started with a new, consolidated alerts experience in February where you can see all your invoice, credit, and budget overage alerts in a single place. Budgets were expanded to support new account types we talked about above, and to support management groups in June giving you a view of all your costs across subscriptions. Then in August, you were able to create targeted budgets with filters for fine-grained tracking, whether that be for an entire service, a single resource, or an application that spans multiple subscriptions (via tags). This also came with an improved experience when creating budgets to help you better estimate what your budget should be based on historical and forecasted trends.

What’s next?

2020 will take cost control to the next level by allowing you to split shared costs with cost allocation rules and define an additional markup for central teams who typically run on overhead or don’t want to expose discounts to the organization. We’re also looking at improvements around management groups and tags to give you more flexibility to manage costs the way you need to for your organization.

New ways to save and do more with less

Cloud computing comes with a lot of promises, from flexibility and speed to scalability and security. The promise of cost savings is often the driving force behind cloud migrations, yet is also one of the more elusive to achieve. Luckily, Azure delivers new cost optimization opportunities nearly every month! This is on top of the recommendations offered by Azure Advisor, which are specifically tuned to save money on the resources you already have deployed. Here are a few of the over two dozen new cost saving opportunities you saw in 2019:

What’s next?

Expect to see continued updates in these areas through 2020. We’re also partnering with individual service teams to deliver even more built-in recommendations for database, storage, and PaaS services, just to name a few.

Streamlined account and subscription management

Throughout 2019, you may have noticed a lot of changes to Cost Management + Billing in the Azure portal. What was purely focused on PAYG subscriptions in early 2018 became a central hub for billing administrators in 2019 with full administration for MCA accounts in March, new EA account management capabilities in July, and subscription provisioning and transfer updates in August. All of these are helping you get one step closer to having a single portal to manage every aspect of your account.

What’s next?

2020 will be the year of converged and consolidated experiences for Cost Management + Billing. This will start with the Billing and Cost Management experiences within the Azure portal and will expand to include capabilities you’re currently using the EA, Account, or Cloudyn portals for today. Whichever portal you use, expect to see all these come together into a single, consolidated experience that has more consistency across account types. This will be especially evident as your account moves from the classic EA, PAYG, and CSP programs to Microsoft Customer Agreement (and Azure plan), which is fully managed within the Azure portal and offers critical new billing capabilities, like finer-grained access control and grouping subscriptions into separate invoices.

Looking forward to another year

The past 12 months have been packed with one improvement after another, and we’re just getting started! We couldn’t list them all here, but if you only take one thing away, please do check out and subscribe to the Azure Cost Management monthly updates for the latest news on what’s changed and what’s coming. We’ve already talked about what you can expect to see in 2020 for each area, but the key takeaway is:

2020 will bring one experience to manage all your Azure, AWS, and GCP costs from the Azure portal, with simpler, yet more powerful cost reporting, control, and optimization tools that help you stay more focused on your mission.

We look forward to hearing your feedback as these new and updated capabilities become available. And if you’re interested in the latest features, before they’re available to everyone, check out Azure Cost Management Labs (introduced in July) and don’t hesitate to reach out with any feedback. Cost Management Labs gives you a direct line to the Azure Cost Management engineering team and is the best way to influence and make an immediate impact on features being actively developed and tuned for you.

Follow @AzureCostMgmt on Twitter and subscribe to the YouTube channel for updates, tips, and tricks! And, as always, share your ideas and vote up others in the Cost Management feedback forum. See you in 2020!

Advancing no-impact and low-impact maintenance technologies

“This post continues our reliability series kicked off by my July blog post highlighting several initiatives underway to keep improving platform availability, as part of our commitment to provide a trusted set of cloud services. Today I wanted to double-click on the investments we’ve made in no-impact and low-impact update technologies including hot patching, memory-preserving maintenance, and live migration. We’ve deployed dozens of security and reliability patches to host infrastructure in the past year, many of which were implemented with no customer impact or downtime. The post that follows was written by John Slack from our core operating systems team, who is the Program Manager for several of the update technologies discussed below.” – Mark Russinovich, CTO, Azure


This post was co-authored by Apurva Thanky, Cristina del Amo Casado, and Shantanu Srivastava from the engineering teams responsible for these technologies.

 

We regularly update Azure host infrastructure to improve the reliability, performance, and security of the platform. While the purposes of these ‘maintenance’ updates vary, they typically involve updating software components in the hosting environment or decommissioning hardware. If we go back five years, the only way to apply some of these updates was by fully rebooting the entire host. This approach took customer virtual machines (VMs) down for minutes at a time. Since then, we have invested in a variety of technologies to minimize customer impact when updating the fleet. Today, the vast majority of updates to the host operating system are deployed in place with absolute transparency and zero customer impact using hot patching. In infrequent cases in which the update cannot be hot patched, we typically utilize low-impact memory preserving update technologies to roll out the update.

Even with these technologies, there are still other rare cases in which we need to do more impactful maintenance (including evacuating faulty hardware or decommissioning old hardware). In such cases, we use a combination of live migration, in-VM notifications, and planned maintenance providing customer controls.

Thanks to continued investments in this space, we are at a point where the vast majority of host maintenance activities do not impact the VMs hosted on the affected infrastructure. We’re writing this post to be transparent about the different techniques that we use to ensure that Azure updates are minimally impactful.

Plan A: Hot patching

Function-level “hot” patching provides the ability to make targeted changes to running code without incurring any downtime for customer VMs. It does this by redirecting all new invocations of a function on the host to an updated version of that function, so it is considered a ‘no impact’ update technology. Wherever possible we use hot patching to apply host updates completely avoiding any impact to the VMs running on that host. We have been using hot patching in Azure since 2017. Since then, we have worked to broaden the scope of what we can hot patch. As an example, we updated the host operating system to allow the hypervisor to be hot patched in 2018. Looking forward, we are exploring firmware hot patches. This is a place where the industry typically hasn’t focused. Firmware has always been viewed as ‘if you need to update it, reboot the server,’ but we know that makes for a terrible customer experience. We’ve been working with hardware manufacturers to consider our own firmware to make them hot patchable and incrementally updatable.

Some large host updates contain changes that cannot be applied using function-level hot patching. For those updates, we endeavor to use memory-preserving maintenance.

Plan B: Memory-preserving maintenance

Memory-preserving maintenance involves ‘pausing’ the guest VMs (while preserving their memory in RAM), updating the host server, then resuming the VMs and automatically synchronizing their clocks. We first used memory-preserving maintenance for Azure in 2018. Since then we have improved the technology in three important ways. First, we have developed less impactful variants of memory-preserving maintenance targeted for host components that can be serviced without a host reboot. Second, we have reduced the duration of the customer experienced pause. Third, we have expanded the number of VM types that can be updated with memory preserving maintenance. While we continue to work in this space, some variants of memory-preserving maintenance are still incompatible with some specialized VM offerings like M, N, or H series VMs for a variety of technical reasons.

In the rare case we need to make more impactful maintenance (including host reboots, VM redeployment), customers are notified in advance and given the opportunity to perform the maintenance at a time suitable for their workload(s).

Plan C: Self-service maintenance

Self-service maintenance involves providing customers and partners a window of time, within which they can choose when to initiate impactful maintenance on their VM(s). This initial self-service phase typically lasts around a month and empowers organizations to perform the maintenance on their own schedules so it has no or minimal disruption to users. At the end of this self-service window, a scheduled maintenance phase begins—this is where Azure will perform the maintenance automatically. Throughout both phases, customers get full visibility of which VMs have or have not been updated—in Azure Service Health or by querying in PowerShell/CLI. Azure first offered self-service maintenance in 2018. We generally see that administrators take advantage of the self-service phase rather than wait for Azure to perform maintenance on their VMs automatically.

In addition to this, when the customer owns the full host machine, either using Azure Dedicated Hosts or Isolated virtual machines, we recently started to offer maintenance control over all non-zero impact platform updates. This includes rebootless updates which only cause a few seconds pause. It is useful for VMs running ultra-sensitive workloads which can’t sustain any interruption even if it lasts just for a few seconds. Customers can choose when to apply these non-zero impact updates in a 35-day rolling window. This feature is in public preview, and more information can be found in this dedicated blog post.

Sometimes in-place update technologies aren’t viable, like when a host shows signs of hardware degradation. In such cases, the best option is to initiate a move of the VM to another host, either through customer control via planned maintenance or through live migration.

Plan D: Live migration

Live migration involves moving a running customer VM from one “source” host to another “destination” host. Live migration starts by moving the VM’s local state (including RAM and local storage) from the source to the destination while the virtual machine is still running. Once most of the local state is moved, the guest VM experiences a short pause usually lasting five seconds or less. After that pause, the VM resumes running on the destination host. Azure first started using live migration for maintenance in 2018. Today, when Azure Machine Learning algorithms predict an impending hardware failure, live migration can be used to move guest VMs onto different hosts preemptively.

Amongst other topics, planned maintenance and AI Operations were covered in Igal Figlin’s recent Ignite 2019 session “Building resilient applications in Azure.” Watch the recording here for additional context on these, and to learn more about how to take advantage of the various resilient services Azure provides to help you build applications that are inherently resilient.

The future of Azure maintenance 

In summary, the way in which Azure performs maintenance varies significantly depending on the type of updates being applied. Regardless of the specifics, Azure always approaches maintenance with a view towards ensuring the smallest possible impact to customer workloads. This post has outlined several of the technologies that we use to achieve this, and we are working diligently to continue improving the customer experience. As we look toward the future, we are investing heavily in machine learning-based insights and automation to maintain availability and reliability. Eventually, this “AI Operations” model will carry out preventative maintenance, initiate automated mitigations, and identify contributing factors and dependencies during incidents more effectively than our human engineers can. We look forward to sharing more on these topics as we continue to learn and evolve.

Azure Lighthouse: The managed service provider perspective

This blog post was co-authored by Nikhil Jethava, Senior Program Manager, Azure Lighthouse.

Azure Lighthouse became generally available in July this year and we have seen a tremendous response from Azure managed service provider communities who are excited about the scale and precision of management that the Azure platform now enables with cross tenant management. Similarly, customers are empowered with architecting precise and just enough access levels to service providers for their Azure environments. Both customers and partners can decide on the precise scope of the projection.

Azure Lighthouse enables partners to manage multiple customer tenants from within a single control plane, which is their environment. This enables consistent application of management and automation across hundreds of customers and monitoring and analytics to a degree that was unavailable before. The capability works across Azure services (that are Azure Resource Manager enabled) and across licensing motion. Context switching is a thing of the past now.

In this article, we will answer some of the most commonly asked questions:

  • How can MSPs perform daily administration tasks across different customers in their Azure tenant from a single control plane?
  • How can MSPs secure their intellectual property in the form of code?

Let us deep dive into a few scenarios from the perspective of a managed service provider.

Azure Automation

Your intellectual property is only yours. Service providers, using Azure delegated resource management, are no longer required to create Microsoft Azure Automation runbooks under customers’ subscription or keep their IP in the form of runbooks in someone else’s subscription. Using this functionality, Automation runbooks can now be stored in a service provider’s subscription while the effect of the runbooks will be reflected on the customer’s subscription. All you need to do is ensure the Automation account’s service principal has the required delegated built-in role-based access control (RBAC) role to perform the Automation tasks. Service providers can create Azure Monitor action groups in customer’s subscriptions that trigger Azure Automation runbooks residing in a service provider’s subscription.
    Runbook in MSP subscription

Azure Monitor alerts

Azure Lighthouse allows you to monitor the alerts across different tenants under the same roof. Why go through the hassle of storing the logs ingested by different customer’s resources in a centralized log analytics workspace? This helps your customers stay compliant by allowing them to keep their application logs under their own subscription while empowering you to have a helicopter view of all customers.

Azure Monitor Alerts across tenants

Azure Resource Graph Explorer

With Azure delegated resource management, you can query Azure resources from Azure Resource Graph Explorer across tenants. Imagine a scenario where your boss has asked you for a CSV file that would list the existing Azure Virtual Machines across all the customers’ tenants. The results of the Azure Resource Graph Explorer query now include the tenant ID, which makes it easier for you to identify which Virtual Machine belongs to which customer.

 

Querying Azure resources across tenants 
 

Azure Security Center

Azure Lighthouse provides you with cross-tenant visibility of your current security state. You can now monitor compliance to security policies, take actions on security recommendations, monitor the secure score, detect threats, execute file integrity monitoring (FIM), and more, across the tenants.
Detecting threats across tenants
    Exploring Resource Menu of Cross Tenant VMs

Azure Virtual Machines

Service providers can perform post-deployment tasks on different Azure Virtual Machines from different customer’s tenants using Azure Virtual Machine extensions, Azure Virtual Machine Serial Console, run PowerShell commands using Run command option, and more in the Azure Portal. Most administrative tasks on Azure Virtual Machines across the tenants can now be performed quickly since the dependency on taking remote desktop protocol (RDP) access to the Virtual Machines lessens. This also solves a big challenge since admins now do not require to log on to different Azure Subscriptions in multiple browser tabs just to get to the Virtual Machine’s resource menu.
Exploring Resource Menu of Cross Tenant VMs

Managing user access

Using Azure delegated resource management, MSPs no longer need to create administrator accounts (including contributor, security administrator, backup administrator, and more) in their customer tenants. This allows them to manage the lifecycle of delegated administrators right within their own Microsoft Azure Active Directory (AD) tenant. Moreover, MSPs can add user accounts to the user group in their Azure Active Directory (AD) tenant, while customers make sure those groups have the required access to manage their resources. To revoke access when an employee leaves the MSP’s organization, it can simply be removed from the specific group the access has been delegated to.

Added advantages for Cloud Solution Providers

Cloud Solution Providers (CSPs) can now save on administration time. Once you’ve set up the Azure delegated resource management for your users, there is absolutely no need for them to log in to the Partner Center (found by accessing Customers, Contoso, and finally All Resources) to administer customers’ Azure resources.

Also, Azure delegated resource management happens outside the boundaries of the Partner Center portal. Instead, the delegated user access is managed directly under Azure Active Directory. This means subscription and resource administrators in Cloud Solution Providers are no longer required to have the ‘admin agent’ role in the Partner Center. Therefore, Cloud Solutions Providers can now decide which users in their Azure Active Directory tenant will have access to which customer and to what extent.

More information

This is not all. There is a full feature list available for supported services and scenarios in Azure Lighthouse documentation. Check out Azure Chief Technology Officer Mark Russinovich’s blog for a deep under-the-hood view.

So, what are you waiting for? Get started with Azure Lighthouse today.