Skip to content
KenkoGeek
  • Home
  • News
  • Home
  • News

Category Archives: Virtual Machines

  • Home Azure
  • Archive by category "Virtual Machines"

Azure HBv2 Virtual Machines eclipse 80,000 cores for MPI HPC

HPC-optimized virtual machines now available

Azure HBv2-series Virtual Machines (VMs) are now generally available in the South Central US region. HBv2 VMs will also be available in West Europe, East US, West US 2, North Central US, Japan East soon.

HBv2 VMs deliver supercomputer-class performance, message passing interface (MPI) scalability, and cost efficiency for a variety of real-world high performance computing (HPC) workloads, such as CFD, explicit finite element analysis, seismic processing, reservoir modeling, rendering, and weather simulation.

Azure HBv2 VMs are the first in the public cloud to feature 200 gigabit per second HDR InfiniBand from Mellanox. HDR InfiniBand on Azure delivers latencies as low as 1.5 microseconds, more than 200 million messages per second per VM, and advanced in-network computing engines like hardware offload of MPI collectives and adaptive routing for higher performance on the largest scaling HPC workloads. HBv2 VMs use standard Mellanox OFED drivers that support all RDMA verbs and MPI variants.

Each HBv2 VM features 120 AMD EPYC™ 7002-series CPU cores with clock frequencies up to 3.3 GHz, 480 GB of RAM, 480 MB of L3 cache, and no simultaneous multithreading (SMT). HBv2 VMs provide up to 340 GB/sec of memory bandwidth, which is 45-50 percent more than comparable x86 alternatives and three times faster than what most HPC customers have in their datacenters today. A HBv2 virtual machine is capable of up to 4 double-precision teraFLOPS, and up to 8 single-precision teraFLOPS.

One and three year Reserved Instance, Pay-As-You-Go, and Spot Pricing for HBv2 VMs is available now for both Linux and Windows deployments. For information about five-year Reserved Instances, contact your Azure representative.

Disruptive speed for critical weather forecasting

Numerical Weather Prediction (NWP) and simulation has long been one of the most beneficial use cases for HPC. Using NWP techniques, scientists can better understand and predict the behavior of our atmosphere, which in turn drives advances in everything from coordinating airline traffic, shipping of goods around the globe, ensuring business continuity, and critical disaster preparedness from the most adverse weather. Microsoft recognizes the criticality of this field is to science and society, which is why Azure shares US hourly weather forecast data produced by the Global Forecast System (GFS) from the National Oceanic and Atmospheric Administration (NOAA) as part of the Azure Open Datasets initiative.

Cormac Garvey, a member of the HPC Azure Global team, has extensive experience supporting weather simulation teams on the world’s most powerful supercomputers. Today, he’s published a guide to running the widely-used Weather Research and Forecasting (WRF) Version 4 simulation suite on HBv2 VMs.

Cormac used a 371M grid point simulation of Hurricane Maria, a Category 5 storm that struck the Caribbean in 2017, with a resolution of 1 kilometer. This model was chosen not only as a rigorous benchmark of HBv2 VMs but also because the fast and accurate simulation of dangerous storms is one of the most vital functions of the meteorology community.

WRF v4.1.3 on Azure HBv2 benchmark results

Figure 1: WRF Speedup from 1 to 672 Azure HBv2 VMs.

Nodes

(VMs)

Parallel

Processes

Average Time(s)

per Time Step

Scaling

Efficiency

Speedup

(VM-based)

1

120

18.51

100 percent

1.00

2

240

8.9

104 percent

2.08

4

480

4.37

106 percent

4.24

8

960

2.21

105 percent

8.38

16

1,920

1.16

100 percent

15.96

32

3,840

0.58

100 percent

31.91

64

7,680

0.31

93 percent

59.71

128

15,360

0.131

110 percent

141.30

256

23,040

0.082

88 percent

225.73

512

46,080

0.0456

79 percent

405.92

640

57,600

0.0393

74 percent

470.99

672

80,640

0.0384

72 percent

482.03

Figure 2: Scaling and configuration data for WRF on Azure HBv2 VMs.

Note: for some scaling points, optimal performance is achieved with 30 MPI ranks and 4 threads per rank, while in others 90 MPI ranks was optimal. All tests were run with OpenMPI 4.0.2.

Azure HBv2 VMs executed the “Maria” simulation with mostly super-linear scalability up to 128 VMs (15,360 parallel processes). Improvements from scaling continue up to the largest scale of 672 VMs (80,640 parallel processes) tested in this exercise, where a 482x speedup over a single VM. At 512 nodes (VMs) we observe a ~2.2x performance increase as compared to a leading supercomputer that debuted among the top 20 fastest machines in 2016.

The gating factor to higher levels of scaling efficiency? The 371M grid point model, even as one of the largest known WRF models, is too small at such extreme levels of parallel processing. This opens the door for leading weather forecasting organizations to leverage Azure to build and operationalize even higher resolution models that higher numerical accuracy and a more realistic understanding of these complex weather phenomena.

Visit Cormac’s blog post on the Azure Tech Community to learn how to run WRF on our family of H-series Virtual Machines, including HBv2.

Better, safer product design from hyper-realistic CFD

Computational fluid dynamics (CFD) is core to the simulation-driven businesses of many Azure customers. A common request from customers is to “10x” their capabilities while keeping costs as close to constant as possible. Specifically, customers often seek ways to significantly increase the accuracy of their models by simulating it in higher resolution. Given that many customers already solve CFD problems with ~500-1000 parallel processes per job, this is a tall task that implies linear scaling to at least 5,000-10,000 parallel processes. Last year, Azure accomplished one of these objectives when it became the first public cloud to scale a CFD application to more than 10,000 parallel processes. With the launch of HBv2 VMs, Azure’s CFD capabilities are increasing again.

Jon Shelley, also a member of the Azure Global HPC team, worked with Siemens to validate one its largest CFD simulations ever, a 1 billion cell model of a sports car named after the famed 24 Hours of Le Mans race with a 10x higher-resolution mesh than what Azure tested just last year. Jon has published a guide to running Simcenter STAR-CCM+ at large scale on HBv2 VMs.

Siemens Simcenter Star-CCM+ 14.06 benchmark results

Figure 3: Simcenter STAR-CCM+ Scaling Efficiency from 1 to 640 Azure HBv2 VMs

Nodes

(VMs)

Parallel

Processes

Solver Elapsed Time

Scaling Efficiency

Speedup

(VM-based)

8

928

337.71

100 percent

1.00

16

1,856

164.79

102.5 percent

2.05

32

3,712

82.07

102.9 percent

4.11

64

7,424

41.02

102.9 percent

8.23

128

14,848

20.94

100.8 percent

16.13

256

29,696

12.02

87.8 percent

28.10

320

37,120

9.57

88.2 percent

35.29

384

44,544

7.117

98.9 percent

47.45

512

59,392

6.417

82.2 percent

52.63

640

57,600

5.03

83.9 percent

67.14

Figure 4: Scaling and configuration data for STAR-CCM+ on Azure HBv2 VMs

Note: A given scaling point may achieve optimal performance with 90, 112, 116, or 120 parallel processes per VM. Plotted data below shows optimal performance figures. All tests were run with HPC-X MPI ver. 2.50.

Once again, Azure HBv2 executed the challenging problem with linear efficiency to more than 15,000 parallel processes across 128 VMs. From there, high scaling efficiency continued, peaking at nearly 99 percent at more than 44,000 parallel processes. At the largest scale of 640 VMs and 57,600 parallel processes, HBv2 delivered 84 percent scaling efficiency. This is among the largest scaling CFD simulations with Simcenter STAR-CCM+ ever performed, and now can be replicated by Azure customers.

Visit Jon’s blog post on the Azure Tech Community site to learn how to run Simcenter STAR-CCM+ on our family of H-series Virtual Machines, including HBv2.

Extreme HPC I/O meets cost-efficiency

An increasing scenario on the cloud is on-demand HPC-grade parallel filesystems. The rationale is straight forward; if a customer needs to perform a large quantity of compute, that customer often needs to also move a lot of data into and out of those compute resources. The catch? Simple cost comparisons against traditional on-premises HPC filesystem appliances can be unfavorable, depending on circumstances. With Azure HBv2 VMs, however, NVMeDirect technology can be combined with ultra low-latency RDMA capabilities to deliver on-demand “burst buffer” parallel filesystems at no additional cost beyond the HBv2 VMs already provisioned for compute purposes.

BeeGFS is one such filesystem and has a rapidly growing user base among both entry-level and extreme-scale users. The BeeOND filesystem is even used in production on the novel HPC + AI hybrid supercomputer “Tsubame 3.0.”

Here is a high-level summary of how a sample BeeOND filesystem looks when created across 352 HBv2 VMs, providing 308 terabytes of usable, high-performance namespace.

Overview of example BeeOND filesystem on HBv2 VMs

Figure 5: Overview of example BeeOND filesystem on HBv2 VMs.

Running the widely-used IOR test of parallel filesystems across 352 HBv2 VMs, BeeOND achieved peak read performance of 763 gigabytes per second, and peak write performance of 352 gigabytes per second.

Visit Cormac’s blog post on the Azure Tech Community to learn how to run BeeGFS on RDMA-powered Azure Virtual Machines.

10x-ing the cloud HPC experience

Microsoft Azure is committed to delivering to our customers a world-class HPC experience, and maximum levels of performance, price/performance, and scalability.

“The 2nd Gen AMD EPYC processors provide fantastic core scaling, access to massive memory bandwidth and are the first x86 server processors that support PCIe 4.0; all of these features enable some of the best high-performance computing experiences for the industry,” said Ram Peddibhotla, corporate vice president, Data Center Product Management, AMD. “What Azure has done for HPC in the cloud is amazing; demonstrating that HBv2 VMs and 2nd Gen EPYC processors can deliver supercomputer-class performance, MPI scalability, and cost efficiency for a variety of real-world HPC workloads, while democratizing access to HPC that will help drive the advancement of science and research.”

“200 gigabit HDR InfiniBand delivers high data throughout, extremely low latency, and smart In-Network Computing engines, enabling high performance and scalability for compute and data applications. We are excited to collaborate with Microsoft to bring the InfiniBand advantages into Azure, providing users with leading HPC cloud services” said Gilad Shainer, Senior Vice President of Marketing at Mellanox Technologies. “By taking advantage of InfiniBand RDMA and its MPI acceleration engines, Azure delivers higher performance compared to other cloud options based on Ethernet. We look forward to continuing to work with Microsoft to introduce future generations and capabilities.”

  • Find out more about High Performance Computing in Azure.
  • Running WRF v4 on Azure.
  • Running Siemens Simcenter Star-CCM+ on Azure.
  • Tuning BeeGFS and BeeOND on Azure for Specific I/O Patterns.
  • Azure HPC on Github.
  • Azure HPC CentOS 7.6 and 7.7 images.
  • Learn about Azure Virtual Machines.
  • AMD EPYC™ 7002-series.
  • 28 Feb, 2020
  • (0) Comments
  • By editor
  • Azure, Virtual Machines

Burst 4K encoding on Azure Kubernetes Service

Burst encoding in the cloud with Azure and Media Excel HERO platform.

Content creation has never been as in demand as it is today. Both professional and user-generated content has increased exponentially over the past years. This puts a lot of stress on media encoding and transcoding platforms. Add the upcoming 4K and even 8K to the mix and you need a platform that can scale with these variables. Azure Cloud compute offers a flexible way to grow with your needs. Microsoft offers various tools and products to fully support on-premises, hybrid, or native cloud workloads. Azure Stack offers support to a hybrid scenario for your computing needs and Azure ARC helps you to manage hybrid setups.

Finding a solution

Generally, 4K/UHD live encoding is done on dedicated hardware encoder units, which cannot be hosted in a public cloud like Azure. With such dedicated hardware units hosted on-premise that need to push 4K into the Azure data center the immediate problem we face is a need for high bandwidth network connection between the encoder unit on-premise and Azure data center. In general, it’s a best practice to ingest into multiple regions, increasing the load on the network connected between the encoder and the Azure Datacenter.

How do we ingest 4K content reliably into the public cloud?

Alternatively, we can encode the content in the cloud. If we can run 4K/UHD live encoding in Azure, its output can be ingested into Azure Media Services over the intra-Azure network backbone which provides sufficient bandwidth and reliability.

How can we reliably run and scale 4K/UHD live encoding on the Azure cloud as a containerized solution? Let’s explore below. 

Azure Kubernetes Service

With Azure Kubernetes Services (AKS) Microsoft offers a managed Kubernetes platform to customers. It is a hosted Kubernetes platform without having to spend a lot of time creating a cluster with all the necessary configuration burden like networking, cluster masters, and OS patching of the cluster nodes. It also comes with pre-configured monitoring seamlessly integrating with Azure Monitor and Log Analytics. Of course, it still offers flexibility to integrate your own tools. Furthermore, it is still just the plain vanilla Kubernetes and as such is fully compatible with any existing tooling you might have running on any other standard Kubernetes platform.

Media Excel encoding

Media Excel is an encoding and transcoding vendor offering physical appliance and software-based encoding solutions. Media Excel has been partnering with Microsoft for many years and engaging in Azure media customer projects. They are also listed as recommended and tested contribution encoder for Azure Media Services for fMP4. There has also work done by both Media Excel and Microsoft to integrate SCTE-35 timed metadata from Media Excel encoder to an Azure Media Services Origin supporting Server-Side Ad Insertion (SSAI) workflows.

Networking challenge

With increasing picture quality like 4K and 8K, the burden on both compute and networking becomes a significant architecting challenge. In a recent engagement with a customer, we needed to architect a 4K live streaming platform with a challenge of limited bandwidth capacity from the customer premises to one of our Azure Datacenters. We worked with Media Excel to build a scalable containerized encoding platform on AKS. Utilizing cloud compute and minimizing network latency between Encoder and Azure Media Services Packager. Multiple bitrates with a top bitrate up to [email protected] of the same source are generated in the cloud and ingested into the Azure Media Services platform for further processing. This includes Dynamic Encryption and Packaging. This setup enables the following benefits:

  • Instant scale to multiple AKS nodes
  • Eliminate network constraints between customer and Azure Datacenter
  • Automated workflow for containers and easy separation of concern with container technology
  • Increased level of security of high-quality generated content to distribution
  • Highly redundant capability
  • Flexibility to provide various types of Node pools for optimized media workloads

In this particular test, we proved that the intra-Azure network is extremely capable of shipping high bandwidth, latency-sensitive 4K packets from a containerized encoder instance running in West Europe to both East US and Honk Kong Datacenter Regions. This allows the customer to place origin closer to them for further content conditioning.

High-level Architecture of used Azure components for 4K encoding in the Azure cloud.

Workflow:

  1. Azure Pipeline is triggered to deploy onto the AKS cluster. In the YAML file (which you can find on Github) there is a reference to the Media Excel Container in Azure Container Registry.
  2. AKS starts deployment and pulls container from Azure Container Registry.
  3. During Container start custom PHP script is loaded and container is added to the HMS (Hero Management Service). And placed into the correct device pool and job.
  4. Encoder loads source and (in this case) push 4K Livestream into Azure Media Services.
  5. Media Services packaged Livestream into multiple formats and apply DRM (digital rights management).
  6. Azure Content Deliver Network scales livestream.

Scale through Azure Container Instances

With Azure Kubernetes Services you get the power of Azure Container Instances out of the box. Azure Container Instances are a way to instantly scale to pre-provisioned compute power at your disposal. When deploying Media Excel encoding instances to AKS you can specify where these instances will be created. This offers you the flexibility to work with variables like increased density on cheaper nodes for low-cost low priority encoding jobs or more expensive nodes for high throughput high priority jobs. With Azure Container Instances you can instantly move workloads to standby compute power without provisioning time. You only pay for the compute time offering full flexibility for customer demand and future change in platform needs. With Media Excel’s flexible Live/File based encoding roles you can easily move workloads across different compute power offered by AKS and Azure Container Instances.

Container Creating on Azure Kubernetes Services (AKS)

Media Excel Hero Management System showing all Container Instances.

Azure DevOps pipeline to bring it all together

All the general benefits that come with containerized workload apply in the following case. For this particular proof-of-concept, we created an automated deployment pipeline in Azure DevOps for easy testing and deployment. With a deployment YAML and Pipeline YAML we can easily automate deployment, provisioning and scaling of a Media Excel encoding container. Once DevOps pushes the deployment job onto AKS a container image is pulled from Azure Container Registry. Although container images can be bulky utilizing node side caching of layers any additional container pull is greatly improved down to seconds. With the help of Media Excel, we created a YAML file container pre- and post-container lifecycle logic that will add and remove a container from Media Excel’s management portal. This offers an easy single pane of glass management of multiple instances across multiple node types, clusters, and regions.

This deployment pipeline offers full flexibility to provision certain multi-tenant customers or job priority on specific node types. This unlocks the possibility of provision encoding jobs on GPU enabled nodes for maximum throughput or using cheaper generic nodes for low priority jobs.

Deployment Release Pipeline in Azure DevOps.

Azure Media Services and Azure Content Delivery Network

Finally, we push the 4K stream into Azure Media Services. Azure Media Services is a cloud-based platform that enables you to build solutions that achieve broadcast-quality video streaming, enhance accessibility and distribution, analyze content, and much more. Whether you’re an app developer, a call center, a government agency, or an entertainment company, Media Services helps you create apps that deliver media experiences of outstanding quality to large audiences on today’s most popular mobile devices and browsers.

Azure Media Services is seamlessly integrated with Azure Content Delivery Network. With Azure Content Delivery Network we offer a true multi CDN with choices of Azure Content Delivery Network from Microsoft, Azure Content Delivery Network from Verizon, and Azure Content Delivery Network from Akamai. All of this through a single Azure Content Delivery Network API for easy provisioning and management. As an added benefit, all CDN traffic between Azure Media Services Origin and CDN edge is free of charge.

With this setup, we’ve demonstrated that Cloud encoding is ready to handle real-time 4K encoding across multiple clusters. Thanks to Azure services like AKS, Container Registry, Azure DevOps, Media Services, and Azure Content Delivery Network, we demonstrated how easy it is to create an architecture that is capable of meeting high throughput time-sensitive constraints.

  • 25 Feb, 2020
  • (0) Comments
  • By editor
  • Azure, DevOps, Media Services & CDN, Networking, Partner, Virtual Machines

Announcing the preview of Azure Shared Disks for clustered applications

Today, we are announcing the limited preview of Azure Shared Disks, the industry’s first shared cloud block storage. Azure Shared Disks enables the next wave of block storage workloads migrating to the cloud including the most demanding enterprise applications, currently running on-premises on Storage Area Networks (SANs). These include clustered databases, parallel file systems, persistent containers, and machine learning applications. This unique capability enables customers to run latency-sensitive workloads, without compromising on well-known deployment patterns for fast failover and high availability. This includes applications built for Windows or Linux-based clustered filesystems like Global File System 2 (GFS2).

With Azure Shared Disks, customers now have the flexibility to migrate clustered environments running on Windows Server, including Windows Server 2008 (which has reached End-of-Support), to Azure. This capability is designed to support SQL Server Failover Cluster Instances (FCI), Scale-out File Servers (SoFS), Remote Desktop Servers (RDS), and SAP ASCS/SCS running on Windows Server.

We encourage you to get started and request access by filling out this form.

Leveraging Azure Shared Disks

Azure Shared Disks provides a consistent experience for applications running on clustered environments today. This means that any application that currently leverages SCSI Persistent Reservations (PR) can use this well-known set of commands to register nodes in the cluster to the disk. The application can then choose from a range of supported access modes for one or more nodes to read or write to the disk. These applications can deploy in highly available configurations while also leveraging Azure Disk durability guarantees.

The below diagram illustrates a sample two-node clustered database application orchestrating failover from one node to the other.
   2-node failover cluster
The flow is as follows:

  1. The clustered application running on both Azure VM 1 and  Azure VM 2 registers the intent to read or write to the disk.
  2. The application instance on Azure VM 1 then takes an exclusive reservation to write to the disk.
  3. This reservation is enforced on Azure Disk and the database can now exclusively write to the disk. Any writes from the application instance on Azure VM 2 will not succeed.
  4. If the application instance on Azure VM 1 goes down, the instance on Azure VM 2 can now initiate a database failover and take-over of the disk.
  5. This reservation is now enforced on the Azure Disk, and it will no longer accept writes from the application on Azure VM 1. It will now only accept writes from the application on Azure VM 2.
  6. The clustered application can complete the database failover and serve requests from Azure VM 2.

The below diagram illustrates another common workload consists of multiple nodes reading data from the disk to run parallel jobs, for example, training of Machine Learning models.
   n-node cluster with multiple readers
The flow is as follows:

  1. The application registers all Virtual Machines registers to the disk.
  2. The application instance on Azure VM 1 then takes an exclusive reservation to write to the disk while opening up reads from other Virtual Machines.
  3. This reservation is enforced on Azure Disk.
  4. All nodes in the cluster can now read from the disk. Only one node writes results back to the disk on behalf of all the nodes in the cluster.

Disk types, sizes, and pricing

Azure Shared Disks are available on Premium SSDs and supports disk sizes including and greater than P15 (i.e. 256 GB). Support for Azure Ultra Disk will be available soon. Azure Shared Disks can be enabled as data disks only (not OS Disks). Each additional mount to an Azure Shared Disk (Premium SSDs) will be charged based on disk size. Please refer to the Azure Disks pricing page for details on limited preview pricing.

Azure Shared Disks vs Azure Files

Azure Shared Disks provides shared access to block storage which can be leveraged by multiple virtual machines. You will need to use a common Windows and Linux-based cluster manager like Windows Server Failover Cluster (WSFC), Pacemaker, or Corosync for node-to-node communication and to enable write locking. If you are looking for a fully-managed files service on Azure that can be accessed using Server Message Block (SMB) or Network File System (NFS) protocol, check out Azure Premium Files or Azure NetApp Files.

Getting started

You can create Azure Shared Disks using Azure Resource Manager templates. For details on how to get started and use Azure Shared Disks in preview, please refer to the documentation page. For updates on regional availability and Ultra Disk availability, please refer to the Azure Disks FAQ. Here is a video of Mark Russinovich from Microsoft Ignite 2019 covering Azure Shared Disks.

In the coming weeks, we will be enabling Portal and SDK support. Support for Azure Backup and  Azure Site Recovery is currently not available. Refer to the Managed Disks documentation for detailed instructions on all disk operations.

If you are interested in participating in the preview, you can now get started by requesting access.

  • 14 Feb, 2020
  • (0) Comments
  • By editor
  • Announcements, Azure, Virtual Machines

MSC Mediterranean Shipping Company on Azure Site Recovery

Today’s Q&A post covers an interview between Siddharth Deekshit, Program Manager, Microsoft Azure Site Recovery engineering and Quentin Drion, IT Director of Infrastructure and Operations, MSC. MSC is a global shipping and logistics business, our conversation focused on their organization’s journey with Azure Site Recovery (ASR). To learn more about achieving resilience in Azure, refer to this whitepaper.

I wanted to start by understanding the transformation journey that MSC is going through, including consolidating on Azure. Can you talk about how Azure is helping you run your business today?

We are a shipping line, so we move containers worldwide. Over the years, we have developed our own software to manage our core business. We have a different set of software for small, medium, and large entities, which were running on-premises. That meant we had to maintain a lot of on-premises resources to support all these business applications. A decision was taken a few years ago to consolidate all these business workloads inside Azure regardless of the size of the entity. When we are migrating, we turn off what we have on-premises and then start using software hosted in Azure and provide it as a service for our subsidiaries. This new design is managed in a centralized manner by an internal IT team.

That’s fantastic. Consolidation is a big benefit of using Azure. Apart from that, what other benefits do you see of moving to Azure?

For us, automation is a big one that is a huge improvement, the capabilities in terms of API in the integration and automation that we can have with Azure allows us to deploy environments in a matter of hours where before that it took much, much longer as we had to order the hardware, set it up, and then configure. Now we no longer need to worry about the set up as well as hardware support, and warranties. The environment is all virtualized and we can, of course, provide the same level of recovery point objective (RPO), recovery time objective (RTO), and security to all the entities that we have worldwide.

Speaking of RTO and RPO, let’s talk a little bit about Site Recovery. Can you tell me what life was like before using Site Recovery?

Actually, when we started migrating workloads, we had a much more traditional approach, in the sense that we were doing primary production workloads in one Azure region, and we were setting up and managing a complete disaster recovery infrastructure in another region. So the traditional on-premises data center approach was really how we started with disaster recovery (DR) on Azure, but then we spent the time to study what Site Recovery could provide us. Based on the findings and some testing that we performed, we decided to change the implementation that we had in place for two to three years and switch to Site Recovery, ultimately to reduce our cost significantly, since we no longer have to keep our DR Azure Virtual Machines running in another region. In terms of management, it’s also easier for us. For traditional workloads, we have better RPO and RTO than we saw with our previous approach. So we’ve seen great benefits across the board.

That’s great to know. What were you most skeptical about when it came to using Site Recovery? You mentioned that your team ran tests, so what convinced you that Site Recovery was the right choice?

It was really based on the tests that we did. Earlier, we were doing a lot of manual work to switch to the DR region, to ensure that domain name system (DNS) settings and other networking settings were appropriate, so there were a lot of constraints. When we tested it compared to this manual way of doing things, Site Recovery worked like magic. The fact that our primary region could fail and that didn’t require us to do a lot was amazing. Our applications could start again in the DR region and we just had to manage the upper layer of the app to ensure that it started correctly. We were cautious about this app restart, not because of the Virtual Machine(s), because we were confident that Site Recovery would work, but because of our database engine. We were positively surprised to see how well Site Recovery works. All our teams were very happy about the solution and they are seeing the added value of moving to this kind of technology for them as operational teams, but also for us in management to be able to save money, because we reduced the number of Virtual Machines that we had that were actually not being used.

Can you talk to me a little bit about your onboarding experience with Site Recovery?

I think we had six or seven major in house developed applications in Azure at that time. We picked one of these applications as a candidate for testing. The test was successful. We then extended to a different set of applications that were in production. There were again no major issues. The only drawback we had was with some large disks. Initially, some of our larger disks were not supported. This was solved quickly and since then it has been, I would say, really straightforward. Based on the success of our testing, we worked to switch all the applications we have on the platform to use Site Recovery for disaster recovery.

Can you give me a sense of what workloads you are running on your Azure Virtual Machines today? How many people leverage the applications running on those Virtual Machines for their day job?

So it’s really core business apps. There is, of course, the main infrastructure underneath, but what we serve is business applications that we have written internally, presented to Citrix frontend in Azure. These applications do container bookings, customer registrations, etc. I mean, we have different workloads associated with the complete process of shipping. In terms of users, we have some applications that are being used by more than 5,000 people, and more and more it’s becoming their primary day-to-day application.

Wow, that’s a ton of usage and I’m glad you trust Site Recovery for your DR needs. Can you tell me a little bit about the architecture of those workloads?

Most of them are Windows-based workloads. The software that gets the most used worldwide is a 3-tier application. We have a database on SQL, a middle-tier server, application server, and also some web frontend servers. But for the new one that we have developed now, it’s based on microservices. There are also some Linux servers being used for specific usage.

Tell me more about your experience with Linux.

Site Recovery works like a charm with Linux workloads. We only had a few mistakes in the beginning, made on our side. We wanted to use a product from Red Hat called Satellite for updates, but we did not realize that we cannot change the way that the Virtual Machines are being managed if you want to use Satellite. It needs to be defined at the beginning otherwise it’s too late. But besides this, the ‘bring your own license’ story works very well and especially with Site Recovery.

Glad to hear that you found it to be a seamless experience. Was there any other aspect of Site Recovery that impressed you, or that you think other organizations should know about?

For me, it’s the capability to be able to perform drills in an easy way. With the more traditional approach, each time that you want to do a complete disaster recovery test, it’s always time and resource-consuming in terms of preparation. With Site Recovery, we did a test a few weeks back on the complete environment and it was really easy to prepare. It was fast to do the switch to the recovery region, and just as easy to bring back the workload to the primary region. So, I mean for me today, it’s really the ease of using Site Recovery.

If you had to do it all over again, what would you do differently on your Site Recovery Journey?

I would start to use it earlier. If we hadn’t gone with the traditional active-passive approach, I think we could have saved time and money for the company. On the other hand, we were in this way confident in the journey. Other than that, I think we wouldn’t have changed much. But what we want to do now, is start looking at Azure Site Recovery services to be able to replicate workloads running on on-premises Virtual Machines in Hyper-V. For those applications that are still not migrated to Azure, we want to at least ensure proper disaster recovery. We also want to replicate some VMware Virtual Machines that we still have as part of our migration journey to Hyper-V. This is what we are looking at.

Do you have any advice for folks for other prospective or current customers of Site Recovery?

One piece of advice that I could share is to suggest starting sooner and if required, smaller. Start using Site Recovery even if it’s on one small app. It will help you see the added value, and that will help you convince the operational teams that there is a lot of value and that they can trust the services that Site Recovery is providing instead of trying to do everything on their own.

That’s excellent advice. Those were all my questions, Quentin. Thanks for sharing your experiences.

Learn more about resilience with Azure. 

  • 24 Jan, 2020
  • (0) Comments
  • By editor
  • Azure, Management, Virtual Machines

MSC Mediterranean Shipping Company on Azure Site Recovery

Today’s Q&A post covers an interview between Siddharth Deekshit, Program Manager, Microsoft Azure Site Recovery engineering and Quentin Drion, IT Director of Infrastructure and Operations, MSC. MSC is a global shipping and logistics business, our conversation focused on their organization’s journey with Azure Site Recovery (ASR). To learn more about achieving resilience in Azure, refer to this whitepaper.

I wanted to start by understanding the transformation journey that MSC is going through, including consolidating on Azure. Can you talk about how Azure is helping you run your business today?

We are a shipping line, so we move containers worldwide. Over the years, we have developed our own software to manage our core business. We have a different set of software for small, medium, and large entities, which were running on-premises. That meant we had to maintain a lot of on-premises resources to support all these business applications. A decision was taken a few years ago to consolidate all these business workloads inside Azure regardless of the size of the entity. When we are migrating, we turn off what we have on-premises and then start using software hosted in Azure and provide it as a service for our subsidiaries. This new design is managed in a centralized manner by an internal IT team.

That’s fantastic. Consolidation is a big benefit of using Azure. Apart from that, what other benefits do you see of moving to Azure?

For us, automation is a big one that is a huge improvement, the capabilities in terms of API in the integration and automation that we can have with Azure allows us to deploy environments in a matter of hours where before that it took much, much longer as we had to order the hardware, set it up, and then configure. Now we no longer need to worry about the set up as well as hardware support, and warranties. The environment is all virtualized and we can, of course, provide the same level of recovery point objective (RPO), recovery time objective (RTO), and security to all the entities that we have worldwide.

Speaking of RTO and RPO, let’s talk a little bit about Site Recovery. Can you tell me what life was like before using Site Recovery?

Actually, when we started migrating workloads, we had a much more traditional approach, in the sense that we were doing primary production workloads in one Azure region, and we were setting up and managing a complete disaster recovery infrastructure in another region. So the traditional on-premises data center approach was really how we started with disaster recovery (DR) on Azure, but then we spent the time to study what Site Recovery could provide us. Based on the findings and some testing that we performed, we decided to change the implementation that we had in place for two to three years and switch to Site Recovery, ultimately to reduce our cost significantly, since we no longer have to keep our DR Azure Virtual Machines running in another region. In terms of management, it’s also easier for us. For traditional workloads, we have better RPO and RTO than we saw with our previous approach. So we’ve seen great benefits across the board.

That’s great to know. What were you most skeptical about when it came to using Site Recovery? You mentioned that your team ran tests, so what convinced you that Site Recovery was the right choice?

It was really based on the tests that we did. Earlier, we were doing a lot of manual work to switch to the DR region, to ensure that domain name system (DNS) settings and other networking settings were appropriate, so there were a lot of constraints. When we tested it compared to this manual way of doing things, Site Recovery worked like magic. The fact that our primary region could fail and that didn’t require us to do a lot was amazing. Our applications could start again in the DR region and we just had to manage the upper layer of the app to ensure that it started correctly. We were cautious about this app restart, not because of the Virtual Machine(s), because we were confident that Site Recovery would work, but because of our database engine. We were positively surprised to see how well Site Recovery works. All our teams were very happy about the solution and they are seeing the added value of moving to this kind of technology for them as operational teams, but also for us in management to be able to save money, because we reduced the number of Virtual Machines that we had that were actually not being used.

Can you talk to me a little bit about your onboarding experience with Site Recovery?

I think we had six or seven major in house developed applications in Azure at that time. We picked one of these applications as a candidate for testing. The test was successful. We then extended to a different set of applications that were in production. There were again no major issues. The only drawback we had was with some large disks. Initially, some of our larger disks were not supported. This was solved quickly and since then it has been, I would say, really straightforward. Based on the success of our testing, we worked to switch all the applications we have on the platform to use Site Recovery for disaster recovery.

Can you give me a sense of what workloads you are running on your Azure Virtual Machines today? How many people leverage the applications running on those Virtual Machines for their day job?

So it’s really core business apps. There is, of course, the main infrastructure underneath, but what we serve is business applications that we have written internally, presented to Citrix frontend in Azure. These applications do container bookings, customer registrations, etc. I mean, we have different workloads associated with the complete process of shipping. In terms of users, we have some applications that are being used by more than 5,000 people, and more and more it’s becoming their primary day-to-day application.

Wow, that’s a ton of usage and I’m glad you trust Site Recovery for your DR needs. Can you tell me a little bit about the architecture of those workloads?

Most of them are Windows-based workloads. The software that gets the most used worldwide is a 3-tier application. We have a database on SQL, a middle-tier server, application server, and also some web frontend servers. But for the new one that we have developed now, it’s based on microservices. There are also some Linux servers being used for specific usage.

Tell me more about your experience with Linux.

Site Recovery works like a charm with Linux workloads. We only had a few mistakes in the beginning, made on our side. We wanted to use a product from Red Hat called Satellite for updates, but we did not realize that we cannot change the way that the Virtual Machines are being managed if you want to use Satellite. It needs to be defined at the beginning otherwise it’s too late. But besides this, the ‘bring your own license’ story works very well and especially with Site Recovery.

Glad to hear that you found it to be a seamless experience. Was there any other aspect of Site Recovery that impressed you, or that you think other organizations should know about?

For me, it’s the capability to be able to perform drills in an easy way. With the more traditional approach, each time that you want to do a complete disaster recovery test, it’s always time and resource-consuming in terms of preparation. With Site Recovery, we did a test a few weeks back on the complete environment and it was really easy to prepare. It was fast to do the switch to the recovery region, and just as easy to bring back the workload to the primary region. So, I mean for me today, it’s really the ease of using Site Recovery.

If you had to do it all over again, what would you do differently on your Site Recovery Journey?

I would start to use it earlier. If we hadn’t gone with the traditional active-passive approach, I think we could have saved time and money for the company. On the other hand, we were in this way confident in the journey. Other than that, I think we wouldn’t have changed much. But what we want to do now, is start looking at Azure Site Recovery services to be able to replicate workloads running on on-premises Virtual Machines in Hyper-V. For those applications that are still not migrated to Azure, we want to at least ensure proper disaster recovery. We also want to replicate some VMware Virtual Machines that we still have as part of our migration journey to Hyper-V. This is what we are looking at.

Do you have any advice for folks for other prospective or current customers of Site Recovery?

One piece of advice that I could share is to suggest starting sooner and if required, smaller. Start using Site Recovery even if it’s on one small app. It will help you see the added value, and that will help you convince the operational teams that there is a lot of value and that they can trust the services that Site Recovery is providing instead of trying to do everything on their own.

That’s excellent advice. Those were all my questions, Quentin. Thanks for sharing your experiences.

Learn more about resilience with Azure. 

  • 24 Jan, 2020
  • (0) Comments
  • By editor
  • Azure, Management, Virtual Machines

MSC Mediterranean Shipping Company on Azure Site Recovery, “ASR worked like magic”

Today’s Q&A post covers an interview between Siddharth Deekshit, Program Manager, Microsoft Azure Site Recovery engineering and Quentin Drion, IT Director of Infrastructure and Operations, MSC. MSC is a global shipping and logistics business, our conversation focused on their organization’s journey with Azure Site Recovery (ASR). To learn more about achieving resilience in Azure, refer to this whitepaper.

I wanted to start by understanding the transformation journey that MSC is going through, including consolidating on Azure. Can you talk about how Azure is helping you run your business today?

We are a shipping line, so we move containers worldwide. Over the years, we have developed our own software to manage our core business. We have a different set of software for small, medium, and large entities, which were running on-premises. That meant we had to maintain a lot of on-premises resources to support all these business applications. A decision was taken a few years ago to consolidate all these business workloads inside Azure regardless of the size of the entity. When we are migrating, we turn off what we have on-premises and then start using software hosted in Azure and provide it as a service for our subsidiaries. This new design is managed in a centralized manner by an internal IT team.

That’s fantastic. Consolidation is a big benefit of using Azure. Apart from that, what other benefits do you see of moving to Azure?

For us, automation is a big one that is a huge improvement, the capabilities in terms of API in the integration and automation that we can have with Azure allows us to deploy environments in a matter of hours where before that it took much, much longer as we had to order the hardware, set it up, and then configure. Now we no longer need to worry about the set up as well as hardware support, and warranties. The environment is all virtualized and we can, of course, provide the same level of recovery point objective (RPO), recovery time objective (RTO), and security to all the entities that we have worldwide.

Speaking of RTO and RPO, let’s talk a little bit about Site Recovery. Can you tell me what life was like before using Site Recovery?

Actually, when we started migrating workloads, we had a much more traditional approach, in the sense that we were doing primary production workloads in one Azure region, and we were setting up and managing a complete disaster recovery infrastructure in another region. So the traditional on-premises data center approach was really how we started with disaster recovery (DR) on Azure, but then we spent the time to study what Site Recovery could provide us. Based on the findings and some testing that we performed, we decided to change the implementation that we had in place for two to three years and switch to Site Recovery, ultimately to reduce our cost significantly, since we no longer have to keep our DR Azure Virtual Machines running in another region. In terms of management, it’s also easier for us. For traditional workloads, we have better RPO and RTO than we saw with our previous approach. So we’ve seen great benefits across the board.

That’s great to know. What were you most skeptical about when it came to using Site Recovery? You mentioned that your team ran tests, so what convinced you that Site Recovery was the right choice?

It was really based on the tests that we did. Earlier, we were doing a lot of manual work to switch to the DR region, to ensure that domain name system (DNS) settings and other networking settings were appropriate, so there were a lot of constraints. When we tested it compared to this manual way of doing things, Site Recovery worked like magic. The fact that our primary region could fail and that didn’t require us to do a lot was amazing. Our applications could start again in the DR region and we just had to manage the upper layer of the app to ensure that it started correctly. We were cautious about this app restart, not because of the Virtual Machine(s), because we were confident that Site Recovery would work, but because of our database engine. We were positively surprised to see how well Site Recovery works. All our teams were very happy about the solution and they are seeing the added value of moving to this kind of technology for them as operational teams, but also for us in management to be able to save money, because we reduced the number of Virtual Machines that we had that were actually not being used.

Can you talk to me a little bit about your onboarding experience with Site Recovery?

I think we had six or seven major in house developed applications in Azure at that time. We picked one of these applications as a candidate for testing. The test was successful. We then extended to a different set of applications that were in production. There were again no major issues. The only drawback we had was with some large disks. Initially, some of our larger disks were not supported. This was solved quickly and since then it has been, I would say, really straightforward. Based on the success of our testing, we worked to switch all the applications we have on the platform to use Site Recovery for disaster recovery.

Can you give me a sense of what workloads you are running on your Azure Virtual Machines today? How many people leverage the applications running on those Virtual Machines for their day job?

So it’s really core business apps. There is, of course, the main infrastructure underneath, but what we serve is business applications that we have written internally, presented to Citrix frontend in Azure. These applications do container bookings, customer registrations, etc. I mean, we have different workloads associated with the complete process of shipping. In terms of users, we have some applications that are being used by more than 5,000 people, and more and more it’s becoming their primary day-to-day application.

Wow, that’s a ton of usage and I’m glad you trust Site Recovery for your DR needs. Can you tell me a little bit about the architecture of those workloads?

Most of them are Windows-based workloads. The software that gets the most used worldwide is a 3-tier application. We have a database on SQL, a middle-tier server, application server, and also some web frontend servers. But for the new one that we have developed now, it’s based on microservices. There are also some Linux servers being used for specific usage.

Tell me more about your experience with Linux.

Site Recovery works like a charm with Linux workloads. We only had a few mistakes in the beginning, made on our side. We wanted to use a product from Red Hat called Satellite for updates, but we did not realize that we cannot change the way that the Virtual Machines are being managed if you want to use Satellite. It needs to be defined at the beginning otherwise it’s too late. But besides this, the ‘bring your own license’ story works very well and especially with Site Recovery.

Glad to hear that you found it to be a seamless experience. Was there any other aspect of Site Recovery that impressed you, or that you think other organizations should know about?

For me, it’s the capability to be able to perform drills in an easy way. With the more traditional approach, each time that you want to do a complete disaster recovery test, it’s always time and resource-consuming in terms of preparation. With Site Recovery, we did a test a few weeks back on the complete environment and it was really easy to prepare. It was fast to do the switch to the recovery region, and just as easy to bring back the workload to the primary region. So, I mean for me today, it’s really the ease of using Site Recovery.

If you had to do it all over again, what would you do differently on your Site Recovery Journey?

I would start to use it earlier. If we hadn’t gone with the traditional active-passive approach, I think we could have saved time and money for the company. On the other hand, we were in this way confident in the journey. Other than that, I think we wouldn’t have changed much. But what we want to do now, is start looking at Azure Site Recovery services to be able to replicate workloads running on on-premises Virtual Machines in Hyper-V. For those applications that are still not migrated to Azure, we want to at least ensure proper disaster recovery. We also want to replicate some VMware Virtual Machines that we still have as part of our migration journey to Hyper-V. This is what we are looking at.

Do you have any advice for folks for other prospective or current customers of Site Recovery?

One piece of advice that I could share is to suggest starting sooner and if required, smaller. Start using Site Recovery even if it’s on one small app. It will help you see the added value, and that will help you convince the operational teams that there is a lot of value and that they can trust the services that Site Recovery is providing instead of trying to do everything on their own.

That’s excellent advice. Those were all my questions, Quentin. Thanks for sharing your experiences.

Learn more about resilience with Azure. 

  • 23 Jan, 2020
  • (0) Comments
  • By editor
  • Azure, Management, Virtual Machines

Turning to a new chapter of Windows Server innovation

Today, January 14, 2020, marks the end of support for Windows Server 2008 and Windows Server 2008 R2. Customers loved these releases, which introduced advancements such as the shift from 32-bit to 64-bit computing and server virtualization. While support for these popular releases ends today, we are excited about new innovations in cloud computing, hybrid cloud, and data that can help server workloads get ready for the new era.

We want to thank customers for trusting Microsoft as their technology partner. We also want to make sure that we work with all our customers to support them through this transition while applying the latest technology innovations to modernize their server workloads.

We are pleased to offer multiple options to as you make this transition. Learn how you can take advantage of cloud computing in combination with Windows Server as you make this transition. Here are some of our customers that are using Azure for their Windows Server workloads.

Customers using Azure for their Windows Server workloads

Customers such as All Scripts, Tencent, Alaska Airlines, and Altair Engineering are using Azure to modernize their apps and services. One great example of this is from JB Hunt Transport Services, Inc. which has over 3.5 million trucks on the road every single day.

See how JB Hunt has driven their digital transformation with Azure:

JB Hunt truck, linking to video

How you can take advantage of Azure for your Windows Server workloads

You can deploy Windows Server workloads in Azure in various ways such as Virtual Machines on Azure, Azure VMware Solutions and Azure Dedicated Host. You can apply Azure Hybrid Benefit to use existing Windows Server licenses in Azure. The benefits are immediate and tangible, Azure Hybrid Benefit alone saves 40 percent in cost. Use the Azure Total Cost of Ownership Calculator to estimate your savings by migrating your workloads to Azure.

As you transition your Windows Server workloads to the cloud, Azure offers additional app modernization options. For example, you can migrate Remote Desktop Service to Windows Virtual Desktop on Azure, which offers the best virtual desktop experience, multi-session Windows 10, and elastic scale. You can migrate on-premises SQL Server to Azure SQL database, which offers Hyperscale, artificial intelligence, and advanced threat detection to modernize and secure your databases. Plus, you can future proof your apps, no more patching and upgrades, which is a huge benefit to many IT organizations.

Free extended security updates on Azure

We understand comprehensive upgrades are traditionally a time-consuming process for many organizations. To ensure that you can continue to protect your workloads, you can take advantage of three years of extended security updates, which you can learn more about here, for your Windows Server 2008 and Windows Server 2008 R2 servers only on Azure. This will allow you more time to plan the transition paths for your business-critical apps and services.

How you can take advantage of latest innovations in Windows Server on-premises

If your business requires that your servers must stay on-premises, we recommend upgrading to the latest Windows Server.

Windows Server 2019 is the latest and the most quickly adopted Windows Server version ever. Millions of instances have been deployed by customers worldwide. Hybrid capabilities of Windows Server 2019 have been designed to help customers integrate Windows Server on-premises with Azure on their own terms. Windows Server 2019 adds additional layers of security such as Windows Defender Advanced Threat Protection (ATP) and Defender Exploit Guard, which improves even further when you connect to Azure. With Kubernetes support for Windows containers, you can deploy modern-containerized Windows apps on premises or on Azure.

With Windows Server running on-premises, you can still leverage Azure services for backup, update management, monitoring and security. To learn how you can start using these capabilities, we recommend trying Windows Admin Center – a free, browser-based app included as part of Windows Sever licenses that makes server management easier than ever.

Start innovating with your Window Server workloads

Getting started with the latest release of Windows Server 2019 has never been easier.

  • Try the latest Windows Server 2019 on Azure and read the Windows Server Migration Guide
  • Learn about Extended Security Updates (FAQ)
  • Learn about Azure Migration Program to transform server workloads.
  • Download Windows Admin Center for hybrid management

In addition, tune into our Azure Migration Virtual Event on February 26, 2020 and learn about best practices and common issues when moving Windows Server and SQL Server workloads to Azure.

Today also marks the end of support for Windows 7. To learn more, visit the Microsoft 365 blog.

  • 15 Jan, 2020
  • (0) Comments
  • By editor
  • Azure, Cloud Strategy, IT Pro, Virtual Machines

Advancing no-impact and low-impact maintenance technologies

“This post continues our reliability series kicked off by my July blog post highlighting several initiatives underway to keep improving platform availability, as part of our commitment to provide a trusted set of cloud services. Today I wanted to double-click on the investments we’ve made in no-impact and low-impact update technologies including hot patching, memory-preserving maintenance, and live migration. We’ve deployed dozens of security and reliability patches to host infrastructure in the past year, many of which were implemented with no customer impact or downtime. The post that follows was written by John Slack from our core operating systems team, who is the Program Manager for several of the update technologies discussed below.” – Mark Russinovich, CTO, Azure


This post was co-authored by Apurva Thanky, Cristina del Amo Casado, and Shantanu Srivastava from the engineering teams responsible for these technologies.

 

We regularly update Azure host infrastructure to improve the reliability, performance, and security of the platform. While the purposes of these ‘maintenance’ updates vary, they typically involve updating software components in the hosting environment or decommissioning hardware. If we go back five years, the only way to apply some of these updates was by fully rebooting the entire host. This approach took customer virtual machines (VMs) down for minutes at a time. Since then, we have invested in a variety of technologies to minimize customer impact when updating the fleet. Today, the vast majority of updates to the host operating system are deployed in place with absolute transparency and zero customer impact using hot patching. In infrequent cases in which the update cannot be hot patched, we typically utilize low-impact memory preserving update technologies to roll out the update.

Even with these technologies, there are still other rare cases in which we need to do more impactful maintenance (including evacuating faulty hardware or decommissioning old hardware). In such cases, we use a combination of live migration, in-VM notifications, and planned maintenance providing customer controls.

Thanks to continued investments in this space, we are at a point where the vast majority of host maintenance activities do not impact the VMs hosted on the affected infrastructure. We’re writing this post to be transparent about the different techniques that we use to ensure that Azure updates are minimally impactful.

Plan A: Hot patching

Function-level “hot” patching provides the ability to make targeted changes to running code without incurring any downtime for customer VMs. It does this by redirecting all new invocations of a function on the host to an updated version of that function, so it is considered a ‘no impact’ update technology. Wherever possible we use hot patching to apply host updates completely avoiding any impact to the VMs running on that host. We have been using hot patching in Azure since 2017. Since then, we have worked to broaden the scope of what we can hot patch. As an example, we updated the host operating system to allow the hypervisor to be hot patched in 2018. Looking forward, we are exploring firmware hot patches. This is a place where the industry typically hasn’t focused. Firmware has always been viewed as ‘if you need to update it, reboot the server,’ but we know that makes for a terrible customer experience. We’ve been working with hardware manufacturers to consider our own firmware to make them hot patchable and incrementally updatable.

Some large host updates contain changes that cannot be applied using function-level hot patching. For those updates, we endeavor to use memory-preserving maintenance.

Plan B: Memory-preserving maintenance

Memory-preserving maintenance involves ‘pausing’ the guest VMs (while preserving their memory in RAM), updating the host server, then resuming the VMs and automatically synchronizing their clocks. We first used memory-preserving maintenance for Azure in 2018. Since then we have improved the technology in three important ways. First, we have developed less impactful variants of memory-preserving maintenance targeted for host components that can be serviced without a host reboot. Second, we have reduced the duration of the customer experienced pause. Third, we have expanded the number of VM types that can be updated with memory preserving maintenance. While we continue to work in this space, some variants of memory-preserving maintenance are still incompatible with some specialized VM offerings like M, N, or H series VMs for a variety of technical reasons.

In the rare case we need to make more impactful maintenance (including host reboots, VM redeployment), customers are notified in advance and given the opportunity to perform the maintenance at a time suitable for their workload(s).

Plan C: Self-service maintenance

Self-service maintenance involves providing customers and partners a window of time, within which they can choose when to initiate impactful maintenance on their VM(s). This initial self-service phase typically lasts around a month and empowers organizations to perform the maintenance on their own schedules so it has no or minimal disruption to users. At the end of this self-service window, a scheduled maintenance phase begins—this is where Azure will perform the maintenance automatically. Throughout both phases, customers get full visibility of which VMs have or have not been updated—in Azure Service Health or by querying in PowerShell/CLI. Azure first offered self-service maintenance in 2018. We generally see that administrators take advantage of the self-service phase rather than wait for Azure to perform maintenance on their VMs automatically.

In addition to this, when the customer owns the full host machine, either using Azure Dedicated Hosts or Isolated virtual machines, we recently started to offer maintenance control over all non-zero impact platform updates. This includes rebootless updates which only cause a few seconds pause. It is useful for VMs running ultra-sensitive workloads which can’t sustain any interruption even if it lasts just for a few seconds. Customers can choose when to apply these non-zero impact updates in a 35-day rolling window. This feature is in public preview, and more information can be found in this dedicated blog post.

Sometimes in-place update technologies aren’t viable, like when a host shows signs of hardware degradation. In such cases, the best option is to initiate a move of the VM to another host, either through customer control via planned maintenance or through live migration.

Plan D: Live migration

Live migration involves moving a running customer VM from one “source” host to another “destination” host. Live migration starts by moving the VM’s local state (including RAM and local storage) from the source to the destination while the virtual machine is still running. Once most of the local state is moved, the guest VM experiences a short pause usually lasting five seconds or less. After that pause, the VM resumes running on the destination host. Azure first started using live migration for maintenance in 2018. Today, when Azure Machine Learning algorithms predict an impending hardware failure, live migration can be used to move guest VMs onto different hosts preemptively.

Amongst other topics, planned maintenance and AI Operations were covered in Igal Figlin’s recent Ignite 2019 session “Building resilient applications in Azure.” Watch the recording here for additional context on these, and to learn more about how to take advantage of the various resilient services Azure provides to help you build applications that are inherently resilient.

The future of Azure maintenance 

In summary, the way in which Azure performs maintenance varies significantly depending on the type of updates being applied. Regardless of the specifics, Azure always approaches maintenance with a view towards ensuring the smallest possible impact to customer workloads. This post has outlined several of the technologies that we use to achieve this, and we are working diligently to continue improving the customer experience. As we look toward the future, we are investing heavily in machine learning-based insights and automation to maintain availability and reliability. Eventually, this “AI Operations” model will carry out preventative maintenance, initiate automated mitigations, and identify contributing factors and dependencies during incidents more effectively than our human engineers can. We look forward to sharing more on these topics as we continue to learn and evolve.

  • 4 Jan, 2020
  • (0) Comments
  • By editor
  • Azure, Management, Supportability, Virtual Machines

SAP HANA backup using Azure Backup is now generally available

Today, we are sharing the general availability of Microsoft Azure Backup’s solution for SAP HANA databases in the UK South region.

Azure Backup is Azure’s native backup solution, which is BackInt certified by SAP. This offering aligns with Azure Backup’s mantra of zero-infrastructure backups, eliminating the need to deploy and manage backup infrastructure. You can now seamlessly backup and restore SAP HANA databases running on Microsoft Azure Virtual Machines (VM) — M series Virtual Machine is also supported, and leverage enterprise management capabilities that Azure Backup provides.

Benefits

  • 15-minute Recovery Point Objective (RPO): Recovery of critical data of up to 15 minutes is possible.
  • One-click, point-in-time restores: Easy to restore production data on SAP HANA databases to alternate servers. Chaining of backups and catalogs to perform restores is all managed by Azure behind the scenes.
  • Long-term retention: For rigorous compliance and audit needs, you can retain your backups for years, based on the retention duration, beyond which the recovery points will be pruned automatically by the built-in lifecycle management capability.
  • Backup Management from Azure: Use Azure Backup’s management and monitoring capabilities for improved management experience.

Watch this space for more updates on GA rollout to other regions. We are currently in preview in these Azure geos.

Getting started

  • We are working on making the SAP HANA backup experience even better. Find out the scenarios we support today.
  • See how to backup and restore SAP HANA databases.
  • Manage and Monitor the backed up SAP HANA databases.
  • Need help? Read the troubleshooting documentation. Or reach out to the Azure Backup forum for support.
  • Learn more about Azure backup.
  • Follow us on twitter @AzureBackup for more updates.
  • 3 Dec, 2019
  • (0) Comments
  • By editor
  • Azure, Cloud Strategy, Storage, Backup & Recovery, Virtual Machines

Faster and cheaper: SQL on Azure continues to outshine AWS

Over a million on-premises SQL Server databases have moved to Azure, representing a massive shift in where customers are collecting, storing, and analyzing their data.

Modernizing your databases provides the opportunity to transform your data architecture. SQL Server on Azure Virtual Machines allows you to maintain control over your database and operating system while still benefiting from cloud flexibility and scale. For some, this represents a step in the journey to a fully-managed database, while others choose this deployment option for compatibility with on-premises workloads such as SQL Server Reporting Services.

Whatever the reason, migrating SQL workloads to Azure Virtual Machines is a popular option. Azure customers benefit from our unique built-in security and manageability capabilities, which automate tasks like patching and backups. In addition to providing these unparalleled innovations, it is important to provide customers with the best price-performance possible. Once again, SQL Server on Azure Virtual Machines comes out on top.

Ready for the next stage of your modernization journey? Azure SQL Database is a fully-managed service that leads in price-performance for your mission-critical workloads and it provides limitless scale, built-in artificial intelligence, and industry-leading availability guarantees.

SQL Server on Azure leads in price-performance

GigaOm, an independent research firm, recently published a study comparing throughput performance between SQL Server on Azure Virtual Machines and SQL Server on AWS EC2. Azure emerged as the clear leader across both Windows and Linux for mission-critical workloads, up to 3.4 times faster and up to 87 percent less expensive than AWS EC2.1

GigaOm Report

The images above are performance and price-performance comparisons from the GigaOm report. The performance metric is throughput (transactions per second, tps); higher performance is better. The price-performance metric is three-year pricing divided by throughput (transactions per second, tps), lower price-performance is better.

With Azure Ultra Disk, GigaOm was able to achieve 80,000 input or output per second (IOPS) per single disk, maxing out the virtual machine’s throughput limit, and well exceeding the capabilities of AWS provisioned IOPS.2

A key reason why Azure price-performance is superior to AWS is Azure BlobCache, which provides free reads. Given that most online transaction processing (OLTP) workloads today come with a ten-to-one read-to-write ratio, this provides customers with significant savings.

Unmatched innovation from the team that brought SQL Server to the world

With a proven track record over 25 years, the engineering team behind SQL Server continues to drive security and innovation to meet our customers’ changing needs. Whether executing on-premises, in the cloud, or on the edge, the result is the most comprehensive, consistent, and secure solution for your data.

Azure SQL Virtual Machines offer unique built-in security and manageability, including automatic security patching and automated high-availability, and database recovery to a specific point in time. Azure’s unique security capabilities include advanced data security for SQL Server on Azure Virtual Machines, which enables both vulnerability assessments and advanced threat protection. Customers self-installing SQL Server on virtual machines in the cloud can now register with our resource provider to enable this same functionality.

Get started with SQL in Azure today

Migrate from SQL Server on-premises to SQL Server 2019 in Azure Virtual Machines today. Get started with preconfigured Azure SQL Virtual Machine images on Red Hat Enterprise Linux, SUSE Linux Enterprise Server, Ubuntu, and Windows in minutes. Take advantage of the Azure Hybrid Benefit to reuse your existing on-premises Windows server and SQL Server licenses in Azure for significant savings.

When you add it up, SQL databases are simply best on Azure. Learn more about why SQL Server is best on Azure, and use a $200 in Azure credits with a free account3 or Azure Dev or Test credits4 for additional cost savings.

 


1Price-performance claims based on data from a study commissioned by Microsoft and conducted by GigaOm in October 2019. The study compared price performance between SQL Server 2017 Enterprise Edition on Windows Server 2016 Datacenter edition in Azure E64s_v3 instance type with 4x P30 1TB Storage Pool data (Read-Only Cache) + 1x P20 0.5TB log (No Cache) and the SQL Server 2017 Enterprise Edition on Windows Server 2016 Datacenter edition in AWS EC2 r4.16xlarge instance type with 1x 4TB gp2 data + 1x 1TB gp2 log. Benchmark data is taken from a GigaOm Analytic Field Test derived from a recognized industry standard, TPC Benchmark™ E (TPC-E). The Field Test does not implement the full TPC-E benchmark and as such is not comparable to any published TPC-E benchmarks. The Field Test is based on a mixture of read-only and update intensive transactions that simulate activities found in complex OLTP application environments. Price-performance is calculated by GigaOm as the cost of running the cloud platform continuously for three years divided by transactions per second throughput. Prices are based on publicly available US pricing in West US for SQL Server on Azure Virtual Machines and Northern California for AWS EC2 as of October 2019. The pricing incorporates three-year reservations for Azure and AWS compute pricing, and Azure Hybrid Benefit for SQL Server and Azure Hybrid Benefit for Windows Server and License Mobility for SQL Server in AWS, excluding Software Assurance costs.  Price-performance results are based upon the configurations detailed in the GigaOm Analytic Field Test.  Actual results and prices may vary based on configuration and region.

2Claims based on data from a study commissioned by Microsoft and conducted by GigaOm in October 2019. The study compared price-performance between SQL Server 2017 Enterprise Edition on Windows Server 2016 Datacenter edition in Azure E64s_v3 instance type with 1x Ultra 1.5TB with 650MB per sec throughput and the SQL Server 2017 Enterprise Edition on Windows Server 2016 Datacenter edition in AWS EC2 r4.16xlarge instance type with 1x 1.5TB io1 provisioned log + data. Benchmark data is taken from a GigaOm Analytic Field Test derived from a recognized industry standard, TPC Benchmark™ E (TPC-E). The Field Test does not implement the full TPC-E benchmark and as such is not comparable to any published TPC-E benchmarks. The Field Test is based on a mixture of read-only and update intensive transactions that simulate activities found in complex OLTP application environments. Price-performance is calculated by GigaOm as the cost of running the cloud platform continuously for three years divided by transactions per second throughput. Prices are based on publicly available US pricing in north Europe for SQL Server on Azure Virtual Machines and Ireland for AWS EC2 as of October 2019. Price-performance results are based upon the configurations detailed in the GigaOm Analytic Field Test.  Actual results and prices may vary based on configuration and region.

3Additional information about $200 Azure free account available at https://azure.microsoft.com/en-us/free/.

4Dev or Test Azure credits and pricing available for paid Visual Studio subscribers only.

  • 3 Dec, 2019
  • (0) Comments
  • By editor
  • Azure, Database, Developer, Virtual Machines

Azure Backup support for SQL Server 2019 and Restore as files

As SQL Server 2019 continues to push the boundaries of availability, performance, and data intelligence, a centrally managed, enterprise-scale backup solution is imperative to ensure the protection of all that data. This is especially true if you are running the SQL Server in the cloud to leverage the benefits of dynamic scale and don’t want to continue using the legacy backup methods that are tedious, infrastructure-heavy, and difficult to scale.

We are excited to share native backup for SQL Server 2019 running in Azure Virtual Machine. This is a key addition to the general availability of Azure Backup for SQL Server Virtual Machine, announced earlier this year.  Azure Backup is a zero-infrastructure solution that protects standalone SQL Server and SQL AlwaysOn configurations in Azure Virtual Machine without the need to deploy and manage any backup infrastructure. While it offers long-term retention and central monitoring capabilities to help IT admins govern and meet their compliance requirements, it lets SQL Admin continue to exercise the power of self-service backup and restore for operational recoveries.

In addition to this, we are also sharing Azure Backup general availability for:

  •  SQL Server 2008 and 2008 R2 migrating to Azure as SQL Server running in virtual machines.
  • SQL Server running on Windows 2019

Backup SQL server in Azure VM

Restore as files:

Adding to the list of enhancements is the key capability of Restore as Files, now restore anywhere by recovering the backed-up data as .bak files. Move these backup files across subscriptions, regions, or even to on-premises SQL Servers and trigger database restore wherever you want. Besides aiding cross-subscription and cross-region restore scenarios, this feature helps users stay compliant by giving them greater control over storing and recovering backup data to any destination of their choice.Restore options for SQL server

 

Getting started:

Under the Restore operation, you will see a newly introduced option of Restore as files. Specify the destination server (this server should be SQL Server Virtual Machine registered to the vault) and path on that server. The service will dump all the .bak files specific to the recovery point you have chosen to this path. Typically, a network share path or path of a mounted Azure File share when specified as the destination enables easier access to these files by other machines in the same network or with the same Azure File share mounted on them.

Once the restore operation is completed, you can move these files to any machine across subscriptions or locations and restore them as a database using SQL Server Management Studio. Learn more.

Restore as files

Additional resources

  • Watch the getting started video.
  • Want more details about this feature? Review the Azure Backup for SQL Server documentation.
  • Get the Azure Backup pricing details for this feature.
  • 22 Nov, 2019
  • (0) Comments
  • By editor
  • Azure, Storage, Backup & Recovery, Virtual Machines

Announcing the general availability of the new Azure HPC Cache service

If data-access challenges have been keeping you from running high-performance computing (HPC) jobs in Azure, we’ve got great news to report! The now-available Microsoft Azure HPC Cache service lets you run your most demanding workloads in Azure without the time and cost of rewriting applications and while storing data where you want to—in Azure or on your on-premises storage. By minimizing latency between compute and storage, the HPC Cache service seamlessly delivers the high-speed data access required to run your HPC applications in Azure.

Use Azure to expand analytic capacity—without worrying about data access

Most HPC teams recognize the potential for cloud bursting to expand analytic capacity. While many organizations would benefit from the capacity and scale advantages of running compute jobs in the cloud, users have been held back by the size of their datasets and the complexity of providing access to those datasets, typically stored on long-deployed network-attached storage (NAS) assets. These NAS environments often hold petabytes of data collected over a long period of time and represent significant infrastructure investment.

Here’s where the HPC Cache service can help. Think of the service as an edge cache that provides low-latency access to POSIX file data sourced from one or more locations, including on-premises NAS and data archived to Azure Blob storage. The HPC Cache makes it easy to use Azure to increase analytic throughput, even as the size and scope of your actionable data expands.

Keep up with the expanding size and scope of actionable data

The rate of new data acquisition in certain industries such as life sciences continues to drive up the size and scope of actionable data. Actionable data, in this case, could be datasets that require post-collection analysis and interpretation that in turn drive upstream activity. A sequenced genome can approach hundreds of gigabytes, for example. As the rate of sequencing activity increases and becomes more parallel, the amount of data to store and interpret also increases—and your infrastructure has to keep up. Your power to collect, process, and interpret actionable data—your analytic capacity—directly impacts your organization’s ability to meet the needs of customers and to take advantage of new business opportunities.

Some organizations address expanding analytic throughput requirements by continuing to deploy more robust on-premises HPC environment with high-speed networking and performant storage. But for many companies, expanding on-premises environments presents increasingly daunting and costly challenges. For example, how can you accurately forecast and more economically address new capacity requirements? How do you best juggle equipment lifecycles with bursts in demand? How can you ensure that storage keeps up (in terms of latency and throughput) with compute demands? And how can you manage all of it with limited budget and staffing resources?

Azure services can help you more easily and cost-effectively expand your analytic throughput beyond the capacity of existing HPC infrastructure. You can use tools like Azure CycleCloud and Azure Batch to orchestrate and schedule compute jobs on Azure virtual machines (VMs). More effectively manage cost and scale by using low-priority VMs, as well as Azure Virtual Machine Scale Sets. Use Azure’s latest H- and N-series Virtual Machines to meet performance requirements for your most complex workloads.

So how do you start? It’s straightforward. Connect your network to Azure via ExpressRoute, determine which VMs you will use, and coordinate processes using CycleCloud or Batch—voila, your burstable HPC environment is ready to go. All you need to do is feed it data. Ok, that’s the stickler. This is where you need the HPC Cache service.

Use HPC Cache to ensure fast, consistent data access

Most organizations recognize the benefits of using cloud: a burstable HPC environment can give you more analytic capacity without forcing new capital investments. And Azure offers additional pluses, letting you take advantage of your current schedulers and other toolsets to ensure deployment consistency with your on-premises environment.

But here’s the catch when it comes to data. Your libraries, applications, and location of data may require the same consistency. In some circumstances, a local analytic pipeline may rely on POSIX paths that must be the same whether running in Azure or locally. Data may be linked between directories, and those links may need to be deployed in the same way in the cloud. The data itself may reside in multiple locations and must be aggregated. Above all else, the latency of access must be consistent with what can be realized in the local HPC environment.

To understand how the HPC Cache works to address these requirements, consider it an edge cache that provides low-latency access to POSIX file data sourced from one or more locations. For example, a local environment may contain a large HPC cluster connected to a commercial NAS solution. HPC Cache enables access from that NAS solution to Azure Virtual Machines, containers, or machine learning routines operating across a WAN link. The service accomplishes this by caching client requests (including from the virtual machines), and ensuring that subsequent accesses of that data are serviced by the cache rather than by re-accessing the on-premises NAS environment. This lets you run your HPC jobs at a similar performance level as you could in your own data center. HPC Cache also lets you build a namespace consisting of data located in multiple exports across multiple sources while displaying a single directory structure to client machines.

HPC Cache provides a Blob-backed cache (we call it Blob-as-POSIX) in Azure as well, facilitating migration of file-based pipelines without requiring that you rewrite applications. For example, a genetic research team can load reference genome data into the Blob environment to further optimize the performance of secondary-analysis workflows. This helps mitigate any latency concerns when you launch new jobs that rely on a static set of reference libraries or tools.

   : Diagram showing the placement of Azure HPC Cache in a systems architectures that included on-premises storage access, Azure Blob, and computing in an Azure compute cluster.
Azure HPC Cache Architecture

HPC Cache Benefits

Caching throughput to match workload requirements

HPC Cache offers three SKUs: up to 2 gigabytes per second (GB/s), up to 4 GB/s, and up to 8 GB/s throughput. Each of these SKUs can service requests from tens to thousands of VMs, containers, and more. Furthermore, you choose the size of your cache disks to control your costs while ensuring the right capacity is available for caching.

Data bursting from your datacenter

HPC Cache fetches data from your NAS, wherever it is. Run your HPC workload today and figure out your data storage policies over the longer term.

High-availability connectivity

HPC Cache provides high-availability (HA) connectivity to clients, a key requirement for running compute jobs at larger scales.

Aggregated namespace

The HPC Cache aggregated namespace functionality lets you build a namespace out of various sources of data. This abstraction of sources makes it possible to run multiple HPC Cache environments with a consistent view of data.

Lower-cost storage, full POSIX compliance with Blob-as-POSIX

HPC Cache supports Blob-based, fully POSIX-compliant storage. HPC Cache, using the Blob-as-POSIX format, maintains full POSIX support including hard links. If you need this level of compliance, you’ll be able to get full POSIX at Blob price points.

Start here

The Azure HPC Cache Service is available today and can be accessed now. For the very best results, contact your Microsoft team or related partners—they’ll help you build a comprehensive architecture that optimally meets your specific business objectives and desired outcomes.

Our experts will be attending at SC19 in Denver, Colorado, the conference on high-performance computing, ready and eager to help you accelerate your file-based workloads in Azure!

  • 12 Nov, 2019
  • (0) Comments
  • By editor
  • Announcements, Azure, Virtual Machines

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 1 other subscriber

Most Viewed News
  • Microsoft Azure Government is First Commercial Cloud to Achieve DoD Impact Level 5 Provisional Authorization, General Availability of DoD Regions (938)
  • Introducing Coral: Our platform for development with local AI (836)
  • Enabling connected transformation with Apache Kafka and TensorFlow on Google Cloud Platform (479)
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
Tags
aws Azure Books cloud Developer Development DevOps GCP Google HowTo Learn Linux news Noticias OpenBooks SysOps Tutorials

KenkoGeek © 2019 All Right Reserved