Identifying and tracking toil using SRE principles

One of the key measures that Google site reliability engineers (SREs) use to verify our effectiveness is how we spend our time day-to-day. We want ample time available for long-term engineering project work, but we’re also responsible for the continued operation of Google’s services, which sometimes requires doing some manual work. We aim for less than half of our time to be spent on what we call “toil.” So what is toil, and how do we stop it from interfering with our engineering velocity? We’ll look at these questions in this post.

First, let’s define toil, from chapter 5 of the Site Reliability Engineering book:

“Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.”

Some examples of toil may include:

  • Handling quota requests

  • Applying database schema changes

  • Reviewing non-critical monitoring alerts

  • Copying and pasting commands from a playbook

A common thread in all of these examples is that they do not require an engineer’s human judgment. The work is easy but it’s not very rewarding, and it interrupts us from making progress on engineering work to scale services and launch features.

Here’s how to take your team through the process of identifying, measuring, and eliminating toil.

Identifying toil

The hardest part of tackling toil is identifying it. If you aren’t explicitly tracking it, there’s probably a lot of work happening on your team that you aren’t aware of. Toil often comes as a request texted to you or email sent to an individual who dutifully completes the work without anyone else noticing. We heard a great example of this from CRE Jamie Wilkinson in Sydney, Australia, who shared this story of his experience as an SRE on a team managing one of Google’s datastore services.

Jamie’s SRE team was split between Sydney and Mountain View, CA, and there was a big disconnect between the achievements of the two sites. Sydney was frustrated that the project work they relied upon—and the Mountain View team committed to—never got done. One of the engineers from Sydney visited the team in Mountain View, and discovered they were being interrupted frequently throughout the day, handling walk-ups and IMs from the Mountain View-based developers. 

Despite regular meetings to discuss on-call incidents and project work, and complaints that the Mountain View side felt overworked, the Sydney team couldn’t help because they didn’t know the extent of these requests. So the team decided to require all the requests to be submitted as bugs. The Mountain View team had been trained to leap in and help with every customer’s emergency, so it took three months just to make the cultural change. Once that happened, they could establish a rotation of people across both sites to distribute load, see stats on how much work there was and how long it took, and identify repetitive issues that needed fixing.

“The one takeaway from this was that when you start measuring the right thing, you can show people what is happening, and then they agree with you,” Jamie said. “Showing everyone on the team the incoming vs. outgoing ticket rates was a watershed moment.”

When tracking your work this way, it helps to gather some lightweight metadata in a tracking system of your choice, such as:

  • What type of work was it (quota changes, push release to production, ACL update, etc.)?

  • What was the degree of difficulty: Easy (<1 hour); Medium (hours); Hard (days) (based on human hands-on time, not elapsed time)?

  • Who did the work?

This initial data lets you measure the impact of your toil. Remember, however, that the emphasis is on lightweight in this step. Extreme precision has little value here; it actually places more burden on your team if they need to capture many details, and makes them feel micromanaged.

Another way to successfully identify toil is to survey your team. Another Google CRE, Vivek Rau, would regularly survey Google’s entire SRE organization. Because the size and shape of toil varied between different SRE teams, at a company-wide level ticket metrics were harder to analyze. He surveyed SREs every three months to identify common issues across Google that were eating away at our time for project work. Try this sample toil survey to start:

  • Averaging over the past four weeks, approximately what fraction of your time did you spend on toil?  

    • Scale 0-100%

  • How happy are you with the quantity of time you spend on toil? 

    • Not happy / OK / No problem at all

  • What are your top three sources of toil?

    • On-call Response / Interrupts / Pushes / Capacity / Other / etc.

  • Do you have a long-term engineering project in your quarterly objectives?

    • Yes / No

  • If so, averaging over the past four weeks, approximately what fraction of your time did you spend on your engineering project? (estimate)

    • Scale 0-100%

  • In your team, is there toil you can automate away but you don’t do so, because that very toil takes time away from long-term engineering work? If so, please describe below.

    • Open response

Measuring toil

Once you’ve identified the work being done, how do you determine if it’s too much? It’s pretty simple: Regularly (we find monthly or quarterly to be a good interval), compute an estimate of how much time is being spent on various types of work. Look for patterns or trends in your tickets, surveys, and on-call incident response, and prioritize based on the aggregate human time spent. Within Google SRE, we aim to keep toil below 50% of each SRE’s time, to preserve the other 50% for engineering project work. If the estimates show that we have exceeded the 50% toil threshold, we plan work explicitly with the goal of reducing that number and getting the work balance back into a healthy state. 

Eliminating toil

Now that you’ve identified and measured your toil, it’s time to minimize it. As we’ve hinted at already, the solution here is typically to automate the work. This is not always straightforward, however, and the aim shouldn’t be to eliminate all toil.

Automating tasks that you rarely do (for example, deploying your service at a new location) can be tricky, because the procedure you used or assumptions you made while automating may change by the time you do that same task again. If a large amount of your time is spent on this kind of toil, consider how you might change the underlying architecture to smooth this variability. Do you use an infrastructure as code (IaC) solution for managing your systems? Can the procedure be executed multiple times without negative side effects? Is there a test to verify the procedure?

Treat your automation like any other production system. If you have an SLO practice, use some of your error budget to automate away toil. Complete postmortems when your automation fails, and fix it as you would any user-facing system. You want your automation available to you in any situation, including production incidents, to free humans to do the work they’re good at.

If you’ve gotten your users familiar with opening tickets to request help, use your ticketing system as the API for automation, making the work fully self-service.

Also, because toil isn’t just technical, but also cultural, make sure the only people doing toil work are the people explicitly assigned to it. This might be your oncaller, or a rotation of engineers scheduled to deal with “tickets” or “interrupts.” This preserves the rest of the team’s time to work on projects and reinforces a culture of surfacing and accounting for toil.

A note on complexity vs. toil

Sometimes we see engineers and leadership mistaking technical or organizational complexity as toil. The effects on humans are similar, but the work fails to meet the definition at the start of this post. Where toil is work that is basically of no enduring value, complexity often makes valuable work feel onerous. 

Google SRE Laura Beegle has been investigating this within Google, and suggests a different approach to addressing complexity: While there’s intense satisfaction in designing a simple, robust system, it inevitably becomes somewhat more complex, simply by existing in a distributed environment, used by a diverse range of users, or growing to serve more functionality over time. We want our systems to evolve over time, while also reducing what we call “experienced complexity”—the negative feelings based on mismatched expectations about how long or difficult a task is to complete. Quantifying the subjective experience of your systems is known by another name: user experience. The users in this case are SREs. The observable outcome of well-managed system complexity is a better user experience.

Addressing the user experience of supporting your systems is engineering work of enduring value, and therefore not the same as toil. If you find that complexity is threatening your system’s reliability, take action. By following a blameless postmortem process, or surveying your team, you can identify situations where complexity resulted in unexpected results or a longer-than-expected recovery time.

Some manual care and feeding of the systems we build is inevitably required, but the number of humans needed shouldn’t grow linearly with the number of VMs, users, or requests. As engineers, we know the power of using computers to complete routine tasks, but we often find ourselves doing that work by hand anyway. By identifying, measuring, and reducing toil, we can reduce operating costs and ensure time to focus on the difficult and interesting projects instead.

For more about SRE, learn about the fundamentals or explore the full SRE book.

Bots in Hangouts Chat: How they can help developers change the conversation

With chatbots growing in popularity, more and more teams are relying on them in the workplace to help them communicate and collaborate faster. As teams that use G Suite increasingly adopt Hangouts Chat as a primary communication method, bots represent an opportunity for developers to deliver unique and impactful experiences directly where users are engaged. In this blog post, I’ll provide a brief overview of what bots inChat are, and what you can do with them.

What are chatbots

Chatbots are computer programs that create human-like interaction experiences  primarily through text or voice commands. You might have encountered one while using chat support on a website.  And you likely ‘talk’ to bots via your favorite smart devices at home or on the go e.g. Google Assistant, Siri or Alexa. 

But why should this trend matter? Both as users and developers, bots present a compelling opportunity to significantly improve the way we collectively work. 

For users, it’s the opportunity to blend conversations with actions, to quickly connect with colleagues to accelerate teamwork, to get more done without switching between work streams and tools, and to stay informed right within a core communication tool.

For developers, it’s an opportunity to build solutions that surface where a growing number of users work more and more, to reduce friction between collaborating and performing tasks, to connect a myriad of back ends and processes directly within a familiar and increasingly popular front end user experience, and to capture user mindshare by connecting to their own apps and services.

Changing the way we work

Messaging apps aren’t new; they have been in the consumer space for many years. But messaging apps, aka chat apps, haven’t always been embraced by the enterprise. This is partly due to the fact that enterprise needs—for example, management controls, security, regulation, data protection, and governance—often supersede what many consumer-centric chat apps offer. It’s also partly because email has been so dominant as the de facto form of collaboration in the enterprise space, resulting in less of a need for chat adoption.

However, that’s changing along with users’ expectations. While email may remain the main source of “record” for work communication, chat has emerged to serve naturally conversational, typically informal, expectantly instantaneous and dynamic team-based communications. And as a result, Chatbots in Hangouts Chat are becoming the new frontier for next generation user interfaces for the better connected enterprise.

Bots in the enterprise

So what does a chatbot in the enterprise look like? While the use cases are virtually unlimited (it’s essentially just a new UI paradigm after all), there are a few common high-level categories bots can deliver broad impact for enterprise users.

Enhancing teamwork—If Hangouts Chat is aimed at making communications more fluid, bots in Chat can ratchet that up by making teams even more productive. A chatbot can speed up actions teams need to perform without switching context. For example, imagine your team chatting about a project you are working on, and collectively you want to review a list of outstanding action items or update completed tasks in real time. Why not just do that in-line conversationally as the discussion flows in chat? Well you can! Users can leverage bots in Hangouts Chat that handle common actions like managing task lists, pulling status reports or updating to-do item ownership, all without switching context to another service or tool. The ability to take action as a team, or even individually like request a vacation day or update your HR info from Chat, is a great way to enhance productivity of everyone across an organization.

1070-GS-Bot in Chats-Wrapper-ER-01-01.png

Delivering information—Chat is great for real-time communications between individuals and teams, but it’s also a great way to deliver pertinent information to users without requiring they actively request it or seek it out. Bots can deliver asynchronous notifications that keeps users up to date on topics that matter to them personally, in a format that is easily accessible with a higher probability of being consumed (than say getting lost in email or overlooked on a portal). With users spending more time in Chat, the chance that your information reaches them increases making asynchronous bots really interesting. Here’s just a few ideas that may interest users:

  • Notification of changes to customer opportunities for sales professionals (e.g. Deal Closed! Or Deal Lost!)

  • Warnings about inventory levels or shipping status of products

  • Push updates on milestones for team projects

  • Automated reminders on HR deadlines or policy changes

  • Announcement of new team members (great way to inspire interactive introductions to replace the obligatory, outdated email pushed from managers)

  • Immediate notice of fluctuations in your company’s stock price, or a regularly scheduled update at the close of the trading day

The list is essentially endless and of course varies based on organizational needs. But the common thing about asynchronous bots is that they are easy for users to discover, to opt-in or opt-out, and the entire process can be automated to create a timely, noninvasive, highly efficient channel for connecting users to relevant information.

Connecting systems—If you are a developer or an enterprise technology practitioner, chatbots in Hangouts Chat are ideal for connecting users with your applications. Whether you simply want to enable users to directly query information from your app, update data, or kick off actions and workflows that drive your app(s), bots are a great way to simplify connecting users where they work with your app. Users can reach your apps in context individually or as a team, with a simplified experience from a unified logon to enhanced interaction with natural language commands using driven by Dialogflow. Bots provide a new entry point into your app(s) that will likely increase user reach, engagement and satisfaction with frankly little effort (in most cases—bots aren’t overly complex). 

A number of third party vendors have already built Hangouts Chat bots doing just this and are worth taking at look at for your organization to use or get ideas from. Check them out here.

Looking ahead

This post has been about profiling one of the best kept ‘secrets’ of the G Suite platform. Chat bots aren’t brand new—they were actually available early in the launch of Hangouts Chat itself around mid 2018—but they are totally worth exploring if you haven’t yet.  I personally discovered them first as an ‘excited user’; and now I want to encourage developers to build their own bots. My goal here was to quickly introduce you to the concept of bots though, and of course the details are way beyond this post, so check back here as I cover more about what Hangouts Chat bots can do and how to build them in future posts. 

To learn more, and get started on your own building bots for Google Chat, visit

ICYMI: A monthly roundup of stuff developers want to know

Posted by Natalie Dao, Google Developers Social TeamHappy New Year … is something we won’t say again until next January, promise. Still. There’s a lot to be thrilled about in 2020. Check out our Top Ten list of videos, blogs, and events to find out why we’re already excited for next month, the month after that, and beyond. It’s been a bit of a slow start, but one thing is for sure: 2020 is going to rule. Let’s get into it.

1. Game On 🎮

Gamers rejoice! The annual Indie Games Festival from Google Play will hit Europe, Japan, and South Korea on April 25th. Whether you’re an indie game developer or a devoted gamer, this is your chance to showcase your unique skills. Submissions close on March 2nd, so get to it!

Learn more about it on the official website.

2. It’s A Dirty Job 🧹

Finally, a vacuum cleaner that doesn’t suck! Wait. Ecovacs Robotics manufactures robotic vacuum cleaners powered by a TensorFlow Lite model to help detect and avoid obstacles.

Read the blog to learn more.

3. Take The DSC Challenge 🏆

Developer Student Clubs from 800+ universities across the globe will use technology to solve local problems within their communities. 10 winning teams (up to 4 members) will be chosen and receive prizes including a curated experience with Googlers to celebrate! Submissions will be accepted between March 15-30, 2020.

Up for the challenge? Learn how you can enter here.

4. You Gotta Check Out This New Podcast 🎙

Sound up! The Assistant on Air podcast from Actions on Google is now streaming. Tune in to listen to your favorite couch-friendly series, where guests chat about building for the Google Assistant.

Get to listening on Google Podcasts, Google Play Music, Apple, and Spotify!

5. Flutter/Dart Do Design And They Do It Well 🎨

Photo courtesy of Fast Company

Look Ma, we made it! Our favorite UI toolkit and the programming language that powers it have been listed in Fast Company’s most important design ideas of the decade. Flutter and Dart allow developers to build beautiful experiences that can be seamlessly deployed across all platforms.

Check out the star-studded lineup on Fast Company.

6. Summit Season Starts Now 🙌

The time is now to register for the TensorFlow Dev Summit! Join the machine learning community in Sunnyvale, CA this March for two full days of highly technical talks, demos, sessions, and networking with the TensorFlow team.

See how you can witness that ML magic on the official event website.

7. Registration Open For Google Cloud Next ’20 ⏩

SO. MANY. EVENTS. Registration for Google Cloud Next ‘20 has been announced! Taking place in the charming city of San Francisco, this epic conference brings together some of the brightest minds in tech for three days of networking, learning, and collaboration. Get the scoop on all the latest products, learn how leading brands use Cloud to solve challenges, immerse yourself in exhibits, and more.

Get your registration locked down on the official event website.

8. New Coral Products For 2020 👍

Coral is a platform of hardware components and software tools that makes prototyping and scaling local AI products easier. Launched last year, this portfolio of products has been used for many applications across different industries ranging from healthcare to agriculture. To kick off the new year, Coral has released new products to expand the possibilities of local AI!

Get all the details on the blog here.

9. SERIES SPOTLIGHT: Get To Know Cloud Firestore 🔥

In this episode of Get to Know Cloud Firestore from Firebase, Todd Kerpelman tackles Cloud Functions and five interesting scenarios you might come across when implementing them in your app.

Watch the full video here and don’t forget to subscribe to the Firebase YT channel.

10. Countdown to IO 🕛

#GoogleIO is returning to Mountain View in May! To announce the event, Google launched a collaborative game where users worked together to repair an intergalactic satellite network. Although the date has been decoded by savvy internet detectives, you can still embark on the mission for fun!

More event details are coming soon on the official event website. See you at Shoreline.

Stay connected!

Follow and subscribe to get all the latest news and updates from the Google Developer ecosystem.


Hyperledger Fabric on Azure Kubernetes Service Marketplace template

Customers exploring blockchain for their applications and solutions typically start with a prototype or proof of concept effort with a blockchain technology before they get to build, pilot, and production rollout. During the latter stages, apart from the ease of deployment, there is an expectation of flexibility in the configuration in terms of the number of blockchain members in the consortium, size and number of nodes and ease in management post-deployment.

We are sharing the release of a new Hyperledger Fabric on Azure Kubernetes Service marketplace template in preview. Any user with minimal knowledge of Azure or Hyperledger Fabric can now set up a blockchain consortium on Azure using this solution template by providing few basic input parameters.

This template helps the customers to deploy Hyperledger Fabric (HLF) network on Azure Kubernetes Service (AKS) clusters in a modular manner, that meets the much-required customization with regard to the choice of Microsoft Azure Virtual Machine series, number of nodes, fault-tolerance, etc. Azure Kubernetes Service provides enterprise-grade security and governance, making the deployment and management of containerized application easy. Customers anticipate leveraging the native Kubernetes tools for the management plane operations of the infrastructure and call Hyperledger Fabric APIs or Hyperledger Fabric client software development kit for the data plane workflows.

The template has various configurable parameters that make it suitable for production-grade deployment of Hyperledger Fabric network components.

Top features of Hyperledger Fabric on Azure Kubernetes Service template are:

  • Supports deployment of Hyperledger Fabric version 1.4.4 (LTS).
  • Supports deployment of orderer organization and peer nodes with the option to configure the number of nodes.
  • Supports Fabric Certificate Authority (CA) with self-signed certificates by default, and an option to upload organization-specific root certificates to initialize the Fabric CA.
  • Supports running of LevelDb and CouchDB for world state database on peer nodes.
  • Ordering service runs highly available RAFT based consensus algorithm, with an option to choose 3,5, or 7 nodes.  
  • Supports ways to configure in terms of number and size of the nodes of Azure Kubernetes Clusters.
  • Public IP exposed for each AKS cluster deployed for networking with other organizations
  • Enables you to jump start with building your network sample scripts to help post-deployment steps such as create workflows of consortiums and channels, adding peer nodes to the channel, etc.
  • Node.js application sample to support running a few native Hyperledger Fabric APIs such as new user identity generation, running custom chain code, etc.

To know more about how to get started with deploying Hyperledger Fabric network components, refer to the documentation.

What’s coming next

  • Microsoft Visual Studio code extension support for Azure Hyperledger Fabric instances

What more do we have for you? The template and consortium sample scripts are open-sourced in the GitHub repo, so the community can leverage to build their customized versions.

Windows Server applications, welcome to Google Kubernetes Engine

The promise of Kubernetes is to make container management easy and ubiquitous. Up until recently though, the benefits of Kubernetes were limited to Linux-based applications, preventing enterprise applications running on Windows Server from taking advantage of its agility, speed of deployment and simplified management. 

Last year, the community brought Kubernetes support to Windows Server containers. Building on this, we’re thrilled to announce that you can now run Windows Server containers on Google Kubernetes Engine (GKE). 

GKE, the industry’s first Kubernetes- based container management solution for the public cloud, is top rated by analysts and widely used by customers across a variety of industries. Supporting  Windows on GKE is a part of our commitment to provide a first-class experience for hosting and modernizing Windows Server-based applications on Google Cloud. To this end, in the past six months, we added capabilities such as the ability to bring their own Windows Server licenses (BYOL), virtual displays, and managed services for SQL Server and Active Directory. Volusion and Travix are among the many thousands of customers who have chosen Google Cloud to run and modernize their Windows-based application portfolios.

Bringing Kubernetes’ goodness to Windows Server apps

By running Windows Server apps as containers on Kubernetes, you get many of the benefits that Linux applications have enjoyed for years. Running your Windows Server containers on GKE can also save you on licensing costs, as you can pack many Windows Server containers on each Windows node.

kubernetes windows server app.png
Illustration of Windows Server and Linux containers running side-by-side in the same GKE cluster

In the beta release of Windows Server container support in GKE (version 1.16.4), Windows and Linux containers can run side-by-side in the same cluster. This release also includes several other features aimed at helping you meet the security, scalability, integration and management needs of your Windows Server containers. Some highlights include:

  • Private clusters: a security and privacy feature that allows you to restrict access to a cluster’s nodes and the master from the public internet—your cluster’s nodes can only be accessed from within a trusted Google Virtual Private Cloud (VPC).

  • Node Auto Upgrades: a feature that reduces the management overhead, provides ease of use and better security by automatically upgrading GKE nodes on your behalf. Make sure you build your container images using the Docker ‘multi-arch’ feature to avoid any version mismatch issues between the node OS version and the base container image. 

  • Regional clusters: an availability and reliability feature that allows you to create a multi-master, highly-available Kubernetes cluster that spreads both the control plane and the nodes across multiple zones in the same region. This provides increased control plane uptime of 99.95% (up from 99.5%), and zero-downtime upgrades.

  • Support for Group Managed Service Accounts (gMSA): gMSA is a type of Active Directory account that provides automatic password management, simplified service principal name (SPN) management, etc. for multiple servers. gMSAs are supported by Google Cloud’s Managed Microsoft Active Directory Service for easier administration.

  • Choice of Microsoft Long-Term Servicing Channel (LTSC) or Semi-Annual Channel (SAC) servicing channels, allowing you to choose the version that best fits your support and feature requirements. 

For full details on each of these features and more, please consult the documentation

With Windows Server 2008 and 2008 R2 reaching End of Support recently, you may be exploring ways to upgrade your legacy applications. This may be an opportune time to consider containerizing your applications and deploying them in GKE. In general, good candidates for containerization include custom-built .NET applications as well as batch and web applications. For applications provided by third-party ISVs, please consult the ISV for containerized versions of the applications.  

What customers are saying

We’ve been piloting Windows Server container support in GKE for several months now with preview customers, who have been impressed by GKE’s performance, reliability and security, as well as differentiated features such as automated setup and configuration for easier cluster management. 

Helix RE creates software that makes digital models of buildings, and recently switched from setting up and running Windows Kubernetes clusters manually to using GKE. Here’s what they had to say: 

“What used to take us weeks to set up and configure, now takes a few minutes. Besides saving time, features like autoscaling, high-availability, Stackdriver Logging and Monitoring are already baked in. Windows in GKE gives us the same scale, reliability, and ease of management that we have come to expect from running Linux in GKE.” -Premkumar Masilamani, Cloud Architect, Helix RE

Making it easier with partner solutions

Modernizing your applications means more than just deploying and managing containers. That is why we are working with several partners who can help you build, integrate and deploy Windows Server containers into GKE, for a seamless CI/CD and container management experience. We’re excited to announce that the following partners have already worked to integrate their solutions with Windows on GKE.

CircleCI allows teams to rapidly release code they trust by automating the build, test, and delivery process. CircleCI ‘orbs’ bundle CircleCI configuration into reusable packages. They make it easy to integrate with modern tools, eliminating the need for teams to spend time and cycles building the integrations themselves. 

“We are excited to further our partnership with Google with our latest Google Kubernetes Engine (GKE) Orb. This orb supports deployment to Windows containers running on GKE, and allows users to automate deploys in minutes directly from their CI/CD pipeline. By simplifying the process of automating deploys, teams can build confidence in their process, ship new features faster, and take advantage of cutting-edge technology without having to overhaul their existing infrastructure.”  -Tom Trahan, VP of Business Development, CircleCI

CloudBees enables enterprise developer teams to accelerate software delivery with continuous integration and continuous delivery (CI/CD). The CloudBees solutions optimize delivery of high quality applications while ensuring they are secure and compliant.

“We are pleased to offer support for Windows containers on Google Cloud Platform. This announcement broadens the options for CloudBees users to now run Microsoft workloads on GCP. It’s all about speeding up software delivery time and, with CloudBees running Windows containers on GCP, our users can enjoy a fast, modernized experience, leveraging the Microsoft technologies already pervasive within their organization.”  -Francois Dechery, Chief Strategy Officer, CloudBees 

GitLab is a complete DevOps platform, delivered as a single application, with the goal of fundamentally changing the way Development, Security, and Ops teams collaborate.

“GitLab and Google Cloud are lowering the barrier of adoption for DevOps and Kubernetes within the Windows developer community. Within minutes, developers can create a project, provision a GKE cluster, and execute a CI/CD pipeline with Windows Runners now on or with GitLab Self-managed to automatically deploy Windows apps onto Kubernetes.” –Darren Eastman, Senior Product Manager, GitLab”

Checkout GitLab’s blog and video to learn more.

Get started today

We hope that you will take your Windows Server containers for a spin on GKE—to get started, you can find detailed documentation on our website. If you are new to GKE, get started by checking out the Google Kubernetes Engine page and the Coursera course on Architecting with GKE

Please don’t hesitate to reach out to us at [email protected] And please take a few minutes to give us your feedback and ideas to help us shape upcoming releases.

Announcing new certifications for technical leaders and DevOps engineers

As the cloud continues to evolve, it’s paving the way for exciting innovations that were difficult to imagine even a few years ago. To take advantage of everything the latest cloud technology has to offer, organizations must be able to find people with the right combination of skills to make it happen. In 2020, Google Cloud will continue helping you along this path to modernization with some exciting new resources: 

  • The Google Cloud Certified Fellow program will identify and recognize technical leaders who can help organizations transform their business through hybrid- and multi-cloud technology.

  • The Cloud DevOps Engineer certification addresses the cloud skills shortage and helps organizations quickly identify qualified talent.

Google Cloud Certified Fellow program

The Google Cloud Certified Fellow program is a unique, invitation-only certification program for technical leaders who are experts in designing innovative enterprise solutions with Anthos—our open hybrid- and multi-cloud application platform that enables you to modernize your existing applications, build new ones, and securely run them anywhere. The program identifies experts who can effectively lead organizations through frictionless hybrid multi-cloud adoption. To become part of the program, Certified Fellows pass a series of rigorous assessments, including hands-on labs and a panel interview.

Twenty IT leaders are already part of the Google Cloud Certified Fellow program, and are using what they’ve learned to overcome their business challenges. 

“The Certified Fellow exam pushed my understanding of Anthos’ capabilities, and the certification provides me the platform to help educate the community about the latest technology and inspire people to become better engineers,“ said Chris Love, Google Cloud Certified Fellow and Principal Architect at LionKube. “I have been contributing to open-source projects for over ten years, so it’s important to me that I continue to give back to the community.” 

Not only does the Google Cloud Certified Fellow program equip these leaders with the technical skills and industry best-practices they need to solve cloud challenges, it identifies them as cloud experts to everyone in the industry. 

“I am proud of the recognition for my skill level from Google Cloud and my clients value the mastery it demonstrates,” said Zach Snee, Google Cloud Certified Fellow and Cloud Solutions Architect at Accenture. “Having representation in the Certified Fellow Program reinforces Accenture’s commitment to Cloud innovation and to our continued Google Cloud partnership.” 

Another important part of the program is providing an opportunity for certified leaders to connect with other Google Cloud Certified Fellows as well as with Google Cloud product and engineering leadership. 

“It is rare to have that many accomplished IT leaders in one place,” said Love. “Meeting with my peers during the Google Cloud Certified Fellow program was a highlight for me and something I look forward to doing again.” 

Cloud DevOps certification

DevOps is emerging as an integral part of digital transformation projects. The role of cloud DevOps engineers is to employ continuous change and rapid experimentation to help organizations transform quickly and meet changing customer demands. Unfortunately, DevOps engineering positions are currently one of the most difficult technical positions to fill

To address this skills shortage, we are offering a Professional Cloud DevOps Engineer certification. Now, cloud professionals can become industry recognized and clearly demonstrate to employers their expertise in efficient development operations with a focus on service reliability and delivery speed. 

We hope these programs help you achieve your cloud modernization goals. To learn more about our DevOps certification, and get a special discount on DevOps training, register for our Professional Cloud DevOps Engineer webinar. To start learning about how to architect hybrid-cloud infrastructure with Anthos, check out our training options.

Nuevo curso digital: escalado y redes de AWS Transit Gateway

Nos complace anunciar un curso gratuito de formación técnica que demuestra cómo crear y configurar un AWS Transit Gateway. En este curso digital, aprenderá cómo configurar una gateway de tránsito básica, crear una gateway de tránsito con dominios compartidos y tablas de rutas, y sobre enrutamiento y propagación. Las demostraciones le enseñarán cómo funciona la conexión de VPN y la conexión directa a AWS Transit Gateway.

10 recommendations for cloud privacy and security with Ponemon research

Today we’re pleased to publish Data Protection and Privacy Compliance in the Cloud: Privacy Concerns Are Not Slowing the Adoption of Cloud Services, but Challenges Remain, original research sponsored by Microsoft and independently conducted by the Ponemon Institute. The report concludes with a list of 10 recommended steps that organizations can take to address cloud privacy and security concerns, and in this blog, we have provided information about Azure services such as Azure Active Directory and Azure Key Vault that help address all 10 recommendations.

The research was undertaken to better understand how organizations undergo digital transformation while wrestling with the organizational impact of complying with such significant privacy regulations as the European Union’s General Data Protection Regulation (GDPR). The research explored the reasons organizations are migrating to the cloud, the security and privacy challenges they encounter in the cloud, and the steps they have taken to protect sensitive data and achieve compliance.

The survey of over 1,000 IT professionals in the US and EU found that privacy concerns are not slowing cloud adoption and that most privacy-related activities are easier in the cloud, while at the same time, most organizations don’t feel they have control and visibility they need to manage online privacy.  The report lists ten steps organizations can take to improve security and privacy.

Download Data Protection and Privacy Compliance in the Cloud

Key takeaways from the research include:

  • Privacy concerns are not slowing the adoption of cloud services, as only one-third of US respondents and 38 percent of EU respondents say privacy issues have stopped or slowed their adoption of cloud services. The importance of the cloud in reducing costs and speeding time to market seem to override privacy concerns.
  • Most privacy-related activities are easier to deploy in the cloud. These include governance practices such as conducting privacy impact assessments, classifying or tagging personal data for sensitivity or confidentiality, and meeting legal obligations, such as those of the GDPR. However, other items such as managing incident response are considered easier to deploy on premises than in the cloud.
  • 53 percent of US and 60 percent of EU respondents are not confident that their organization currently meets their privacy and data protection requirements. This lack of confidence may be because most organizations are not vetting cloud-based software for privacy and data security requirements prior to deployment.
  • Organizations are reactive and not proactive in protecting sensitive data in the cloud. Specifically, just 44 percent of respondents are vetting cloud-based software or platforms for privacy and data security risks, and only 39 percent are identifying information that is too sensitive to be stored in the cloud.
  • Just 29 percent of respondents say their organizations have the necessary 360-degree visibility into the sensitive or confidential data collected, processed, or stored in the cloud. Organizations also lack confidence that they know all the cloud applications and platforms that they have deployed.

The Ponemon report closes with a list of recommended steps that organizations can take to address cloud privacy and security concerns, annotated below with relevant Azure services that can help you implement each of the recommendations:

  1. Improve visibility into the organization’s sensitive or confidential data collected, processed, or stored in the cloud environment. 
    Azure service: Azure Information Protection helps discover, classify, and control sensitive data. Learn more.
  2. Educate themselves about all the cloud applications and platforms already in use in the organization.
    Azure service: Microsoft Cloud App Security helps discover and control the use of shadow IT by identifying cloud apps, infrastructure as a service (IaaS), and platform as a service (PaaS) services. Learn more.
  3. Simplify the authentication of users in both on-premises and cloud environments.
    Azure service: Azure Active Directory provides tools to manage and deploy single sign-on authentication for both cloud and on-prem services. Learn more.
  4. Ensure the cloud provider offers event monitoring of suspicious and anomalous traffic in the cloud environment.
    Azure service: Azure Monitor enables customers to collect, analyze, and act on telemetry data from both Azure and on-premises environments. Learn more.
  5. Implement the capability to encrypt sensitive and confidential data in motion and at rest.
    Azure service: Azure offers a variety of options for encrypting both data at rest and in transit. Learn more.
  6. Make sure that the organization uses and manages its own encryption keys (BYOK).
    Azure service: Azure Key Vault allow you to import or generate keys in hardware security modules (HSMs) that never leave the HSM boundary. Learn more.
  7. Implement multifactor authentication before allowing access to the organization’s data and applications in the cloud environment.
    Azure service: Azure Active Directory offers multiple options for deploying multifactor authentication for both cloud and on-prem services. Learn more.
  8. Assign responsibility for ensuring compliance with privacy and data protection regulations and security safeguards in the cloud to those most knowledgeable: the compliance and IT security teams. Privacy and data protection teams should also be involved in evaluating any cloud applications or platforms under consideration.
    Azure service: Role-based access control (RBAC) helps manage who has access to Azure resources, what they can do with those resources, and what areas they have access to. Learn more.
  9. Identify information that is too sensitive to be stored in the cloud and assess the impact that cloud services may have on the ability to protect and secure confidential or sensitive information.
    Azure service: Azure Information Protection helps discover, classify, and control sensitive data. Learn more.
  10. Thoroughly evaluate cloud-based software and platforms for privacy and security risks.
    Azure service: Microsoft Cloud App Security Assess the risk levels and business readiness of over 16,000 apps. Learn more.

Read the full report to learn more.