Exploring container security: How DroneDeploy achieved ISO-27001 certification on GKE

Editor’s note: Aerial data mapping company DroneDeploy wanted to migrate its on-premises Kubernetes environment to Google Kubernetes Engine—but only if it would pass muster with auditors. Read on to learn how the firm leveraged GKE’s native security capabilities to smooth the path to ISO-27001 certification.

At DroneDeploy, we put a lot of effort into securing our customers’ data. We’ve always been proud of our internal security efforts, and receiving compliance certifications validates these efforts, helping us formalize our information security program, and keeping us accountable to a high standard. Recently, we achieved ISO-27001 certification— all from taking advantage of the existing security practices in Google Cloud and Google Kubernetes Engine (GKE). Here’s how we did it.

As a fast-paced, quickly growing B2B SaaS startup in San Francisco, our mission is to make aerial data accessible and productive for everyone. We do so by providing our users with image processing, automated mapping, 3D modeling, data sharing, and flight controls through iOS and Android applications. Our Enterprise Platform provides an admin console for role-based access and monitoring of flights, mapped routes, image capture, and sharing. We serve more than 4,000 customers across 180 countries in the construction, energy, insurance, and mining industries, and ingest more than 50 terabytes of image data from over 30,000 individual flights every month.

Many of our customers and prospects are large enterprises that have strict security expectations of their third-party service providers. In an era of increased regulation (such as Europe’s GDPR law) and data security concerns, the scrutiny on information security management has never been higher.. Compliance initiatives are one piece of the overall security strategy that help us communicate our commitment to securing customer data. At DroneDeploy, we chose to start our compliance story with ISO-27001, an international information security standard that is for recognized across a variety of industries.

DroneDeploy’s Architecture: Google Kubernetes Engine (GKE)

DroneDeploy was an early adopter of Kubernetes, and we have long since migrated all our workloads from virtual machines to containers orchestrated by Kubernetes. We currently run more than 150,000 Kubernetes jobs each month with run times ranging from a few minutes to a few days. Our tooling for managing clusters evolved over time, starting with hand-crafted bash and Ansible scripts, to the now ubiquitous (and fantastic) kops. About 18 months ago, we decided to re-evaluate our hosting strategy given the decreased costs of compute in the cloud. We knew that managing our own Kubernetes clusters was not a competitive advantage for our business and that we would rather spend our energy elsewhere if we could.

We investigated the managed Kubernetes offerings of the top cloud providers and did some technical due diligence before making our selection—comparing not only what was available at the time but also future roadmaps. We found that GKE had several key features that were missing in other providers such as robust Kubernetes-native autoscaling, a mature control plane, multi-availability zone masters, and extensive documentation. GKE’s ability to run on pre-emptible node pools for ephemeral workloads was also a huge plus.

Proving our commitment to security hardening

But if we were going to make the move, we needed to document our information security management policies and process and prove that we were following best practices for security hardening.

Specifically, when it comes to ISO-27001 certification, we needed to follow the general process:

  1. Document the processes you perform to achieve compliance
  2. Prove that the processes convincingly address the compliance objectives
  3. Provide evidence that you are following the process
  4. Document any deviations or exceptions

While Google Cloud offers hardening guidance for GKE and several GCP blogs to guide our approach, we still needed to prove that we had security best practices in place for our critical systems. With newer technologies, though, it can be difficult to provide clear evidence to an auditor that those best practices are in place; they often live in the form of blog posts by core contributors and community leaders versus official, documented best practices. Fortunately, standards have begun to emerge for Kubernetes. The Center for Internet Security (CIS) recently published an updated compliance benchmark for Kubernetes 1.11 that is quite comprehensive. You can even run automated checks against the CIS benchmark using the excellent open source project kube-bench. Ultimately though, it was the fact that Google manages the underlying GKE infrastructure that really helped speed up the certification process.  

Compliance with less pain thanks to GKE

As mentioned, one of the main reasons we switched from running Kubernetes in-house to GKE was to reduce our investment in manually maintaining and upgrading our Kubernetes clusters— including our compliance initiatives. GKE reduces the overall footprint that our team has to manage since Google itself manages and documents much of the underlying infrastructure. We’re now able to focus on improving and documenting the parts of our security procedures that are unique to our company and industry, rather than having to meticulously document the foundational technologies of our infrastructure.

For Kubernetes, here’s a snippet of how we documented our infrastructure using the four steps described above:

  1. We implemented security best practices within our Kubernetes clusters by ensuring all of them are benchmarked using the Kubernetes CIS guide. We use kube-bench for this process, which we run on our clusters once every quarter.
  2. A well respected third-party authority publishes this benchmark, which confirms that our process addresses best practices for using Kubernetes securely.
  3. We provided documentation that we assessed our Kubernetes clusters against the benchmark, including the tickets to track the tasks.
  4. We provided the results of our assessment and documented any policy exceptions and proof that we evaluated those exceptions against our risk management methodology.

Similarly to the physical security sections of the ISO-27001 standard, the CIS benchmark has large sections dedicated to security settings for Kubernetes masters and nodes. Because we run on GKE, Google handled 95 of the 104 line items in the benchmark applicable to our infrastructure. For those items that could not be assessed against the benchmark (because GKE does not expose the masters), we provided links to Google’s security documentation on those features (see Cluster Trust and Control Plane Security). Some examples include:

Beyond GKE, we were also able to take advantage of many other Google Cloud services that made it easier for us to secure our cloud footprint (although the shared responsibility model for security means we can’t rely on Google Cloud alone):

  • For OS level security best practices, we we able to document strong security best practices for our OS security because we use Google’s Container-Optimized OS (COS), which provides many security best practices by default by using things such as a read-only file system. All that was left for us to do was was follow best practices to help secure our workloads.
  • We use node auto-upgrade on our GKE nodes to handle patch management at the OS layer for our nodes. For the level of effort, we found that node auto-upgrade provides a good middle ground patching and stability. To date, we have not had any issues with our software as a result of node auto-upgrade.
  • We use Container Analysis (which is built into Google Container Registry) to scan for known vulnerabilities in our Docker images.
  • ISO-27001 requires that you demonstrate the physical security of your network infrastructure. Because we run our entire infrastructure in the cloud, we were able to directly rely on Google Cloud’s physical and network security for portions of the certification (Google Cloud is ISO-27001 certified amongst other certifications).

DroneDeploy is dedicated to giving our customers access to aerial imaging and mapping technologies quickly and easily. We handles vast amounts of sensitive information on behalf of our customers, and we want them to know that we are following best security practices even when the underlying technology gets complicated, like in the case of Kubernetes. For DroneDeploy, switching to GKE and Google Cloud has helped us reduce our operational overhead and increased the velocity with which we achieve key compliance certifications. To learn more about DroneDeploy, and our experience using Google Cloud and GKE, feel free to reach out to us.

Recursion Pharmaceuticals accelerates drug discovery with Google Cloud

Despite advances in scientific research and medical technology, the process of drug discovery has become increasingly slower and more expensive over the last decade. While the pharmaceutical industry has spent more money on research and development each year, this has not resulted in an increase in the number of FDA-approved new medicines. Recursion, headquartered in Salt Lake City, is looking to address this declining productivity by combining rich biological datasets with the latest in machine learning to reinvent the drug discovery and development process.

Today, Recursion has selected Google Cloud as their primary public cloud provider as they build a drug discovery platform that combines chemistry, automated biology, and cloud computing to reveal new therapeutic candidates, potentially cutting the time to discover and develop a new medicine by a factor of 10.

In order to fulfill their mission, Recursion developed a data pipeline that incorporates image processing, inference engines and deep learning modules, supporting bursts of computational power that weigh in at trillions of calculations per second. In just under two years, Recursion has created hundreds of disease models, generated a shortlist of drug candidates across several diseases, and advanced drug candidates into the human testing phase for two diseases.

Starting with wet biology—plates of glass-bottom wells containing thousands of healthy and diseased human cells—biologists run experiments on the cells, applying stains that help characterize and quantify the features of the cellular samples: their roundness, the thickness of their membrane, the shape of their mitochondria, and other characteristics. Automated microscopes capture this data by snapping high-resolution photos of the cells at several different light wavelengths. The data pipeline, which sits on top ofGoogle Kubernetes Engine (GKE) and Confluent Kafka, all running on GCP, extracts and analyzes cellular features from the images. Then, data are processed by deep neural networks to find patterns, including those humans might not recognize. The neural nets are trained to compare healthy and diseased cell signatures with those of cells before and after a variety of drug treatments. This process yields promising new potential therapeutics.

To train its deep learning models, Recursion uses on-premises GPUs, then they use GCP CPUs to perform inference on new images in the pipeline using these models. Recursion is currently evaluating cloud-based alternatives including using Cloud TPU technology to accelerate and automate image processing. Since Recursion is already using TensorFlow to train its neural networks in its proprietary biological domains, Cloud TPUs are a natural fit. Additionally, Recursion is exploring using GKE On-Prem, the foundation of Cloud Services Platform, to manage all of their Kubernetes clusters from a single, easy-to-use console.

We’re thrilled to collaborate with Recursion in their quest to more rapidly and inexpensively discover new medicines for dozens of diseases, both rare and common. Learn more about how Recursion is using Google Cloud solutions to better execute its mission of “decoding biology to radically improve lives” here. You can also learn more about solutions for life sciences organizations and our Google Cloud for Startups Program.

Everyday AI: beyond spell check, how Google Docs is smart enough to correct grammar

Written communication is at the heart of what drives businesses. Proposals, presentations, emails to colleagues—this all keeps work moving forward. This is why we’ve built features into G Suite to help you communicate effectively, like Smart Compose and Smart Reply, which use machine learning smarts to help you draft and respond to messages quickly. More recently, we’ve introduced machine translation techniques into Google Docs to flag grammatical errors within your documents as you draft them.

If you’ve ever questioned whether to use “a” versus “an” in a sentence, or if you’re using the correct verb tense or preposition, you’re not alone. Grammar is nuanced and tricky, which makes it a great problem to solve with the help of artificial intelligence. Here’s a look at how we built grammar suggestions in Docs.

The gray areas of grammar

Although we generally think of grammar as a set of rules, these rules are often complex and subjective. In spelling, you can reference a resource that tells you whether a word exists or how it’s spelled: dictionaries (Remember those?).

Grammar is different. It’s a harder problem to tackle because its rules aren’t fixed. It varies based on language and context, and may change over time, too. To make things more complicated, there are many different style books—whether it be MLA, AP or some other style—which makes consistency a challenge.

Given these nuances, even the experts don’t always agree on what’s correct. For our grammar suggestions, we worked with professional linguists to proofread sample sentences to get a sense of the true subjectivity of grammar. During that process, we found that linguists disagreed on grammar about 25 percent of the time. This raised the obvious question: how do we automate something that doesn’t run on definitive rules?

Where machine translation makes a mark

Much like having someone red-line your document with suggestions on how to replace “incorrect” grammar with “correct” grammar, we can use machine translation technology to help automate that process. At a basic level, machine translation performs substitution and reorders words from a source language to a target language, for example, substituting a “source” word in English (“hello!”) for a “target” word in Spanish (¡hola!). Machine translation techniques have been developed and refined over the last two decades throughout the industry, in academia and at Google, and have even helped power Google Translate.

Along similar lines, we use machine translation techniques to flag “incorrect” grammar within Docs using blue underlines, but instead of translating from one language to another like with Google Translate, we treat text with incorrect grammar as the “source” language and correct grammar as the “target.”


Working with the experts

Before we could train models, we needed to define “correct” and “incorrect” grammar. What better way to do so than to consult the experts? Our engineers worked with a collection of computational and analytical linguists, with specialties ranging from sociology to machine learning. This group supports a host of linguistic projects at Google and helps bridge the gap between how humans and machines process language (and not just in English—they support over 40 languages and counting).

For several months, these linguists reviewed thousands of grammar samples to help us refine machine translation models, from classic cases like “there” versus “their” versus “they’re,” to more complex rules involving prepositions and verb tenses. Each sample received close attention—three linguists reviewed each case to identify common patterns and make corrections. The third linguist served as the “tie breaker” in case of disagreement (which happened a quarter of the time).


Once we identified the samples, we then fed them into statistical learning algorithms—along with “correct” text gathered from high-quality web sources (billions of words!)—to help us predict outcomes using stats like the frequency at which we’ve seen a specific correction occur. This process helped us build a basic spelling and grammar correction model.

We iterated over these models by rolling them out to a small portion of people who use Docs, and then refined them based on user feedback and interactions. For example, in earlier models of grammar suggestions, we received feedback that suggestions for verb tenses and the correct singular or plural form of a noun or verb were inaccurate. We’ve since adjusted the model to solve for these specific issues, resulting in more precise suggestions. Although it’s impossible to catch 100 percent of issues, we’re constantly evaluating our models at Google to ensure bias does not surface in results such as these.

Better grammar. No ifs, ands or buts.

So if you’ve ever asked yourself “how does it know what to suggest when I write in Google Docs,” these grammar suggestion models are the answer.  They’re working in the background to analyze your sentence structure, and the semantics of your sentence, to help you find mistakes or inconsistencies. With the help of machine translation, here are some mistakes that Docs can help you catch:


Evolving grammar suggestions, just like language

When it comes to grammar, we’re constantly improving the quality of each suggestion to make corrections as useful and relevant as possible. With our AI-first approach, G Suite is in the best position to help you communicate smarter and faster, without sweating the small stuff. Learn more.

Announcing the general availability of Azure Lab Services

Today, we are very excited to announce the general availability of Azure Lab Services – your computer labs in the cloud.

With Azure Lab Services, you can easily set up and provide on-demand access to preconfigured virtual machines (VMs) to teach a class, train professionals, run hackathons or hands-on labs, and more. Simply input what you need in a lab and let the service roll it out to your audience. Your users go to a single place to access all their VMs across multiple labs, and connect from there to learn, explore, and innovate.

Since our preview announcement, we have had many customers use the service to conduct classes, training sessions, boot camps, hands on labs, and more! For classroom or professional training, you can provide students with a lab of virtual machines configured with exactly what you need for class and give each student a specified number of hours to use the VMs for homework or personal projects. You can run a hackathon or a hands-on lab at conferences or events and scale up to hundreds of virtual machines for your attendees. You can also create an invite-only private lab of virtual machines installed with your prerelease software to give preview customers access to early trials or set up interactive sales demos.

Top three reasons customers use Azure Lab Services

Automatic management of Azure infrastructure and scale

Azure Lab Services is a managed service, which means that provisioning and management of a lab’s underlying infrastructure is handled automatically by the service. You can just focus on preparing the right lab experience for your users. Let the service handle the rest and roll out your lab’s virtual machines to your audience. Scale your lab to hundreds of virtual machines with a single click.

Publishing a template in Azure Lab Services

Simple experience for your lab users

Users who are invited to your lab get immediate access to the resources you give them inside your labs. They just need to sign in to see the full list of virtual machines they have access to across multiple labs. They can click on a single button to connect to the virtual machines and start working. Users don’t need Azure subscriptions to use the service.

Screen shot displaying a sample virtual machine in Azure Lab Services

Cost optimization and tracking 

Keep your budget in check by controlling exactly how many hours your lab users can use the virtual machines. Set up schedules in the lab to allow users to use the virtual machines only during designated time slots or set up reoccurring auto-shutdown and start times. Keep track of individual users’ usage and set limits.

An example screen shot of how to set up scheduled in Azure Lab Services

Get started now

Try Azure Lab Services today! Get started by creating a lab account for your organization or team. All labs are managed under a lab account. You can give permissions to people in your organization to create labs in your lab account.

To learn more, visit the Azure Lab Services documentation. Ask any questions you have on Stack Overflow. Last of all, don’t forget to subscribe to our Service Updates and view other Azure Lab Services posts on the Azure blog to get the latest news.

General availability pricing

Azure Lab Services GA pricing goes into effect on May 1, 2019. Until then, you will continue to be billed based on the preview pricing. Please see the Azure Lab Services pricing page for complete details.

What’s next

We continue to listen to our customers to prioritize and ship new features and updates. Several key features will be enabled in the coming months:

  • Ability to reuse and share custom virtual machine images across labs
  • Feature to enable connections between a lab and on-premise resources
  • Ability to create GPU virtual machines inside the labs

We always welcome any feedback and suggestions. You can make suggestions or vote on priorities on our UserVoice feedback forum.

Latest enhancements now available for Cognitive Services’ Computer Vision

This blog was co-authored by Lei Zhang, Principal Research Manager, Computer Vision

You can now extract more insights and unlock new workflows from your images with the latest enhancements to Cognitive Services’ Computer Vision service.

1. Enrich insights with expanded tagging vocabulary

Computer Vision has more than doubled the types of objects, situations, and actions it can recognize per image.


Black and white picture of New York City before Computer Vision


Black and white picture of New York City now, after Computer vision

2. Automate cropping with new object detection feature

Easily automate cropping and conduct basic counting of what you need from an image with the new object detection feature. Detect thousands of real life or man-made objects in images. Each object is now highlighted by a bounding box denoting its location in the image.

Picture of subway train and detecting real life or man-made objects

3. Monitor brand presence with new brand detection feature

You can now track logo placement of thousands of global brands from the consumer electronics, retail, manufacturing, entertainment industries.

Picture of Microsoft logo and the brand detection feature

With these enhancements, you can:

  • Do at-scale image and video-frame indexing, making your media content searchable. If you’re in media, entertainment, advertising, or stock photography, rich image and video metadata can unlock productivity for your business.
  • Derive insights from social media and advertising campaigns by understanding the content of images and videos and detecting logos of interest at scale. Businesses like digital agencies have found this capability useful for tracking the effectiveness of advertising campaigns. For example, if your business launches an influencer campaign, you can apply Custom Vision to automatically generate brand inclusion metrics pulling from influencer-generated images and videos.

In some cases, you may need to further customize the image recognition capabilities beyond what the enhanced Computer Vision service now provides by adding specific tagging vocabulary or object types that are relevant to your use case. Custom Vision service allows you to easily customize and deploy your model without requiring machine-learning expertise.

See it in action through the Computer Vision demo. If you’re ready to start building to unlock these insights, visit our documentation pages for image tagging, object detection, and brand detection.

Running Redis on GCP: four deployment scenarios

Redis is one of the most popular open source in-memory data stores, used as a database, cache and message broker. This post covers the major deployment scenarios for Redis on Google Cloud Platform (GCP). In the following post, we’ll go through the pros and cons of these deployment scenarios and the step-by-step approach, limitations and caveats for each.

Deployment options for running Redis on GCP

There are four typical deployment scenarios we see for running Redis on GCP: Cloud Memorystore for Redis, Redis Labs Cloud and VPC, Redis on Google Kubernetes Engine (GKE), and Redis on Google Compute Engine. We’ll go through the considerations for each of them. It’s also important to have backup for production databases, so we’ll discuss backup and restore considerations for each deployment type.

Cloud Memorystore for Redis

Cloud Memorystore for Redis, part of GCP, is a way to use Redis and get all its benefits without the cost of managing Redis. If you need data sharding, you can deploy open source Redis proxies such as Twemproxy and Codis with multiple Cloud Memorystore for Redis instances for scale until Redis Cluster becomes ready in GCP.


Twemproxy, also known as the nutcracker, is an open source (under the Apache License) fast and lightweight Redis proxy developed by Twitter. The purpose of Twemproxy is to provide a proxy and data sharding solution for Redis and to reduce the number of client connections to the back-end Redis instances. You can set up multiple Redis instances behind Twemproxy. Clients only talk to the proxy and don’t need to know the details of back-end Redis instances, which simplifies management. You can also run multiple Twemproxy instances for the same group of back-end Redis servers to prevent having a single point of failure, as shown here:


Note that Twemproxy does not support all Redis commands, such as pub/sub and transaction commands. In addition, it’s not convenient to add or remove back-end Redis nodes for Twemproxy. It requires you to restart Twemproxy for configurations to be effective, and data isn’t rebalanced automatically after adding or removing Redis nodes.


Codis is an open source (under the MIT License) proxy-based high-performance Redis cluster tool developed by CodisLabs. Codis offers another Redis data sharding proxy option to solve the horizontal scalability limitation and lack of administration dashboard. It’s fully compatible with Twemproxy and has a handy tool called redis-port that handles the migration from Redis Twemproxy to Codis.

Pros of Cloud Memorystore for Redis

  • It’s fully managed. Google fully manages administrative tasks for Redis instances such as hardware provisioning, setup and configuration management, software patching, failover, monitoring and other nuances that require considerable effort for service owners who just want to use Redis as a memory store or a cache.
  • It’s highly available. We provide a standard Cloud Memorystore tier, in which we fully manage replication and failover to provide high availability. In addition, you can keep the replica in a different zone.
  • It’s scalable and performs well. You can easily scale memory that’s provisioned for Redis instances. We also provide high network throughput per instance, which can be scaled on demand.
  • It’s updated and secure. We provide network isolation so that access to Redis is restricted to within a network via a private IP. Also, OSS compatibility is Redis 3.2.11, as of late 2018.

Cons of Cloud Memorystore for Redis

  • Some features are not yet available: Redis Cluster, backup and restore.
  • It lacks replica options. Cloud Memorystore for Redis provides a master/replica configuration in the standard tier, and master and replica are spread across zones. There is only one replica per instance.
  • There are some product constraints you should note.

You can deploy OSS proxies such as Twemproxy and Codis with multiple Cloud Memorystore for Redis instances for scalability until Redis Cluster is ready in GCP. And note the caveat that basic-tier Cloud Memorystore for Redis instances are subject to a cold restart and full data flush during routine maintenance, scaling, or an instance failure. Choose the standard tier to prevent data loss during those events.

How to get started

Check out our Cloud Memorystore for Redis guide for the basics. You can see here how to configure multiple Cloud Memorystore for Redis instances using Twemproxy and an internal load balancer in front of them.

1. Create nine new Cloud Memorystore for Redis instances in asia-northeast1 region

2. Prepare a Twemproxy container for deployment

3. Build a Twemproxy docker image

* Please replace with your GCP project ID.

Note that a VM instance starts a container with –network=”host” flag of the Docker run command by default.

4. Create an instance template based on the Docker image

* Please replace with your GCP project ID.

5. Create a managed instance group using the template

6. Create a health check for the internal load balancer

7.Create a back-end service for the internal load balancer

8. Add instance groups to the back-end service

9. Create a forwarding rule for the internal load balancer

10. Configure firewall rules to allow the internal load balancer access to Twemproxy instances

Redis Labs Cloud and VPC

To get managed Redis Clusters, you can use a partner solution from Redis Labs. Redis Labs has two managed-service options: Redis Enterprise Cloud (hosted) and Redis Enterprise VPC (managed).

  • Redis Enterprise Cloud is a fully managed and hosted Redis Cluster on GCP.
  • Redis Enterprise VPC is a fully managed Redis Cluster in your virtual private cloud (VPC) on GCP.

Redis Labs Cloud and VPC protect your database by maintaining automated daily and on-demand backups to remote storage. You can back up your Redis Enterprise Cloud/VPC databases to Cloud Storage. Find instructions here.

You can also import a data set from an RDB file using Redis Labs Cloud with VPC. Check out the official public document on Redis Labs site for instructions.

Pros of Redis Labs Cloud and VPC

  • It’s fully managed. Redis Labs manages all administrative tasks.
  • It’s highly available. These Redis Labs products include an SLA with 99.99% availability.
  • It scales and performs well. It will automatically add new instances to your cluster according to your actual data set size without any interruption to your applications.
  • It’s fully supported. Redis Labs supports Redis itself.

Cons of Redis Labs Cloud and VPC

  • There’s a cost consideration. You’ll have to pay separately for Redis Labs’ service.

How to get started

Contact Redis Labs to discuss further steps.

Redis on GKE

If you want to use Redis Cluster, or want to read from replicas, Redis on GKE is an option. Here’s what you should know.

Pros of Redis on GKE

  • You have full control of the Redis instances. You can configure, manage and operate as you like.
  • You can use Redis Cluster.
  • You can read from replicas.

Cons of Redis on GKE

  • It’s not managed. You’ll need to manage administrative tasks such as hardware provisioning, setup and configuration management, software patching, failover, backup and restore, configuration management, monitoring, etc.
  • Availability, scalability and performance varies, depending on how you architect. Running a standalone instance of Redis on GKE is not ideal for production because it would be a single point of failure, so consider configuring master/slave replication to have redundant nodes with Sentinel, or set up a cluster.
  • There’s a steeper learning curve. This option requires you to learn Redis itself in more detail. Kubernetes also requires some time to learn, and its deployment may introduce additional complexity to your design and operations.
  • When using Redis on GKE, you’ll want to be aware of GKE cluster node maintenance; cluster nodes will need to be upgraded once every three months or so. To avoid unexpected disruption during the upgrade process, consider using PodDisruptionBudgets and configure parameters appropriately. And you’ll want to run containers in host networking mode to eliminate additional network overhead from Docker networking. Make sure that you run one Redis instance on each VM, otherwise it may cause port conflicts. This can be achieved with podAntiAffinity.

How to get started

Use Kubernetes to deploy a container to run Redis on GKE. The example below shows the steps to deploy Redis Cluster on GKE.


1. Provision a GKE cluster

* If prompted, specify your preferred GCP project ID or zone.

2. Clone an example git repository

3. Create config maps

4. Deploy Redis pods

* Wait until it is completed.

5. Prepare a list of Redis cache nodes

6. Submit a job to configure Redis Cluster

7. Confirm the job “redis-create-cluster-xxxxx” shows completed status

Limitations highly depend on how you design the cluster.

Backing up and restoring manually built Redis

Both GKE and Compute Engine will follow the same method to back up and restore your databases. Basically, copying the RDB file is completely safe while the server is running, because the RDB is never modified once produced.

To back up your data, copy the RDB file to somewhere safe, such as Cloud Storage.

  • Create a cron job in your server to take hourly snapshots of the RDB files in one directory, and daily snapshots in a different directory.
  • Every time the cron script runs, make sure to call the “find” command to make sure old snapshots are deleted: for instance, you can take hourly snapshots for the latest 48 hours, and daily snapshots for one or two months. Make sure to name the snapshots with data and time information.
  • At least once a day, make sure to transfer an RDB snapshot outside your production environment. Cloud Storage is a good place to do so.

To restore a data set from an RDB file, disable AOF and remove AOF and RDB before restoring data to Redis. Then you can copy RDB file from remote and simply restart redis-server to restore your data.

  • Redis will try to restore data from the AOF file if AOF is enabled. If the AOF file cannot be found, Redis will start with an empty data set.
  • Once the RDB snapshot is triggered due to the key changes, the original RDB file will be rewritten.

Redis on Compute Engine

You can also deploy your own open source Redis Cluster on Google Compute Engine if you want to use Redis Cluster, or want to read from replicas. The possible deployment options are:

  • Run Redis on a Compute Engine instance—this is the simplest way to run the Redis service processes directly.
  • Run Redis containers on Docker on a Compute Engine instance.

Pros of Redis on Compute Engine

  • You’ll have full control of Redis. You can configure, manage and operate as you like.

Cons of Redis on Compute Engine

  • It’s not managed. You have to manage administrative tasks such as hardware provisioning, setup and configuration management, software patching, failover, backup and restore, configuration management, monitoring, etc.
  • Availability, scalability and performance depend on how you architect. For example,  a standalone setup is not ideal for production because it would be a single point of failure, so consider configuring master/slave replication to have redundant nodes with Sentinel, or set up a cluster.
  • There’s a steeper learning curve: This option requires you to learn Redis itself in more detail.

For best results, run containers in host networking mode to eliminate additional network overhead from Docker networking. Make sure that you run one Redis container on each VM, otherwise it causes port conflicts. Limitations highly depend on how you design the cluster.

How to get started

Provision Compute Engine instances by deploying containers on VMs and managed instance groups. Alternatively, you can run your container on Compute Engine instances using whatever container technologies and orchestration tools that you need. You can create an instance from a public VM image and then install the container technologies that you want, such as Docker. Package service-specific components into separate containers and upload to Cloud Repositories.

The steps to configure Redis on Compute Engine instances are pretty basic if you’re already using Compute Engine, so we don’t describe them here. Check out the Compute Engine docs and open source Redis docs for more details.

Redis performance testing

It’s always necessary to measure the performance of your system to identify any bottlenecks before you expose it in production. The key factors affecting the performance of Redis are CPU, network bandwidth and latency, the size of the data set, and the operations you perform. If the result of the benchmark test doesn’t meet your requirements, consider scaling your infrastructure up or out or adjust the way you use Redis. There are a few ways to do benchmark testing against multiple Cloud Memorystore for Redis instances deployed using Twemproxy with an internal load balancer in front.


Redis-benchmark is an open source command line benchmark tool for Redis, which is included with the open source Redis package.


Memtier_benchmark is an open source command line benchmark tool for NoSQL key-value stores, developed by Redis Labs. It supports both Redis and Memcache protocols, and can generate various traffic patterns against instances.

Migrating Redis to GCP

The most typical Redis customer journey to GCP we see is migration from other cloud providers. Here are a few options that can be used to perform data migration of Redis:

If you would like to work with Google experts to migrate your Redis deployment onto GCP, get in touch and learn more here.

OpenVPN: Enabling access to the corporate network with Cloud Identity credentials

Editor’s note: Cloud Identity, Google Cloud’s identity as a service (IDaaS) platform, now offers secure LDAP functionality that enables authentication, authorization, and user/group lookups for LDAP-based apps and IT infrastructure. Today, we hear from OpenVPN, which has tested and integrated its OpenVPN Access Server with secure LDAP, enabling your employees and partners to use their Cloud Identity credentials to access applications through VPN. Read on to learn more.

As IT organizations adopt more cloud-based IaaS and SaaS apps, they need a way to let users access them securely, while still being able to use legacy LDAP-based apps and infrastructure. The new secure LDAP capabilities in Cloud Identity provides both legacy LDAP platforms and cloud-native applications with a single authentication source, for a simple, effective solution to this problem.

In fact, we here at OpenVPN have integrated our OpenVPN Access Server with Cloud Identity, allowing your remote users to connect to your corporate network and apps over VPN with their Cloud Identity (or G Suite) credentials. This helps keep your company secure, and ensures your entire team is following the protocol.

This illustration demonstrates how Cloud Identity makes security accessible and efficient for any level of enterprise. The top-half of the illustration shows the deployment of OpenVPN Access Server in various cloud IaaS providers. As you can see, all instances of Access Server use Cloud Identity for authentication and authorization. The Access Servers are configured with a group called ‘IT Admin,’ which allows SSH access to all application servers on all the private networks. This allows any employee identity present in Cloud Identity that is a member of ‘IT Admin’ group to access any of the private networks via VPN and use SSH.


Then, as you can see in the lower half of the illustration, remote employees use VPN to connect to your corporate network and apps with their Cloud Identity credentials.

Using Cloud Identity for authentication

OpenVPN Access Server v2.6.1 and later supports secure LDAP and has been tested to work with Cloud Identity. You can find specific configuration instructions on our website.

Using Cloud Identity groups for network access control

As shown in the illustration below, Access Server’s administrative controls make it easy to configure groups. Administrators can configure access controls for these groups with fine granularity down to an individual IP address and port number.

You can configure groups in Access Server that correspond to those stored in Cloud Identity and enforce access controls for the user based on that user’s group membership. You can do this kind of mapping by using a script on Access Server. Instructions to set up the script are available on our website. In addition, our support staff is also ready to help you.


With OpenVPN Access Server, you can protect your cloud applications, connect your premise to the cloud, and provide simple and secure access for your remote employees in a way that scales with the tools you’re already using. Best of all, OpenVPN Access Server is available on GCP Marketplace. Try it out today!

Let Deep Learning VMs and Jupyter notebooks to burn the midnight oil for you: robust and automated training with Papermill

In the past several years, Jupyter notebooks have become a convenient way of experimenting with machine learning datasets and models, as well as sharing training processes with colleagues and collaborators. Often times your notebook will take a long time to complete its execution. An extended training session may cause you to incur charges even though you are no longer using Compute Engine resources.

This post will explain how to execute a Jupyter Notebook in a simple and cost-efficient way.

We’ll explain how to deploy a Deep Learning VM image using TensorFlow to launch a Jupyter notebook which will be executed using the Nteract Papermill open source project. Once the notebook has finished executing, the Compute Engine instance that hosts your Deep Learning VM image will automatically terminate.

The components of our system:

First, Jupyter Notebooks

The Jupyter Notebook is an open-source web-based, interactive environment for  creating and sharing IPython notebook (.ipynb) documents that contain live code, equations, visualizations and narrative text. This platform supports data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

Next, Deep Learning Virtual Machine (VM) images

The Deep Learning Virtual Machine images are a set of Debian 9-based Compute Engine virtual machine disk images that are optimized for data science and machine learning tasks. All images include common ML frameworks and tools installed from first boot, and can be used out of the box on instances with GPUs to accelerate your data processing tasks. You can launch Compute Engine instances pre-installed with popular ML frameworks like TensorFlow, PyTorch, or scikit-learn, and even add Cloud TPU and GPU support with a single click.

And now, Papermill

Papermill is a library for parametrizing, executing, and analyzing Jupyter Notebooks. It lets you spawn multiple notebooks with different parameter sets and execute them concurrently. Papermill can also help collect and summarize metrics from a collection of notebooks.

Papermill also permits you to read or write data from many different locations. Thus, you can store your output notebook on a different storage system that provides higher durability and easy access in order to establish a reliable pipeline. Papermill recently added support for Google Cloud Storage buckets, and in this post we will show you how to put this new functionality to use.


Submit a Jupyter notebook for execution

The following command starts execution of a Jupyter notebook stored in a Cloud Storage bucket:

The above commands do the following:

  • Create a Compute Engine instance using TensorFlow Deep Learning VM and 2 NVIDIA Tesla T4 GPUs
  • Install the latest NVIDIA GPU drivers
  • Execute the notebook using Papermill
  • Upload notebook result (with all the cells pre-computed) to Cloud Storage bucket in this case: “gs://my-bucket/”
  • Terminate the Compute Engine instance

And there you have it! You’ll no longer pay for resources you don’t use since after execution completes, your notebook, with populated cells, is uploaded to the specified Cloud Storage bucket. You can read more about it in the Cloud Storage documentation.

Note: In case you are not using a Deep Learning VM, and you want to install Papermill library with Cloud Storage support, you only need to run:

Note: Papermill version 0.18.2 supports Cloud Storage.

And here is an even simpler set of bash commands:

Execute a notebook using GPU resources

Execute a notebook using CPU resources

The Deep Learning VM instance requires several permissions: read and write ability to Cloud Storage, and the ability to delete instances on Compute Engine. That is why our original command has the scope “https://www.googleapis.com/auth/cloud-platform” defined.

Your submission process will look like this:

Note: Verify that you have enough CPU or GPU resources available by checking your quota in the zone where your instance will be deployed.

Executing a Jupyter notebook

Let’s look into the following code:

This command is the standard way to create a Deep Learning VM. But keep in mind, you’ll need to pick the VM that includes the core dependencies you need to execute your notebook. Do not try to use a TensorFlow image if your notebook needs PyTorch or vice versa.

Note: if you do not see a dependency that is required for your notebook and you think should be in the image, please let us know on the forum (or with a comment to this article).

The secret sauce here contains two following things:


Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

Papermill lets you:

  1. Parameterize notebooks via command line arguments or a parameter file in YAML format
  2. Execute and collect metrics across the notebooks
  3. Summarize collections of notebooks

In our case, we are just using its ability to execute notebooks and pass parameters if needed.

Behind the scenes

Let’s start with the startup shell script parameters:

  • INPUT_NOTEBOOK_PATH: The input notebook located Cloud Storage bucket.
  • Example: gs://my-bucket/input.ipynb
  • OUTPUT_NOTEBOOK_PATH: The output notebook located Cloud Storage bucket.
  • Example: gs://my-bucket/input.ipynb.
  • PARAMETERS_FILE: Users can provide a YAML file where notebook parameter values should be read.
  • Example: gs://my-bucket/params.yaml
  • PARAMETERS: Pass parameters via -p key value for notebook execution.
  • Example: -p batch_size 128 -p epochs 40.

The two ways to execute the notebook with parameters are: (1) through the Python API and (2) through the command line interface. This sample script supports two different ways to pass parameters to Jupyter notebook, although Papermill supports other formats, so please consult Papermill’s documentation.

The above script performs the following steps:

  1. Creates a Compute Engine instance using the TensorFlow Deep Learning VM and 2 NVIDIA Tesla T4 GPUs
  2. Installs NVIDIA GPU drivers
  3. Executes the notebook using Papermill tool
  4. Uploads notebook result (with all the cells pre-computed) to Cloud Storage bucket in this case: gs://my-bucket/
  5. Papermill emits a save after each cell executes, this could generate “429 Too Many Requests” errors, which are handled by the library itself.
  6. Terminates the Compute Engine instance


By using the Deep Learning VM images, you can automate your notebook training, such that you no longer need to pay extra or manually manage your Cloud infrastructure. Take advantage of all the pre-installed ML software and Nteract’s Papermill project to help you solve your ML problems more quickly! Papermill will help you automate the execution of yourJupyter notebooks and in combination of Cloud Storage and Deep Learning VM images you can now set up this process in a very simple and cost efficient way.

Running Cognitive Services on Azure IoT Edge

This blog post is co-authored by Emmanuel Bertrand, Senior Program Manager, Azure IoT.

We recently announced Azure Cognitive Services in containers for Computer Vision, Face, Text Analytics, and Language Understanding. You can read more about Azure Cognitive Services containers in this blog, “Brining AI to the edge.”

Today, we are happy to announce the support for running Azure Cognitive Services containers for Text Analytics and Language Understanding containers on edge devices with Azure IoT Edge. This means that all your workloads can be run locally where your data is being generated while keeping the simplicity of the cloud to manage them remotely, securely and at scale.

Image displaying containers that align with Vision and Langauge

Whether you don’t have a reliable internet connection, or want to save on bandwidth cost, have super low latency requirements, or are dealing with sensitive data that needs to be analyzed on-site, Azure IoT Edge with the Cognitive Services containers gives you consistency with the cloud. This allows you to run your analysis on-site and a single pane of glass to operate all your sites.

These container images are directly available to try as IoT Edge modules on the Azure Marketplace:

  • Key Phrase Extraction extracts key talking points and highlights in text either from English, German, Spanish, or Japanese.
  • Language Detection detects the natural language of text with a total of 120 languages supported.
  • Sentiment Analysis detects the level of positive or negative sentiment for input text using a confidence score across a variety of languages.
  • Language Understanding applies custom machine learning intelligence to a user’s conversational and natural language text to predict overall meaning and pull out relevant and detailed information.

Please note, the Face and Recognize Text containers are still gated behind a preview, thus are not yet available via the marketplace. However you can deploy them manually by first signing up to for the preview to get access.

In this blog, we describe how to provision Language Detection container on your edge device locally and how you manage it through Azure IoT.

Set up an IoT Edge device and its IoT Hub

Follow the first steps in this quick-start for setting up your IoT Edge device and your IoT Hub.

It first walks your through creating an IoT Hub and then registering an IoT Edge device to your IoT hub. Here is a screenshot of a newly created edge device called “LanguageDetection” under the IoT Hub called “CSContainers”. Select the device, copy its primary connection string, and save it for later.

Screenshot of a newly created Edge device

Next, it guides you through setting up the IoT Edge device. If you don’t have a physical edge device, it is recommended to deploy the Ubuntu Server 16.04 LTS and Azure IoT Edge runtime virtual machine (VM) which is available on the Azure Marketplace. It is an Azure Virtual Machine that comes with IoT Edge pre-installed.

The last step is to connect your IoT Edge device to your IoT Hub by giving it its connection string created above. To do that, edit the device configuration file under /etc/iotedge/config.yaml file and update the connection string. After the connection string is update, restart the edge device with sudo systemctl restart iotedge.

Provisioning a Cognitive Service (Language Detection IoT Edge module)

The images are directly available as IoT edge modules from the Iot Hub marketplace.

Screenshot of Language Detection Container offering in the Azure Marketplace

Here we’re using the Language Detection image as an example, however other images work the same way. To download the image, search for the image and select Get it now, this will take you to the Azure portal “Target Devices for IoT Edge Module” page. Select your subscription with your IoT Hub, select Find Device and your IoT Edge device, then click the Select and Create buttons.

Screenshot of target devices for IoT Edge modules

Screenshot for selecting device to deploy on and creating

Configuring your Cognitive Service

Now you’re almost ready to deploy the Cognitive Service to your IoT Edge device. But in order to run a container you need to get a valid API key and billing endpoints, then pass them as environment variables in the module details.

Screenshot of setting deployment modules

Go to the Azure portal and open the Cognitive Services blade. If you don’t have a Cognitive Service that matches the container, in this case a Text Analytics service, then select add and create one. Once you have a Cognitive Service get the endpoint and API key, you’ll need this to fire up the container:

Screenshot showing the retrival of the endpoint and API key

The endpoint is strictly used for billing only, no customer data ever flows that way. Copy your billing endpoint value to the “billing” environment variable and copy your API key value to the “apikey” environment variable.

Deploy the container

All required info is now filled in and you only need to complete the IoT Edge deployment. Select Next and then Submit. Verify that the deployment is happening properly by refreshing the IoT Edge device details section.

Verify that the deployment is happening properly by refreshing the IoT Edge device details section.

Screenshot of IoT Edge device details section

Trying it out

To try things out, we’ll make an HTTP call to the IoT Edge device that has the Cognitive Service container running.

For that, we’ll first need to make sure that the port 5000 of the edge device is open. If you’re using the pre-built Ubuntu with IoT Edge Azure VM as an edge device, first go to VM details, then Settings, Networking, and Outbound port rule to add an outbound security rule to open port 5000. Also copy the Public IP address of your device.

Now you should be able to query the Cognitive Service running on your IoT Edge device from any machine with a browser. Open your favorite browser and go to http://your-iot-edge-device-ip-address:5000.

Now, select Service API Description or jump directly to http://your-iot-edge-device-ip-address:5000/swagger. This will give you a detailed description of the API.

Screenshot showing a detailed description of the Language Detection Cognitive Service API

Select Try it out and then Execute, you can change the input value as you like.

Screenshot of executing the API

The result will show up further down on the page and should look something like the following image:

Screenshot of results after executing the API

Next steps

You are now up and running! You are running the Cognitive Services on your own IoT Edge device, remotely managed via your central IoT Hub. You can use this setup to manage millions of devices in a secure way.

You can play around with the various Cognitive Services already available in the Azure Marketplace and try out various scenarios. Have fun!

Announcing Azure Integration Service Environment for Logic Apps

A new way to integrate with resources in your virtual network

We strive with every service to provide experiences that significantly improve the development experience. We’re always looking for common pain points that everybody building software in the cloud deals with. And once we find those pain points, we build best-of-class software to address the need.

In critical business scenarios, you need to have the confidence that your data is flowing between all the moving parts. The core Logic Apps offering is a great, multi-faceted service for integrating between data sources and services, but sometimes it is necessary to have dedicated service to ensure that your integration processes are as performant as can be. That’s why we developed the Integration Service Environment (ISE), a fully isolated integration environment.

What is an Integration Service Environment?

An Integration Service Environment is a fully isolated and dedicated environment for all enterprise-scale integration needs. When you create a new Integration Service Environment, it is injected into your Azure virtual network, which allows you to deploy Logic Apps as a service on your VNET.

  • Direct, secure access to your virtual network resources. Enables Logic Apps to have secure, direct access to private resources, such as virtual machines, servers, and other services in your virtual network including Azure services with service endpoints and on-premises resources via an Express Route or site to site VPN.
  • Consistent, highly reliable performance. Eliminates the noisy neighbor issue, removing fear of intermittent slowdowns that can impact business critical processes with a dedicated runtime where only your Logic Apps execute in.
  • Isolated, private storage. Sensitive data subject to regulation is kept private and secure, opening new integration opportunities.
  • Predicable pricing. Provides a fixed monthly cost for Logic Apps. Each Integration Service Environment includes the free usage of 1 Standard Integration Account and 1 Enterprise connector. If your Logic Apps action execution count exceeds 50 million action executions per month, the Integration Service Environment could provide better value.

Integration Service Environments are available in every region that Logic Apps is currently available in, with the exception of the following locations:

  • West Central US
  • Brazil South
  • Canada East

Logic Apps is great for customers who require a highly reliable, private integration service for all their data and services. You can try the public preview by signing up for an Azure account. If you’re an existing customer, you can find out how to get started by visiting our documentation, “Connect to Azure virtual networks from Azure Logic Apps by using an integration service environment.”