New Application Manager brings GitOps to Google Kubernetes Engine

Kubernetes is the de facto standard for managing containerized applications, but developers and app operators often struggle with end-to-end Kubernetes lifecycle management—things like authoring, releasing and managing Kubernetes applications. 

To simplify the management of application lifecycle and configuration, today we are launching Application Manager, an application delivery solution delivered as an add-on to Google Kubernetes Engine (GKE). Now available in beta, Application Manager allows developers to easily create a dev-to-production application delivery flow, while incorporating Google’s best practices for managing release configurations. Application Manager lets you get your applications running in GKE efficiently, securely and in line with company policy, so you can succeed with your application modernization goals. 

Addressing the Kubernetes application lifecycle

The Kubernetes application lifecycle consists of three main stages: authoring, releasing and managing. Authoring includes writing the application source code and app-specific Kubernetes configuration. Releasing includes making changes to code and/or config, then safely deploying those changes to different release environments. The managing phase includes operationalizing applications at scale and in production. Currently, there are no well defined standards for these stages and users often ask us for best practices and recommendations to help them get started.

In addition, Kubernetes application configurations can be too long and complex to manage at scale. In particular, an application that is deployed across test, staging and production release environments might have duplicate configurations stored in multiple Git repositories. Any change to one config needs to be replicated to the others, creating the potential for human error. 

Application Manager embraces GitOps principles, leveraging Git repositories to enable declarative configuration management. It allows you to audit and review changes before they are deployed to environments. It also automatically scaffolds and enforces recommended Git repository structures, and allows you to perform template-free customization for configurations with Kustomize, a Kubernetes-native configuration management tool.

Application Manager runs inside your GKE cluster as a cluster add-on, and performs the following tasks: 

  • It pulls Kubernetes manifests from a Git repository (within a git branch, tag or commit) and deploys the manifests as an application in the cluster. 

  • It reports metadata about deployed applications (e.g. version, revision history, health, etc.) and visualizes the applications in Google Cloud Console.

Releasing an application with Application Manager

Now, let’s dive into more details on how to use Application Manager to release or deploy an application, from scaffolding Git repositories, defining application release environments, to deploying it in clusters. You can do all those tasks by executing simple commands in appctl, Application Manager’s command line interface. 

Here’s an example workflow of how you can release a “bookstore” app to both staging and production environments. 

First, initialize it by running 

appctl init bookstore --app-config-repo=github.com/$USER_OR_ORG/bookstore. 

This creates two remote Git repositories: 1) an application repository, for storing application configuration files in kustomize format (for easier configuration management), and 2) a deployment repository, for storing auto-generated, fully-rendered configuration files as the source of truth of what’s deployed in the cluster. 

After the Git repositories are initialized, you can add a staging environment to the bookstore app by running appctl env add staging --cluster=$MY_STAGING_CLUSTER, and do the same for prod environment. At this point, the application repository looks like this:

Here, we are using kustomize to manage environment-specific differences in the configuration. With kustomize, you can declaratively manage distinctly customized Kubernetes configurations for different environments using only Kubernetes API resource files, by patching overlays on top of the base configuration.

When you’re ready to release the application to the staging environment, simply create an application version with git tag in the application repository, and then run appctl prepare staging. This automatically generates hydrated configurations from the tagged version in the application repository, and pushes them to the staging branch of the deployment repository for an administrator to review. 

With this Google-recommended repository structure, Application Manager provides a clean separation between the easy-to-maintain kustomize configurations in the application repository, and the auto-generated deployment repository—an easy-to-review single source of truth; it also prevents these two repositories from diverging. 

Once the commits to hydrated configurations are reviewed and merged into the deployment repository, run appctl apply staging to deploy this application to the staging cluster. 

Promotion from staging to prod is as easy as appctl apply prod --from-env staging. To do rollback in case of failure, simply run appctl apply staging --from-tag=OLD_VERSION_TAG

What’s more, this appctl workflow can be automated and streamlined by executing it in scripts or pipelines. 

Application Manager for all your Kubernetes apps 

Now, with Application Manager, it’s easy to create a dev-to-production application delivery flow with a simple and declarative approach that’s recommended by Google. We are also working with our partners on the Google Cloud Marketplace to enable seamless updates of the Kubernetes applications you procure there, so you get automated updates and rollbacks of your partner applications. You can find more information here. For a detailed overview of Application Manager, please see this demo video. When you’re ready to get started, follow the steps in this tutorial.

Exploring Container Security: Run what you trust; isolate what you don’t

From vulnerabilities to cryptojacking to well, more cryptojacking, there were plenty of security events to keep container users on their toes throughout 2019. With Kubernetes being used to manage most container-based environments (and increasingly hybrid ones too), it’s no surprise that Forrester Research, in their 2020 predictions, called out the need for “securing apps and data in an increasingly hybrid cloud world.” 

On the Google Cloud container security team, we want your containers to be well protected, whether you’re running in the cloud with Google Kubernetes Engine or hybrid with Anthos, and for you to be in-the-know about container security. As we kick off 2020, here’s some advice on how to protect your Kubernetes environment, plus a breakdown of recent GKE features and resources.

Run only what you trust, from hardware to services

Many of the vulnerabilities we saw in 2019 compromised the container supply chain or escalated privileges through another overly-trusted component. It’s important that you trust what you run, and that you apply defense-in-depth principles to your containers. To help you do this, Shielded GKE Nodes is now generally available, and will be followed shortly by the general availability of Workload Identity–a way to authenticate your GKE applications to other Google Cloud services that follows best practice security principles like defense-in-depth.

Let’s take a deeper look at these features.

Shielded GKE Nodes
Shielded GKE Nodes ensures that a node running in your cluster is a verified node in a Google data center. By extending the concept of Shielded VMs to GKE nodes, Shielded GKE Nodes improves baseline GKE security in two respects:

  • Node OS provenance check: A cryptographically verifiable check to make sure the node OS is running on a virtual machine in Google data center

  • Enhanced rootkit and bootkit protection: Secure and measured boot, virtual trusted platform module (vTPM), UEFI firmware, and integrity monitoring

You can now turn on these Shielded GKE Nodes protections when creating a new cluster or upgrading an existing cluster. For more information, read the documentation.

Workload Identity
Your GKE applications probably use another service–like a data warehouse–to do their job. For example, in the vein of “running only what you trust,” when an application interacts with a data warehouse, that warehouse will require your application to be authenticated. Historically the approaches to doing this haven’t been in line with security principles—they were overly permissive, or had the potential for a large blast radius if they were compromised. 

Workload Identity helps you follow the principle of least privilege and reduce that blast radius potential by automating workload authentication through a Google-managed service account, with short-lived credentials. Learn more about Workload Identity in the beta launch blog and the documentation. We will soon be launching general availability of Workload Identity.

Stronger security for the workloads you don’t trust
But sometimes, you can’t confidently vouch for the workloads you’re running. For example, an application might use code that originated outside your organization, or it might be a  software-as-a-service (SaaS) application that ingests input from an unknown user. In the case of these untrusted workloads, a second layer of isolation between the workload and the host resources is part of following the defense-in-depth security principle. To help you do this, we’re releasing the general availability of GKE Sandbox.  

GKE Sandbox
GKE Sandbox uses the open source container runtime gVisor to run your containers with an extra layer of isolation, without requiring you to change your application or how you interact with the container. gVisor uses a user space kernel to intercept and handle syscalls, reducing the direct interaction between the container and the host, and thereby reducing the attack surface. However, as a managed service, GKE Sandbox abstracts away these internals, giving you  single-step simplicity for multiple layers of protection. Get started with GKE Sandbox. 

Up your container security knowledge

As more companies use containers and Kubernetes to modernize their applications, decision makers and business leaders need to understand how they apply to their business—and how they will help keep them secure.  

Core concepts in container security
Written specifically for readers who are new to containers and Kubernetes, Why Container Security Matters to Your Business takes you through the core concepts of container security, for example supply chain and runtime security. Whether you’re running Kubernetes yourself or through a managed service like GKE or Anthos, this white paper will help you connect the dots between how open-source software like Kubernetes responds to vulnerabilities and what that means for your organization.  

New GKE multi-tenancy best practices guide
Multi-tenancy, when one or more clusters are shared between tenants, is often implemented as a cost-saving or productivity mechanism. However, incorrectly configuring the clusters to have multiple tenants, or the corresponding compute or storage resources, can not only deny these cost-savings, but open organizations to a variety of attack vectors. We’ve just released a new guide, GKE Enterprise Multi-tenancy Best Practices, that takes you through setting up multi-tenant clusters with an eye towards reliability, security, and monitoring. Read the new guide, see the corresponding Terraform modules, and improve your multi-tenancy security.

Learn how Google approaches cloud-native security internally
Just as the industry is transitioning from an architecture based on monolithic applications to distributed “cloud-native” microservices, Google has also been on a journey from perimeter-based security to cloud-native security.

In two new whitepapers, we released details about how we did this internally, including the security principles behind cloud-native security. Learn more about BeyondProd, Google’s model for cloud-native security; and about Binary Authorization for Borg, which discusses how we ensure code provenance and use code identity.

Let 2020 be your year for container security

Security is a continuous journey. Whether you’re just getting started with GKE or are already running clusters across clouds with Anthos, stay up to date with the latest in Google’s container security features and see how to implement them in the cluster hardening guide.

Windows Server applications, welcome to Google Kubernetes Engine

The promise of Kubernetes is to make container management easy and ubiquitous. Up until recently though, the benefits of Kubernetes were limited to Linux-based applications, preventing enterprise applications running on Windows Server from taking advantage of its agility, speed of deployment and simplified management. 

Last year, the community brought Kubernetes support to Windows Server containers. Building on this, we’re thrilled to announce that you can now run Windows Server containers on Google Kubernetes Engine (GKE). 

GKE, the industry’s first Kubernetes- based container management solution for the public cloud, is top rated by analysts and widely used by customers across a variety of industries. Supporting  Windows on GKE is a part of our commitment to provide a first-class experience for hosting and modernizing Windows Server-based applications on Google Cloud. To this end, in the past six months, we added capabilities such as the ability to bring their own Windows Server licenses (BYOL), virtual displays, and managed services for SQL Server and Active Directory. Volusion and Travix are among the many thousands of customers who have chosen Google Cloud to run and modernize their Windows-based application portfolios.

Bringing Kubernetes’ goodness to Windows Server apps

By running Windows Server apps as containers on Kubernetes, you get many of the benefits that Linux applications have enjoyed for years. Running your Windows Server containers on GKE can also save you on licensing costs, as you can pack many Windows Server containers on each Windows node.

kubernetes windows server app.png
Illustration of Windows Server and Linux containers running side-by-side in the same GKE cluster

In the beta release of Windows Server container support in GKE (version 1.16.4), Windows and Linux containers can run side-by-side in the same cluster. This release also includes several other features aimed at helping you meet the security, scalability, integration and management needs of your Windows Server containers. Some highlights include:

  • Private clusters: a security and privacy feature that allows you to restrict access to a cluster’s nodes and the master from the public internet—your cluster’s nodes can only be accessed from within a trusted Google Virtual Private Cloud (VPC).

  • Node Auto Upgrades: a feature that reduces the management overhead, provides ease of use and better security by automatically upgrading GKE nodes on your behalf. Make sure you build your container images using the Docker ‘multi-arch’ feature to avoid any version mismatch issues between the node OS version and the base container image. 

  • Regional clusters: an availability and reliability feature that allows you to create a multi-master, highly-available Kubernetes cluster that spreads both the control plane and the nodes across multiple zones in the same region. This provides increased control plane uptime of 99.95% (up from 99.5%), and zero-downtime upgrades.

  • Support for Group Managed Service Accounts (gMSA): gMSA is a type of Active Directory account that provides automatic password management, simplified service principal name (SPN) management, etc. for multiple servers. gMSAs are supported by Google Cloud’s Managed Microsoft Active Directory Service for easier administration.

  • Choice of Microsoft Long-Term Servicing Channel (LTSC) or Semi-Annual Channel (SAC) servicing channels, allowing you to choose the version that best fits your support and feature requirements. 

For full details on each of these features and more, please consult the documentation

With Windows Server 2008 and 2008 R2 reaching End of Support recently, you may be exploring ways to upgrade your legacy applications. This may be an opportune time to consider containerizing your applications and deploying them in GKE. In general, good candidates for containerization include custom-built .NET applications as well as batch and web applications. For applications provided by third-party ISVs, please consult the ISV for containerized versions of the applications.  

What customers are saying

We’ve been piloting Windows Server container support in GKE for several months now with preview customers, who have been impressed by GKE’s performance, reliability and security, as well as differentiated features such as automated setup and configuration for easier cluster management. 

Helix RE creates software that makes digital models of buildings, and recently switched from setting up and running Windows Kubernetes clusters manually to using GKE. Here’s what they had to say: 

“What used to take us weeks to set up and configure, now takes a few minutes. Besides saving time, features like autoscaling, high-availability, Stackdriver Logging and Monitoring are already baked in. Windows in GKE gives us the same scale, reliability, and ease of management that we have come to expect from running Linux in GKE.” -Premkumar Masilamani, Cloud Architect, Helix RE

Making it easier with partner solutions

Modernizing your applications means more than just deploying and managing containers. That is why we are working with several partners who can help you build, integrate and deploy Windows Server containers into GKE, for a seamless CI/CD and container management experience. We’re excited to announce that the following partners have already worked to integrate their solutions with Windows on GKE.

CircleCI 
CircleCI allows teams to rapidly release code they trust by automating the build, test, and delivery process. CircleCI ‘orbs’ bundle CircleCI configuration into reusable packages. They make it easy to integrate with modern tools, eliminating the need for teams to spend time and cycles building the integrations themselves. 

“We are excited to further our partnership with Google with our latest Google Kubernetes Engine (GKE) Orb. This orb supports deployment to Windows containers running on GKE, and allows users to automate deploys in minutes directly from their CI/CD pipeline. By simplifying the process of automating deploys, teams can build confidence in their process, ship new features faster, and take advantage of cutting-edge technology without having to overhaul their existing infrastructure.”  -Tom Trahan, VP of Business Development, CircleCI

CloudBees 
CloudBees enables enterprise developer teams to accelerate software delivery with continuous integration and continuous delivery (CI/CD). The CloudBees solutions optimize delivery of high quality applications while ensuring they are secure and compliant.

“We are pleased to offer support for Windows containers on Google Cloud Platform. This announcement broadens the options for CloudBees users to now run Microsoft workloads on GCP. It’s all about speeding up software delivery time and, with CloudBees running Windows containers on GCP, our users can enjoy a fast, modernized experience, leveraging the Microsoft technologies already pervasive within their organization.”  -Francois Dechery, Chief Strategy Officer, CloudBees 

GitLab 
GitLab is a complete DevOps platform, delivered as a single application, with the goal of fundamentally changing the way Development, Security, and Ops teams collaborate.

“GitLab and Google Cloud are lowering the barrier of adoption for DevOps and Kubernetes within the Windows developer community. Within minutes, developers can create a project, provision a GKE cluster, and execute a CI/CD pipeline with Windows Runners now on GitLab.com or with GitLab Self-managed to automatically deploy Windows apps onto Kubernetes.” –Darren Eastman, Senior Product Manager, GitLab”

Checkout GitLab’s blog and video to learn more.

Get started today

We hope that you will take your Windows Server containers for a spin on GKE—to get started, you can find detailed documentation on our website. If you are new to GKE, get started by checking out the Google Kubernetes Engine page and the Coursera course on Architecting with GKE

Please don’t hesitate to reach out to us at [email protected] And please take a few minutes to give us your feedback and ideas to help us shape upcoming releases.

Unify Kubernetes and GCP resources for simpler and faster deployments

Adopting containers and Kubernetes means adopting new ways of doing things, not least of which is how you configure and maintain your resources. As a declarative system, Kubernetes allows you to express your intent for a given resource, and then creates and updates those resources using continuous reconciliation. Compared with imperative configuration approaches, Kubernetes-style declarative config helps ensure that your organization follows GitOps best practices like storing configuration in a version control system, and defining it in a YAML file.   

However, applications that run on Kubernetes often use resources that live outside of Kubernetes, for example, Cloud SQL or Cloud Storage, and those resources typically don’t use the same approach to configuration. This can cause friction between teams, and force developers into frequent “context switching”. Further, configuring and operating those applications is a multi-step process: configuring the external resources, then the Kubernetes resources, and finally making the former available to the latter. 

To help, today, we’re announcing the general availability of Config Connector, which lets you manage Google Cloud Platform (GCP) resources as Kubernetes resources, giving you a single place to configure your entire application.

Config Connector is a Kubernetes operator that makes all GCP resources behave as if they were Kubernetes resources, so you don’t have to learn and use multiple conventions and tools to manage your infrastructure. For cloud-native developers, Config Connector simplifies operations and resource management by providing a uniform and consistent way to manage all of cloud infrastructure through Kubernetes.

Automating infrastructure consistency

With its declarative approach, Kubernetes is continually reconciling the resources it manages. Resources managed by Kuberentes are continuously monitored, and “self-heal” to continuously meet the user’s desired state. However, monitoring and reconciliation of non-Kubernetes resources (a SQL server instance for example), happens as part of a separate workflow. In the most extreme cases, changes to your desired configuration, for example, changes to the number of your Cloud Spanner nodes, are not propagated to your monitoring and alerting infrastructure, causing false alarms and creating additional work for your teams. 

By bringing these resources under the purview of Kuberentes with Config Connector, you get resource reconciliation across your infrastructure, automating the work of achieving eventual consistency in your infrastructure. Instead of spinning up that SQL server instance separately and monitoring it for changes as a second workflow, you ask Config Connector to create a SQL server instance and an SQL database on that instance. Config Connector creates these resources, and now that they’re part of your declarative approach, the SQL server instance is effectively self-healing, just like the rest of your Kubernetes deployment. 

Using Kubernetes’ resource model relieves you from having to explicitly order resources in your deployment scripts. Just like for pods, deployments, or other native Kubernetes resources, you no longer have to explicitly wait for the SQL instance to be completed before starting to provision an SQL database on that instance, as illustrated in the YAML manifests below.

Additionally, by defining GCP resources as Kubernetes objects, you now get to leverage familiar Kubernetes features with these resources, such as Kubernetes Labels and Selectors. For example, here  we used cost-center as a label on the resources. You can now filter by this label using kubectl get. Furthermore, you can apply your organization’s governance policy using admission controllers, such as Anthos Policy Controller. For example, you can enforce that the cost-center label should exist on all resources in the cluster and only have an allowed range of values:

Faster development with simplified operations

For Etsy, Kubernetes was instrumental in helping them to move to the cloud, but the complexity of their applications meant they were managing resources in multiple places, slowing down their deployments.

“At Etsy, we run complex Kubernetes applications that combine custom code and cloud resources across many environments. Config Connector will allow Etsy to move from having two distinct, disconnected CI/CD pipelines to a single pipeline for both application code and the infrastructure it requires. Config Connector will simplify our delivery and enable end-to-end testing of cloud infrastructure changes, which we expect will result in faster deployment and lower friction usage of cloud infrastructure” – Gregg Donovan, Senior Staff Software Engineer, Etsy. 

Getting started with Config Connector

Today, Config Connector can be used to manage more than 60+ GCP services, including Bigtable, BigQuery, IAM Policies, Service Account and Service Account Keys, Pub/Sub, Redis, Spanner, Cloud SQL, Cloud Storage, Compute Engine, Networking and Cloud Load Balancer. 

Config Connector can be installed standalone on any Kubernetes cluster, and is also integrated into Anthos Config Management, for managing hybrid and multi-cloud environments. Get started with Config Connector today to simplify configuration management across GKE and GCP.

Exploring container security: Announcing the CIS Google Kubernetes Engine Benchmark

If you’re serious about the security of your Kubernetes operating environment, you need to build on a strong foundation. The Center for Internet Security’s (CIS) Kubernetes Benchmark give you just that: a set of Kubernetes security best practices that will help you build an operating environment that meets the approval of both regulators and customers. 

The CIS Kubernetes Benchmark v1.5.0 was recently released, covering environments up to Kubernetes v1.15. Written as a series of recommendations rather than as a must-do checklist, the Benchmarks follows the upstream version of Kubernetes. But for users running managed distributions such as our own Google Kubernetes Engine (GKE), not all of its recommendations are applicable. To help, we’ve released in conjunction with CIS, a new CIS Google Kubernetes Engine (GKE) Benchmark, available under the CIS Kubernetes Benchmark, which takes the guesswork out of figuring out which CIS Benchmark recommendations you need to implement, and which ones Google Cloud handles as part of the GKE shared responsibility model.

Read on to find out what’s new in the v1.5.0 CIS Kubernetes Benchmark, how to use the CIS GKE Benchmark, and how you can test if you’re following recommended best practices.

Exploring the CIS Kubernetes Benchmark v1.5.0

The CIS Kubernetes Benchmark v1.5.0 was published in mid October, and has a significantly different structure than the previous version. Whereas the previous version split up master and worker node configurations at a high level, the new version separates controls by the components to which they apply: control plane components, etcd, control plane configuration, worker nodes, and policies. This should help make it easier for you to apply the guidance to a particular distribution, as you may not be able to control some components, nor is it your responsibility.

In terms of specific controls, you’ll see additional recommendations for: 

  • Secret management. New recommendations include Minimize access to secrets (5.1.2), Prefer using secrets as files over secrets as environment variables (5.4.1), and Consider external secret storage (5.4.2).

  • Audit logging. In addition to an existing recommendation on how to ensure audit logging is configured properly with the control plane’s audit log flags, there are new recommendations to Ensure that a minimal audit policy is created (3.2.1), and Ensure that the audit policy covers key security concerns (3.2.2).

  • Preventing unnecessary access, by locking down permissions in Kubernetes following the principle of least privilege. Specifically, you should Minimize wildcard use in Roles and ClusterRoles (5.1.3).

Introducing the new CIS GKE Benchmark

What does this mean if you’re using a managed distribution like GKE? As we mentioned earlier, the CIS Kubernetes Benchmark is written for the open-source Kubernetes distribution. And while it’s intended to be as universally applicable as possible, it doesn’t fully apply to hosted distributions like GKE.

The new CIS GKE Benchmark is a child of the CIS Kubernetes Benchmark specifically designed for the GKE distribution. This is the first distribution-specific CIS Benchmark to draw from the existing benchmark, but removing items that can’t be configured or managed by the user. The CIS GKE Benchmark also includes additional controls that are Google Cloud-specific, and that we recommend you apply to your clusters, for example, as defined in the GKE hardening guide. Altogether, it means that you have a single set of controls for security best practice on GKE.

There are two kinds of recommendations in the GKE CIS Benchmark. Level 1 recommendations are meant to be widely applicable—you should really be following these, for example enabling Stackdriver Kubernetes Logging and Monitoring. Level 2 recommendations, meanwhile, result in a more stringent security environment, but are not necessarily applicable to all cases. These recommendations should be implemented with more care to avoid potential conflicts in more complicated environments. For example, Level 2 recommendations may be more relevant to multi-tenant workloads than single-tenant, like using GKE Sandbox to run untrusted workloads. 

The CIS GKE Benchmark recommendations are listed as “Scored” when they can be easily tested using an automated method (like an API call or the gcloud CLI), and the setting has a value that can be definitively evaluated, for example, ensuring node auto-upgrade is enabled. Recommendations are listed as “Not Scored” when a setting cannot be easily assessed using automation or the exact implementation is specific to your workload—for example, using firewall rules to restrict ingress and egress traffic to your nodes—or they use a beta feature that you might not want to use in production.

If you want to suggest a new recommendation or a change to an existing one, please contribute directly to the CIS Benchmark in the CIS Workbench community.

Applying and testing the CIS Benchmarks

There are actually several CIS Benchmarks that are relevant to GKE, and there are tools available to help you test whether you’re following their recommendations. For the CIS Kubernetes Benchmark, you can use a tool like kube-bench to test your existing configuration; for the CIS GKE Benchmark, there’s Security Health Analytics, a security product that integrates into Security Command Center and that has built-in checks for several CIS GCP and GKE Benchmark items. By enabling Security Health Analytics, you’ll be able to discover, review, and remediate any cluster configurations you have that aren’t up to par with best practices from the CIS Benchmarks in the Security Command Center vulnerabilities dashboard.

Security Health Analytics scan results for CIS Benchmarks.png
Security Health Analytics scan results for CIS Benchmarks

Documenting GKE control plane configurations

The new CIS GKE Benchmark should help make it easier for you to implement and adhere to Kubernetes security best practices. And for components that they don’t cover, we’ve documented where the GKE control plane implements the new Kubernetes CIS Benchmark, where we are working to improve our posture, and the existing mitigating controls we have in place. We hope this helps you make an informed decision on what controls to put in place yourself, and better understand your existing threat model.

Check out the new CIS GKE Benchmark, the updated CIS Kubernetes Benchmark, and understand how GKE performs according to the CIS Kubernetes Benchmark. If you’re already using the GKE hardening guide, we’ve added references to the corresponding CIS Benchmark recommendations so you can easily demonstrate that your hardened clusters meets your requirements.

The CIS GKE Benchmark were developed in concert with Control Plane and the Center for Internet Security (CIS) Kubernetes community.

How Google Cloud helped Phoenix Labs meet demand spikes with ease for its hit multiplayer game Dauntless

In the role-playing video game Dauntless, players work in groups to battle monsters and protect the city-island of Ramsgate. Commitment reaps big rewards: with every beast slayed, you earn new weapons and armor made of the same materials as the Behemoth you took down, strengthening your arsenal for the next battle. 

And when creating Dauntless, game studio Phoenix Labs channeled these same values of resourcefulness, teamwork, and persistence. But instead of using war pikes and swords, it wielded the power of the cloud to achieve its goals.  

Preparing for unknown battles with containers and the cloud

For the gaming industry, launches bring unique technological challenges. It’s impossible to predict if a game will go viral, and developers like Phoenix Labs need to plan for a number of scenarios without knowing exactly how many players will show up and how much server capacity will ultimately be needed. In addition, since Dauntless was the first game in the industry to launch cross-platform—available on PlayStation 4, Xbox One, and PCs—it would be critical for all the underlying cloud-based services to work together flawlessly and provide an uninterrupted, real-time and consistent experience for players around the globe.

As part of staying agile to meet player needs, Phoenix Labs runs all its game servers in containers on Google Cloud Platform (GCP). The studio has a custom Google Kubernetes Engine (GKE) cluster in each region where Dauntless is available, across five continents (North America, Australia, Europe and Asia). When a player loads the game, Dauntless matches him or her with up to three other players, forming a virtual team that is taken to a neighboring island to hunt a Behemoth monster together. Each “group hunt” runs on an ephemeral pod on GKE, lasting for about 15 minutes before the players complete their assignment and return to Ramsgate to polish their weapons and prepare for the next battle. 

“Containerizing servers isn’t very common in the gaming industry, especially for larger games,” said Simon Beaumont, VP Technology at Phoenix Labs. “Google Cloud spearheaded this effort with their leadership and unique technology expertise, and their platform gave us the flexibility to use Kubernetes-as-a-service in production.”

dauntless-screenshot_stormchasers_login-fullsize.jpg

Addressing player and customer needs at launch and beyond

When Dauntless launched out of beta earlier this year, the required amount of server capacity turned out to be a lot. Within the first week, player count quickly climbed to 4 million—rapid growth that was no small feat to accommodate.

Continuously addressing Reddit and Twitter feedback from players, Phoenix Labs’ lean team worked side by side with Google Cloud Professional Services to execute over 1,700 deployments to its production platform during the week of the launch alone. 

“Google Cloud’s laser focus on customers reaches a level I’ve never seen before,” said Jesse Houston, CEO and co-founder at Phoenix Labs. “They care just as much about our experience as a GCP customer as they do about our players. Without their ‘let’s go’ attitude, Dauntless would have been a giant game over.”

dauntless-artwork_switch_launch_malkarion_box_art-fullsize.jpg

“Behemoth” growth, one platform at a time 

Now that Dauntless has surpassed 16 million unique players and launched on Nintendo Switch, Phoenix Labs is preparing to expand to new regions such as Russia and Poland (they recently launched in Japan) and take advantage of other capabilities across Google. For example, by leveraging Google Ads and YouTube as part of its digital strategy for Dauntless, 5 million new gamers were onboarded in the first week of launch; using YouTube Masthead ads also increased exposure to its audience. Phoenix Labs has migrated to Google Cloud’s data warehouse BigQuery for its ease of use and speed, returning queries in seconds based on trillions of rows of data. They’re even beginning to use the Google Sheets data connector for BigQuery to simplify reporting and ensure every decision is data informed. 

At Google Cloud, we’re undaunted by behemoth monsters—and the task of making our platform a great place to launch and run your multiplayer game. Learn more about how game developers of all sizes work with Google Cloud to take their games to the next level here.

Exploring container security: Navigate the security seas with ease in GKE v1.15

Your container fleet, like a flotilla, needs ongoing maintenance and attention to stay afloat—and stay secure. In the olden days of seafaring, you grounded your ship at high tide and turned it on its side to clean and repair the hull, essentially taking it “offline.” We know that isn’t practical for your container environment however, as uptime is as important as security for most applications. 

Here on the Google Kubernetes Engine (GKE) team, we’re always hard at work behind the scenes to provide you with the latest security patches and features, so you can keep your fleet safe while retaining control and anticipating disruptions.

As GKE moved from v1.12 to v1.15 over the past year, here’s an overview of what security changes we’ve made to the platform, to improve security behind the scenes, and with stronger defaults, as well as advice we added to the GKE hardening guide.

Behind-the-scenes hardening in GKE

A lot of our security recommendations come down to a simple principle: implement and expose fewer items in your infrastructure, so there’s less for you to secure, maintain, and patch. In GKE, this means paring down controls to only what your application actually needs and removing older implementations or defaults. Let’s take a deeper look at the changes we made this year.

Distroless images

Behind the scenes, we’re continually hardening and improving GKE. A major undertaking in the past several months has been rebasing GKE master and daemonset containers on top of distroless base images. Distroless images are limited to only the application and its runtime dependencies—they’re not a full Linux distribution, so there are no shells or package managers. And because these images are smaller, they’re faster to load, and have a smaller attack surface. By moving almost all Kubernetes components to distroless images in Kubernetes 1.15 and 1.16, this helps to reduce the signal-to-noise ratio in vulnerability scanning, and makes it simpler to maintain Kubernetes components. By the way, you should also consider moving your container application images to distroless images!

Locking down system:unauthenticated access to clusters

Kubernetes authentication allows certain cluster roles to have access to cluster information by default, for example, to gather metrics about cluster performance. This specifically allows unauthenticated users (who could be from anywhere on the public internet!) to read some unintended information if they gain access to the cluster API server. We worked in open-source to change this in Kubernetes 1.14, and introduced a new discovery role system:public-info-viewer explicitly meant for unauthenticated users. We also removed system:unauthenticated access to other API server information. 

Ongoing patching and vulnerability response

Our security experts are part of the Kubernetes Product Security Committee, and help manage, develop patches for, and address newly discovered Kubernetes vulnerabilities. On GKE, in addition to Kubernetes vulnerabilities, we handle other security patches—in the past year, these included critical patches to the Linux kernel, runc, and the Go programming language—and when appropriate, publishing a security bulletin detailing the changes.

Better defaults in GKE

Among the more visible changes, we’ve also changed the defaults for new clusters in GKE to more secure options, to allow newer clusters to more easily adopt these best practices. In the past several releases, this has included enabling node auto-upgrade by default, removing the Kubernetes dashboard add-on, removing basic authentication and client certs, and removing access to legacy node metadata endpoints. These changes apply to any new GKE clusters you create, and you can still opt to use another option if you prefer.

new clusters in GKE.png
Defaults for new clusters in GKE have been improving over releases in the past several years, to improve security

Enabling node auto-upgrade

Keeping the version of Kubernetes up-to-date is one of the simplest things you can do to improve your security. According to the shared responsibility model, we patch and upgrade GKE masters for you, but upgrading the nodes remains your responsibility. Node auto-upgrade automatically provides security patches, bug fixes and other upgrades to your node pools, and ensures alignment with your master version to avoid unsupported version skew. As of November, node auto-upgrade is enabled by default for new clusters. Nothing has changed for pre-existing clusters though, so please consider enabling node auto-upgrade manually or upgrading yourself regularly and watching the Security Bulletins for information on recommended security patches. With release channels, you can subscribe your cluster to a channel that meets your business needs, and infrastructure requirement. Release channels take care of both the masters and nodes, and ensures your cluster is up to date with the latest patch version available in the chosen channel.

Locking down the Kubernetes Dashboard

The open-source Kubernetes web UI (Dashboard) is an add-on which provides a web-based interface to interact with your Kubernetes deployment, including information on the state of your clusters and errors that may have occurred. Unfortunately, it is sometimes left publicly accessible or granted sensitive credentials, making it susceptible to attack. Since the Google Cloud Console provides much of the same functionality for GKE, we’ve further locked down the Dashboard to better protect your clusters. For new clusters created with:

  • GKE v1.7, the Dashboard does not have admin access by default.
  • GKE v1.10, the Dashboard is disabled by default.
  • GKE v1.15 and higher, the Kubernetes web UI add-on Dashboard is no longer available in new GKE clusters.

You can still run the dashboard if you wish, following the Kubernetes web UI documentation to install it yourself.

Improving authentication

There are several methods of authenticating to the Kubernetes API server. In GKE, the supported methods are OAuth tokens, x509 client certificates, and static passwords (basic authentication). GKE manages authentication via gcloud for you using the OAuth token method, setting up the Kubernetes configuration, getting an access token, and keeping it up to date. Enabling additional authentication methods, unless your application is using them, presents a wider surface of attack. Starting in GKE v1.12, we disabled basic authentication and legacy client certificates by default for new clusters, so that these credentials are not created for your cluster. For older clusters, make sure to remove the static password if you aren’t using it.

Disabling metadata server endpoints

Some attacks against Kubernetes use access to the VM’s metadata server to extract the node’s credentials; this can be particularly true for legacy metadata server endpoints. For new clusters starting with GKE v1.12, we disabled these endpoints by default. Note that Compute Engine is in the process of turning down these legacy endpoints. If you haven’t already, you may use the check-legacy-endpoint-access tool to help discover if your apps should be updated and migrated to the GA v1 metadata endpoints, which include an added layer of security that can help customers protect against vulnerabilities .

Our latest and greatest hardening guide

Even though we keep making more and more of our security recommendations the default in GKE, they primarily apply to new clusters. This means that even if you’ve been continuously updating an older cluster, you’re not necessarily benefitting from these best practices. To lock down your workloads as best as possible, make sure to follow the GKE hardening guide. We’ve recently updated this with the latest features, and made it more practical, with recommendations for new clusters, as well as recommendations for GKE On-Prem

It’s worth highlighting some of the newer recommendations in the hardening guide for Workload Identity and Shielded GKE Nodes.

Workload Identity

Workload Identity is a new way to manage credentials for workloads you run in Kubernetes, automating best practices for workload authentication, and removing the need for service account private keys or node credential workarounds. We recommend you use Workload Identity over other options, as it replaces the need to use metadata concealment, and protects sensitive node metadata.

Shielded GKE Nodes

Shielded GKE Nodes is built upon Shielded VMs and further protects node metadata, providing strong, verifiable node identity and integrity for all the GKE nodes in your cluster. If you’re not using third-party kernel modules, we also recommend you enable secure boot to verify the validity of components running on your nodes and get enhanced rootkit and bootkit protections.

The most secure GKE yet

We’ve been working hard on hardening, updating defaults, and delivering new security features to help protect your GKE environment. For the latest and greatest guidance on how to bolster the security of your clusters, we’re always updating the GKE hardening guide.

Your guide to Kubernetes best practices

Kubernetes made a splash when it brought containerized app management to the world a few years back. Now, many of us are using it in production to deploy and manage apps at scale. Along the way, we’ve gathered tips and best practices on using Kubernetes and Google Kubernetes Engine (GKE) to your best advantage. Here are some of the most popular posts on our site about deploying and using Kubernetes. 

  1. Use Kubernetes Namespaces for easier resource management. Simple tasks get more complicated as you build services on Kubernetes. Using Namespaces, a sort of virtual cluster, can help with organization, security, and performance. This post shares tips on which Namespaces to use (and not to use), how to set them up, view them, and create resources within a Namespace. You’ll also see how to manage Namespaces easily and let them communicate.
  2. Use readiness and liveness probes for health checks. Managing large, distributed systems can be complicated, especially when something goes wrong. Kubernetes health checks are an easy way to make sure app instances are working. Creating custom health checks lets you tailor them to your environment. This blog post walks you through how and when to use readiness and liveness probes.
  3. Keep control of your deployment with requests and limits. There’s a lot to love about the scalability of Kubernetes. However, you do still have to keep an eye on resources to make sure containers have enough to actually run. It’s easy for teams to spin up more replicas than they need or make a configuration change that affects CPU and memory. Learn more in this post about using requests and limits to stay firmly in charge of your Kubernetes resources.  
  4. Discover services running outside the cluster. There are probably services living outside your Kubernetes cluster that you’ll want to access regularly. And there are a few different ways to connect to these services, like external service endpoints or ConfigMaps. Those have some downsides, though, so in this blog post you’ll learn how best to use the built-in service discovery mechanisms for external services, just like you do for internal services.
  5. Decide whether to run databases on Kubernetes. Speaking of external services: there are a lot of considerations when you’re thinking about running databases on Kubernetes. It can make life easier to use the same tools for databases and apps, and get the same benefits of repeatability and rapid spin-up. This post explains which databases are best run on Kubernetes, and how to get started when you decide to deploy.
  6. Understand Kubernetes termination practices. All good things have to come to an end, even Kubernetes containers. The key to Kubernetes terminations, though, is that your application can handle them gracefully. This post walks through the steps of Kubernetes terminations and what you need to know to avoid any excessive downtime.

For even more on using GKE, check out our latest Containers and Kubernetes blog posts. Want a refresher? Get certified with the one month promo for the Architecting with GKE, Coursera specialization at http://goo.gle/k8s5. Offer valid until 01/31/2020, while supplies last.

Kubernetes Podcast in 2019: year-end recap

At the Kubernetes Podcast, we bring you a weekly round-up of cloud-native news, accompanied by an in-depth interview with a community member. As we publish our 50th and final episode for 2019, it’s time to look back on some of our favorite moments from the year.

This year, we stepped out of the studio. We hosted a live recording at Google Cloud Next in San Francisco, as well as listener meetups at KubeCon EU in Barcelona and KubeCon NA in San Diego. There’s nothing more gratifying to us than having someone come up to you at a conference and tell you that they enjoy your show, or even ask after the family of foxes you mentioned were living in your backyard.  Our heartfelt thanks to everyone who came by, or stopped us in the hallways.

Open source reaches all corners of the world, and we’ve been amazed at all the listeners who have joined the podcast community from around the globe. Every now and then we send out stickers by post: they’ve gone to dozens of countries on almost every continent. (We’re still waiting for a listener to reach out from Antarctica!) Thank you to our wonderful audience, who has let us know how much we’re helping them connect with and learn about the Kubernetes community. We are truly grateful to you for listening.

kubernetes podccast headphones.png
Serious dedication tweeted to us from one podcast listener

We would like to share some of our most popular episodes from 2019:

  • Kubernetes Failure Stories, with Henning Jacobs (episode 38): To have the best chance for success, it helps to learn from failures. After experiencing some of his own, Henning was inspired to start collecting the failure stories of others.

  • Ingress, with Tim Hockin (episode 41): A proud parent of the Kubernetes project, Tim is a 15-year Googler and designer of large parts of the Kubernetes networking and storage stack—an obvious extension of his years of work on the Linux kernel.

  • Live at Google Cloud Next, with Eric Brewer (episode 49): In our first live show, Eric joined us to talk about his history in building infrastructure for search, the CAP theorem, and announcing Kubernetes to the world.

  • KeyBank, with Gabe Jaynes (episode 51): Banks aren’t always terminals and mainframes. The smart ones, like KeyBank, are Kubernetes and mainframes! Gabe’s team worked with Google Cloud as a design partner.

  • Istio 1.2, with Louis Ryan (episode 58): Louis has been working on API infrastructure and service mesh at Google for 10 years. He talked about the history of Istio, its design decisions, and its future goals.

  • Attacking and Defending Kubernetes, with Ian Coldwater (episode 65): Learn how to protect your container infrastructure from Ian: they are paid to attack it, and a popular conference speaker on the topic.

  • CRDs, API Machinery and Extensibility, with Daniel Smith (episode 73): Another long-time Kubernetes contributor, Daniel joined the project before it was open-sourced, and leads both the open-source and Google teams who build CRDs and other extensibility features.

  • Kubernetes 1.17, with Guinevere Saenger (episode 83): Our penultimate episode for the year is an interview with the Release Team lead for the new Kubernetes 1.17. Learn how Guinevere went from being a concert pianist to a software engineer and leading a team of over 30 to produce the final Kubernetes release of 2019.

If you have a break over the holidays, why not subscribe and enjoy one episode or many?  For those who can’t listen, or prefer not to, we also offer a transcript of each episode on its page at kubernetespodcast.com.

We’re going to take a two-week break over the holiday period, but we’ll be back in your ears in January!

kubernetes podcast.png

Exploring container security: Performing forensics on your GKE environment

Running workloads in containers can be much easier to manage and more flexible for developers than running them in VMs, but what happens if a container gets attacked? It can be bad news. We recently published some guidance for how to collect and analyze forensic data in Google Kubernetes Engine (GKE), and how best to investigate and respond to an incident.

When performing forensics on your workload, you need to perform a structured investigation, and keep a documented chain of evidence to know exactly what happened in your environment, and who was responsible for it. In that respect, performing forensics and mounting an incident response is the same for containers as it is for other environments—have an incident response plan, collect data ahead of time, and know when to call in the experts. What’s different with containers is (1) what data you can collect and how, and (2) how to react.

Get planning

Even before an incident occurs, make the time to put together an incident response plan. This typically includes: who to contact, what actions to take, how to start collecting information and how to communicate what’s going on, both internally and externally. Incident response plans are critical, so if panic does start to set in you’ll know what steps to follow.

Other information that’s helpful to decide ahead of time, and list in your response plan, is external contacts or resources, and how your response changes based on severity of the incident. Severity levels and planned actions should be business-specific and dependent on your risks—for example, a data leak is likely more severe than an abuse of resources, and you may have different parties that need to be involved. This way, you’re not hunting around for—or debating—this information during an incident. If you don’t get the severity levels right the first time, in terms of categorization, speed of response, speed of communications, or something else, surface this in an incident post-mortem, and adjust as needed.

Collect logs now, you’ll be thankful later

To put yourself in the best possible position for responding to an incident, you want data! Artifacts such as logs, disks, and live recorded info are how you’re going to figure out what’s happening in your environment. Most of these you can get in the heat of the moment, but you need to set up logs ahead of time.

There are several kinds of logs in a containerized environment that you can set up to capture: Cloud Audit Logs for GKE and Compute Engine nodes, including Kubernetes audit logs; OS specific logs; and your own application logs.

containerized environment logs.png
There are several kinds of logs you can collect from a containerized environment.

You should begin by collecting logs as soon as you deploy an app or set up a GCP project, to ensure they’re available if you need them for analysis in case of an incident. For more guidance on which logs to collect for further analysis for your containers, see our new solution Security controls and forensic analysis for GKE apps.

Stay cool

What should you do if you suspect an incident in your environment? Don’t panic! You may be tempted to terminate your pods, or restart the nodes, but try to resist the urge. Sure, that will stop the problem at hand, but it also alerts a potential attacker that you know that they’re there, depriving you of the ability to do forensics!

So, what should you do? Put your incident response plan into action. Of course, what this means depends on the severity of the incident, and your certainty that you have correctly identified the issue. Your first step might be to ask your security team to further investigate the incident. The next step might be to snapshot the disk of the node that was running the container. You might then move other workloads off and quarantine the node to run additional analysis. For more ideas, check out the new documentation on mitigation options for container incidents next time you’re in such a situation (hopefully never!).

To learn more about container forensics and incident response, check out our talk from KubeCon EU 2019, Container forensics: what to do when your cluster is a cluster (slides). But as always, the most important thing you can do is prevention and preparation—be sure to follow the GKE hardening guide, and set up those logs for later!

8 production-ready features you’ll find in Cloud Run fully managed

Since we launched Cloud Run at Google Cloud Next in April, developers have discovered that “serverless” and “containers” run well together. With Cloud Run, not only do you benefit from fully managed infrastructure, up and down auto-scaling, and pay-as-you-go pricing, but you’re also able to package your workload however you like, inside a stateless container listening for incoming requests, with any language, runtime, or library of your choice. And you get all this without compromising portability, thanks to its Knative open-source underpinnings. 

Many Google Cloud customers already use Cloud Run in production, for example, to deploy public websites or APIs, or as a way to perform fast and lightweight data transformations or background operations. 

“Cloud Run promises to dramatically reduce the operational complexity of deploying containerized software. The ability to put an automatically scaling service in production with one command is very attractive.” – Jamie Talbot, Principal Engineer at Mailchimp.

Cloud Run recently became generally available, as both a fully managed platform or on Anthos, and offers a bunch of new features. What are those new capabilities? Today, let’s take a look at what’s new in the fully managed Cloud Run platform.

1. Service level agreement
With general availability, Cloud Run now comes with a Service Level Agreement (SLA). In addition, it now offers data location commitments that allow you to store customer data in a specific region/multi-region. 

2. Available in 9 GCP regions
In addition to South Carolina, Iowa, Tokyo, and Belgium, in the coming weeks, you’ll also be able to deploy containers to Cloud Run in North Virginia, Oregon, Netherlands, Finland, and Taiwan, for a total of nine cloud regions.

Cloud run regions.png

3. Max instances
Auto-scaling can be magic, but there are times when you want to limit the maximum number of instances of your Cloud Run services, for example, to limit costs. Or imagine a backend service like a database is limited to a certain number of connections—you might want to limit the number of instances that can connect to that service. With the max instance feature, you can now set such a limit.

Use the Cloud Console or Cloud SDK to set this limit:

gcloud run services update SERVICE-NAME --max-instances 42

4. More secure: HTTPS only
All fully managed Cloud Run services receive a stable and secure URL. Cloud Run now only accepts secure HTTPS connection and redirects any HTTP connection to the HTTPS endpoint. 

But having an HTTPS endpoint does not mean that your service is publicly accessible—you are in control and can opt into allowing public access to your service. Alternatively, you can require authentication by leveraging the “Cloud Run Invoker” IAM role.

5. Unary gRPC protocol support
Cloud Run now lets you deploy and run unary gRPC services (i.e., non-streaming gRPC), allowing your microservices to leverage this RPC framework. 

To learn more, read Peter Malinas’ tutorial on Serverless gRPC with Cloud Run using Go, as well as Ahmet Alp Balkan’s article on gRPC authentication on Cloud Run.

6. New metrics to track your instances
Out of the box, Cloud Run integrates with Stackdriver Monitoring. From within the Google Cloud Console, the Cloud Run page now includes a new “Metrics” tab that shows charts of key performance indicators for your Cloud Run service: requests per second, request latency, used instance time, CPU and memory.

A new built-in Stackdriver metric called container/billable_instance_time gives you insights into the number of container instances for a service, with the billable time aggregated from all container instances.

billable container instance time.jpg

7. Labels
Like the bibs that identify the runners in a race, GCP labels can help you easily identify a set of services, break down costs, or distinguish different environments.

You can set labels from the Cloud Run service list page in Cloud Console, or update labels with this command and flag:

gcloud run services update SERVICE-NAME --update-labels KEY=VALUE

8. Terraform support
Finally, if you practice Infrastructure as Code, you’ll be glad to know that Terraform now  supports Cloud Run, allowing you to provision Cloud Run services from a Terraform configuration. 

Ready, set, go!
The baton is now in your hands. To start deploying your container images to Cloud Run, head over to our quickstart guides on building and deploying your images. With the always free tier and the $300 credit for new GCP accounts, you’re ready to take Cloud Run for a spin. To learn more, there’s the documentation of course, as well as the numerous samples with different language runtimes (don’t miss the “Run on Google Cloud” button to automatically deploy your code). In addition, be sure to check out the community-contributed resources on the Awesome Cloud Run github project. We’re looking forward to seeing what you build and deploy!

Modernize your apps with Migrate for Anthos

In a perfect cloud world, you would host all your applications in containers running on Kubernetes and Istio, benefitting from the portability and improved resource utilization of containers, plus a robust orchestration platform with advanced application management, networking, and security functionality. This is easy to do if you’re developing a new application, but it can be hard for existing applications to take advantage of those capabilities.

Many of the applications that you may want to move to the cloud have been around a long time, and you may not have the application-specific knowledge that would be required to rewrite them to be more cloud-native—or it would be incredibly time-consuming to do so. Another option is to lift-and-shift to a virtual machine (VM) hosting platform like Compute Engine, but that means you still need to maintain the VMs. Even if you’re not able to fully modernize an existing app, it would still be great to get some of the benefits of containers and Kubernetes.

What is Migrate for Anthos?

Enter Migrate for Anthos, a fast and easy way to modernize your existing applications with a service that encapsulates them in a container. Moving your physical servers or existing VMs into Kubernetes containers gives you crucial portability and resource utilization benefits without having to rewrite the underlying application. Since Migrate for Anthos is built for Google Kubernetes Engine (GKE), you also automatically capture the scaling and flexibility benefits of a managed Kubernetes environment in the cloud. Migrate for Anthos recently became generally available.

Converting an application with Migrate for Anthos happens in two phases. First, it creates a generic container wrapper around your application that makes it seem like it is still running in a full VM environment. Then, you launch Migrate for Anthos software on your Kubernetes cluster that runs the containerized application. You can find more details about this in the documentation and in our blog post: Migrating from Compute Engine to Kubernetes Engine with Migrate for Anthos.

As the name suggests, Migrate for Anthos works with Anthos GKE. However, you can also use Migrate for Anthos with only GKE—all it  requires is your application and a GKE cluster running the Migrate for Anthos software. 

Getting started with Migrate for Anthos

Migrate for Anthos works with a variety of workloads, but not all. It’s particularly adept at migrating legacy applications, stand-alone applications, and monolithic applications. As you start start the modernization process, here are some questions to ask to determine whether to use Migrate for Anthos with your applications:

1. Should this app be in the cloud?
By its nature, the cloud may not be able to support some characteristics of your on-prem environment, such as geography and legal compliance. The best way to find out whether the cloud will work for each of your applications is to plan out a full migration. That will allow you to identify groups of applications that can benefit from cloud offerings such as a global network and ease of resizing resources. After that, try out a proof of concept by testing the apps in the cloud to see if it fits your business needs.

2. Should this app be in Kubernetes?
Containerizing an application simplifies workload administration, improves scalability (both up and down), and increases host utilization. Kubernetes orchestrates the containers and GKE handles node upgrades, while add-ons like Istio let you manage network and security policies independently of your application.

With those advantages it’s easy to think that containers are always the right way to go, but there are some cases where it may make sense to stick with VMs. Strict hardware requirements, specialized kernel modules, and license constraints may be harder to run with containers, negating their advantages.

3. Should this app migration use Migrate for Anthos?
Migrating your apps or workloads to the cloud isn’t just about shifting where the compute resources run; it’s also an opportunity to modernize them with containers. Using Migrate for Anthos (or Migrate for Compute Engine) gives you the ability to get your workloads in the cloud quickly, with minimal upfront downtime that’s easy to plan for. 

However, even if you use the Migrate for Anthos wrapper, your application is still the same application. The benefits of the modern platform may not outweigh a legacy application and a rewrite may be the only way to meet your business needs. There are also some specific services from your VM that may not work with Migrate for Anthos, for example licensing requirements.

Migrate for Anthos can also be the first step on a larger migration effort. Once you’ve migrated the application to GKE, you can gradually break up a monolithic app into microservices by manually rewriting parts. Spreading out the migration effort gets you in the cloud sooner, giving you more time to modernize.

Next steps

A successful modernization starts with creating a full migration plan, testing the workloads, and monitoring them. You can experience the benefits of modernization with Migrate for Anthos by picking a small workload and trying it out for yourself!

As you test different workloads for your migration, be sure to reference the documentation. And keep an eye out for an upcoming blog series on the migration process. Our first blog steps through how to modernize a Compute Engine instance and host it on GKE.