Got microservices? Service mesh management might not be enough
A lot of enterprises are evolving their monolithic applications into microservices architectures. In this pattern, applications are composed of fine-grained services that communicate via APIs. Microservices promise, faster development, innovation, cloud scaling, better infrastructure optimization—and happier developers. No wonder this architecture gets so much attention.
But that doesn’t mean implementing a microservices strategy is easy (If you’re in the middle of this process, you know it’s complicated). You’ve got lots of different departments working on lots of different projects, and all of them are at different places. How do companies get to a point where they can reap the benefits of microservices?
In this post, we’ll explain why a successful microservices approach requires dedicated infrastructure for building and managing those services, how sharing access to services happens through APIs, and why APIs shared outside the domain of ownership need to be managed. We’ll also walk through how Istio, an open-source project that can help with microservices management, helps to control the potential chaos as microservices adoption spreads through an organization.
APIs as communication contract
Cooperating services intercommunicate via APIs. Simply put, APIs are how software talks to software. The API defines the communication contract between a service provider and a service consumer. Whether you think of the provider and the consumer as “services,” or as “applications” is immaterial; the API defines how they make requests and receive responses.
Sending and receiving JSON RESTfully over HTTP/1 seems to be the most common technical basis for an API, but APIs can also employ HTTP/2 or TCP, and may use gRPC, GraphQL, jsonRPC or other data and message representations. They’re all just APIs, and they may be more or less formally specified.
When an application is “decomposed” into a set of services that intercommunicate via APIs, a new set of problems arises: how to manage all of those interdependent services and the communication between them. As the set of services or the number of instances grows, the service management problem grows, too.
For example, one of the very first things to consider when building with microservices is the ability to secure traffic between the microservices. A common method for securing this communication is mutual transport layer security (mTLS), which enables both peers in an exchange to authenticate one another. Once authentication happens, this can be used to perform authorization decisions at the service that receives a request, based on the identity of the caller that’s been asserted with the TLS certificate. This important function is pretty basic and simple to do when you have two services, but it gets more and more difficult when the number of services grows. One might attempt to mitigate this with client libraries.
But then there’s the reality that services are developed in various languages: Java, C#, Python, Golang, or Node.js. It gets very difficult to apply a variety of different policies if it requires independent implementations in a set of five distinct languages. The complications multiply, and it becomes obvious that we need a better model: some kind of management infrastructure to control the potential chaos.
Enter the service mesh
While the term “microservices architecture” refers to a general pattern, a service mesh is a particular realization of that pattern. A service mesh provides a transparent and language-independent way to flexibly and easily automate application network functions. (For more on service mesh, check out this blog series.)
Simply put, services meshes were developed to solve the problems of connecting, securing, controlling, and observing a mesh of services. Service meshes handle service-to-service interactions including load balancing, service-to-service authentication, service discovery, routing, and policy enforcement.
Istio is an open-source project that delivers a service mesh; it’s backed by Google, IBM, Red Hat, Lyft, Cisco, and others, and is being used in production by companies like eBay, Autotrader, Trulia, Continental, and HP.
Istio aims to help connect, secure, control, and observe the services in the mesh.
Connect Istio helps control the flow of traffic and API calls between services intelligently; services connect to their dependent services via names, and load automatically gets balanced across all of the available runtime instances of a target service. Retries, circuit breakers, canary releases—all are handled automatically and configured for the mesh.
Secure Istio automatically secures communications between services through managed authentication, authorization, and encryption. Each service has an identity asserted by an X.509 certificate that is automatically provisioned and used to implement two-way (mutual) Transport Level Security (TLS) for authorization and encryption of all API exchanges.
Control Istio applies policies (for example, routing, rate limits, quotas) and enforces them across services. Inbound and outbound communications are controlled—even requests that go to external systems.
Observe Istio ensures visibility with automatic tracing and operational logging of services.
The goal of using a service mesh like Istio with your microservices system is better security, more reliability, lower cost, scale, and better resiliency within a set of closely intercommunicating systems.
A look at services in a mesh
Suppose an application is a custom inventory management system for a retailer, composed of several cooperating, related services:
Policies defined in the service mesh might dictate that the pricing service can make outbound calls only to its data store, while the product service can call the location and inventory and pricing services, but not anything else.
If the team uses Kubernetes as the underlying platform for these services, Kubernetes ensures that unhealthy instances of these services get stopped and new instances get started. The service mesh ensures that the new instances are governed by the same set of policies.
Sharing APIs with consumers outside the mesh
Service meshes generally focus on the problem of managing and controlling the intercommunication among all the disparate services that comprise the application. Surely, it’s possible to define an Istio gateway to accept inbound requests. But that’s not enough; there’s a need to manage the requests into the mesh from service consumers, and to manage outbound requests—perhaps to a SaaS CRM system or another externally managed service.
Back to our example: the product service needs to accept inbound requests from clients, such as an app running on hundreds of thousands of mobile phones. But the product service might want to modify its behavior based on the identity of the user making the call. The app on the mobile phone uses the “Product API”—the communication contract exposed by the product service—to send in requests. The product service might also need to connect to a Salesforce system.
Regarding inbound requests arriving from systems that are significantly separated from the services in the mesh, how should those requests be secured, controlled, and managed? (Examples of “significantly separated” could include requests from third-party apps or even from different business units or teams within the same company. The requirements for these separated systems are quite different from those for inter-service communication).
For an external or mobile client, we cannot rely solely on a TLS certificate to assert the identity of the inbound request. While app producers can provision client certificates into mobile apps, in many cases the client has no certificate to use for transport-level security. Clients may use a different form of identity assertion, relying on message-level security and signing, such as a self-signed JWT.
Often the system would also like to authenticate the identity of a human user. Within the service mesh, the service identity is the only identity, but requests arriving from a mobile app, kiosk, or web app should carry user identity as well. This generally isn’t done with an X.509 certificate, but rather with a token (think OAuth) that asserts identity information about the user.
Rate limits for external consumers will be different, and may depend on the status of the developer of the consumer app (or “client”). For example, client apps built by partners might get greater rights and higher transaction allowances. These limits are often used for business purposes or may be in place to protect an entire system rather than an individual service.
It may be desirable to modify or filter the original API requests and responses depending on the client’s use case, or the user’s identity. For example, a lightweight client might want a compressed data format in lieu of JSON. Or a client might want a filtered view of the JSON, with some of the fields returned and some omitted.
These different requirements apply to externally-developed apps, and in general to any client or service consumer that is significantly separated from the target services. Verification and throttling of requests coming from a client built by a developer on the service-development team is less useful; if the client app (or consuming service) is misbehaving, the API publishing team can notify their internal partners, and that client app development team just gets to work and fixes it. But calls arriving from a client built by an external partner need more verification. It’s not practical to engage on a 1:1 basis with developers of external apps. More strict enforcement is necessary here; that’s where API management come in.
API management enables the sharing of APIs
Sharing a microservice means exposing it as an API for use by a developer outside the team that built the service. Microservice APIs that are shared outside of a small team need to be managed. An API management infrastructure, such as Google Cloud’s Apigee API management platform, helps address the different requirements for requests that are sent from external systems. Apigee also supports Istio as an API gateway or enforcement point. API management enables you to:
Share APIs by publishing them and making them available to developers outside the core app team. These external developers need to gain authorization to access the APIs and understand how to use the APIs to build their own apps and clients. The core app team wants a way to distinguish clients built by different developers, and even distinguish between clients built by the same developer. This is all enabled by a developer portal, and the ability for developers to self-register and self-provision credentials for the APIs.
Productize by treating APIs as “digital products.” This means:
collecting related APIs into a coherent, consumable unit, to enable publishing and sharing.
grouping complementary APIs that may have disparate origins into a single unit and normalizing security requirements on those APIs (rate limits, security credentials, message signing, and so on).
modernizing APIs by transforming SOAP to JSON, implementing caching, or wrapping validation logic around “bare” or naive internal APIs.
potentially monetizing APIs directly, charging developers for inbound traffic.
Report on usage trends, traffic splits, latency and user experience. This enables the API product team to feed insights back into the API’s design, to iterate on the API-as-digital-product—and maximize business value. This capability is built on an analytics subsystem that scales.
Comparing API management to service management
Services management and API management are different. A typical large, information-driven enterprise will deliver thousands of services, and will share hundreds of those via APIs to outsiders. Service management and API management satisfy the needs in these different spheres:
A mature enterprise might aspire to the following goals:
All services will be managed; they’ll have consolidated logging, tracing, mTLS, and retry policies applied and enforced. Access to services is via loosely governed APIs.
All APIs shared outside their original ownership domain will be managed. Developers external to the team can view them, request access to them, and gain credentials. And access from apps built by external developers will be more strictly governed and throttled. Analytics data will help inform modifications to shared APIs.
But there are some logical parallels between the capabilities in service meshes like Istio and API management platforms like Apigee.
Both Istio and Apigee:
Enable policy enforcement for requests (rate limiting, quotas, and token verification)
Can perform request routing based on data in the request
Collect logging information for observability
Use a communication proxy to implement these controls
However, these two systems are targeted to different needs. Services management is intended for services built by a more or less closely related development team. This includes some aspects of managing the communication among those services, for example, mutual-TLS enforcement and automatic certificate provisioning, or rate limiting, or routing.
On the other hand, API management is intended primarily to manage sharing of APIs outside of a core team. An outsider might be a member of the team across the hall, a developer in a different division of your company, or a developer at a different company. In any case there is significant separation between the API consumer and the API provider, which demands a greater degree of management of the API.
While technically there are parallels, the requirements are different enough that, especially as the number of services under management grows, and as the number of APIs shared beyond their publishing teams grows, companies will want dedicated infrastructure to manage each.
How HP manages microservices
HP Inc., which sells a wide range of printing solutions to consumers and enterprises, builds a variety of core services that are shared across business units at the company, including identity management and content management.
The decision to move to a microservices architecture was driven in large part by the need to move faster, says Galo Gimenez-Palop, distinguished technologist and platform architect at HP. Large teams working on applications from different functional areas created the need for lots of synchronization—and lots of meetings.
“We had these continuous integration pipelines that would take hours. And because there were so many dependencies, there were manual gates to decide if something was moving or not moving forward,” Gimenez-Palop tells us.
So it’s no surprise that HP was attracted to the increased development velocity promised by a microservices architecture. Adopting microservices (alongside Kubernetes container orchestration, which accelerated teams building and deploying applications) would enable smaller development teams working on smaller code bases, with services going into production independently to reduce reliance on other teams—“you build it, you run it,” as Gimenez-Palop puts it.
Yet moving from monolithic applications to a more distributed microservices architecture posed several challenges—especially as the number of microservices grew. The stumbling blocks mounted, Gimenez-Palop says: from orchestration difficulties, to challenges in discovering services, to breakages resulting from modifying services that depend on other services, to “policy sprawl,” to issues with ensuring that new versions of services integrate with other services.
“How do you know which is the other version of the other service and when the other team changes the version of the service?” Gimenez-Palop asks. “Integration testing became really, really difficult because now, when you release one of the services, you need to do the integration testing with a bunch of other testing other services.”
“As soon as you start to do microservices, you will find that it’s very easy to get in a complicated mess,” he adds.
Istio proved to be the solution for HP. It simplified the complexities of microservices communications by providing a standardized way to connect, secure, monitor, and manage microservices. As a vital plane for service-to-service control and reliability, Istio handles application-layer load balancing, routing, and service authentication.
Sharing microservices-based core services with other business units within HP is done by exposing them as APIs with other teams in the organization or with external partners and developers.
But when microservices are exposed as APIs, they require API management, which makes it easier to extend the value of microservices both within the enterprise and to external developers. HP uses the Apigee platform for this purpose, gaining security, visibility, and control along the way.
“We can have written contracts with the consumers of our APIs, we can have visibility into how those consumers are using the APIs of different services,” Gimenez-Palop says. “We can have policies like authorization, authentication, and payload inspection, all centralized in a single location.”
Learn more by watching the Google Cloud NEXT ‘19 session:
Can I use services management alone?
As services become more prevalent and more fine-grained within an enterprise, formal service management via dedicated service mesh infrastructure will become a requirement.
But is a service mesh alone enough? Sometimes. In the case where all of the inter-cooperating services are built and managed under the same ownership domain (a company’s director of engineering, for example), and access to the services is rarely shared with outsiders (other teams) via exposed APIs, a service mesh such as Istio will likely satisfy the requirements. Clients to the services are all in-house and don’t have significant separation from the services themselves.
If services expose an externally consumable API that outsiders can see and use, then API management (as Gimenez-Palop says above) becomes a natural complement to service management.
Microservices continues to be a big idea. A successful microservices approach requires tooling and infrastructure for building and managing those services. Sharing access to services happens through APIs, and APIs shared outside the domain of ownership need to be managed. Drawing the line between what is inside the domain of ownership and what is outside, and therefore which API calls need less management and which need more, is a judgment call.
Service meshes and API management are complementary and are used to solve distinct problems around services and APIs. While they both use communication proxies, and while there are parallels in function, the differences in domain set them apart, and most companies will see significant benefits from using them together.
For more, watch Dino and Greg’s popular presentation at Google Cloud NEXT ‘19, “APIs, Microservices, and the Service Mesh.”