Published at 2020-11-04 | Last Update 2020-11-04
This post also provides a Chinese version.
This post serves as a successor to our previous post Trip.com: First Step towards Cloud Native Networking. We will update some of our recent progresses on Cilium-based networking & security.
For historical reasons, Neutron+OVS has been our networking stack in the past years - even for our Kubernetes clusters. As cloud native era approaches, this solution gets increasingly cumbersome, especially its inherent hardware and software bottlenecks in the face of the sheer scale of containers today [1].
So, to address the bottlenecks, as well as to meet the ever-increasing new networking requirements, we devoted lots of efforts to investigating and evaluating various kinds of new generation networking solutions, and in the end Cilium won our favor.
Fig 1-1. Networking solutions over the past years [2]
In combination with BGP [3], Cilium landed our production environment at the end of 2019. Since then, we have been migrating our existing Pods from legacy network to Cilium.
As one of the early practitioners, we’ve made certain customizations to smoothly rollout Cilium into our existing infrastructure. Below lists some of them [2]:
docker-compsoe + salt
instead of daemonset+configmap
.BIRD
instead of kube-router
.We have detailed most of these changes and customizations in [2], refer to the post if you are interested.
In the next, we’d like to elaborate on some topics that are not covered much before.
With Cilium+BIRD, networking is split into two complementary parts, with the host as boundary, as shown in Fig 2-1:
Fig 2-1. High level topology of the Cilium+BGP solution [2]
And regarding to the second part - cross-host networking with BGP - a BGP peering model is needed, which solves questions such as,
Regarding to specific requirements, it may end up with a really complex model. But we made a simple one that fitted well into our capabilities and business needs, described as below:
/25
or /24
PodCIDR when the node turns up./25
or /24
BGP announcements from nodes, but
does not announce any routes to them.This scheme is simple in that,
Fig 2-2. BGP peering model in 3-tier network topology
On choosing BGP protocols as well as establishing BGP connections, it depends on different hardware network topologies,
We have summarized our practices as a getting started guide, see Using BIRD to run BGP [3].
As an example, let’s see a typical traffic path within this networking solution: accesing Service from a Pod, with the backend located on another node, as shown below,
Fig 2-3. Traffic path: accessing Service from a Pod [4]
Major steps as numbered in the picture:
curl <ServiceIP>:<port>
) from Pod1 at Node1.dst_ip
field.We have a dedicated post for illustrating this process and the code implementation at each stage, see [4] if you’re interested.
By its design, ServiceIP is meant to be accessed only within each Kubernetes cluster, what if I’d like to access a Service from outside of the cluster? For example, from a bare metal cluster, an OpenStack cluster, or another Kubernetes cluster?
The good news is that, Kubernetes already ships several models for these accessing patterns, for example,
The bad news is: Kubernetes only provides these models, but the implementations are left to each vendor. For example, if you are using AWS, its ALB and ELB just corresponds to the L7 and L4 implementation, respectively.
For our on-premises clusters, we proposed a L4 solution with Cilium+BGP+ECMP. It’s essentially a L4LB, which provides VIPs that could be used by those externalIPs and LoadBalancer type Services in Kubernetes cluster:
Fig 2-4. L4LB solution with Cilium+BGP+ECMP [5]
Based on this L4 solution, we deployed istio ingress-gateway, which implements the L7 model. A typical traffic path:
Fig 2-5. Traffic path when accesing Service from outside the Kubernetes cluster [5]
We have a dedicated post for illustrating this, see [5].
Cilium features two cutting edge functionalities:
As a big step, we are trying to carry out the security capabilities into our infrastructure.
Let’s first take a simple example, have a glance at what a CiliumNetworkPolicy (CNP) looks like [6]:
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "clustermesh-ingress-l4-policy"
description: "demo: allow only employee to access protected-db"
spec:
endpointSelector:
matchLabels:
app: protected-db
ingress:
- toPorts:
- ports:
- port: "6379"
protocol: TCP
fromEndpoints:
- matchLabels:
app: employee
The above yaml says:
name
and description
.app=protected-db
.ingress
(inbound) traffic of the matched endpoints (server side), allow only if
TCP
and 6379
.app:employee
.As can be seen, CNP is really flexible and easy to use. But to roll it out into real environments in enterprises, there may be considerable challenges.
As an example, we think below challenges are not specific to us alone.
If all your applications run in Cilium, and all your to-be-secured applications converged to a single cluster (most public cloud vendors suggest one big Kubernetes cluster inside each region), then it’ll be fairly easy to rollout things.
But this assumption almost always proves to be false in the reality, especially in companies whose infrastructures have evolved from many many years ago. In other words, “neat” and well-orginazed infrastructures are ideal rather than real.
The more bigger challenge for us is that, we still have so many non-Cilium or even non-Kubernetes clusters.
The reality we are facing is: applications scattered among Cilium-powered Kubernetes clusters, Neutron-powered Kubernetes clusters, OpenStack clusters, and bare metal clusters.
Although in the long run, Neutron-powered Kubernetes clusters will fade out, but OpenStack clusters as well as bare metal clusters will continue to live (although may gradually scale down), so we must consider them when planning.
The security solution we came up:
Enforce CNP at only server side, clients could come from any cluster, any platform.
This limits the scope and simplifies the overall design.
Only consider (the server side) Cilium Pods at the first stage rollout of this solution.
This is a good starting point, and we expect major part of our server side applications will be running in Cilium clusters.
The, the remaining question is: how to enforce network policy over clients that coming from outside of a cluster, or even outside of Cilium’s awareness? Our answer is:
Each explained below.
Fig 3-1. Vanilla Cilium ClusterMesh [6]
ClusterMesh [7] is a multi-cluster solution provided by Cilium. This solves the multi-cluster problem if all applications are deployed as native Cilium endpoints (Pods).
Using ClusterMesh sounds to be straight forward, but actually it was not our first choice then. Several reasons:
Here, non-Cilium endpoints include Neutron-powered Pods, VMs, BMs.
Looking at the code, Cilium already has an abstraction for these endpoints, named external endpoints, but, this feature is currently lessly implemented and publicized.
Fig 3-2. Proposed security solution over hybrid infrastructures
As shown above, as an (community compatible) extension to Cilium, we developed a custom API suite, which allows specific owners to fed their instances’ metadata into Cilium cluster, the Cilium will perceive them as external endpoints. And more, we ensure that external points’ create/update/delete events will be timely notified to Cilium.
Combining 3.3.1 & 3.3.2, our extended ClusterMesh now possesses an entire view over our hybrid infrastructures, which is sufficient for enforcing network policy over all types of clients.
This post shares some of our recent processes on Cilium-based networking & security.