By Andy Shi.
Gartner defines edge computing as a topology where information processing and content collection and delivery are placed closer to the sources of the information, with the idea that keeping traffic local and distributed will reduce latency. This includes all the technology on the Internet of Things (IoT).
A typical Kubernetes cluster sits in well-protected data centers. But when moving the nodes to the edge, the harsh reality of unstable public internet kicks in. By current upstream design, the Kubelet on each node will cache information in memory. This is fine if the connection is guaranteed. However, when the connections are not available, this design can lead to trouble. To be more specific, there are three typical scenarios that need to be taken into account.
- How to recover Pod information when the connection to Master is down and Kubelet restarts during that period of time?
- When the worker nodes cannot be reached by the control plane for certain period of time, the pods on those nodes will get evicted.
- After resumption, the edge nodes and master nodes may need to reconcile their status differences.
- The connection may not be bi-directionally available. The edge nodes may connect to intranet with outbound networking configured only. For security reasons, the cloud site control plane may not be able to directly address the edge node.
In the Gartner edge computing report, it also points to the imperatives to be addressed, and one of them is: “A need for limited autonomy or disconnected operation.”
OpenYurt (Yurt: /jɝːt/, portable, round tent) is an open source project to extend native Kubernetes to the edge. OpenYurt has several features outstanding and limited autonomy is a major one. So, how does OpenYurt solve these problems?
YurtHub is a node daemon that serves as a proxy for the outbound traffic from the Kubernetes node to the master nodes. When relaying outbound traffic, it will cache all the states and persist them locally. If the connection to the data center is lost or the Kubelet restarts, the Kubelet will read from the cache of YurtHub. This will effectively solve problem A. For problem C, it still relies on the Yurthub. Since the users manage the workloads through API server, the master nodes hold the truth. When the connection resumes, YurtHub sync the edge nodes with the Masters. This way your pods will get the latest updates from the master.
This design has a couple advantages:
- There is no change to the upstream code. Simple and easy maintenance.
- The YurtHub can be used by other node components if they need to leverage the capability of autonomy.
Of course, there are other solutions. One of them, for example, will create its own agent based on a forked version of Kubelt and basically embed the caching mechanism inside it. While this approach appears to be simpler in implementation, it does add tremendous maintenance cost as the forked Kubelet has to keep up with the upstream version changes if the agent wants to retain the API compatibility.
Yurt Controller Manager
How about problem B? OpenYurt provides a controller manager which manages a few controllers such as the node controller and the unit controller for different edge computing use cases. And one of the functionalities is the Pods in the nodes that are in the autonomy mode will not be evicted from API Server even if the node heartbeats are missing.
As you can see, the first three challenges of limited autonomy can be easily and elegantly solved with OpenYurt. How about the last one? We are still working on that. If you are interested, please join us. Here is our git repo.