Discover more from The Best Data Architectures Newsletter
What are Containers, Kubernetes and Openshift?
What do these commonly used terms in the Cloud context mean?
When it comes to cloud computing, there are 3 terms that sometimes get mixed up or confused- containers, Kubernetes and OpenShift. This blog is my attempt to clear up the grey areas and clarify the use-cases for each, based on my understanding.
The basic building blocks of cloud native data architectures are containers. Kubernetes is an orchestration tool that orchestrates the deployment, spinning up containers or spinning them down. OpenShift is an enterprise grade version of Kubernetes
What are Containers?
Historically, applications used to run on physical servers, consuming all the resources. This used to be a huge time and resource sink for businesses.
Then circa, early 2000s, the concept of “Virtual machines (VMs)” came along where multiple VMs could be run on one physical server by abstraction from the actual computer hardware. Each VM thereby runs it’s own OS instance and all the VMs share the resources from the underlying hardware.
A container is an executable “package” of software where application code, (including libraries and dependencies) is bundled together. Compared to VMs, containers are much more lighter weight as they don’t package the OS instance in it’s payload and therefore is independent on the OS (Windows or Linux). The main advantage of containers is being able to “build once, run anywhere” on-premises or on any cloud and it’s fit for Agile and DevOps code development.
The concept of “Images” for containers can be understand as “recipes” or templates that include a runtime environment for all the libraries and files needed.
What is Kubernetes (k8s)?
Kubernetes is a container orchestration platform designed to run, manage and schedule container workloads.
What is Orchestration?
As developers start to build more and applications and the number of containers being built per application starts to increase the effort of manually managing these containers starts to become exponentially difficult. Moreover, cloud applications are created with the assumption that they are bound to fail and should be able to handle the failure scenarios. Orchestration brings together these 2 aspects and plays a key role in building a system that is “meant to fail” and should be able to reroute the traffic between containers in case a container goes down.
What is scheduling?
When an application is getting a lot of traffic, scheduling is being able to scale up the number of containers. Or let’s say there is a sudden drop in the traffic an application has to support, being able to scale down the number of containers.
Key components of Kubernetes
Load Balancer and services:
The Load Balance “service” in K8s, is able to route traffic inside the cluster to specific IP addresses of the different containers or pods within the k8s cluster.
They provide an easy way to automate a lot of the management within Kubernetes. From Red Hat’s website, application specific operators “includes domain or application-specific knowledge to automate the entire life cycle of the software it manages. “
Pods are the smallest manageable units in k8s. they may consist of 1 or more containers.
Nodes maybe virtual VMs or physical servers. Each node can be a VM or a Physical server that can run on a public or private cloud and contains the services necessary to run pods. A master node provides the basic cluster services such as an API server and exposes a set of capabilities to allow users to define how to schedule workloads on the worker nodes. There are also worker nodes, each of which has a kubelet component.
A k8s process that makes changes to resources depending on the current state of the cluster and the “desired state”. A best examples of this is the case of to define the number of pods or containers that are replicated within the cluster.
Though pods and containers can be spun up or brought down as needed, a lot of applications that run on containers still need to have the ability to retain data using persistent volumes that k8s can access and bind to the pods within the cluster.
How does it all tie together?
The kubelet plays a big role in scheduling and making sure applications are healthy and running on the pods within the worker nodes and also works hand in hand with the master node.
K8s used a yaml file to define the resources that are sent to the API server and are used to create the application running on the worker nodes. kubectl is a command lone tool in k8s to deploy applications, inspect and manage cluster resources and is used to push the config in the yaml file through to the kubelets so a pod can be started up. (Side note- a pod can run 1 or more containers in it) With an internal IP address that gets assigned, you could ssh to any of the worker nodes to access the application.
Although pods are the basic unit of computation in Kubernetes, they are not typically directly launched on a cluster. Instead, pods are usually managed by one more layer of abstraction: the deployment. A deployment’s primary purpose is to declare how many replicas of a pod should be running at a time. Anytime a replica goes down, the master spins another replica to take it’s place.
Now that the pods are running on the worker nodes, a service then needs to be created to access the pods as a singular entity (vs. accessing each node using separate ip addresses) and allow them to communicate with each other. This creates a cluster and an internal cluster IP for the service.
To expose this to external users, a load balancer is used to create an external IP.
What is OpenShift?
Built on Red Hat Enterprise Linux (RHEL) and the Open sources project called Origin Kubernetes distribution (OKD), OpenShift also comes with RedHat support and is an enterprise ready version of Kubernetes. This includes supporting multitenancy, increased security and monitoring to name a few key aspects. Integrated developer workflows (Continuous integration, continuous delivery or CI/CD) and a concept called “Routes” OpenShift can run on bare metal, or on VMs. As of the date this blog was published, the latest official version for OpenShift is 4.5.
What is a Route?
Routes builds on top of services and exposes the internal services to the external internet. For example- in the case of a customer facing application.
Here is what the architecture for Openshift would look for. The core building block is RedHat Enterprise Linux. The Kubernetes layer would run on top. The value add of OpenShift is taking away the difficulty of deploying applications, a Command language interface (CLI) and web consoles to make working with K8s much easier.
Here’s a real world use-case using OpenShift:
A developer wants to write code for an application to deploy on a K8s cluster. He would use the CLI to log into the OpenShift clusters and create a project or application.
Once the developer wants to check in his code changes to a github respository, OpenShift in the backend creates a Jenkin job/pipeline which powering the deployment of the application. An image gets built and pushed into a private or public registry. Then it gets pushed into the actual cluster.
With a feature called Image Streams, anytime a change in the code checked into the repository /image is detected, the code is then pushed down the cluster.
Using another feature called Ancible playbook, OpenShift can spin up a new host and bring that into the cluster seamlessly.
To summarize, containers form a crucial foundational block in cloud native architectures. Open source Kubernetes and enterprise grade orchestration solutions such as OpenShift offer solutions to deploy and manage applications and provide external access and CI/CD capabilities.