/
HySDS in Kubernetes (k8)

HySDS in Kubernetes (k8)

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.

K8 pods run an instance of a docker image, similar to a docker container. We can run all of our services in HySDS as K8 pods/deployments (& services if it needs to be exposed to users)

kubectl is the CLI tool used to communicate with your k8 cluster

New Dockerfile

I want to make the base docker image for the HySDS services as light as possible, so it would be best to not use hysds/pge-base because it’s ~3.5GB and installs a lot of extra tools needed for PGE execution

Resorted to creating a new docker image from centos:7 and installed only python 3.7 and the core hysds libraries, the new image is ~850MB (but will try to shrink it more)

  • in the future we can set a ARG to a python version (3.7.9) and give the users the option of installing a different version of python with --build-arg

the docker images for the various services in HySDS (mozart, grq2, pele rest APIs, celery workers) will be branched off this image and ran in a k8 environment

FROM centos:7 ARG HOME=/root ARG VERSION="3.7.9" WORKDIR $HOME # RUN yum update -y && \ RUN yum install gcc openssl-devel bzip2-devel libffi-devel openldap-devel readline-devel make wget git -y && \ cd /tmp && \ # installing python 3 wget https://www.python.org/ftp/python/${VERSION}/Python-${VERSION}.tgz && \ tar xzf Python-${VERSION}.tgz && \ cd Python-${VERSION} && \ ./configure --enable-optimizations && \ make altinstall && \ ln -s /usr/local/bin/python${VERSION:0:3} /usr/local/bin/python3 && \ ln -s /usr/local/bin/pip${VERSION:0:3} /usr/local/bin/pip3 && \ pip3 install --no-cache-dir --upgrade pip && \ pip3 install --no-cache-dir gnureadline && \ rm -f /tmp/Python-${VERSION}.tgz && \ rm -rf /tmp/Python-${VERSION} && \ # installing HySDS libraries cd $HOME && \ git clone https://github.com/hysds/prov_es.git && \ git clone https://github.com/hysds/osaka.git && \ git clone https://github.com/hysds/hysds_commons.git && \ git clone https://github.com/hysds/hysds.git && \ pip3 install --no-cache-dir -e prov_es/ && \ pip3 install --no-cache-dir -e osaka/ && \ pip3 install --no-cache-dir -e hysds_commons/ && \ pip3 install --no-cache-dir -e hysds/ && \ yum clean all && \ rm -rf /var/cache/yum && \ rm -r /tmp/* WORKDIR $HOME CMD ["/bin/bash"]

Kubernetes YAML files

Example of all services in mozart in a kubernetes environment, can run on your local k8 cluster (minikube or docker for desktop)

k8 services and deployments are defined in a .yaml file

a k8 service exposes your “pod” (similar to a docker container) to allow other entities to communicate with it (another pod or a user)

service.yml

apiVersion: v1 kind: Service metadata: name: mozart labels: app: mozart spec: ports: - port: 8888 selector: app: mozart type: LoadBalancer

deployment.yml

apiVersion: apps/v1 kind: Deployment metadata: name: mozart labels: app: mozart spec: # replicas: 2 # will allow you to run multiple instances of the app selector: matchLabels: app: mozart strategy: type: Recreate template: metadata: labels: app: mozart spec: containers: - name: mozart image: mozart:test # env: # passing environment variables # - name: WORKERS # value: "4" ports: - containerPort: 8888 name: mozart volumeMounts: - ... volumes: - ...

Use the kubectl CLI tool to deploy your application in your kubernetes cluster

Your deployment and service is now running

HySDS services:

 

Before HySDS would run its services in their respective machine/instance (Mozart, GRQ, Metrics & Factotum) but moving to a k8 deployment will get rid off that as the engine will determine which k8 node will run which service

Stateless Application(s)

  • Mozart rest API

  • Logstash

  • Celery workers

  • GRQ2 rest API

  • Pele rest API

  • Kibana

  • sdswatch

stateless applications are applications that don’t store data (besides logs), therefore the deployment in k8 is very simple & straightforward

can scale out easily without worrying about a leader/worker architecture, just add replicas: # in the deployment.yml file and your k8 LoadBalancer will handle the rest

most of the work is revolved around creating a PersistentVolume to store logs & maybe cache data

Stateful Application(s)

stateful applications save client data and deployments are more complicated

examples are databases, queues and cache stores

scaling out stateful applications require the usage of a StatefulSet

Another option they recommend is to move your stateful applications into cloud managed services

Helm

Helm is a plugin (similar to homebrew, yum) and repository for kubernetes which hosts k8 “packages” and …

… helps you manage Kubernetes applications — Helm Charts help you define, install, and upgrade even the most complex Kubernetes application.

Use Helm v3, v2 has a security vulnerabilities

Deploying stateful applications can often be complicated and can take a lot of k8 yaml files to get it to work, especially if you’re planning on running a multi-node setup: example for RabbitMQ

Using helm to handle the templating and .yml creation makes things much easier: example by Bitnami

Concerns

Kubernetes is dropping support for Docker

  • contains only the runtime component of Docker: container.d

  • saves resources (RAM, CPU storage, etc.) & less security risk

Kubernetes on cloud (EKS, GKE & AKS) don’t have to worry about it, but will affect users who are managing a K8 cluster themselves

SDSCLI - GitHub - sdskit/sdscli: Command line interface to SDSKit

moving to kubernetes will drastically affect sdscli

  • it was written under the design of SSHing into other HySDS components (grq, factotum & metrics) and running commands such as pip install, etc.

  • it relies on fabdric to copy files from mozart to other HySDS components

    • for example, sds update grq will clear out ~/sciflo/ops/ and copy over all the necessary files/repos from ~/mozart/ops/ to grq

    • can copy files from pod -> pod (kubectl cp my-pod:my-file my-file) but it can potentially mess things up

  • this will not work with k8 b/c every service is completely de-coupled and in their own environment

  • sds [start|stop|reset] [mozart|grq|metrics|factotum] will become somewhat obsolete (in its current state) because there’s no need for supervisord to run its services

    • services will be running in their own standalone pod(s)

    • instead will use kubectl to manage the k8 services

    • sueprvisord may be used in the k8 pod for celery workers

      • b/c we have many celery workers (user_rules processing, orchestrator, etc), wrapping it in supervisord in a pod may clean things up

  • will need to see how sds ship will be affected by kubernetes

Related content

Container Engines
Container Engines
More like this
Adding Support for Podman
Adding Support for Podman
More like this
Analysis of Podman Integration into HySDS core
Analysis of Podman Integration into HySDS core
More like this
Releases
Releases
More like this
v3.0.0-rc.6
v3.0.0-rc.6
More like this
Note: JPL employees can also get answers to HySDS questions at Stack Overflow Enterprise: