Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.

K8 pods run an instance of a docker image, similar to a docker container. We can run all of our services in HySDS as K8 pods/deployments (& services if it needs to be exposed to users)

kubectl is the CLI tool used to communicate with your k8 cluster

New Dockerfile

I want to make the base docker image for the HySDS services as light as possible, so it would be best to not use hysds/pge-base because it’s ~3.5GB and installs a lot of extra tools needed for PGE execution

Resorted to creating a new docker image from centos:7 and installed only python 3.7 and the core hysds libraries, the new image is ~850MB (but will try to shrink it more)

  • in the future we can set a ARG to a python version (3.7.9) and give the users the option of installing a different version of python with --build-arg

the docker images for the various services in HySDS (mozart, grq2, pele rest APIs, celery workers) will be branched off this image and ran in a k8 environment

FROM centos:7

ARG HOME /root

WORKDIR $HOME

RUN yum install gcc openssl-devel bzip2-devel libffi-devel openldap-devel make wget git -y && \
    cd /tmp && \
    # installing python 3
    wget https://www.python.org/ftp/python/3.7.9/Python-3.7.9.tgz && \
    tar xzf Python-3.7.9.tgz && \
    cd Python-3.7.9 && \
    ./configure --enable-optimizations && \
    make altinstall && \
    ln -s /usr/local/bin/python3.7 /usr/local/bin/python3 && \
    ln -s /usr/local/bin/pip3.7 /usr/local/bin/pip3 && \
    pip3 install --upgrade pip && \
    pip3 install gnureadline && \
    rm -f /tmp/Python-3.7.9.tgz && \
    rm -rf /tmp/Python-3.7.9 && \
    # installing HySDS libraries
    cd $HOME && \
    git clone https://github.com/hysds/prov_es.git && \
    git clone https://github.com/hysds/osaka.git && \
    git clone https://github.com/hysds/hysds_commons.git && \
    git clone https://github.com/hysds/hysds.git && \
    pip3 install -e prov_es/ && \
    pip3 install -e osaka/ && \
    pip3 install -e hysds_commons/ && \
    pip3 install -e hysds/ && \
    yum clean all && \
    rm -rf /var/cache/yum

WORKDIR $HOME
CMD ["/bin/bash"]

Kubernetes YAML files

Example of all services in mozart in a kubernetes environment, can run on your local k8 cluster (minikube or docker for desktop)

k8 services and deployments are defined in a .yaml file

a k8 service exposes your “pod” (similar to a docker container) to allow other entities to communicate with it (another pod or a user)

service.yml

apiVersion: v1
kind: Service
metadata:
  name: mozart
  labels:
    app: mozart
spec:
  ports:
    - port: 8888
  selector:
    app: mozart
  type: LoadBalancer

deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mozart
  labels:
    app: mozart
spec:
  # replicas: 2  # will allow you to run multiple instances of the app
  selector:
    matchLabels:
      app: mozart
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: mozart
    spec:
      containers:
        - name: mozart
          image: mozart:test
          # env:  # passing environment variables
          #   - name: WORKERS
          #     value: "4"
          ports:
            - containerPort: 8888
              name: mozart
          volumeMounts:
            - ...
      volumes:
        - ...

Use the kubectl CLI tool to deploy your application in your kubernetes cluster

$ kubectl apply -f deployment.yml
$ kubectl apply -f service.yml

Your deployment and service is now running

$ kubectl get pod
NAME                      READY     STATUS    RESTARTS   AGE
mozart-7cfff56848-ztg7d   1/1       Running   0          6s

$ kubectl get deployment
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
mozart    1         1         1            1           33s

HySDS services:

Before HySDS would run its services in their respective machine/instance (Mozart, GRQ, Metrics & Factotum) but moving to a k8 deployment will get rid off that as the engine will determine which k8 node will run which service

Stateless Application(s)

  • Mozart rest API

  • Logstash

  • Celery workers

  • GRQ2 rest API

  • Pele rest API

  • Kibana

  • sdswatch

stateless applications are applications that don’t store data (besides logs), therefore the deployment in k8 is very simple & straightforward

can scale out easily without worrying about a leader/worker architecture, just add replicas: # in the deployment.yml file and your k8 LoadBalancer will handle the rest

most of the work is revolved around creating a PersistentVolume to store logs & maybe cache data

Stateful Application(s)

  • Elasticsearch

  • Redis

  • RabbitMQ

stateful applications save client data and deployments are more complicated

examples are databases, queues and cache stores

scaling out stateful applications require the usage of a StatefulSet

Helm

Helm is a plugin (similar to homebrew, yum) and repository for kubernetes which hosts k8 “packages” and …

… helps you manage Kubernetes applications — Helm Charts help you define, install, and upgrade even the most complex Kubernetes application.

Use Helm v3, v2 has a security vulnerabilities

Deploying stateful applications can often be complicated and can take a lot of k8 yaml files to get it to work, especially if you’re planning on running a multi-node setup: example for RabbitMQ

Using helm to handle the templating and .yml creation makes things much easier: example by Bitnami

Concerns

Kubernetes is dropping support for Docker

  • contains only the runtime component of Docker: container.d

  • saves resources (RAM, CPU storage, etc.) & less security risk

Kubernetes on cloud (EKS, GKE & AKS) don’t have to worry about it, but will affect users who are managing a K8 cluster themselves

SDSCLI - https://github.com/sdskit/sdscli

moving to kubernetes will drastically affect sdscli

  • it was written under the design of SSHing into other HySDS components (grq, factotum & metrics) and running commands such as pip install, etc.

  • it relies on fabdric to copy files from mozart to other HySDS components

    • for example, sds update grq will clear out ~/sciflo/ops/ and copy over all the necessary files/repos from ~/mozart/ops/ to grq

    • can copy files from pod -> pod (kubectl cp my-pod:my-file my-file) but it can potentially mess things up

  • this will not work with k8 b/c every service is completely de-coupled and in their own environment

  • sds [start|stop|reset] [mozart|grq|metrics|factotum] will become somewhat obsolete (in its current state) because there need for supervisord to run its services

    • services will be running in their own standalone pod(s)

    • instead will use kubectl to manage the k8 services

    • sueprvisord may be used in the k8 pod for celery workers

      • b/c we have many celery workers (user_rules processing, orchestrator, etc), wrapping it in supervisord in a pod may clean things up

  • will need to see how sds ship will be affected by kubernetes

  • No labels