braindump ... cause thread-dumps are not enough ;)

extending kubernetes with use of the Operator pattern

Sometimes even the best pieces of software are not good enough and require customizing. Or you are looking for introducing some automation to frequently executed actions. The Operator pattern is kubernetes’s answer to that. It shows how to introduce your own management logic into a nicely functioning cluster and not lose your mind in the process :) Below are my notes from creating a sample controller (code available here: https://github.com/pkoperek/kubernetes-custom-resource).

Initial notes

  • “Control plane” is a set of services which manage the cluster: source
  • Kubernetes can be extended by adding CustomResourceDefinitions or Aggregated APIS
  • A CRD can be created simply by creating another object in k8s (you don’t have to programm anything to get it setup) but then it doesn’t do anything apart from storing some structured data in k8s API
  • “On their own, custom resources let you store and retrieve structured data. When you combine a custom resource with a custom controller, custom resources provide a true declarative API.”
  • Relationship between CDR and custom controllers is described by the “Operator pattern”
  • What is the operator pattern?
    • “Operator” code defines what a human operator would do when they are managing a particular app
    • operators are tightly coupled to the specific application
    • they should maximally utilize existing k8s resources for improved stability
    • “Operator” code is actually written in form of a custom controller
    • original article about the Operator pattern
    • as per this Operator typically is:

What might an Operator look like in more detail? Here’s an example in more detail:

  1. A custom resource named SampleDB, that you can configure into the cluster.
  2. A Deployment that makes sure a Pod is running that contains the controller part of the operator.
  3. A container image of the operator code.
  4. Controller code that queries the control plane to find out what SampleDB resources are configured.
  5. The core of the Operator is code to tell the API server how to make reality match the configured resources.
    • If you add a new SampleDB, the operator sets up PersistentVolumeClaims to provide durable database storage, a StatefulSet to run SampleDB and a Job to handle initial configuration.
    • If you delete it, the Operator takes a snapshot, then makes sure that the StatefulSet and Volumes are also removed.
  6. The operator also manages regular database backups. For each SampleDB resource, the operator determines when to create a Pod that can connect to the database and take backups. These Pods would rely on a ConfigMap and / or a Secret that has database connection details and credentials.
  7. Because the Operator aims to provide robust automation for the resource it manages, there would be additional supporting code. For this example, code checks to see if the database is running an old version and, if so, creates Job objects that upgrade it for you.

So how does all of that look in practice?

In general the operator pattern involves creating two components: a Custom Resource Definition and a controller which is going to handle its management.

How to create a CRD?

  • Creating CRD is relatively simple: you just need to specify another kubernetes resource - but this time it is a bit “meta” as the resource defines other resources
    • source
    • create a file with the definition, e.g.
      apiVersion: apiextensions.k8s.io/v1
      kind: CustomResourceDefinition
      metadata:
        # name must match the spec fields below, and be in the form: <plural>.<group>
        name: databases.stable.imaginedata.com
      spec:
        # group name to use for REST API: /apis/<group>/<version>
        group: stable.imaginedata.com
        # list of versions supported by this CustomResourceDefinition
        versions:
          - name: v1
            # Each version can be enabled/disabled by Served flag.
            served: true
            # One and only one version must be marked as the storage version.
            storage: true
            schema:
              openAPIV3Schema:
                type: object
                properties:
                  spec:
                    type: object
                    properties:
                      name:
                        type: string
                      port:
                        type: integer
                      storage-size:
                        type: string
                      image:
                        type: string
                      replicas:
                        type: integer
        # either Namespaced or Cluster
        scope: Namespaced
        names:
          # plural name to be used in the URL: /apis/<group>/<version>/<plural>
          plural: databases
          # singular name to be used as an alias on the CLI and for display
          singular: database
          # kind is normally the CamelCased singular type. Your resource manifests use this.
          kind: Database
          # shortNames allow shorter string to match your resource on the CLI
          shortNames:
          - db
      
  • You can install it in your cluster by simply running: kubectl apply -f crd.yml
  • Initially there will be no resources of that kind:
    $ kubectl get databases
    No resources found in default namespace.
    
  • You can create one by simply applying a resource definition:
    $ cat sample-object.yml
    apiVersion: "stable.imaginedata.co/v1"
    kind: Database
    metadata:
      name: my-first-crd-database
      namespace: default
    spec:
      name: name-of-database
      port: 5432
      storage-size: 50G
      image: super-great-docker-image
      replicas: 3
    $ kubectl apply -f sample-object.yml
    database.stable.imaginedata.co/my-first-crd-database created
    $ kubectl get db
    NAME                    AGE
    my-first-crd-database   5s
    

One additional topic which I have omitted was to create a structured schema with validation definitions. If you are developing a production system, you probably want to have a look at it.

How to implement the custom controller?

Finally we can get into implementing a custom controller. First, some general info about controllers.

A controller is a client of Kubernetes. When Kubernetes is the client and calls out to a remote service, it is called a Webhook. The remote service is called a Webhook Backend. Like Controllers, Webhooks do add a point of failure. source

There is a specific pattern for writing client programs that work well with Kubernetes called the Controller pattern. Controllers typically read an object's .spec, possibly do things, and then update the object's .status.. source

How is a controller associated with CRD? Controller watches constantly on the queue of resources of a given kind and reconciles the state of cluster to fulfill the requirements of the resource. It is up to the controller to schedule any required changes (e.g. introducing a new set of pods if we are scaling something).

Building a controller involves the following steps:

  • Generating code stubs (if you are working with a staticly typed language). For Java I have followed these steps
  • Implementing the logic (a good starting point is this)
  • Deploying the controller to kubernetes as a pod. The controller can run e.g. as a Deployment.

To write a controller you need to use one of the client libraries: list … or use one of the frameworks which are suppoed to make it easier :)

Below are links to some additional resources and examples I have been using.

How to sell stuff - advice from Clever CEO

Notes and observations from watching a lecture by Tyler Bosmeny - Clever CEO.

  • Sales is not what you see in movies (men in expensive suits etc).
  • Sales is about talking to users.
  • Being passionate and knowledgeable about the problem you are solving is extremely helpful while selling.
  • Sales is about creating a funnel. Stages:
    • Prospecting - finding who might be interested in the product
    • Conversations - you reach to people and check whether they are actually interested, if the product is a good match
    • Closing - get the committment
    • Revenue :)

Prospecting

  • figure out who will take the call
  • “you will hear a lot of ‘nos’”
  • innovators (people who might be willing to pay for your service) are just 2,5% of the population
  • you need to talk to a lot of people
  • best methods
    • reach out to your network
    • conferences
      • small gatherings of people who are potential users
      • figure out top conferences for your field
      • ask organizers for a list of people who are attending (GDPR????)
      • email them and explain what are you working on and that you would like to spend some time to have a chat (30mins?) and show them what you have and maybe they would be interested in this.
      • go to the conference, hopefully with 30min slots booked from morning till late evening
      • talk and sell :)
    • cold emails (GDPR????)
      • very effective if done well
      • short, to the point, personalized, tells why people should care
      • goal - get to stage 2 - a conversation

Conversations

Sales is about listening.

  • Shut up and listen.
  • Best sales people in the world listen a lot and ask questions
    • “Why did you even agree to have a call with me?”
    • “What is the problem you hoped to solve?”
    • “What would be the ideal solution if you could have anything?”

Closing

  • don’t fuss on the contract details
  • YC has a free contract template
  • for the first/two customers do whatever it takes, but in general do not do 1-off “please add this one feature” implementations
    • say instead: “sorry, but we can’t accomodate individual user requests, if we get feedback from more users this is missing, we will add this”
  • instead of 60-day free trial, do 1-year contract with 30-day cancellation
    • this is meeting customer in the middle: you let them cancel if they don’t like it, but by default they are your customers

Links:

Four principles of scalable Big Data systems

Below are my notes from listening to a podcast with Ian Gorton who talked about principles in design of systems which process massive amounts of data.

What are the key problems with the Big Data:

  • Volume of the data: there is a lot of it, storing it and providing resources for processing is a challenge.
  • Velocity of the data: how fast does it cahenge, the rate of change.
  • Variety of data types which need to be analyzed.
  • Veracity which is about data integrity, quality etc.

They all boil down to a single issue: very large, complex data sets are available now and it is not possible anymore to process them with traditional databases and data processing techniques.

My comment: in 2014 it was indeed true. I think that there is one caveat to this: Moore’s law. 4 years ago CPUs and memory were not as fast/not as optimized for parallel processing as of today. Some size of data which might be considered Big Data back then, right now could be handled by older technologies. This boundary is only going up. This means that for some slower organizations, actually moving to Big Data tech doesn’t make any sense, as they can solve the problem by just buying faster hardware instead of building a Hadoop/Spark cluster.

The processing systems need to be distributed and horizontally scalable (i.e. you can add capacity by adding more hosts of the same type instead of building a faster CPU).

Principles to follow building Big Data systems:

  • “You can’t scale your efforts and costs for building a big data system at the same rate that you scale the capacity of the system.” - if you estimate that within a year your system will be 4x bigger, you can’t expect to have 4x bigger team.
  • “The more complex solution, the less it will scale” - this one is more about making the right decisions on choosing technologies. If you choose something very complex - it will be very complex to scale.
  • “When you start to scale things, statefulness is a real problem because it’s really hard to balance the load of the stateful objects across the available servers” - it is difficult to handle failures, because loosing state and recreating it is hard. Using stateless objects is the way to go here.
  • Failure is inevitable - redundancy and resilience to failure is key. You have to account for failure and be ready for problems with many parts of the system.

I think one golden thought which may be easily overlooked is this:

“Hence, how do you know that your test cases and the actual code that you have built are going to work anymore? The answer is you do not, and you never will.”

If you are working with a Big Data system, you can never know how it will behave in production, because recreating the real conditions is too costly. This means that the only reliable and predictable way to build such a system is to introduce a feedback loop which will tell you if you haven’t broken anything as early as possible, what boils down to: continuous, in-depth monitoring of the infrastructure and using CI/CD in connection with techniques like blue/green deployment.

My observations:

  • It is important to do proper capacity planning and if not capacity planning, just estimation of data inflow in the future (e.g. year).
  • Key factor which allows to introduce efficiency and cut the costs of operating a large system is automation (e.g. instead of manually installing servers you need to automate this).
  • Simplicity allows for better understanding of what is happening in the system => this leads to better understanding bottlenecks and figuring out how to avoid them.

Link to the original talk and transcript.

Cassandra - first touch

What is Cassandra?

  • NoSQL data-storage engine
  • Stores information as key-value pairs
  • Has masterless architecture (no masters no slaves)
  • Analog of Amazon Dynamo DB or Google’s BigTable
  • Based on the famous Dynamo paper from Amazon and famous BigTable paper from Google (actually blends them)

Architecture

  • All nodes participate in a cluster
  • No single point of failure, shared nothing
  • Nodes are organized as a ring
  • Scales linearly (add more nodes == add more capacity or make the cluster faster)
  • Does asynchronous replication
  • Can work in muliple data centers in active-active mode
  • Uses consistent hashing to spread data over the cluster.
  • Data is replicated N times - N is the replication factor.
  • How replication works?
  • The client can set the consistency level (ONE, QUORUM, LOCAL_QUORUM, LOCAL_ONE, TWO, ALL)
  • Data can be accessed/inserted with Cassandra Query Language which looks a lot like SQL.
  • Cassandra is an OLTP system.
    • INSERT always does an overwrite (it doesn’t check if row of data exists already - just writes); however this can be forced with some other syntax.

What does a single node do?

  • Write:
    • first the data is dumped to commit log (append only - super fast)
    • then it updates the table in memory
    • this triggers an ACK sent to client
    • then the data is dumped to a file on a disk (again those are append only big files)
  • After dumping data to harddrive happened over a period of time, it is time for compaction (merge sort over files on disk, then we choose the records with the most recent timestamps and voila)
  • To which partition a row goes? Cassandra does an MD5 on the PK which gives a 128 bit number. Every node is responsible for handling a subset of the key pace. Because we always do the MD5 on primary key we always end up on the same node -> we have consistent hashing.

When to use it?

  • Collecting enormous amount of data
    • The amount of writes handled by the cluster scales linearly with the number of nodes in the cluster.
  • When a single box is not enough anymore
  • … and sharding is not good enough for scaling

Differences to HBase

  • A lot of similarities - they ultimately originate from BigTable paper.
  • Cassandra is easier to run (operationally - because all nodes are created equal)
  • HBase has advantages when you are doing operations which are sequential
  • Programmatically - CQL seems more expressive
  • MapReduce - there is some integration for Cassandra, HBase is a native because storage is in HDFS

Apache Airflow tricks

I will update this post from time to time with more learnings.

Problem: I fixed problem in my pipeline but airflow doesn’t see this.

Possible things you can do:

  • check if you actually did fix it :)
  • try to refresh the DAG through UI
  • remove *.pyc files from the dags directory
  • check if you have installed the dependencies in the right python environment (e.g. in the right version of system-wide python or in the right virtualenv)

Problem: This DAG isn’t available in the web server’s DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database.

  • Refresh the DAG code from the UI
  • Restart webserver - this did the trick in my case. Some people report that there might be a stalled gunicorn process. If restart doesn’t help, try to find rogue processes and kill them manually (source, source 2)

Problem: I want to delete a DAG

  • Airflow 1.10 has a command for this: airflow delete ...
  • Prior to 1.10, you can use following script:
import sys
from airflow.hooks.postgres_hook import PostgresHook

dag_input = sys.argv[1]
hook=PostgresHook( postgres_conn_id= "airflow_db")

for t in ["xcom", "task_instance", "sla_miss", "log", "job", "dag_run", "dag" ]:
    sql="delete from {} where dag_id='{}'".format(t, dag_input)
    hook.run(sql, True)

(tested: works like a charm :))

Source