Kubernetes Operator
===================

.. warning:: This is not authoritative documentation.  These features
   are not currently available in Zuul.  They may change significantly
   before final implementation, or may never be fully completed.

While Zuul can be happily deployed in a Kubernetes environment, it is
a complex enough system that a Kubernetes Operator could provide value
to deployers. A Zuul Operator would allow a deployer to create, manage
and operate "A Zuul" in their Kubernetes and leave the details of how
that works to the Operator.

To that end, the Zuul Project should create and maintain a Kubernetes
Operator for running Zuul. Given the close ties between Zuul and Ansible,
we should use `Ansible Operator`_ to implement the Operator. Our existing
community is already running Zuul in both Kubernetes and OpenShift, so
we should ensure our Operator works in both. When we're happy with it,
we should publish it to `OperatorHub`_.

That's the easy part. The remainder of the document is for hammering out
some of the finer details.

.. _Ansible Operator: https://github.com/operator-framework/operator-sdk/blob/master/doc/ansible/user-guide.md
.. _OperatorHub: https://www.operatorhub.io/

Custom Resource Definitions
---------------------------

One of the key parts of making an Operator is to define one or more
Custom Resource Definition (CRD). These allow a user to say "hey k8s,
please give me a Thing". It is then the Operator's job to take the
appropriate actions to make sure the Thing exists.

For Zuul, there should definitely be a Zuul CRD. It should be namespaced
with ``zuul-ci.org``. There should be a section for each service for
managing service config as well as capacity:

::

  apiVersion: zuul-ci.org/v1alpha1
  kind: Zuul
  spec:
    merger:
      count: 5
    executor:
      count: 5
    web:
      count: 1
    fingergw:
      count: 1
    scheduler:
      count: 1

.. note:: Until the distributed scheduler exists in the underlying Zuul
    implementation, the ``count`` parameter for the scheduler service
    cannot be set to anything greater than 1.

Zuul requires Nodepool to operate. While there are friendly people
using Nodepool without Zuul, from the context of the Operator, the Nodepool
services should just be considered part of Zuul.

::

  apiVersion: zuul-ci.org/v1alpha1
  kind: Zuul
  spec:
    merger:
      count: 5
    executor:
      count: 5
    web:
      count: 1
    fingergw:
      count: 1
    scheduler:
      count: 1
    # Because of nodepool config sharding, count is not valid for launcher.
    launcher:
    builder:
      count: 2


Images
------

The Operator should, by default, use the ``docker.io/zuul`` images that
are published. To support locally built or overridden images, the Operator
should have optional config settings for each image.

::

  apiVersion: zuul-ci.org/v1alpha1
  kind: Zuul
  spec:
    merger:
      count: 5
      image: docker.io/example/zuul-merger
    executor:
      count: 5
    web:
      count: 1
    fingergw:
      count: 1
    scheduler:
      count: 1
    launcher:
    builder:
      count: 2

External Dependencies
---------------------

Zuul needs some services, such as a RDBMS and a Zookeeper, that themselves
are resources that should or could be managed by an Operator. It is out of
scope (and inappropriate) for Zuul to provide these itself. Instead, the Zuul
Operator should use CRDs provided by other Operators.

On Kubernetes installs that support the Operator Lifecycle Manager, external
dependencies can be declared in the Zuul Operator's OLM metadata. However,
not all Kubernetes installs can handle this, so it should also be possible
for a deployer to manually install a list of documented operators and CRD
definitions before installing the Zuul Operator.

For each external service dependency where the Zuul Operator would be relying
on another Operator to create and manage the given service, there should be
a config override setting to allow a deployer to say "I already have one of
these that's located at Location, please don't create one." The config setting
should be the location and connection information for the externally managed
version of the service, and not providing that information should be taken
to mean the Zuul Operator should create and manage the resource.

::

  ---
  apiVersion: v1
  kind: Secret
  metadata:
    name: externalDatabase
  type: Opaque
  stringData:
    dburi: mysql+pymysql://zuul:password@db.example.com/zuul
  ---
  apiVersion: zuul-ci.org/v1alpha1
  kind: Zuul
  spec:
    # If the database section is omitted, the Zuul Operator will create
    # and manage the database.
    database:
      secretName: externalDatabase
      key: dburi

While Zuul supports multiple backends for RDBMS, the Zuul Operator should not
attempt to support managing both. If the user chooses to let the Zuul Operator
create and manage RDBMS, the `Percona XtraDB Cluster Operator`_ should be
used. Deployers who wish to use a different one should use the config override
setting pointing to the DB location.

.. _Percona XtraDB Cluster Operator: https://operatorhub.io/operator/percona-xtradb-cluster-operator

Zuul Config
-----------

Zuul config files that do not contain information that the Operator needs to
do its job, or that do not contain information into which the Operator might
need to add data, should be handled by ConfigMap resources and not as
parts of the CRD. The CRD should take references to the ConfigMap objects.

Completely external files like ``clouds.yaml`` and ``kube/config``
should be in Secrets referenced in the config. Zuul files like
``nodepool.yaml`` and ``main.yaml`` that contain no information the Operator
needs should be in ConfigMaps and referenced.

::

  apiVersion: zuul-ci.org/v1alpha1
  kind: Zuul
  spec:
    merger:
      count: 5
    executor:
      count: 5
    web:
      count: 1
    fingergw:
      count: 1
    scheduler:
      count: 1
      config: zuulYamlConfig
    launcher:
      config: nodepoolYamlConfig
    builder:
      config: nodepoolYamlConfig
    externalConfig:
      openstack:
        secretName: cloudsYaml
      kubernetes:
        secretName: kubeConfig
      amazon:
        secretName: botoConfig

Zuul files like ``/etc/nodepool/secure.conf`` and ``/etc/zuul/zuul.conf``
should be managed by the Operator and their options should be represented in
the CRD.

The Operator will shard the Nodepool config by provider-region using a utility
pod and create a new ConfigMap for each provider-region with only the subset of
config needed for that provider-region. It will then create a pod for each
provider-region.

Because the Operator needs to make decisions based on what's going on with
the ``zuul.conf``, or needs to directly manage some of it on behalf of the
deployer (such as RDBMS and Zookeeper connection info), the ``zuul.conf``
file should be managed by and expressed in the CRD.

Connections should each have a stanza that is mostly a passthrough
representation of what would go in the corresponding section of ``zuul.conf``.

Due to the nature of secrets in kubernetes, fields that would normally contain
either a secret string or a path to a file containing secret information
should instead take the name of a kubernetes secret and the key name of the
data in that secret that the deployer will have previously defined. The
Operator will use this information to mount the appropriate secrets into a
utility container, construct appropriate config files for each service,
reupload those into kubernetes as additional secrets, and then mount the
config secrets and the needed secrets containing file content only in the
pods that need them.

::

  ---
  apiVersion: v1
  kind: Secret
  metadata:
    name: gerritSecrets
  type: Opaque
  data:
    sshkey: YWRtaW4=
    http_password: c2VjcmV0Cg==
  ---
  apiVersion: v1
  kind: Secret
  metadata:
    name: githubSecrets
  type: Opaque
  data:
    app_key: aRnwpen=
    webhook_token: an5PnoMrlw==
  ---
  apiVersion: v1
  kind: Secret
  metadata:
    name: pagureSecrets
  type: Opaque
  data:
    api_token: Tmf9fic=
  ---
  apiVersion: v1
  kind: Secret
  metadata:
    name: smtpSecrets
  type: Opaque
  data:
    password: orRn3V0Gwm==
  ---
  apiVersion: v1
  kind: Secret
  metadata:
    name: mqttSecrets
  type: Opaque
  data:
    password: YWQ4QTlPO2FpCg==
    ca_certs: PVdweTgzT3l5Cg==
    certfile: M21hWF95eTRXCg==
    keyfile: JnhlMElpNFVsCg==
  ---
  apiVersion: zuul-ci.org/v1alpha1
  kind: Zuul
  spec:
    merger:
      count: 5
      git_user_email: zuul@example.org
      git_user_name: Example Zuul
    executor:
      count: 5
      manage_ansible: false
    web:
      count: 1
      status_url: https://zuul.example.org
    fingergw:
      count: 1
    scheduler:
      count: 1
    connections:
      gerrit:
        driver: gerrit
        server: gerrit.example.com
        sshkey:
          # If the key name in the secret matches the connection key name,
          # it can be omitted.
          secretName: gerritSecrets
        password:
          secretName: gerritSecrets
          # If they do not match, the key must be specified.
          key: http_password
        user: zuul
        baseurl: http://gerrit.example.com:8080
        auth_type: basic
      github:
        driver: github
        app_key:
          secretName: githubSecrets
          key: app_key
        webhook_token:
          secretName: githubSecrets
          key: webhook_token
        rate_limit_logging: false
        app_id: 1234
      pagure:
        driver: pagure
        api_token:
          secretName: pagureSecrets
          key: api_token
      smtp:
        driver: smtp
        server: smtp.example.com
        port: 25
        default_from: zuul@example.com
        default_to: zuul.reports@example.com
        user: zuul
        password:
          secretName: smtpSecrets
      mqtt:
        driver: mqtt
        server: mqtt.example.com
        user: zuul
        password:
          secretName: mqttSecrets
        ca_certs:
          secretName: mqttSecrets
        certfile:
          secretName: mqttSecrets
        keyfile:
          secretName: mqttSecrets

Executor job volume
-------------------

To manage the executor job volumes, the CR also accepts a list of volumes
to be bind mounted in the job bubblewrap contexts:

::

  name: Text
  context: <trusted | untrusted>
  access: <ro | rw>
  path: /path
  volume: Kubernetes.Volume


For example, to expose a GCP authdaemon token, the Zuul CR can be defined as

::

  apiVersion: zuul-ci.org/v1alpha1
  kind: Zuul
  spec:
    ...
    jobVolumes:
      - context: trusted
        access: ro
        path: /authdaemon/token
        volume:
          name: gcp-auth
          hostPath:
            path: /var/authdaemon/executor
            type: DirectoryOrCreate

Which would result in a new executor mountpath along with this zuul.conf change:

::

   trusted_ro_paths=/authdaemon/token


Logging
-------

By default, the Zuul Operator should perform no logging config which should
result in Zuul using its default of logging to ``INFO``. There should be a
simple config option to switch that to enable ``DEBUG`` logging. There should
also be an option to allow specifying a named ``ConfigMap`` with a logging
config. If a logging config ``ConfigMap`` is given, it should override the
``DEBUG`` flag.