Taming the herd: using Zookeeper and Exhibitor on Google Container Engine

Zookeeper is the cornerstone of many distributed systems, maintaining configuration information, naming and providing distributed synchronization and group services. But using Zookeeper in a dynamic cloud environment can be challenging, because of the way it keeps track of members in the cluster. Luckily, there’s an open source package called Exhibitor that makes it simple to use Zookeeper with Google Container Engine.

With Zookeeper, it’s easy to implement the primitives for coordinating hosts, such as locking and leader election. To provide these features in a highly available manner, Zookeeper uses a clustering mechanism that requires a majority of the cluster members to acknowledge a change before the cluster’s state is committed. In a cloud environment, however, Zookeeper machines come and go from the cluster (or “ensemble” in Zookeeper terms), changing names and IP addresses, and losing state about the ensemble. These sorts of ephemeral cloud instances are not well suited to Zookeeper, which requires all hosts to know the addresses or hostnames of the other hosts in the ensemble.

Netflix tackled this issue, and in order to more easily configure, operate, and debug Zookeeper in cloud environments, created Exhibitor, a supervisor process that coordinates the configuration and execution of Zookeeper processes across many hosts. Exhibitor also provides the following features for Zookeeper operators and users:
  • Backup and restore
  • A lightweight GUI for Zookeeper nodes
  • A REST API for getting the state of the Zookeeper ensemble
  • Rolling updates of configuration changes
Let’s take a look at using Exhibitor in Container Engine in order to run Zookeeper as a shared service for our other applications running in a Kubernetes cluster.

To start, provision a shared file server to host the shared configuration file between your cluster hosts. The easiest way to get a file server up and running in Google Cloud Platform is to create a Single Node File Server using Google Cloud Launcher. For horizontally scalable and highly available shared filesystem options on Cloud Platform, have a look at Red Hat Gluster Storage and Avere. The resulting file server will expose both NFS and SMB file shares.
  1. Provision your file server
  2. Select the Cloud Platform project in which you’d like to launch it
  3. Choose the following parameters for the dimensions of your file server
    • Zone — must match the zone of your Container Engine cluster (to be provisioned later)
    • Machine type — for this tutorial we chose a n1-standard-1 as throughput will not be very high for our hosted files
    • Storage disk size — for this tutorial, we chose a 100GB disk as we will not need high throughput nor IOPS
  4. Click Deploy
Next, create a Container Engine cluster in which to deploy your Zookeeper ensemble.
  1. Create a new Kubernetes cluster using Google Container Engine
  2. Ensure that the zone setting is the same as you use to deploy your file server (for the purposes of this tutorial, leave the defaults for the other settings)
  3. Click Create
Once your cluster is created, open up the Cloud Shell in the Cloud Console by
clicking the button in the top right of the interface.  It looks like:
Now set up your Zookeeper ensemble in Kubernetes.
    1. Set the default zone for the gcloud CLI to the zone you created for your file server and Kubernetes cluster:
    2. gcloud config set compute/zone us-east1-d
    3. Download your Kubernetes credentials via the Cloud SDK:
    4. gcloud container clusters get-credentials cluster-1
    5. Create a file named exhibitor.yaml with the following content. This will define our Exhibitor deployment as well as the service that other applications can use to communicate with it:
    6. apiVersion: extensions/v1beta1
      kind: Deployment
        name: exhibitor
       replicas: 3
             name: exhibitor
          - image: mbabineau/zookeeper-exhibitor
          imagePullPolicy: Always
          name: exhibitor
            - name: nfs
            mountPath: "/opt/zookeeper/local_configs"
              port: 8181
            initialDelaySeconds: 60
            timeoutSeconds: 1
              path: /exhibitor/v1/cluster/4ltr/ruok
              port: 8181
            initialDelaySeconds: 120
            timeoutSeconds: 1
           - name: HOSTNAME
                 fieldPath: status.podIP
          - name: nfs
              server: singlefs-1-vm
              path: /data
      apiVersion: v1
      kind: Service
        name: exhibitor
          name: exhibitor
        - port: 2181
          protocol: TCP
          name: zk
        - port: 8181
          protocol: TCP
          name: api
          name: exhibitor
      In this manifest we’re configuring the NFS volume to be attached on each of the pods and mounted in the folder where Exhibitor expects to find its shared configuration.
    7. Create the artifacts in that manifest with the kubectl CLI
    8. kubectl apply -f exhibitor.yaml
    9. Monitor the pods until they all enter the RUNNING state
    10. kubectl get pods -w
    11. Run the kubectl proxy command so that you can access the Exhibitor REST API for the Zookeeper state:
    12. kubectl proxy &
    13. Query the Exhibitor API using curl, then use jq to format the JSON response to be more human-readable
    14. export PROXY_URL=http://localhost:8001/api/v1/proxy/
      export EXHIBITOR_URL=${PROXY_URL}namespaces/default/services/exhibitor:8181
      export STATUS_URL=${EXHIBITOR_URL}/exhibitor/v1/cluster/status
      export STATUS=`curl -s $STATUS_URL`
      echo $STATUS | jq '.[] | {hostname: .hostname, leader: .isLeader}'
    15. After a few minutes, the cluster state will have settled and elected a stable leader.
    You have now completed the tutorial and can use your Exhibitor/Zookeeper cluster just like any other Kubernetes service, by accessing its exposed ports (2181 for Zookeeper and 8181 for Exhibitor) and addressing it via its DNS name, or by using environment variables.

    In order to access the Exhibitor web UI, run the kubectl proxy command on a machine with a browser, then visit this page.

    To scale up your cluster size, simply edit the exhibitor.yaml file and change the replicas to an odd number greater than 3 (e.g., 5), then run “kubectl apply -f exhibitor.yaml” again. Cause havoc in the cluster by killing pods and nodes in order to see how it responds to failures.