Setting up elasticsearch with Operator Pattern

Photo by Mick Haupt on Unsplash

Setting up elasticsearch with Operator Pattern

ElasticSearch H/A setup using ECK operator in EKS

Overview

This guide provides the information for deploying/managing highly available Elastic search setup in EKS using Elastic Cloud on Kubernetes Operator. In this, we will see how we can deploy the Elasticsearch version 8.8.1 using the ECK Operator.

The major source of information for this guide is the official ECK docs.

Why ECK Operator?

We will be deploying/managing Elasticsearch using the Elastic Cloud on Kubernetes Operator which is the latest recommended approach and managed by the Elastic team, the community itself. Please find more details here.

Architecture

The current setup will create an elastic search with 3 master nodes and 2 data nodes and make them accessible via LoadBalancer in AWS EKS. It tries to distribute the elastic search nodes evenly among the existing k8s worker nodes.

Pre-requisite

EKS Cluster ☸️

The setup assumes that EKS cluster is up and running and has enough resources to provision the ES Cluster. The current setup creates 3 master nodes and 2 data nodes which require approx 2GB memory and 1 vCPU per ElasticSearch node.

Install ECK CRDs ☸️

Install custom resources for provisioning the kind: ElasticSearch resource. It creates other Custom Resources supported by ECK as well but can be ignored. Command

kubectl create -f https://download.elastic.co/downloads/eck/2.8.0/crds.yaml

Output

customresourcedefinition.apiextensions.k8s.io/agents.agent.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/apmservers.apm.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/beats.beat.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/elasticmapsservers.maps.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/elasticsearchautoscalers.autoscaling.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/elasticsearches.elasticsearch.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/enterprisesearches.enterprisesearch.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/kibanas.kibana.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/logstashes.logstash.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/stackconfigpolicies.stackconfigpolicy.k8s.elastic.co created

Install Operator with its RBAC rules ☸️

The below command installs operator and required roles to manage various operations. By default, it creates an elastic-system namespace and deploys resources under it. Command

kubectl apply -f https://download.elastic.co/downloads/eck/2.8.0/operator.yaml

Output

namespace/elastic-system created
serviceaccount/elastic-operator created
secret/elastic-webhook-server-cert created
configmap/elastic-operator created
clusterrole.rbac.authorization.k8s.io/elastic-operator created
clusterrole.rbac.authorization.k8s.io/elastic-operator-view created
clusterrole.rbac.authorization.k8s.io/elastic-operator-edit created
clusterrolebinding.rbac.authorization.k8s.io/elastic-operator created
service/elastic-webhook-server created
statefulset.apps/elastic-operator created
validatingwebhookconfiguration.admissionregistration.k8s.io/elastic-webhook.k8s.elastic.co created

Verify ☑️

Verify if the operator is up and running by looking at the logs. Command

kubectl -n elastic-system logs -f statefulset.apps/elastic-operator

Output

{"log.level":"info","@timestamp":"2023-06-19T04:38:33.355Z","log.logger":"manager.eck-operator","message":"Starting EventSource","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","controller":"beat-controller","source":"kind source: *v1.Secret"}
{"log.level":"info","@timestamp":"2023-06-19T04:38:33.355Z","log.logger":"manager.eck-operator","message":"Starting EventSource","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","controller":"beat-controller","source":"kind source: *v1.Secret"}
{"log.level":"info","@timestamp":"2023-06-19T04:38:33.355Z","log.logger":"manager.eck-operator","message":"Starting Controller","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","controller":"beat-controller"}
...

Storage Class (Optional) ☸️

This is an optional step as the elasticsearch.yaml manifest is configured to use the default gp2 storage class. If you want to override this behavior you can create your custom storage class and update its name in the elasticsearch.yaml file.

NOTE: Make sure the EBS CSI Plugin is active and the EKS Nodes have permission to manage volumes on behalf of the provisioner. Please include the following policy if missing.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:AttachVolume",
        "ec2:CreateSnapshot",
        "ec2:CreateTags",
        "ec2:CreateVolume",
        "ec2:DeleteSnapshot",
        "ec2:DeleteTags",
        "ec2:DeleteVolume",
        "ec2:DescribeAvailabilityZones",
        "ec2:DescribeInstances",
        "ec2:DescribeSnapshots",
        "ec2:DescribeTags",
        "ec2:DescribeVolumes",
        "ec2:DescribeVolumesModifications",
        "ec2:DetachVolume",
        "ec2:ModifyVolume"
      ],
      "Resource": "*"
    }
  ]
}

Verify ☑️

Verify if the EBS CSI controller is actively running. Command

kubectl get pods -n kube-system -lapp=ebs-csi-controller
``
Output
```shell
NAME                                 READY   STATUS    RESTARTS   AGE
ebs-csi-controller-6876d9b86-d88kq   6/6     Running   0          5m49s
ebs-csi-controller-6876d9b86-t47wn   6/6     Running   0          5m49s

Deploy Elasticsearch ☸️

Now as our pre-requisite is met we can deploy the elastic search in the cluster using elasticsearch.yaml template

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: esearch
spec:
  version: 8.8.1
  http:
    service:
      spec:
        type: LoadBalancer
    tls:
      selfSignedCertificate:
        subjectAltNames:
        - dns: localhost
        - dns: es.esearch.app
  #link: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-update-strategy.html
  updateStrategy:
    changeBudget:
      maxSurge: 1
      maxUnavailable: 1
  #link: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-pod-disruption-budget.html
  podDisruptionBudget:
    spec:
      minAvailable: 4
      selector:
        matchLabels:
          elasticsearch.k8s.elastic.co/cluster-name: esearch
  #Behind the scenes, ECK translates each NodeSet specified in the Elasticsearch resource into a StatefulSet in Kubernetes
  #Upgrade Patterns:  #Behind the scenes, ECK translates each NodeSet specified in the Elasticsearch resource into a StatefulSet in Kubernetes
  nodeSets:
  - name: masters
    count: 3
    config:
      node.roles: ["master"]
      # node.store.allow_mmap: false
      xpack.ml.enabled: true
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: gp2
    podTemplate:
      metadata:
        labels:
          app: elasticsearch
      spec:
        terminationGracePeriodSeconds: 150
      #link: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-advanced-node-scheduling.html#k8s-affinity-options
        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    elasticsearch.k8s.elastic.co/cluster-name: esearch
                topologyKey: kubernetes.io/hostname
        #https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-virtual-memory.html#k8s_using_an_init_container_to_set_virtual_memory
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
            runAsUser: 0
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        containers:
        - name: elasticsearch
          resources:
            limits:
              memory: 3Gi
            requests:
              memory: 2Gi
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms1g -Xmx1g"
  - name: data
    count: 2
    config:
      node.roles: ["data"]
      # node.store.allow_mmap: false
      xpack.ml.enabled: true
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: gp2
    podTemplate:
      metadata:
        labels:
          app: elasticsearch
      spec:
        terminationGracePeriodSeconds: 150
        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    elasticsearch.k8s.elastic.co/cluster-name: esearch
                topologyKey: kubernetes.io/hostname
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
            runAsUser: 0
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        containers:
        - name: elasticsearch
          resources:
            limits:
              memory: 3Gi
            requests:
              memory: 2Gi
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms1g -Xmx1g"

Apply ⤵️

k apply -f elasticsearch.yaml

Verify ☑️

Verify by listing the Elasticsearch resources

Initial Status

#When you create the cluster, there is unknown HEALTH status and the PHASE is ApplyingChanges. After a while, the PHASE turns into Ready, and HEALTH becomes green.
➜ kubectl get es  
NAME     HEALTH    NODES   VERSION   PHASE             AGE
esearch   unknown           8.8.1     ApplyingChanges   22s

➜ kubectl get pods
NAME                  READY   STATUS     RESTARTS   AGE
esearch-es-data-0      0/1     Init:0/3   0          23s
esearch-es-data-1      0/1     Init:0/3   0          23s
esearch-es-masters-0   0/1     Init:0/3   0          23s
esearch-es-masters-1   0/1     Init:0/3   0          23s
esearch-es-masters-2   0/1     Init:0/3   0          23s

➜ kubectl get sts 
NAME                READY   AGE
esearch-es-data      0/2     28s
esearch-es-masters   0/3     29s
➜  elastic-search git:(main) ✗

Eventual Status

➜ kubectl get es
NAME     HEALTH   NODES   VERSION   PHASE   AGE
esearch   green    5       8.8.1     Ready   3m5s

➜ kubectl get pods
NAME                  READY   STATUS    RESTARTS   AGE
esearch-es-data-0      1/1     Running   0          3m10s
esearch-es-data-1      1/1     Running   0          3m10s
esearch-es-masters-0   1/1     Running   0          3m10s
esearch-es-masters-1   1/1     Running   0          3m10s
esearch-es-masters-2   1/1     Running   0          3m10s

➜ kubectl get sts
NAME                READY   AGE
esearch-es-data      2/2     3m16s
esearch-es-masters   3/3     3m17s

Accessing 🌍

To access the cluster properly you will need es endpoint, es password and the ca certificate. Please use the below commands to get the information:

#Set Elasticsearch name
export ES_NAME=esearch

#Get Service Public endpoint
ENDPOINT=$(kubectl get svc ${ES_NAME}-es-http -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

#Get ElasticSearch User Password
PASSWORD=$(kubectl get secret ${ES_NAME}-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')

#Get ES CA Certificate
kubectl get secret ${ES_NAME}-es-http-certs-public -o go-template='{{index .data "tls.crt" | base64decode }}' > ca.cert

Verify ☑️

Once we have the endpoint, certificates and password we can not connect with Elasticsearch cluster using https. Please use the below verification command to see if everything is set properly. Command

curl -X GET --cacert ca.cert -u "elastic:${PASSWORD}" "https://${ENDPOINT}:9200"

Output

{
  "name" : "esearch-es-masters-2",
  "cluster_name" : "esearch",
  "cluster_uuid" : "jygZP_KOS_6ELM5WjSe9cQ",
  "version" : {
    "number" : "8.8.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "f8edfccba429b6477927a7c1ce1bc6729521305e",
    "build_date" : "2023-06-05T21:32:25.188464208Z",
    "build_snapshot" : false,
    "lucene_version" : "9.6.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

Cleanup ❌

If we need to revert all the changes we can do it in the following order.

kubectl delete -f elasticsearch.yaml
kubectl delete -f https://download.elastic.co/downloads/eck/2.8.0/operator.yaml
kubectl delete -f https://download.elastic.co/downloads/eck/2.8.0/crds.yaml

Hurray 🎉 This concludes the setup of the elastic search using the ECK operator. For any further details and customization please follow the details section below.

Details on elasticsearch manifest

There are multiple choices and configuration settings that we made when defining the elasticsearch.yaml manifest. Please find details about them in the section below:

1. Update/Upgrade Resiliency

While running any kind of updates/upgrades we want to make sure we are not under the capacity or overcapacity. maxsurge specifies a maximum addition number of nodes than our existing pool of es nodes, this will have us prevent excessive utilization of resources during the upgrades. maxUnavailable specifies a minimum number of nodes that can be available at a time.

...
updateStrategy:
    changeBudget:
      maxSurge: 1 #creates only one new node at a time.
      maxUnavailable: 1 #using 1 because we only have 2 data nodes. Can be increased if we have more data/master nodes. Low value increases stability but increases deployment time i.e we need to find the right balance.
...

2. Pod Disruption Budget

During the upgrades, maintenance and failure of any nodes we still want to maintain the pool of the nodes such that the es cluster continues to serve the traffic. We are using podDistributionBudget where minAvailable is set to 4 and the selector matches the es nodes.

...
podDisruptionBudget:
  spec:
    minAvailable: 4 #using 4 as we have 3 master nodes and 2 data nodes.
    selector:
      matchLabels:
        elasticsearch.k8s.elastic.co/cluster-name: "{{.metadata.name}}"
...

3. Uniform ES nodes distribution

The config tries to ensure the es nodes are distributed uniformly across the worker nodes for better availability and resiliency using topologyKey: kubernetes.io/hostname.

...
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution: #using preferred instead of required as there always won't be unique k8s node per es node and using required might make es nodes unschedulable
    - weight: 100 #give high priority for this rule
      podAffinityTerm:
        labelSelector:
          matchLabels:
            elasticsearch.k8s.elastic.co/cluster-name: "{{.metadata.name}}"
        topologyKey: kubernetes.io/hostname #this ensures distribution of es nodes based on the hostname
...

4. ES Node Roles/Counts

For this scenario the configuration specifies 3 master node(for minimal H/A) and 2 data nodes(for minimal H/A). For the sake of simplicity other roles are ignored.

5. Memory mapping configuration

Default values for virtual address space on Linux distributions are too low for Elasticsearch to work properly, which may result in out-of-memory exceptions. For production workloads, it is strongly recommended to increase the kernel setting vm.max_map_count to 262144.

6. Storage Class

The storage class used in the volumeClaimTemplates template is storageClassName: gp2 default storage class created when EKS cluster is created which can be changed if you have different requirements for different roles.

7. JVM flags

JVM flags can be overridden using the environment variables and this can largely impact the performance of the es cluster overall. It should be different for different roles for now the basic configuration is JVM start memory: 1GB and JVM max: 1GB.

...
env:
  - name: ES_JAVA_OPTS
    value: "-Xms1g -Xmx1g"
...

8. Termination Period

The termination grace period for pod is used to give enough time to gracefully exit the es node during any kinds of upgrades and maintenance tasks. Current scenario specific 150 which should be enough but can highly differ based on the node type and size of traffic/data.

...
spec:
  terminationGracePeriodSeconds: 300
...

Further configuration and details 📖

Many more settings can be configured to tune the cluster setup to our needs. Please find more details in this API reference.