Photo by Mick Haupt on Unsplash
Setting up elasticsearch with Operator Pattern
ElasticSearch H/A setup using ECK operator in EKS
Overview
This guide provides the information for deploying/managing highly available Elastic search setup in EKS using Elastic Cloud on Kubernetes Operator
. In this, we will see how we can deploy the Elasticsearch version 8.8.1
using the ECK Operator.
The major source of information for this guide is the official ECK docs.
Why ECK Operator?
We will be deploying/managing
Elasticsearch using the Elastic Cloud on Kubernetes Operator which is the latest recommended approach and managed by the Elastic team, the community itself. Please find more details here.
Architecture
The current setup will create an elastic search with 3 master nodes and 2 data nodes and make them accessible via LoadBalancer in AWS EKS. It tries to distribute the elastic search nodes evenly among the existing k8s worker nodes.
Pre-requisite
EKS Cluster ☸️
The setup assumes that EKS cluster is up and running and has enough resources to provision the ES Cluster. The current setup creates 3 master nodes
and 2 data nodes
which require approx 2GB
memory and 1 vCPU
per ElasticSearch node.
Install ECK CRDs ☸️
Install custom resources for provisioning the kind: ElasticSearch
resource. It creates other Custom Resources supported by ECK as well but can be ignored. Command
kubectl create -f https://download.elastic.co/downloads/eck/2.8.0/crds.yaml
Output
customresourcedefinition.apiextensions.k8s.io/agents.agent.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/apmservers.apm.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/beats.beat.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/elasticmapsservers.maps.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/elasticsearchautoscalers.autoscaling.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/elasticsearches.elasticsearch.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/enterprisesearches.enterprisesearch.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/kibanas.kibana.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/logstashes.logstash.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/stackconfigpolicies.stackconfigpolicy.k8s.elastic.co created
Install Operator with its RBAC rules ☸️
The below command installs operator and required roles to manage various operations. By default, it creates an elastic-system
namespace and deploys resources under it. Command
kubectl apply -f https://download.elastic.co/downloads/eck/2.8.0/operator.yaml
Output
namespace/elastic-system created
serviceaccount/elastic-operator created
secret/elastic-webhook-server-cert created
configmap/elastic-operator created
clusterrole.rbac.authorization.k8s.io/elastic-operator created
clusterrole.rbac.authorization.k8s.io/elastic-operator-view created
clusterrole.rbac.authorization.k8s.io/elastic-operator-edit created
clusterrolebinding.rbac.authorization.k8s.io/elastic-operator created
service/elastic-webhook-server created
statefulset.apps/elastic-operator created
validatingwebhookconfiguration.admissionregistration.k8s.io/elastic-webhook.k8s.elastic.co created
Verify ☑️
Verify if the operator is up and running by looking at the logs. Command
kubectl -n elastic-system logs -f statefulset.apps/elastic-operator
Output
{"log.level":"info","@timestamp":"2023-06-19T04:38:33.355Z","log.logger":"manager.eck-operator","message":"Starting EventSource","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","controller":"beat-controller","source":"kind source: *v1.Secret"}
{"log.level":"info","@timestamp":"2023-06-19T04:38:33.355Z","log.logger":"manager.eck-operator","message":"Starting EventSource","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","controller":"beat-controller","source":"kind source: *v1.Secret"}
{"log.level":"info","@timestamp":"2023-06-19T04:38:33.355Z","log.logger":"manager.eck-operator","message":"Starting Controller","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","controller":"beat-controller"}
...
Storage Class (Optional) ☸️
This is an optional step as the elasticsearch.yaml
manifest is configured to use the default gp2
storage class
. If you want to override this behavior you can create your custom storage class and update its name in the elasticsearch.yaml
file.
NOTE: Make sure the EBS CSI Plugin
is active and the EKS Nodes have permission to manage volumes on behalf of the provisioner. Please include the following policy if missing.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:AttachVolume",
"ec2:CreateSnapshot",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:DeleteSnapshot",
"ec2:DeleteTags",
"ec2:DeleteVolume",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstances",
"ec2:DescribeSnapshots",
"ec2:DescribeTags",
"ec2:DescribeVolumes",
"ec2:DescribeVolumesModifications",
"ec2:DetachVolume",
"ec2:ModifyVolume"
],
"Resource": "*"
}
]
}
Verify ☑️
Verify if the EBS CSI controller is actively running. Command
kubectl get pods -n kube-system -lapp=ebs-csi-controller
``
Output
```shell
NAME READY STATUS RESTARTS AGE
ebs-csi-controller-6876d9b86-d88kq 6/6 Running 0 5m49s
ebs-csi-controller-6876d9b86-t47wn 6/6 Running 0 5m49s
Deploy Elasticsearch ☸️
Now as our pre-requisite is met we can deploy the elastic search in the cluster using elasticsearch.yaml
template
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: esearch
spec:
version: 8.8.1
http:
service:
spec:
type: LoadBalancer
tls:
selfSignedCertificate:
subjectAltNames:
- dns: localhost
- dns: es.esearch.app
#link: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-update-strategy.html
updateStrategy:
changeBudget:
maxSurge: 1
maxUnavailable: 1
#link: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-pod-disruption-budget.html
podDisruptionBudget:
spec:
minAvailable: 4
selector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: esearch
#Behind the scenes, ECK translates each NodeSet specified in the Elasticsearch resource into a StatefulSet in Kubernetes
#Upgrade Patterns: #Behind the scenes, ECK translates each NodeSet specified in the Elasticsearch resource into a StatefulSet in Kubernetes
nodeSets:
- name: masters
count: 3
config:
node.roles: ["master"]
# node.store.allow_mmap: false
xpack.ml.enabled: true
volumeClaimTemplates:
- metadata:
name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: gp2
podTemplate:
metadata:
labels:
app: elasticsearch
spec:
terminationGracePeriodSeconds: 150
#link: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-advanced-node-scheduling.html#k8s-affinity-options
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: esearch
topologyKey: kubernetes.io/hostname
#https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-virtual-memory.html#k8s_using_an_init_container_to_set_virtual_memory
initContainers:
- name: sysctl
securityContext:
privileged: true
runAsUser: 0
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
resources:
limits:
memory: 3Gi
requests:
memory: 2Gi
env:
- name: ES_JAVA_OPTS
value: "-Xms1g -Xmx1g"
- name: data
count: 2
config:
node.roles: ["data"]
# node.store.allow_mmap: false
xpack.ml.enabled: true
volumeClaimTemplates:
- metadata:
name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: gp2
podTemplate:
metadata:
labels:
app: elasticsearch
spec:
terminationGracePeriodSeconds: 150
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: esearch
topologyKey: kubernetes.io/hostname
initContainers:
- name: sysctl
securityContext:
privileged: true
runAsUser: 0
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
resources:
limits:
memory: 3Gi
requests:
memory: 2Gi
env:
- name: ES_JAVA_OPTS
value: "-Xms1g -Xmx1g"
Apply ⤵️
k apply -f elasticsearch.yaml
Verify ☑️
Verify by listing the Elasticsearch resources
Initial Status
#When you create the cluster, there is unknown HEALTH status and the PHASE is ApplyingChanges. After a while, the PHASE turns into Ready, and HEALTH becomes green.
➜ kubectl get es
NAME HEALTH NODES VERSION PHASE AGE
esearch unknown 8.8.1 ApplyingChanges 22s
➜ kubectl get pods
NAME READY STATUS RESTARTS AGE
esearch-es-data-0 0/1 Init:0/3 0 23s
esearch-es-data-1 0/1 Init:0/3 0 23s
esearch-es-masters-0 0/1 Init:0/3 0 23s
esearch-es-masters-1 0/1 Init:0/3 0 23s
esearch-es-masters-2 0/1 Init:0/3 0 23s
➜ kubectl get sts
NAME READY AGE
esearch-es-data 0/2 28s
esearch-es-masters 0/3 29s
➜ elastic-search git:(main) ✗
Eventual Status
➜ kubectl get es
NAME HEALTH NODES VERSION PHASE AGE
esearch green 5 8.8.1 Ready 3m5s
➜ kubectl get pods
NAME READY STATUS RESTARTS AGE
esearch-es-data-0 1/1 Running 0 3m10s
esearch-es-data-1 1/1 Running 0 3m10s
esearch-es-masters-0 1/1 Running 0 3m10s
esearch-es-masters-1 1/1 Running 0 3m10s
esearch-es-masters-2 1/1 Running 0 3m10s
➜ kubectl get sts
NAME READY AGE
esearch-es-data 2/2 3m16s
esearch-es-masters 3/3 3m17s
Accessing 🌍
To access the cluster properly you will need es endpoint
, es password
and the ca certificate
. Please use the below commands to get the information:
#Set Elasticsearch name
export ES_NAME=esearch
#Get Service Public endpoint
ENDPOINT=$(kubectl get svc ${ES_NAME}-es-http -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
#Get ElasticSearch User Password
PASSWORD=$(kubectl get secret ${ES_NAME}-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
#Get ES CA Certificate
kubectl get secret ${ES_NAME}-es-http-certs-public -o go-template='{{index .data "tls.crt" | base64decode }}' > ca.cert
Verify ☑️
Once we have the endpoint, certificates and password we can not connect with Elasticsearch cluster using https
. Please use the below verification command to see if everything is set properly. Command
curl -X GET --cacert ca.cert -u "elastic:${PASSWORD}" "https://${ENDPOINT}:9200"
Output
{
"name" : "esearch-es-masters-2",
"cluster_name" : "esearch",
"cluster_uuid" : "jygZP_KOS_6ELM5WjSe9cQ",
"version" : {
"number" : "8.8.1",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "f8edfccba429b6477927a7c1ce1bc6729521305e",
"build_date" : "2023-06-05T21:32:25.188464208Z",
"build_snapshot" : false,
"lucene_version" : "9.6.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}
Cleanup ❌
If we need to revert all the changes we can do it in the following order.
kubectl delete -f elasticsearch.yaml
kubectl delete -f https://download.elastic.co/downloads/eck/2.8.0/operator.yaml
kubectl delete -f https://download.elastic.co/downloads/eck/2.8.0/crds.yaml
Hurray 🎉 This concludes the setup of the elastic search using the ECK operator. For any further details and customization please follow the details section below.
Details on elasticsearch manifest
There are multiple choices and configuration settings that we made when defining the elasticsearch.yaml
manifest. Please find details about them in the section below:
1. Update/Upgrade Resiliency
While running any kind of updates/upgrades we want to make sure we are not under the capacity or overcapacity. maxsurge
specifies a maximum addition number of nodes than our existing pool of es nodes, this will have us prevent excessive utilization of resources during the upgrades. maxUnavailable
specifies a minimum number of nodes that can be available at a time.
...
updateStrategy:
changeBudget:
maxSurge: 1 #creates only one new node at a time.
maxUnavailable: 1 #using 1 because we only have 2 data nodes. Can be increased if we have more data/master nodes. Low value increases stability but increases deployment time i.e we need to find the right balance.
...
2. Pod Disruption Budget
During the upgrades, maintenance and failure of any nodes we still want to maintain the pool of the nodes such that the es cluster continues to serve the traffic. We are using podDistributionBudget
where minAvailable
is set to 4
and the selector matches the es nodes.
...
podDisruptionBudget:
spec:
minAvailable: 4 #using 4 as we have 3 master nodes and 2 data nodes.
selector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: "{{.metadata.name}}"
...
3. Uniform ES nodes distribution
The config tries to ensure the es nodes are distributed uniformly across the worker nodes for better availability and resiliency using topologyKey:
kubernetes.io/hostname
.
...
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution: #using preferred instead of required as there always won't be unique k8s node per es node and using required might make es nodes unschedulable
- weight: 100 #give high priority for this rule
podAffinityTerm:
labelSelector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: "{{.metadata.name}}"
topologyKey: kubernetes.io/hostname #this ensures distribution of es nodes based on the hostname
...
4. ES Node Roles/Counts
For this scenario the configuration specifies 3 master node
(for minimal H/A) and 2 data nodes
(for minimal H/A). For the sake of simplicity other roles are ignored.
5. Memory mapping configuration
Default values for virtual address space on Linux distributions are too low for Elasticsearch to work properly, which may result in out-of-memory exceptions. For production workloads, it is strongly recommended to increase the kernel setting vm.max_map_count to 262144
.
6. Storage Class
The storage class used in the volumeClaimTemplates
template is storageClassName: gp2
default storage class created when EKS cluster is created which can be changed if you have different requirements for different roles.
7. JVM flags
JVM flags can be overridden using the environment variables and this can largely impact the performance of the es cluster overall. It should be different for different roles for now the basic configuration is JVM start memory: 1GB
and JVM max: 1GB
.
...
env:
- name: ES_JAVA_OPTS
value: "-Xms1g -Xmx1g"
...
8. Termination Period
The termination grace period for pod is used to give enough time to gracefully exit the es node during any kinds of upgrades and maintenance tasks. Current scenario specific 150
which should be enough but can highly differ based on the node type and size of traffic/data.
...
spec:
terminationGracePeriodSeconds: 300
...
Further configuration and details 📖
Many more settings can be configured to tune the cluster setup to our needs. Please find more details in this API reference.