Implementing Chaos Engineering on Redis with LitmusChaos: Simulating Leader Pod Failures

Introduction

In today's distributed systems, ensuring the resilience of stateful applications like Redis is paramount. Chaos engineering provides a proactive approach to identify potential weaknesses by intentionally introducing failures.

Prerequisites

Before diving into the chaos experiments, ensure you have the following:

Kubernetes Cluster: A running Kubernetes cluster (v1.20 or later).
kubectl: Command-line tool configured to interact with your cluster.
Helm: Package manager for Kubernetes applications.
LitmusChaos: Installed in your cluster. If not already installed, follow the LitmusChaos installation guide.
Redis Cluster: Deployed in your Kubernetes environment.

Understanding the Kubernetes Architecture

Before we start, it's essential to understand how Kubernetes manages workloads. Redis in Kubernetes typically runs as a StatefulSet, ensuring ordered and persistent pod management.

Step 1: Deploying a Redis Cluster

For this tutorial, we'll deploy a Redis cluster using the Bitnami Helm chart.

1. Add the Bitnami Repository

helm repo add bitnami https://charts.bitnami.com/bitnami

2. Install Redis Cluster

helm install redis-cluster bitnami/redis-cluster

This command deploys a Redis cluster with default configurations.

3. Verify the Redis Cluster Deployment

kubectl get pods

Expected output:

NAME                  READY   STATUS    RESTARTS   AGE
redis-cluster-0       1/1     Running   0          2m
redis-cluster-1       1/1     Running   0          2m
redis-cluster-2       1/1     Running   0          2m

Step 2: Installing LitmusChaos

If LitmusChaos isn't already installed in your cluster, proceed with the following steps:

1. Add Litmus Helm Repository

helm repo add litmuschaos https://litmuschaos.github.io/litmus-helm/

2. Install LitmusChaos

helm install litmus litmuschaos/litmus

This installs the LitmusChaos components in the litmus namespace.

Step 3: Understanding Chaos Experiments

LitmusChaos offers various predefined experiments. For simulating a leader pod failure in Redis, we'll use the pod-delete experiment.

The experiment will delete a Redis leader pod and observe how the system recovers.

Step 4: Configuring the Chaos Experiment

1. Create a Chaos Namespace

kubectl create namespace redis-chaos

2. Define the Chaos Experiment

Create a file named redis-pod-delete-experiment.yaml with the following content:

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosExperiment
metadata:
  name: pod-delete
  namespace: redis-chaos
spec:
  definition:
    scope: Namespaced
    permissions:
      - apiGroups: [""]
        resources: ["pods"]
        verbs: ["delete"]
    image: "litmuschaos/go-runner:latest"
    args:
      - -c
      - ./experiments/generic/pod_delete/pod_delete.test
    command:
      - /bin/bash

3. Apply the Chaos Experiment

kubectl apply -f redis-pod-delete-experiment.yaml

Step 5: Configuring the Chaos Engine

The ChaosEngine links the application under test (Redis) with the chaos experiment.

1. Create a ChaosEngine Definition

Create a file named redis-chaos-engine.yaml with the following content:

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: redis-chaos
  namespace: redis-chaos
spec:
  appinfo:
    appns: default
    applabel: "app.kubernetes.io/name=redis-cluster"
    appkind: StatefulSet
  chaosServiceAccount: litmus-admin
  experiments:
    - name: pod-delete
      spec:
        components:
          env:
            - name: TARGET_POD
              value: "redis-cluster-0"

2. Apply the ChaosEngine

kubectl apply -f redis-chaos-engine.yaml

Step 6: Running the Chaos Experiment

Once everything is set up, we can now initiate the chaos experiment.

Start the Experiment

kubectl annotate statefulset redis-cluster litmuschaos.io/chaos="true"
kubectl apply -f redis-chaos-engine.yaml

Observe the Experiment Execution

kubectl get pods -n redis-chaos

This command will show the status of the LitmusChaos experiment.

Monitor Logs
To monitor experiment execution in real-time:

kubectl logs -f <chaos-pod-name> -n redis-chaos

Step 7: Analyzing the Results

Once the experiment completes, observe the impact on the Redis cluster:

Was the leader pod successfully deleted?
Did the cluster automatically elect a new leader?
Were there any service disruptions?

Use kubectl describe pods and kubectl logs commands to analyze the system’s behavior.