Principles

Implement Chaos engineering principal for finding system failures

Pinterest LinkedIn Tumblr

Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system’s capability to withstand turbulent and unexpected conditions.

The software industry moves to the distributed computing environment for getting advantages like reliability, scalability, High performance, and more future.  Microservice architecture is one of the architectures build on top of distributed computing. The microservices require to communicate multiple services synchronously or asynchronously. The software testing team tests the components individually. When the product moves to the production system, it requires real-time load testing. The microservices give more flexibility for developing and deploying the application on the production system. If the microservices fail, the dependency component may not be able to work properly or due to the internal failure of the service. The product design must think about software failure and prepare the components and work based on dependency components availability.

The CHAOS ENGINEERING principles help to build the confidence level for moving the system to production. It helps to test the system in unpredictable failures in a systematic way. Chaos engineering principle introduced by Netflix. When the software developer and QA build the system, it can fail due to application failure, network failure, infrastructure failure or dependency failure. Netflix tool team created ‘Chaos Monkey’ which destroy the software components randomly to help the system behavior. The chaos engineering explained in the following steps.

  • Steady-state: the system behaves the normal way with some measurable metrics output ex throughput, error rates, latency.
  • Hypothesize state: the steady state will continue in both the control group and the experimental group.
  • Run experiment: Run the real word incidents example server crash, application crash, malformed responses, or traffic spikes
  • Improve: test the hypothesis by comparing the steady state of the control group and the experimental group

When weaknesses find in the product, the team addresses the issue and fix. This process continues to confirm the product behavior.

Chaos engineering helps to prevent system failure and helps to test products in a destructive environment. The product may fail due to network glitches, hard disk failure, overloading of any functional component, application crashes, etc. We may not be able to prevent and test everything. But the resilient testing helps to prevent such failures. Chaos Monkey is a tool that randomly disables our production instances to make sure we can survive this common type of failure without any customer impact. After the success of Chaos Monkey, Netflix developed many tools to identify and test the product. 

  • Latency Monkey
  • Conformity Monkey
  • Doctor Monkey
  • Janitor Monkey
  • Security Monkey
  • 10–18 Monkey
  • Chaos Gorilla

We can find more details about the Netflix tools from Netflix technical blog https://netflixtechblog.com/the-netflix-simian-army-16e57fbab116

Failure in the distributed system is unavoidable. The product should design for supporting the chaos engineering principles.  The chaos engineering expects to execute the failure in a systematic way and supported components resilient mechanism.

Chaos engineering can be used to achieve resilience against:

  • Infrastructure failures
  • Network failures
  • Application failures

Simple Use case

The product has two services that communicate with each other. The user accesses the web application and web application communicate to service A. Service A communicate with Service B and database. Does chaos engineering help to understand how the system behavior when service A goes down? Service B goes down? What about user communication? We might design the application design and code to handle such use cases. But We should validate the system before moving to the production system.

Game day

A Game Day is a dedicated day focused on using Chaos Engineering to reveal weaknesses in the product. The team attacks the system either manually or automated scripts to destruct the system. It focusses on building more resilient systems by breaking things on purpose. The game day may be planned or unplanned. The planned game day preplanned well and involve all the members including the DevOps, development, QA, and other management members required to approve the system. The product may involve multiple components example Application (Services), Infrastructure, database, and other dependency components. Each area can be tested using separate team members or automated scripts or tools. The main goal to disturb the current system and fix the issues before face the actual issues on the production system.  The team members connected through chat, conference, or physically in the conference room to communicate easily and fix the issues.

Tools

Netflix designed Chaos Monkey to test system stability.  Many open source and commercial tools available for testing the product resilient.

Chaos Monkey

As per Chaos Monkey documentation, Chaos Monkey is responsible for randomly terminating instances in production to ensure that engineers implement their services to be resilient to instance failures.  The chaos monkey helps to fail the system randomly. The chaos monkey requires Spinnaker and Requires MySQL. Spinnaker is a free and open-source continuous delivery software platform originally developed by Netflix. It helps to manage the application and deployment.  The MySQL require to store the daily termination schedule and to enforce a minimum time between terminations. Chaos Monkey doesn’t have any recovery tools and user interface.

https://netflix.github.io/chaosmonkey/

Chaos Toolkit

The Chaos Toolkit aims to be the simplest and easiest way to explore building your own Chaos Engineering Experiments. It builds on python and requires installing the required modules based on the user requirement. Chaos Toolkit drivers extend the toolkit to be able to cause chaos and probe different types of systems. It supports application, network, infrastructure drivers to test the products.  The user can develop own drivers and publish publicly. The Chaos Toolkit supports the containerized docker base image and can be used to test from docker images and Kubernetes. O’Reilly Learning Chaos Engineering by Russ Miles explained the example using chaos toolkit.

https://docs.chaostoolkit.org/

Chaos monkey spring boot

Chaos monkey spring boot project helps to fail the services, REST controller, controller, repository, and Component from spring boot.  A watcher is a Chaos Monkey for Spring Boot component, that will scan your app for a specific type of annotation. When the user adds Chaos monkey in the project, it enables using a spring boot profile. When the profile activates, it randomly fails the services. The user must add the following in the pom.xml file.

Once the application starts with the Chaos monkey profile, it fails randomly.  The Chaos monkey profile should not active in the production system mistakenly. The chaos monkey access using the following URI. It supports multiple service endpoints to enable and query the Chaos monkey properties.

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.2.7.RELEASE</version>
		<relativePath /> <!-- lookup parent from repository -->
	</parent>
	<groupId>com.learning</groupId>
	<artifactId>chaoswebservice</artifactId>
	<version>0.0.1</version>
	<name>chaoswebservice</name>
	<description>Demo project for Spring Boot</description>

	<properties>
		<java.version>1.8</java.version>
	</properties>

	<dependencies>



		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		<!-- https://mvnrepository.com/artifact/javax.json/javax.json-api -->
		<dependency>
			<groupId>javax.json</groupId>
			<artifactId>javax.json-api</artifactId>

		</dependency>

		<!-- https://mvnrepository.com/artifact/org.glassfish/javax.json -->
		<dependency>
			<groupId>org.glassfish</groupId>
			<artifactId>javax.json</artifactId>
			<version>1.1.4</version>
		</dependency>


		<!-- https://mvnrepository.com/artifact/de.codecentric/chaos-monkey-spring-boot -->
		<dependency>
			<groupId>de.codecentric</groupId>
			<artifactId>chaos-monkey-spring-boot</artifactId>
			<version>2.2.0</version>
		</dependency>



		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-actuator</artifactId>
		</dependency>

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
			<exclusions>
				<exclusion>
					<groupId>org.junit.vintage</groupId>
					<artifactId>junit-vintage-engine</artifactId>
				</exclusion>
			</exclusions>
		</dependency>

		<dependency>
			<groupId>io.gatling.highcharts</groupId>
			<artifactId>gatling-charts-highcharts</artifactId>
			<version>3.0.5</version>
			<scope>test</scope>
		</dependency>

		<!-- https://mvnrepository.com/artifact/io.gatling.highcharts/gatling-charts-highcharts -->
		<dependency>
			<groupId>io.gatling.highcharts</groupId>
			<artifactId>gatling-charts-highcharts</artifactId>
			<version>3.4.0-M1</version>
			<scope>test</scope>
		</dependency>



	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>

			<plugin>
				<artifactId>maven-jar-plugin</artifactId>
				<version>3.2.0</version>
			</plugin>
			

		</plugins>
	</build>

</project>

application.properties

#chaos monkey for spring boot props
management.endpoint.chaosmonkey.enabled=true
management.endpoint.chaosmonkeyjmx.enabled=true



# inlcude all endpoints
management.endpoints.web.exposure.include=*



spring.profiles.active=chaos-monkey
#Determine whether should execute or not
chaos.monkey.enabled=true
#How many requests are to be attacked. 1: attack each request; 5: each 5th request is attacked
chaos.monkey.assaults.level=1
#Minimum latency in ms added to the request
chaos.monkey.assaults.latencyRangeStart=3000
#Maximum latency in ms added to the request
chaos.monkey.assaults.latencyRangeEnd=15000
#Latency assault active
chaos.monkey.assaults.latencyActive=true
#Exception assault active
chaos.monkey.assaults.exceptionsActive=true
#AppKiller assault active
chaos.monkey.assaults.killApplicationActive=true
#Controller watcher active
chaos.monkey.watcher.controller=true
#RestController watcher active
chaos.monkey.watcher.restController=true
#Service watcher active
chaos.monkey.watcher.service=true
#Repository watcher active
chaos.monkey.watcher.repository=false
#Component watcher active
chaos.monkey.watcher.component=false

Simple Java

package com.careerdrill.learning;


import java.util.HashMap;
import java.util.Map;

import javax.json.Json;
import javax.json.JsonBuilderFactory;
import javax.json.JsonObject;

import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class SimpleController {

	@GetMapping("/names")
	public JsonObject getNames() {
		Map<String, Object> config = new HashMap<String, Object>();
        config.put("javax.json.stream.JsonGenerator.prettyPrinting", Boolean.valueOf(true));
        JsonBuilderFactory factory = Json.createBuilderFactory(config);
        
        JsonObject value = factory.createObjectBuilder()
        	    .add("firstName", "John")
        	    .add("lastName", "Smith")
        	    .add("age", 25).build();
		    
		return  value;
	}
}

http://localhost:8080/actuator/chaosmonkey

https://codecentric.github.io/chaos-monkey-spring-boot/2.2.0/

Reference

https://netflixtechblog.com/the-netflix-simian-army-16e57fbab116

https://github.com/netflix/chaosmonkey

https://netflix.github.io/chaosmonkey/How-to-deploy/

https://www.youtube.com/watch?v=B1nUzbuVEUs&feature=youtu.be&t=401

https://github.com/asobti/kube-monkey

https://www.gremlin.com/chaos-monkey/chaos-monkey-alternatives/kubernetes/

https://codecentric.github.io/chaos-monkey-spring-boot/2.2.0/#configuration

https://codecentric.github.io/chaos-monkey-spring-boot/#goal

https://chaostoolkit.org/extensions

https://github.com/dastergon/awesome-chaos-engineering

Write A Comment