What is Chaos Engineering
Chaos engineering is a practice that involves intentionally introducing failures or disruptions into a system in order to test its resilience and identify weaknesses. By simulating real-world failures and observing how the system responds, you can improve the reliability and availability of your applications. This is especially important in the context of cloud computing, where a single failure or disruption can have cascading effects on other parts of the system.
To use chaos engineering in AWS, you can leverage tools such as Chaos Monkey and AWS Fault Injection Simulator.
Chaos Monkey
Chaos Monkey is an open-source tool by netflix that randomly terminates instances in an Amazon Elastic Compute Cloud (EC2) Auto Scaling group to test your applications’ resiliency. By simulating the failure of individual instances, you can ensure that your applications are able to continue functioning even when faced with unexpected disruptions.
Steps to use Chaos Monkey to terminate instances
Here is a step-by-step guide on how to use AWS Chaos Monkey to randomly terminate instances in an Amazon Elastic Compute Cloud (EC2) Auto Scaling group:
Step 1 : Set up an EC2 Auto Scaling group:
First, you will need to create an EC2 Auto Scaling group and specify the desired number of instances. You can do this using the AWS Management Console, the AWS CLI, or an AWS SDK.
To create an EC2 Auto Scaling group using the AWS Management Console, follow these steps:
- Sign in to the AWS Management Console and navigate to the EC2 dashboard.
- In the left navigation menu, select “Auto Scaling Groups” under the “Auto Scaling” heading.
- Click the “Create an Auto Scaling group” button.
- On the “Create an Auto Scaling group” page, select the type of instances you want to use and the desired number of instances. You can also specify the minimum and maximum number of instances, as well as the desired capacity.
- Click the “Next: Configure details” button.
- On the “Configure details” page, enter a name for the Auto Scaling group and select the VPC and subnet where you want to launch the instances.
- Click the “Next: Configure scaling policies” button.
- On the “Configure scaling policies” page, you can specify the conditions under which the Auto Scaling group should scale up or down. You can also specify the target capacity and the cool down period.
- Click the “Next: Configure notifications” button.
- On the “Configure notifications” page, you can specify the Amazon Simple Notification Service (SNS) topics to which you want to send notifications about Auto Scaling events.
- Click the “Next: Configure tagging” button.
- On the “Configure tagging” page, you can specify any tags you want to apply to the Auto Scaling group and its instances.
- Click the “Next: Review” button.
- On the “Review” page, review the details of your Auto Scaling group and click the “Create Auto Scaling group” button to create the group.
To create an EC2 Auto Scaling group using the AWS CLI, you can use the following command:
aws autoscaling create-auto-scaling-group --auto-scaling-group-name my-auto-scaling-group --min-size 2 --max-size 4 --desired-capacity 3 --launch-configuration-name my-launch-config --vpc-zone-identifier subnet-12345678
This command creates an Auto Scaling group with a minimum size of 2 instances, a maximum size of 4 instances, and a desired capacity of 3 instances. It uses the specified launch configuration and VPC subnet to launch the instances.
Overall, creating an EC2 Auto Scaling group is a key step in setting up chaos engineering in AWS, as it provides the infrastructure for running the chaos experiments. By specifying the minimum, maximum, and desired capacity of the group, you can control the number of instances that are running at any given time and ensure that your applications have the resources they need to function properly.
Step 2 : Install and configure AWS Chaos Monkey
Next, you will need to install and configure AWS Chaos Monkey. You can do this by following the instructions in the AWS Chaos Monkey documentation.
To install and configure AWS Chaos Monkey, follow these steps:
- Install the AWS CLI:
AWS Chaos Monkey is a command-line tool that is accessed using the AWS CLI. If you don’t already have the AWS CLI installed, you can install it by following the instructions in the AWS documentation.
- Configure the AWS CLI:
After installing the AWS CLI, you will need to configure it by running the aws configure command and providing your AWS access keys. You can find your access keys by signing in to the AWS Management Console and navigating to the “My Security Credentials” page.
- Install AWS Chaos Monkey:
To install AWS Chaos Monkey, you will need to clone the GitHub repository and install the dependencies. You can do this by running the following commands:
Clone the Chaos Monkey repository from GitHub:
git clone https://github.com/Netflix/chaosmonkey
Change to the directory where you cloned the repository:
cd chaosmonkey
Build the Chaos Monkey executable:
./gradlew build
This will build the Chaos Monkey executable and place it in the build/libs
directory.
Run the Chaos Monkey executable:
java -jar build/libs/chaosmonkey-<version>.jar
Replace <version>
with the version of Chaos Monkey that you are using.
- Configure AWS Chaos Monkey:
After installing AWS Chaos Monkey, you will need to configure it by modifying the configuration file. You can find the configuration file at ~/.chaosmonkey/conf/main.yml.
In the configuration file, you can specify the following options:
target_asgs: This option allows you to specify the Auto Scaling groups that you want to target with the chaos experiments.
whitelist_asgs: This option allows you to specify any Auto Scaling groups that you want to exclude from the chaos experiments.
chaos_schedule: This option allows you to specify the schedule for the chaos experiments, including the start time and frequency.
To configure Chaos Monkey to only target specific resources or to only terminate instances that meet certain criteria, you will need to use the command-line options provided by the tool. You can find a list of these options in the Chaos Monkey documentation or by running the tool with the --help
flag.
For example, to configure Chaos Monkey to only target instances that are not protected by an Auto Scaling group, you could use the --asgExclusion
option and specify the names of the Auto Scaling groups to exclude.
java -jar build/libs/chaosmonkey-<version>.jar --asgExclusion group1,group2
Keep in mind that the specific options and syntax for configuring Chaos Monkey may vary depending on the version of the tool that you are using. Be sure to consult the documentation for the version of Chaos Monkey that you have installed.
- Set up the necessary permissions:
In order to use AWS Chaos Monkey, you will need to grant it the necessary permissions to access your AWS resources. You can do this by creating an IAM role with the appropriate permissions and attaching it to the EC2 instances running the chaos experiments.
Overall, installing and configuring AWS Chaos Monkey involves installing the necessary software and setting up the necessary permissions and configurations. By following the steps outlined above, you can get AWS Chaos Monkey up and running and start using it to test the resilience of your applications.
Step 3 : Set up a chaos schedule
After installing and configuring AWS Chaos Monkey, you will need to set up a chaos schedule to specify when the chaos experiments should run. You can do this by modifying the configuration file for AWS Chaos Monkey.
To set up a chaos schedule using the configuration file for AWS Chaos Monkey, follow these steps:
- Open the configuration file
The configuration file for AWS Chaos Monkey is located at ~/.chaosmonkey/conf/main.yml. You can open the file using a text editor of your choice.
- Modify the chaos_schedule option
In the configuration file, you will find the chaos_schedule option under the scheduler heading. You can use this option to specify the schedule for the chaos experiments.
- Set the start time and frequency
- To set the start time for the chaos experiments, you can use the start_time option. This option should be specified in the format hh:mm, where hh is the hour and mm is the minute.
- To set the frequency for the chaos experiments, you can use the frequency option. This option can be set to daily, weekly, or monthly, depending on how often you want the experiments to run.
Here is an example of how you might modify the chaos_schedule option in the configuration file to run the chaos experiments every day at 12:00 AM:
scheduler:
chaos_schedule:
start_time: "12:00"
frequency: daily
Overall, setting up a chaos schedule involves modifying the configuration file for AWS Chaos Monkey and specifying the start time and frequency for the chaos experiments. By setting the schedule appropriately, you can control when the experiments run and ensure that they are aligned with your testing and deployment schedule.
Keep in mind that the specific options and syntax for setting up a chaos schedule may vary depending on the version of Chaos Monkey that you are using. Be sure to consult the documentation for the version of Chaos Monkey that you have installed.
Step 4 : Run the chaos experiments
To run the chaos experiments, you will need to start the AWS Chaos Monkey service. You can do this using the AWS Management Console, the AWS CLI, or an AWS SDK.
To run the chaos experiments using the AWS CLI, you can use the following command:
aws chaosmonkey start-chaos
This command will start the AWS Chaos Monkey service and begin running the chaos experiments according to the schedule specified in the configuration file.
If you want to stop the chaos experiments, you can use the following command:
aws chaosmonkey stop-chaos
This command will stop the AWS Chaos Monkey service and halt any ongoing chaos experiments.
You can also use the following command to check the status of the AWS Chaos Monkey service:
aws chaosmonkey get-chaos
This command will return the current status of the service, including whether it is running or stopped.
Overall, running the chaos experiments involves starting the AWS Chaos Monkey service and allowing it to run according to the schedule specified in the configuration file. By running the experiments, you can test the resilience of your applications and identify potential weaknesses that may impact the reliability and availability of your applications.
Step 5 : Monitor the results
As the chaos experiments are running, you can monitor the results using the AWS Management Console, the AWS CLI, or an AWS SDK. You can use this information to identify any weaknesses in the resiliency of your application and make improvements as needed.
To monitor the results of the chaos experiments using the AWS Management Console, follow these steps:
- Sign in to the AWS Management Console and navigate to the EC2 dashboard.
- In the left navigation menu, select “Auto Scaling Groups” under the “Auto Scaling” heading.
- Select the Auto Scaling group that you are targeting with the chaos experiments.
- On the “Instances” tab, you can view the current status of the instances in the Auto Scaling group. You can use this information to monitor how the chaos experiments are affecting the performance of your application.
- On the “CloudWatch” tab, you can view various metrics related to the Auto Scaling group, including CPU utilization, network traffic, and disk usage. You can use these metrics to identify any changes in the behavior of your application as a result of the chaos experiments.
To monitor the results of the chaos experiments using the AWS CLI, you can use the following command:
aws autoscaling describe-auto-scaling-groups –auto-scaling-group-names my-auto-scaling-group
This command will return information about the specified Auto Scaling group, including the current status of the instances and the capacity of the group.
Overall, monitoring the results of the chaos experiments involves tracking the performance of your application and identifying any changes in behavior as a result of the experiments. By analyzing the results, you can identify potential weaknesses in the resiliency of your application and make improvements as needed to improve the reliability and availability of your applications.
Conclusion
Overall, using Chaos Monkey can help you test the resilience of your applications and ensure that they are able to withstand unexpected disruptions. By simulating the failure of individual instances, you can identify potential weaknesses and make improvements to the reliability and availability of your applications. Note : This is just an exemplary post how you can use chaos monkey(a Netflix tool) to check the AWS apps resiliency as Netflix did.