Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chaos Monkey for Fun and Profit

Chaos Monkey for Fun and Profit

The core idea of Chaos Engineering is to inject failures proactively in a controlled manner in order to gain confidence in our systems. Automating fault injection with tools like Chaos Monkey represents one of the advanced principles of Chaos Engineering. In this talk, Mathias is going to show you how to run your very own Chaos Monkey with Docker, and how to use it for both automated and manual fault injection.

(Talk given at Chaos Engineering Hamburg meetup: http://www.meetup.com/Chaos-Engineering-Hamburg/events/231567152/)

Mathias Lafeldt

June 15, 2016
Tweet

More Decks by Mathias Lafeldt

Other Decks in Technology

Transcript

  1. Chaos Engineering 101 • Trigger failures before they happen in

    production • Gain confidence that our systems can withstand failures • Verify that things behave as we expect • Fix them if they don't • Netflix: http://principlesofchaos.org/ 3
  2. GameDays at Jimdo 1. Gather the team in front of

    a big screen 2. Think up failure modes and estimate expected impact 3. Go through chaos experiments together 4. Write down measured impact 5. Create follow-up ticket for each flaw 5
  3. 8

  4. Chaos Monkey • Most famous member of Netflix's Simian Army

    • Randomly terminates EC2 instances during business hours • Goal: survive terminations without customer impact • Change frequency, probability, and type of terminations 9
  5. Quick start $ git clone https://github.com/Netflix/SimianArmy . $ ./gradlew build

    # Java 8 $ vim src/main/resources/{client,simianarmy,chaos}.properties $ ./gradlew jettyRun 10
  6. Configuration ! $ grep -c simianarmy src/main/resources/*.properties src/main/resources/chaos.properties:40 src/main/resources/client.properties:18 src/main/resources/conformity.properties:27

    src/main/resources/janitor.properties:55 src/main/resources/log4j.properties:0 src/main/resources/simianarmy.properties:13 src/main/resources/volumeTagging.properties:4 11
  7. Unleash the monkey! $ docker run -it --rm \ -e

    SIMIANARMY_CLIENT_AWS_ACCOUNTKEY=$AWS_ACCESS_KEY_ID \ -e SIMIANARMY_CLIENT_AWS_SECRETKEY=$AWS_SECRET_ACCESS_KEY \ -e SIMIANARMY_CLIENT_AWS_REGION=$AWS_REGION \ -e SIMIANARMY_CALENDAR_ISMONKEYTIME=true \ -e SIMIANARMY_CHAOS_ASG_ENABLED=true \ -e SIMIANARMY_CHAOS_LEASHED=false \ mlafeldt/simianarmy 13
  8. Configuration via etcd $ export ETCDCTL_ENDPOINT=http://$YOUR_ETCD_IP:2379 $ etcdctl set /simianarmy/client/aws/accountkey

    $AWS_ACCESS_KEY_ID $ etcdctl set /simianarmy/client/aws/secretkey $AWS_SECRET_KEY $ etcdctl set ... $ docker run -it --rm \ -e CONFD_OPTS="-backend=etcd -node=$ETCDCTL_ENDPOINT" \ mlafeldt/simianarmy # More confd backends: Consul, Vault, DynamoDB, etc. 14
  9. Trigger a chaos event $ chaosmonkey -endpoint http://example.com:8080 \ -group

    ExampleAutoScalingGroup \ -strategy ShutdownInstance 19
  10. Get past chaos events $ chaosmonkey -endpoint http://example.com:8080 InstanceID AutoScalingGroupName

    Region Strategy TriggeredAt i-741a78f8 ExampleAutoScalingGroup eu-west-1 DetachVolumes 2016-06-13T14:17:36Z i-c538184f ExampleAutoScalingGroup eu-west-1 ShutdownInstance 2016-06-13T14:11:18Z i-615272eb ExampleAutoScalingGroup eu-west-1 ShutdownInstance 2016-06-13T13:48:33Z 20
  11. List chaos strategies $ chaosmonkey -list-strategies ShutdownInstance BlockAllNetworkTraffic DetachVolumes BurnCpu

    BurnIo KillProcesses NullRoute FailEc2 FailDns FailDynamoDb FailS3 FillDisk NetworkCorruption NetworkLatency NetworkLoss 21
  12. AWS integration $ export AWS_ACCESS_KEY_ID=... $ export AWS_SECRET_ACCESS_KEY=... $ export

    AWS_REGION=... # List auto scaling groups $ chaosmonkey -list-groups # Delete all data from SimpleDB $ chaosmonkey -wipe-state SIMIAN_ARMY 22
  13. Use with Docker $ docker run -it --rm -p 8080:8080

    \ -e SIMIANARMY_CHAOS_LEASHED=false \ -e SIMIANARMY_CHAOS_TERMINATEONDEMAND_ENABLED=true \ ... mlafeldt/simianarmy $ chaosmonkey -endpoint http://$DOCKER_HOST_IP:8080 ... 23
  14. Chaos Monkey in Wonderland • Just another service running on

    Jimdo's PaaS • One monkey per environment (prod/stage) • Auth proxy to protect public API endpoint • /simianarmy/ as HTTP health check • Two replicas behind ELB for high availability • Currently on-demand termination only 24
  15. Production Ready newsletter https://tinyletter.com/production-ready Published articles include: Chaos Engineering 101

    Chaos Engineering: A Shift in Mindset Chaos Monkey for Fun and Profit A Little Story about Amazon ECS, systemd, and Chaos Monkey 27