ChaosBlade x SkyWalking: High Availability Microservices Practices

Preface

Tool Introduction

ChaosBlade

  • Basic resources: Experimental scenarios such as CPU, memory, network, disk, and process.
  • Java applications: Databases, caches, messages, JVMs, and microservices. Any methods can be specified to be injected with fault.
  • C++ application: Scenarios like injecting delay, variables, and tampered returned values to specified methods or rows of code.
  • Docker container: Experimental scenarios such as disabling of containers, or CPU, memory, network, disk, and process in containers.
  • Cloud Native platform: Experimental scenarios on Kubernetes such as the CPU, memory, network, disk, and process. Pod network and Pod disabling. The experiment scenario of container as shown above.

SkyWalking

  • Analysis of services, service instances, and endpoint metrics
  • Root cause analysis
  • Service topology analysis
  • Analysis of services, service instances, and endpoint dependencies
  • Slow service and endpoint detection
  • Performance optimization
  • Distributed tracing and context propagation
  • Detection of database access metrics and slow database access statements (including SQL statements)
  • Alerts

Tool Installation and Usage

ChaosBlade Installation

## Download
wget https://chaosblade.oss-cn-hangzhou.aliyuncs.com/agent/github/0.9.0/chaosblade-0.9.0-linux-amd64.tar.gz
## Decompress
tar -zxf chaosblade-0.9.0-linux-amd64.tar.gz
## Set environment variables
export PATH=$PATH:chaosblade-0.9.0/
## Test
blade –h

ChaosBlade Usage

An easy to use and powerful chaos engineering experiment toolkitUsage:
blade [command]
Available Commands:
create Create a chaos engineering experiment
destroy Destroy a chaos experiment
...
Create chaos engineering experiments with CPU loadUsage:
blade create cpu fullload
Aliases:
fullload, fl, load
Examples:# Create a CPU full load experiment
blade create cpu load
#Specifies two random kernel's full load
blade create cpu load --cpu-percent 60 --cpu-count 2
...
Flags:
--blade-release string Blade release package,use this flag when the channel is ssh
--channel string Select the channel for execution, and you can now select SSH
--climb-time string durations(s) to climb
--cpu-count string Cpu count
--cpu-list string CPUs in which to allow burning (0-3 or 1,3)
--cpu-percent string percent of burn CPU (0-100)
...
  • After an experiment is successfully created, ChaosBlade returns a UID. The blade destroy uid command can be executed to resume the experiment.
  • Execute the blade destroy target action (such as “blade destroy cpu fullload”) if no corresponding UID is available.
  • Add the “ — timeout 10” parameter when creating an experiment. The experiment automatically resumes after being executed for ten seconds. Besides, the parameter can act as an expression, such as “ — timeout 30m” for 3 minutes.

SkyWalking Installation and Usage

Case on Application Fault Tolerance

Case Environment

Application Topology

Chaos Experiment Steps

  • Develop a chaos experiment plan
  • Define system steady metrics
  • Make assumptions about system fault tolerance behavior
  • Run chaos experiment
  • Check steady metrics
  • Record and resume chaos experiment
  • Fix the problems
  • Automated Continuous verification

Case 1

ab -n 10000 -c 2 http://127.0.0.1:8083/cart
  • The average response time (RT) was around 15 ms.
  • P99 metric is within 20 ms.
  • Set the timeout period for calls to avoid client request blocking for a long time.
  • Configure the service blow policy/service degradation.
Dubbo interface to do delay experiments, support provider and consumerUsage:
blade create dubbo delay
Examples:
# Invoke com.alibaba.demo.HelloService.hello() service, do delay 3 seconds experiment
blade create dubbo delay --time 3000 --service com.alibaba.demo.HelloService --methodname hello --consumer
Flags:
--appname string The consumer or provider application name
--consumer To tag consumer role experiment.
--effect-count string The count of chaos experiment in effect
--effect-percent string The percent of chaos experiment in effect
--group string The service group
-h, --help help for delay
--methodname string The method name
--offset string delay offset for the time
--override only for java now, uninstall java agent
--pid string The process id
--process string Application process name
--provider To tag provider experiment
--service string The service interface
--time string delay time (required)
--timeout string set timeout for experiment in seconds
--version string the service version
Global Flags:
-d, --debug Set client to DEBUG mode
--uid string Set Uid for the experiment, adapt to docker
  • Procedure search
  • Receive detailed information about the protocol.
  • — time 30000: 30s of delay
  • — service com.alibabacloud.hipstershop.cartserviceapi.service.CartService: Service
  • — methodname viewCart: Service method
  • — process frontend: Java process
  • — consumer: Currently a Dubbo service client
blade create dubbo delay --time 30000 --service com.alibabacloud.hipstershop.cartserviceapi.service.CartService --methodname viewCart --process frontend --consumer

5) Monitoring Metrics

  • The average RT is about 2,000 ms, and the P99 metric is about 2,000 ms.
  • An error is reported on /cart interface calling that the “com.alibabacloud.hipstershop.cartserviceapi.service.CartService” service is abnormal.
  • A timeout error occurs. The timeout period is 2,000 ms.

5. Case 2

  • The “com.alibabacloud.hipstershop.cartserviceapi.service.CartService.viewCart” service is normal.
  • — interface eth0: NIC
  • — percent 100: 100% of packet loss rate
  • — local-port: Local port 8848
blade create network loss --interface eth0 --percent 100 --local-port 8848
  • The “com.alibabacloud.hipstershop.cartserviceapi.service.CartService.viewCart” service is normal.

Simple Practice

Mysql delay experimentUsage:
blade create mysql delay
Examples:
# Do a delay 2s experiment for mysql client connection port=3306 INSERT statement
blade create mysql delay --time 2000 --sqltype select --port 3306
Flags:
--database string The database name which used
--effect-count string The count of chaos experiment in effect
--effect-percent string The percent of chaos experiment in effect
-h, --help help for
--host string The database host
--offset string delay offset for the time
--override only for java now, uninstall java agent
--pid string The process id
--port string The database port which used
--process string Application process name
--sqltype string The sql type, for example, select, update and so on.
--table string The first table name in sql.
--time string delay time (required)
--timeout string set timeout for experiment in seconds
Global Flags:
-d, --debug Set client to DEBUG mode
--uid string Set Uid for the experiment, adapt to docker
blade create mysql delay --time 10000 --sqltype select --port 3306
  • — time 10000: 10s of delay
  • — sqltype select: Only select type of SQL statements is supported.
  • — port 3306: Only connections to port 3306 are supported.

Summary

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com