How to identify Micro-Bursting in EC2

How to identify Micro-Bursting in EC2

  • It is common for server to encounter issues with bandwidth allowance being exceeded and poor performance. Tools like nload and iftop provide bandwidth usage statistics but lack the granularity to identify microburst events at the microsecond level. To properly diagnose potential microbursting, admin can utilize the Linux traffic control (tc) tool to shape and monitor traffic with high precision.
  • Specifically, the tc command allows shaping outbound traffic and recording statistics with microseconds resolution. A typical approach is to insert a tc qdisc to rate limit egress traffic to the expected application bandwidth needs. Then microbursts can be detected by analyzing the overlimit statistics from tc. If microbursts are observed, the application may need to be tuned to smooth out its transmission pattern.
  • Properly tuned, applications can avoid microbursts and maximize network utilization within allocated bandwidth limits. But diagnosing microbursting requires high-resolution traffic analysis. Leveraging tools like tc provides the precision needed to characterize application traffic patterns and identify optimization opportunities to prevent excessive bandwidth usage. With careful testing and measurement, you can resolve performance issues caused by microbursts.

Test Performed on m6i.xlarge which has burstable bandwidth of 10000Mbps which it can sustain for max 30 minutes at least once every 24 hours, after which they revert to their baseline performance of 1250Mbps .

How to configure `tc` ?

  • You can used IP_Prefix to match/filter, here I’m using destination_port 443 for simplicity.

First we will create script for baseline_1250mbit

Baseline Bandwidth

#!/bin/bash
#path to tc command
TC=/sbin/tc 
# interface, you can use ifconfig to find out the name of the interface you want to use.
IF=ens5 

#bandwidth to limit the interface to, in this case, 1250Mbps
LIMIT=1250mbit 

#port to limit, in this case, 443
PORT=443 

# #u32 filter command, you can find out more about it here: https://man7.org/linux/man-pages/man8/tc.8.html

U32="$TC filter add dev $IF protocol ip parent 1:0 prio 1 u32" 

create () {
    echo "== SHAPING INIT == "
    $TC qdisc add dev $IF root handle 1:0 htb default 30

    $TC class add dev $IF parent 1:0 classid 1:1 htb rate $LIMIT ceil $LIMIT
    $U32 match ip dport $PORT flowid 1:1

    echo "== Shaping DONE =="
}

clean () {
    echo "== CLEANING =="
    $TC qdisc del dev $IF root
    echo "== CLEANED =="
}

clean
create

Before Upload:

ubuntu@ip-10-0-22-91:~$ ethtool -S ens5 | grep "exce"
     bw_in_allowance_exceeded: 0
     bw_out_allowance_exceeded: 34658410
     pps_allowance_exceeded: 0
     conntrack_allowance_exceeded: 0
     linklocal_allowance_exceeded: 0

ubuntu@ip-10-0-22-91:~$ tc -s qdisc show
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 1: dev ens5 root refcnt 5 r2q 10 default 0x30 direct_packets_stat 14 direct_qlen 1000
 Sent 1428 bytes 14 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

Uploading 3GB to S3 from EC2:

ubuntu@ip-10-0-22-91:~$ aws s3 cp bw_3GB.txt s3://bucket upload: ./bw_3GB.txt to s3://bucket/bw_3GB.txt

After Upload:

ubuntu@ip-10-0-22-91:~$ ethtool -S ens5 | grep "exce"
     bw_in_allowance_exceeded: 0
     bw_out_allowance_exceeded: 34658410
     pps_allowance_exceeded: 0
     conntrack_allowance_exceeded: 0
     linklocal_allowance_exceeded: 0

ubuntu@ip-10-0-22-91:~$ tc -s qdisc show
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 1: dev ens5 root refcnt 5 r2q 10 default 0x30 direct_packets_stat 1054 direct_qlen 1000
 Sent 3373615083 bytes 2271626 pkt (dropped 0, overlimits 71825 requeues 0)
 backlog 0b 0p requeues 0
  • We can see that, when the rate-limit is configured on baseline bandwidth, the bw_out_allowance_exceeded doesn’t increment and only overlimits counter in tc increments.

Secondly, we will create script for burstable_10000mbit

#!/bin/bash
#path to tc command
TC=/sbin/tc 
# interface, you can use ifconfig to find out the name of the interface you want to use.
IF=ens5 

#bandwidth to limit the interface to, in this case, 10000Mbps
LIMIT=10000mbit 

#port to limit, in this case, 443
PORT=443 

# #u32 filter command, you can find out more about it here: https://man7.org/linux/man-pages/man8/tc.8.html

U32="$TC filter add dev $IF protocol ip parent 1:0 prio 1 u32" 

create () {
    echo "== SHAPING INIT == "
    $TC qdisc add dev $IF root handle 1:0 htb default 30

    $TC class add dev $IF parent 1:0 classid 1:1 htb rate $LIMIT ceil $LIMIT
    $U32 match ip dport $PORT flowid 1:1

    echo "== Shaping DONE =="
}

clean () {
    echo "== CLEANING =="
    $TC qdisc del dev $IF root
    echo "== CLEANED =="
}

clean
create

Before Upload:

ubuntu@ip-10-0-22-91:~$ ethtool -S ens5 | grep "exce"
     bw_in_allowance_exceeded: 0
     bw_out_allowance_exceeded: 34658410
     pps_allowance_exceeded: 0
     conntrack_allowance_exceeded: 0
     linklocal_allowance_exceeded: 0

ubuntu@ip-10-0-22-91:~$ tc -s qdisc show
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 1: dev ens5 root refcnt 5 r2q 10 default 0x30 direct_packets_stat 8 direct_qlen 1000
 Sent 824 bytes 8 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

Uploading 3GB to S3 from EC2:

ubuntu@ip-10-0-22-91:~$ aws s3 cp bw_3GB.txt s3://bucket upload: ./bw_3GB.txt to s3://bucket/bw_3GB.txt

After Upload:

ubuntu@ip-10-0-22-91:~$ ethtool -S ens5 | grep "exce"
     bw_in_allowance_exceeded: 0
     bw_out_allowance_exceeded: 34697386
     pps_allowance_exceeded: 0
     conntrack_allowance_exceeded: 0
     linklocal_allowance_exceeded: 0

ubuntu@ip-10-0-22-91:~$ tc -s qdisc show
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 1: dev ens5 root refcnt 5 r2q 10 default 0x30 direct_packets_stat 253701 direct_qlen 1000
 Sent 6778238404 bytes 4893281 pkt (dropped 0, overlimits 860006 requeues 388)
 backlog 0b 0p requeues 388
  • We can see that, when the rate-limit is configured on burstable bandwidth, the bw_out_allowance_exceeded and overlimits counter in tc increments.

bw_out_allowance_exceeded doesn’t necessarily means the packet is dropped, queuing can also increase the counter. if you see sharp increase in bw_out_allowance_exceeded and see performance impact, then it is time to upgrade the instance type which offers better bandwidth if your application needs to sustain burstable bandwidth for long hours.

Reference:

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html#current

Continue Learning

Discover more articles on similar topics