How to identify Micro-Bursting in EC2

How to identify Micro-Bursting in EC2

  • It is common for server to encounter issues with bandwidth allowance being exceeded and poor performance. Tools like nload and iftop provide bandwidth usage statistics but lack the granularity to identify microburst events at the microsecond level. To properly diagnose potential microbursting, admin can utilize the Linux traffic control (tc) tool to shape and monitor traffic with high precision.
  • Specifically, the tc command allows shaping outbound traffic and recording statistics with microseconds resolution. A typical approach is to insert a tc qdisc to rate limit egress traffic to the expected application bandwidth needs. Then microbursts can be detected by analyzing the overlimit statistics from tc. If microbursts are observed, the application may need to be tuned to smooth out its transmission pattern.
  • Properly tuned, applications can avoid microbursts and maximize network utilization within allocated bandwidth limits. But diagnosing microbursting requires high-resolution traffic analysis. Leveraging tools like tc provides the precision needed to characterize application traffic patterns and identify optimization opportunities to prevent excessive bandwidth usage. With careful testing and measurement, you can resolve performance issues caused by microbursts.

Test Performed on m6i.xlarge which has burstable bandwidth of 10000Mbps which it can sustain for max 30 minutes at least once every 24 hours, after which they revert to their baseline performance of 1250Mbps .

How to configure `tc` ?

  • You can used IP_Prefix to match/filter, here I’m using destination_port 443 for simplicity.

First we will create script for baseline_1250mbit

Baseline Bandwidth

#!/bin/bash
#path to tc command
TC=/sbin/tc 
# interface, you can use ifconfig to find out the name of the interface you want to use.
IF=ens5 

#bandwidth to limit the interface to, in this case, 1250Mbps
LIMIT=1250mbit 

#port to limit, in this case, 443
PORT=443 

# #u32 filter command, you can find out more about it here: https://man7.org/linux/man-pages/man8/tc.8.html

U32="$TC filter add dev $IF protocol ip parent 1:0 prio 1 u32" 

create () {
    echo "== SHAPING INIT == "
    $TC qdisc add dev $IF root handle 1:0 htb default 30

    $TC class add dev $IF parent 1:0 classid 1:1 htb rate $LIMIT ceil $LIMIT
    $U32 match ip dport $PORT flowid 1:1

    echo "== Shaping DONE =="
}

clean () {
    echo "== CLEANING =="
    $TC qdisc del dev $IF root
    echo "== CLEANED =="
}

clean
create

Before Upload:

ubuntu@ip-10-0-22-91:~$ ethtool -S ens5 | grep "exce"
     bw_in_allowance_exceeded: 0
     bw_out_allowance_exceeded: 34658410
     pps_allowance_exceeded: 0
     conntrack_allowance_exceeded: 0
     linklocal_allowance_exceeded: 0

ubuntu@ip-10-0-22-91:~$ tc -s qdisc show
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 1: dev ens5 root refcnt 5 r2q 10 default 0x30 direct_packets_stat 14 direct_qlen 1000
 Sent 1428 bytes 14 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

Uploading 3GB to S3 from EC2:

ubuntu@ip-10-0-22-91:~$ aws s3 cp bw_3GB.txt s3://bucket upload: ./bw_3GB.txt to s3://bucket/bw_3GB.txt

After Upload:

ubuntu@ip-10-0-22-91:~$ ethtool -S ens5 | grep "exce"
     bw_in_allowance_exceeded: 0
     bw_out_allowance_exceeded: 34658410
     pps_allowance_exceeded: 0
     conntrack_allowance_exceeded: 0
     linklocal_allowance_exceeded: 0

ubuntu@ip-10-0-22-91:~$ tc -s qdisc show
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 1: dev ens5 root refcnt 5 r2q 10 default 0x30 direct_packets_stat 1054 direct_qlen 1000
 Sent 3373615083 bytes 2271626 pkt (dropped 0, overlimits 71825 requeues 0)
 backlog 0b 0p requeues 0
  • We can see that, when the rate-limit is configured on baseline bandwidth, the bw_out_allowance_exceeded doesn’t increment and only overlimits counter in tc increments.

Secondly, we will create script for burstable_10000mbit

#!/bin/bash
#path to tc command
TC=/sbin/tc 
# interface, you can use ifconfig to find out the name of the interface you want to use.
IF=ens5 

#bandwidth to limit the interface to, in this case, 10000Mbps
LIMIT=10000mbit 

#port to limit, in this case, 443
PORT=443 

# #u32 filter command, you can find out more about it here: https://man7.org/linux/man-pages/man8/tc.8.html

U32="$TC filter add dev $IF protocol ip parent 1:0 prio 1 u32" 

create () {
    echo "== SHAPING INIT == "
    $TC qdisc add dev $IF root handle 1:0 htb default 30

    $TC class add dev $IF parent 1:0 classid 1:1 htb rate $LIMIT ceil $LIMIT
    $U32 match ip dport $PORT flowid 1:1

    echo "== Shaping DONE =="
}

clean () {
    echo "== CLEANING =="
    $TC qdisc del dev $IF root
    echo "== CLEANED =="
}

clean
create

Before Upload:

ubuntu@ip-10-0-22-91:~$ ethtool -S ens5 | grep "exce"
     bw_in_allowance_exceeded: 0
     bw_out_allowance_exceeded: 34658410
     pps_allowance_exceeded: 0
     conntrack_allowance_exceeded: 0
     linklocal_allowance_exceeded: 0

ubuntu@ip-10-0-22-91:~$ tc -s qdisc show
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 1: dev ens5 root refcnt 5 r2q 10 default 0x30 direct_packets_stat 8 direct_qlen 1000
 Sent 824 bytes 8 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

Uploading 3GB to S3 from EC2:

ubuntu@ip-10-0-22-91:~$ aws s3 cp bw_3GB.txt s3://bucket upload: ./bw_3GB.txt to s3://bucket/bw_3GB.txt

After Upload:

ubuntu@ip-10-0-22-91:~$ ethtool -S ens5 | grep "exce"
     bw_in_allowance_exceeded: 0
     bw_out_allowance_exceeded: 34697386
     pps_allowance_exceeded: 0
     conntrack_allowance_exceeded: 0
     linklocal_allowance_exceeded: 0

ubuntu@ip-10-0-22-91:~$ tc -s qdisc show
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 1: dev ens5 root refcnt 5 r2q 10 default 0x30 direct_packets_stat 253701 direct_qlen 1000
 Sent 6778238404 bytes 4893281 pkt (dropped 0, overlimits 860006 requeues 388)
 backlog 0b 0p requeues 388
  • We can see that, when the rate-limit is configured on burstable bandwidth, the bw_out_allowance_exceeded and overlimits counter in tc increments.

bw_out_allowance_exceeded doesn’t necessarily means the packet is dropped, queuing can also increase the counter. if you see sharp increase in bw_out_allowance_exceeded and see performance impact, then it is time to upgrade the instance type which offers better bandwidth if your application needs to sustain burstable bandwidth for long hours.

Reference:

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html#current

Continue Learning

Discover more articles on similar topics

The Future of AI and PowerPoint: How Large Language Models Could Boost Presentation Building

How integration of LLMs like Copilot could completely transform PowerPoint, saving users time while unlocking more creativity.

aillmcopilot

Advantages of collaborating with a dedicated software development team

In a world dominated by technology, it is crucial to take care of digitalization and utilize advanced informatics solutions to improve your business. Creating well-thought-out software can help you increase productivity and enhance the experiences of your employees and customers. You should consider collaborating with a dedicated software development team to ensure that your software will be tailored to your needs.

Software development

Exploring Common Classification Models

Explore K-Nearest-Neighbor, Support Vector Machines, and Decision Trees: What they are and when to apply which.

aimachine learningsupport vector machine

Essential Linux Commands for System Admins, Cloud, and DevOps Engineers

In this article, we’ll explore essential Linux commands that are commonly used in the daily tasks of system administrators, cloud engineers, and DevOps practitioners.

CloudDevopsLinux

Tailwind CSS and Next.js: Best Practices and Expert Advice

This is for all developers and development teams while working with Tailwind CSS Nextjs in the projects

NextjsTailwindcss

Why Managing Proxies Yourself Is a Bad Idea for Web Scraping

In-house proxy solutions vs off-the-shelf solutions: exploring the challenges of DIY proxy solutions and the merits of an off-the-shelf proxy product.

Proxies For Web ScrapingProxies For AutomationProxy