esBPF: Stress-Testing compares Software-Offload with iptables

esBPF project has been over one year and it began with the idea that Is it worth filtering ingress packets on Software-Offload layer instead of Network Stack? Software-Offload is similar to Hardware-Offload, but it works in ethernet driver. Now time to do Stress-testing since its prototype was released and the comparison object will be iptables.

Before walking through the article, let me define a few short terms against typing exhausting long terms,

Long Term Short Term
Raspberry Pi 3 Rpi3
Host Machine Host

Testbed

Host and Rpi3 are on link connection of the same LAN of the AP below that it supports HW-offload and being Bridge mode against its Kernel interrupts forwarding packets between them.

 1                    High-Performance AP
 2                      - HW-offload Supported
 3                      - Bridge Mode
 4                    +-----------------+
 5                    |   Wireless AP   |
 6                    +-----------------+
 7      100Mbps link    |             |     1Gbps link
 8           +----------+             +-----------+
 9           |                                    |
10+-------------------+                 +-------------------+
11| Raspberry Pi 3    |                 | Host Machine      |
12| (192.168.219.103) |                 | (192.168.219.108) |
13+-------------------+                 +-------------------+

Also using hping3 program for Stress-testing that is going to be just flooding ICMP packets to Rpi3.

1$ hping3 --icmp --faster 192.168.219.103 -d 20

Tuning Raspberry-Pi 3 for the testing

  • Ubuntu 22.10 Kinetic Release - Kernel 5.19.0-1007 (Arm64)
  • Enable CONFIG_HOTPLUG_CPU to on/off CPU cores
  • esBPF-based customized eth driver, smsc95xx-esbpf
  • Off wlan0 interface not to mess up routing

It's set up using 2 cores instead of entire CPUs to load up full traffic on a specific number of cores by maxcpus=2 at boot command-line. Hence we have 2 online and offline cores respectively,

1ubuntu@ubuntu:~$ lscpu
2Architecture:            aarch64
3  CPU op-mode(s):        32-bit, 64-bit
4  Byte Order:            Little Endian
5CPU(s):                  4
6  On-line CPU(s) list:   0,1
7  Off-line CPU(s) list:  2,3
8Vendor ID:               ARM
9  Model name:            Cortex-A53

Briefing about smsc95xx-esbpf

Two significant files exist under a directory /proc/smsc95xx/esbpf once the driver has been loaded on Kernel and each other is responsible for ...

  1. rx_enable : turns on/off esbpf operations.
  2. rx_hooks : is supposed to be written by a program of cBPF instructions.

Stress-testing

We are going to look at mpstat values and compare NET_RX in /proc/softirqs before and after executing hping3. Please suppose the program would be running for 60 seconds on Host in each case.

Here is the idle usage of the CPUs of Rpi3. The idle columns are almost the same in both testing cases, iptables and Software-Offload before generating massive traffic on the LAN.

1$ mpstat -P ALL 3
2CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
3all    0.00    0.00    0.17    0.00    0.00    0.17    0.00    0.00    0.00   99.66
4  0    0.00    0.00    0.34    0.00    0.00    0.00    0.00    0.00    0.00   99.66
5  1    0.00    0.00    0.00    0.00    0.00    0.34    0.00    0.00    0.00   99.66

1. iptables

In the first test, the following rule is supposed to be appended in INPUT part on Rpi3 and as the result, one of the CPUs is being performed by softirq which means so busy to work.

 1$ iptables -A INPUT -p icmp -j DROP
 2$ iptables -nvL
 3Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 4 pkts bytes target     prot opt in     out     source               destination         
 5    0     0 DROP       icmp --  *      *       0.0.0.0/0            0.0.0.0/0
 6
 7# NET_RX softirq count before massive traffic
 8                    CPU0       CPU1       CPU2       CPU3
 9      NET_RX:        123         66          0          0
10
11# NET_RX softirq count after that
12                    CPU0       CPU1       CPU2       CPU3
13      NET_RX:      15040      35021          0          0
14
15# mpstat
16CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
17all    0.00    0.00    0.18    0.00    0.00   52.89    0.00    0.00    0.00   46.94
18  0    0.00    0.00    0.37    0.00    0.00    0.74    0.00    0.00    0.00   98.89
19  1    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00    0.00

2. esBPF

In the second test, it's going to drop the same type of packets in Software-Offload, in other words, in-driver. Special tools are required for doing that, tcpdump and filter_icmp but the latter already has hard-coded cBPF instructions, so tcpdump ain't necessary at this point.

The hard-coded part is as follows

1struct sock_filter insns[] = {
2  /* tcpdump -dd -nn icmp */
3  { 0x28, 0, 0, 0x0000000c },
4  { 0x15, 0, 3, 0x00000800 },
5  { 0x30, 0, 0, 0x00000017 },
6  { 0x15, 0, 1, 0x00000001 },
7  { 0x6, 0, 0, 0x00040000 },
8  { 0x6, 0, 0, 0x00000000 },
9};

and the program is executed by the following command that actually tries writing the above instructions to esBPF module.

1$ sudo ./filter_icmp /proc/smsc95xx/esbpf/rx_hooks
2$ sudo echo 1 > /proc/smsc95xx/esbpf/rx_enable

Even though hping3 works in the same flow, NET_RX didn't rise as much as the first case.

1# NET_RX softirq count before massive traffic
2                    CPU0       CPU1       CPU2       CPU3
3      NET_RX:        129         81          0          0
4
5# NET_RX softirq count after that
6                    CPU0       CPU1       CPU2       CPU3
7      NET_RX:        141         94          0          0

Also, the average usage of CPUs by softirq is around 8% up to 30% by looking at the best and worst cases respectively.

 1# mpstat in the best case
 2CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
 3all    0.00    0.00    0.64    0.00    0.00    7.99    0.00    0.00    0.00   91.37
 4  0    0.00    0.00    0.65    0.00    0.00    6.54    0.00    0.00    0.00   92.81
 5  1    0.00    0.00    0.62    0.00    0.00    9.38    0.00    0.00    0.00   90.00
 6
 7# mpstat in the worst case
 8CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
 9all   18.31    0.00    4.58    0.96    0.00   27.47    0.00    0.00    0.00   48.67
10  0   14.50    0.00    4.00    1.00    0.00   26.00    0.00    0.00    0.00   54.50
11  1   21.86    0.00    5.12    0.93    0.00   28.84    0.00    0.00    0.00   43.26

Notice that you may sometimes see a few ICMP packets come to Network Stack although esBPF is enabled. No worries they are just from lo interface.

Conclusion

esBPF works on Software-Offload, as known as device driver layer against Netfilter, a superset of iptables which works in Network Stack. Hence it drops all incoming packets matched to the filters in Tasklet level instead of NET_RX (part of Network Stack) and as we see the result of esBPF, Kernel doesn't need any extra tasks.

The project could be better than packet filtering in Network Stack in some cases even though the worst case takes CPU resources about four times than the best case. Of course, It depends on how big/long cBPF instructions are in esBPF though.

The project is still in progress such as being more flexible, optimizing, and taking Cache mechanism.

I figured out through this Stress-testing that it will be worth putting more effort into the project and keep working, at least I don't waste my time. Also, it was a nice time to take the responsibility for the entire process from design to testing.

I hope everyone has enjoyed the article, cheers ;-)