[Oisf-users] 1Gbps NIDS performance tuning

Tue May 20 18:00:33 UTC 2014

Hi,

Hoping for some performance tuning guidance on the following system:

Suricata 2.0 RELEASE
CentOS 6.5 amd64
Linux 2.6.32-431.17.1.el6.x86_64
libpcre 8.3.5
luajit 2.0.3
libpcap-1.4.0-1.20130826git2dbcaa1.el6.x86_64
12 core (2x6) Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz (24 cpu w/HT)
64GB RAM

For the time being we're held on this 2.6.32 kernel and wonder if we
can achieve suitable performance/minimal packet drop with AF_PACKET.
PF_RING may be an option but not likely DNA/zero_copy due to
licensing, only vanilla. Can we achieve close to 0% drop with this
hardware/kernel/ruleset as described below? Is an upgraded kernel a
major factor in achieving better performance in this
configuration/traffic profile?

I find pretty good information about 10Gbps Suricata tuning (Intel
82599 adapters, typically) but I'm not certain what pieces of network
adapter setup would apply to a 1Gbps adapter:

43:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
43:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
44:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
44:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)

Updated igb driver to latest:

driver: igb
version: 5.2.5
firmware-version: 1.2.1
bus-info: 0000:43:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

Is it proper to disable all NIC offloading features?

$ sudo ethtool -k eth6
Features for eth6:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
ntuple-filters: off
receive-hashing: off

$ sudo ethtool -g eth6
Ring parameters for eth6:
Pre-set maximums:
RX:             4096
RX Mini:        0
RX Jumbo:       0
TX:             4096
Current hardware settings:
RX:             256
RX Mini:        0
RX Jumbo:       0
TX:             256

$ sudo ethtool -n eth6
1 RX rings available
rxclass: Cannot get RX class rule count: Operation not supported
RX classification rule retrieval failed

Traffic throughput is around 500 Mbps -- 800 Mbps , composed mostly of
TCP streams (forward proxy requests,  clients <-> proxies).

Packet size brackets for interface bond1

 Packet Size (bytes)      Count     Packet Size (bytes)     Count
     1 to   75:        31628146      751 to  825:          458232
    76 to  150:         2284335      826 to  900:          361877
   151 to  225:          651222      901 to  975:          672172
   226 to  300:          524288      976 to 1050:          501744
   301 to  375:          613914     1051 to 1125:          323661
   376 to  450:         1391229     1126 to 1200:          362991
   451 to  525:         1032754     1201 to 1275:         1312685
   526 to  600:          818482     1276 to 1350:          341888
   601 to  675:         7107770     1351 to 1425:          636282
   676 to  750:          718379     1426 to 1500+:       34102516

Proto/Port       Pkts           Bytes        PktsTo        BytesTo
 PktsFrom     BytesFrom
TCP/3128      3788780           2948M       1397043        219783k
  2391737         2728M
TCP/443         40160        11690875         21846        2569412
    18314       9121463
TCP/80          13948        10720193          5295         342935
     8653      10377258
UDP/53           5749          756317          2925         204953
     2824        551364
UDP/161          3260          527866          1710         219259
     1550        308607
TCP/43           4969          373030          2493         154131
     2476        218899
UDP/514           281           67116           267          66004
       14          1112
TCP/22            129           27147            64           7581
       65         19566
UDP/123             8             608             4            304
        4           304

 Protocol data rates:555192.46 kbps total  36979.02 kbps in  518213.43 kbps out

Some flows are sizable (top 10 in monitored period of several minutes):

123715221 bytes
43233291 bytes
25762925 bytes
23353052 bytes
18263680 bytes
16888624 bytes
15858329 bytes
14250494 bytes
14081114 bytes
13980641 bytes

...many are quite small (smallest -> 46 bytes)

I intend to run a lightly tuned Emerging Threats ruleset with
something around 12K-13K rules enabled (current untuned rule
breakout):

20/5/2014 -- 09:39:42 - <Info> - 14341 signatures processed. 750 are
IP-only rules, 4046 are inspecting packet payload, 10997 inspect
application layer, 85 are decoder event only

The current configuration attempt has about a 50% drop rate. stats.log
at http://dpaste.com/246MJP9.txt. Changes in config:

- max-pending-packets: 3000
- eve-log and http-log disabled
- af-packet.ring-size: 524288
- af-packet.buffer-size: 65536
- detection-engine.profile: high
- defrag.memcap: 8gb
- flow.memcap: 16gb
- flow.prealloc: 1000000
- stream.memcap: 32gb
- stream.prealloc-sessions: 1024000
- reassembly.memcap: 16gb
- stream.depth: 6mb

Rules that fire are probably a result of the packet drop affecting
session reassembly (my guess): http://dpaste.com/1B0RZC2.txt

        linux-vdso.so.1 =>  (0x00007fff64dff000)
        libhtp-0.5.10.so.1 => /usr/local/lib/libhtp-0.5.10.so.1
(0x00007f55ad857000)
        libluajit-5.1.so.2 => /usr/local/lib/libluajit-5.1.so.2
(0x00007f55ad5e7000)
        libmagic.so.1 => /usr/lib64/libmagic.so.1 (0x00000034f8600000)
        libcap-ng.so.0 => /usr/local/lib/libcap-ng.so.0 (0x00007f55ad3e2000)
        libpcap.so.1 => /usr/lib64/libpcap.so.1 (0x00000034f4e00000)
        libnet.so.1 => /lib64/libnet.so.1 (0x00007f55ad1c8000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00000034f3600000)
        libyaml-0.so.2 => /usr/lib64/libyaml-0.so.2 (0x00007f55acfa8000)
        libpcre.so.1 => /opt/pcre-8.35/lib/libpcre.so.1 (0x00007f55acd40000)
        libc.so.6 => /lib64/libc.so.6 (0x00000034f2e00000)
        libz.so.1 => /lib64/libz.so.1 (0x00000034f3a00000)
        libm.so.6 => /lib64/libm.so.6 (0x00000034f3e00000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00000034f3200000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000034f4a00000)
        /lib64/ld-linux-x86-64.so.2 (0x00000034f2a00000)

Other questions:

- This sensor combines two half-duplex traffic feeds using a bonding
interface as the capture interface (bond1). If offload features are
disabled on each physical slave interface tied to the bond master,
does one have to disable offload features on the bond interface?
Attempting to disable some features on the bond pseudo-interface
fails; I'm guessing that it's the physical interfaces that really
matter.

- If a monitored network segment carries traffic comprised of jumbo
frames (say 9K frames), do the monitor interfaces on the sensor
require their MTU to be set accordingly in order to receive/capture
the full 9K of payload? Or is the MTU only relevant when transmitting,
or irrelevant for a NIDS sensor (i.e. not an endpoint station)?

- What is the correct practice with regard to the irqbalance daemon
under RHEL-type distributions? Some guidance specifies to disable it
entirely. Is that only a function of a system being tuned with CPU
affinity settings? Is it relevant to 1Gbps installations or 10Gbps
installations only?

- Some guides suggest setting interface ring parameter limits higher,
and setting network stack limits higher (sysctl). How important are
these settings? For example:

ethtool -G eth6 rx 4096
sysctl -w net.core.netdev_max_backlog=250000
sysctl -w net.core.rmem_max = 16777216
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.rmem_default=16777216
sysctl -w net.core.optmem_max=16777216

Thanks for any assistance. There's a lot of tuning guidance published
already but I'm having difficulty determining just what is useful in a
given situation.

-- 
Darren Spruell
phatbuckett at gmail.com