[Oisf-users] Suricata Packet Drop

Fri May 9 07:56:56 UTC 2014

Yasha

We have a similar setup on CentOS, similar throughput.
Instead of pfring we use myricom with their pcap library, though.

> I am testing out Suricata on a high bandwidth link. Here are my stats.
> I have a fiber 10 gig Span port  (HP Nic) coming to my Suricata box. At
> peak, there is almost a 1gbps throughput with 150k packets per second.
> Suricata box has 16 logical CPUs (I think it is 2 x Quad Core CPUs), 132gb
> of RAM and the rest doesnt matter. It has enough space to hold alerts or
> anything else.
>
> I am using CentOS 6 64bit. I've compiled Suricata 2.0 with PFRING, GEOIP,
> and Profilling.

Profiling data are very interesting, but it is expensive. Try without
(do not even compile it in).

...

> max-pending-packets: 65534   (I've messed around with this setting almost
> without any difference in the end. Maybe only if this number is way low,
> packet drop starts to occur faster).

We only have "max-pending-packets: 2048". Not sure about the exact
effect, either.

> runmode: autofp  (I've tried auto and worker as well. This one has the best
> results)

Interesting. I made some tests with run modes, and I get the best
results with "runmode: workers".

...

> Detect Engine:
> detect-engine:
>   - profile: custom
>   - custom-values:
>       toclient-src-groups: 200
>       toclient-dst-groups: 200
>       toclient-sp-groups: 200
>       toclient-dp-groups: 300
>       toserver-src-groups: 200
>       toserver-dst-groups: 400
>       toserver-sp-groups: 200
>       toserver-dp-groups: 250
>   - sgh-mpm-context: auto
>   - inspection-recursion-limit: 3000

Same here.

> Threading.  I've tried to mess around with CPU Affinity with no luck.
> Basically I tried to separate Detect threads from the rest to allow Detect
> threads to have complete control over most of the CPUs. My problem was that
> Suricata failed to set affinity to processes. I think my ULIMIT priority
> settings are messed up, so OS doesnt allow this. Otherwise threads start on
> correct CPUs. But in the end, drops occur and even faster than 1 minute. So
> I have this disabled.
>
> detect-thread-ratio is at 1.0   I've tried changing to 0.5, 0.125, 1.5, 2.0,
> 10.0, 20.0.  1 and 1.5 seems to have the best results as far as packet drops
> goes.

We have 16 Worker Threads with CPU affinity. Works fine. I can't prove
it, but I think our results are far better than without CPU affinity.
Hyperthreading might be in the way, though. Really make sure the
threads are on separate physical cores.
Keep in mind that with 2 x Quad Core, you only have 8 physical cores.
I would guess that with detect-thread-ratio=1, suricata would create
16 threads when Hyperthreading is enabled. This might not not be
desireable in this case.

> Not using CUDA since I dont have NVIDIA card.

We did test with cuda and autofp, but I didn't get better results than without.
Might be due to inappropriate cuda settings, too.

> mpm-algo: ac  (I have not messed around with this value at all. It seems
> that this one is the best if you have enough RAM).

Yep.

> defrag:   memcap: 512mb and max-frags: 65535

Same here.

>
> flow:
>   memcap: 10gb
>   hash-size: 65536
>   prealloc: 10000
>   emergency-recovery: 30

We have:

flow:
  memcap: 1024mb
  hash-size: 2621440
  prealloc: 3000000
  emergency-recovery: 30

> Now stream (where I think is the problem):
> stream:
>   memcap: 30gb
>   checksum-validation: no      # reject wrong csums
>   inline: no                  # auto will use inline mode in IPS mode, yes
> or no set it statically
>   prealloc-sesions: 10000000
>   midstream: false
>   asyn-oneside: true
>   reassembly:
>     memcap: 40gb
>     depth: 24mb                  # reassemble 1mb into a stream
>     toserver-chunk-size: 2560
>     toclient-chunk-size: 2560
>     randomize-chunk-size: yes
>     #randomize-chunk-range: 10
>     #raw: yes
>     chunk-prealloc: 100000
>     segments:
>       - size: 4
>         prealloc: 15000
>       - size: 16
>         prealloc: 20000
>       - size: 112
>         prealloc: 60000
>       - size: 248
>         prealloc: 60000
>       - size: 512
>         prealloc: 50000
>       - size: 768
>         prealloc: 40000
>       - size: 1448
>         prealloc: 300000
>       - size: 65535
>         prealloc: 25000

Similar values here.
For reassembly depth, we only have 5mb, though.

> host:
>   hash-size: 4096
>   prealloc: 1000
>   memcap: 16777216

Here:

host:
  hash-size: 8192
  prealloc: 8192
  memcap: 32mb

...

> In app-layer section, I've increased memcap for http to 20gb. Didnt touch
> anything else.

Here: memcap 768mb only.

> Ruleset is from ET Pro with some of ours totaling 22k.

That's a LOT of rules :-)  This might have a significant impact,
especially as you do not have lots of CPU cores.
We have "only" about 9k rules.
We also have some PASS rules configured for the traffic we're not
interested in, I think this helps, too.

> I've also disabled all of the offloading features on the nic. With eth -K
> ethx.

Did you also tune the sysctl parameters?
We set the following:

net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_sack = 0
net.ipv4.tcp_rmem = 4096 134217728 134217728
net.ipv4.tcp_wmem = 4096 134217728 134217728
net.core.netdev_max_backlog = 250000
net.core.rmem_max = 134217728
net.core.rmem_default = 134217728
net.core.wmem_max = 134217728

We loose generally <1% packets on a link with more than 1Gb/s and >200k pkts/s.

Interestingly, our packets get lost during very few moments.
Suricata reports lots of packets (a lot more than there really are!)
during these moments.
The rest of the day, no packets get lost. I.e., no uniform packet loss.
Maybe anyone can comment on this?

Cheers,
erich