[Oisf-users] Suricata Packet Drop

Thu May 8 19:14:51 UTC 2014

Hello,

I am testing out Suricata on a high bandwidth link. Here are my stats.
I have a fiber 10 gig Span port  (HP Nic) coming to my Suricata box. At peak, there is almost a 1gbps throughput with 150k packets per second.
Suricata box has 16 logical CPUs (I think it is 2 x Quad Core CPUs), 132gb of RAM and the rest doesnt matter. It has enough space to hold alerts or anything else.

I am using CentOS 6 64bit. I've compiled Suricata 2.0 with PFRING, GEOIP, and Profilling.

My problem is packet drop. I monitor stats.log file for info on packet drop. I get capture.kernel_drops after about 1 minute of running Suricata.
I have ran pfcount to check PFRING and had 0 packet loss over longer period of time. So this tells me my PFRING is configured correctly.
I've increased PF Ring Slots to maximum:
"rmmod pf_ring
modprobe pf_ring transparent_mode=0 min_num_slots=65534"
It doesnt seem to have any effect.

Now about my config file. I went through multiple guides on setting up Suricata and acquired various settings. I will try to list as much as possible.

max-pending-packets: 65534   (I've messed around with this setting almost without any difference in the end. Maybe only if this number is way low, packet drop starts to occur faster).

runmode: autofp  (I've tried auto and worker as well. This one has the best results)

default-packet-size: 1522  (I've increased this one to handle VLAN tagging. Read somewhere online)

for outputs I have unified2-alert configured with default settings without XFF. 
Also syslog and stats are configured for outputs with default settings.

Detect Engine:
detect-engine:
  - profile: custom
  - custom-values:
      toclient-src-groups: 200
      toclient-dst-groups: 200
      toclient-sp-groups: 200
      toclient-dp-groups: 300
      toserver-src-groups: 200
      toserver-dst-groups: 400
      toserver-sp-groups: 200
      toserver-dp-groups: 250
  - sgh-mpm-context: auto
  - inspection-recursion-limit: 3000

If I set SGH-MPM-Context to Full, Suricata will be loading forever eventually consuming 100% of RAM without starting. I have AC mpm-algorithm selected with 22k rules.

Threading.  I've tried to mess around with CPU Affinity with no luck. Basically I tried to separate Detect threads from the rest to allow Detect threads to have complete control over most of the CPUs. My problem was that Suricata failed to set affinity to processes. I think my ULIMIT priority settings are messed up, so OS doesnt allow this. Otherwise threads start on correct CPUs. But in the end, drops occur and even faster than 1 minute. So I have this disabled.

detect-thread-ratio is at 1.0   I've tried changing to 0.5, 0.125, 1.5, 2.0, 10.0, 20.0.  1 and 1.5 seems to have the best results as far as packet drops goes.

Not using CUDA since I dont have NVIDIA card.

mpm-algo: ac  (I have not messed around with this value at all. It seems that this one is the best if you have enough RAM).

defrag:   memcap: 512mb and max-frags: 65535    (Judging from stats.log there are no issues with defrags)
defrag.ipv4.fragments     | RxPFR2                    | 2795
defrag.ipv4.reassembled   | RxPFR2                    | 1391
defrag.ipv4.timeouts      | RxPFR2                    | 0
defrag.ipv6.fragments     | RxPFR2                    | 1
defrag.ipv6.reassembled   | RxPFR2                    | 0
defrag.ipv6.timeouts      | RxPFR2                    | 0
defrag.max_frag_hits      | RxPFR2                    | 0

flow:
  memcap: 10gb
  hash-size: 65536
  prealloc: 10000
  emergency-recovery: 30

This one doesnt seem to be a problem as well.

(via stats.log)
flow_mgr.closed_pruned    | FlowManagerThread         | 82348
flow_mgr.new_pruned       | FlowManagerThread         | 10841
flow_mgr.est_pruned       | FlowManagerThread         | 5375
flow.memuse               | FlowManagerThread         | 26879800
flow.spare                | FlowManagerThread         | 10023
flow.emerg_mode_entered   | FlowManagerThread         | 0
flow.emerg_mode_over      | FlowManagerThread         | 0

vlan:
  use-for-tracking: true
Not sure what this does but i didnt touch it.
Flow timeouts have been reduced to really small numbers. I think this should do more good than bad.

Now stream (where I think is the problem):
stream:
  memcap: 30gb
  checksum-validation: no      # reject wrong csums
  inline: no                  # auto will use inline mode in IPS mode, yes or no set it statically
  prealloc-sesions: 10000000
  midstream: false
  asyn-oneside: true
  reassembly:
    memcap: 40gb
    depth: 24mb                  # reassemble 1mb into a stream
    toserver-chunk-size: 2560
    toclient-chunk-size: 2560
    randomize-chunk-size: yes
    #randomize-chunk-range: 10
    #raw: yes
    chunk-prealloc: 100000
    segments:
      - size: 4
        prealloc: 15000
      - size: 16
        prealloc: 20000
      - size: 112
        prealloc: 60000
      - size: 248
        prealloc: 60000
      - size: 512
        prealloc: 50000
      - size: 768
        prealloc: 40000
      - size: 1448
        prealloc: 300000
      - size: 65535
        prealloc: 25000

I've messed around with segment sizes until suricata stopped reporting that there were too many of specific size.

host:
  hash-size: 4096
  prealloc: 1000
  memcap: 16777216

I think I might have increased memcap on this one.

For pfring:
 - interface: bond0
     threads: 16   (I've tried messing around with this number but with no difference except if I set it to like 2, drops occur faster)
    cluster-id: 99
    cluster-type: cluster_flow
- interface: default

In app-layer section, I've increased memcap for http to 20gb. Didnt touch anything else.

Ruleset is from ET Pro with some of ours totaling 22k.

Everything else is the same.

When Suricata starts it consumes about 4% of RAM.

If I run TOP -H, I've noticed something weird. As soon as one of the detect threads gets 100% CPU utilization, i start to see packet drops in capture.kernel_drops for some and then all threads of PFRING.

I've also disabled all of the offloading features on the nic. With eth -K ethx.

Interesting observation. I have created a bond with just one nic (my span feed). If I try to do pfring config on that interface, suricata doesnt see any packets. If I do bond, it works just fine.
Using iptraf, the same behavior. Not sure what that means.

Sorry for long email. I've figured this would reduce number of questions to me.

Thank you all.

--Yasha

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openinfosecfoundation.org/pipermail/oisf-users/attachments/20140508/f026eeab/attachment.html>