[Oisf-users] Suricata load/latency spikes

Oliver Humpage oliver at watershed.co.uk
Mon Jun 29 10:34:27 UTC 2015


Hello,

I have a fairly simple IPS setup, using FreeBSD 10.1 and IPFW divert (runmode workers, since autofp barely allows a trickle through for some reason). It's virtualised under ESXi, but that shouldn't make much odds - I'm using X520s as the hardware, and vmxnet3 drivers in the guest, which I think handles all the offloading fine. I've upped the defaults for how much memory various parts of suricata get in the yaml file.

I've also pared down the rules to the basics needed to protect client machines, such that running netperf I can get around 250Mb throughput no problem. Normally, everything runs swimmingly.

But... a few times a day, the client network slows to an unbearable crawl. This is accompanied by a spike in the load average of the box (from around 0.2 normally, to 0.8 or above), caused by the suricata process. This lasts between 1-5 mins, then goes back to normal.

It seems the slowdown is caused by latency: pings to 8.8.8.8 go from 8ms to around 200-300ms. Pings from a non-IPSed interface are fine, so it's definitely suricata, not the rest of the network.

Nothing untoward is in any of the IPS logs. vmstat shows there's plenty of spare RAM about. The ESXi box's performance graphs show there's always spare CPU/RAM so there's no limit there.

I've included two entries from the stats.log below, for "just-before" and "during" a slowdown. Are there any clues in them at all as to what's wrong? Does anyone have any other ideas? Could it just be someone on the network sending a huge number of small packets which overwhelms the IPS?

Many thanks,

Oliver.


Stats.log BEFORE slowdown (ping times ~8ms):

-------------------------------------------------------------------
Date: 6/29/2015 -- 11:08:48 (uptime: 3d, 01h 08m 05s)
-------------------------------------------------------------------
Counter                   | TM Name                   | Value
-------------------------------------------------------------------
dns.memuse                | Worker-Q5000              | 2499
dns.memcap_state          | Worker-Q5000              | 0
dns.memcap_global         | Worker-Q5000              | 0
decoder.pkts              | Worker-Q5000              | 64664867
decoder.bytes             | Worker-Q5000              | 55267658243
decoder.invalid           | Worker-Q5000              | 0
decoder.ipv4              | Worker-Q5000              | 64664867
decoder.ipv6              | Worker-Q5000              | 3
decoder.ethernet          | Worker-Q5000              | 0
decoder.raw               | Worker-Q5000              | 0
decoder.sll               | Worker-Q5000              | 0
decoder.tcp               | Worker-Q5000              | 62303365
decoder.udp               | Worker-Q5000              | 2337529
decoder.sctp              | Worker-Q5000              | 0
decoder.icmpv4            | Worker-Q5000              | 23973
decoder.icmpv6            | Worker-Q5000              | 0
decoder.ppp               | Worker-Q5000              | 0
decoder.pppoe             | Worker-Q5000              | 0
decoder.gre               | Worker-Q5000              | 0
decoder.vlan              | Worker-Q5000              | 0
decoder.vlan_qinq         | Worker-Q5000              | 0
decoder.teredo            | Worker-Q5000              | 3
decoder.ipv4_in_ipv6      | Worker-Q5000              | 0
decoder.ipv6_in_ipv6      | Worker-Q5000              | 0
decoder.avg_pkt_size      | Worker-Q5000              | 854
decoder.max_pkt_size      | Worker-Q5000              | 1500
defrag.ipv4.fragments     | Worker-Q5000              | 0
defrag.ipv4.reassembled   | Worker-Q5000              | 0
defrag.ipv4.timeouts      | Worker-Q5000              | 0
defrag.ipv6.fragments     | Worker-Q5000              | 0
defrag.ipv6.reassembled   | Worker-Q5000              | 0
defrag.ipv6.timeouts      | Worker-Q5000              | 0
defrag.max_frag_hits      | Worker-Q5000              | 0
tcp.sessions              | Worker-Q5000              | 194839
tcp.ssn_memcap_drop       | Worker-Q5000              | 0
tcp.pseudo                | Worker-Q5000              | 42934
tcp.invalid_checksum      | Worker-Q5000              | 0
tcp.no_flow               | Worker-Q5000              | 0
tcp.reused_ssn            | Worker-Q5000              | 2967
tcp.memuse                | Worker-Q5000              | 439664
tcp.syn                   | Worker-Q5000              | 225651
tcp.synack                | Worker-Q5000              | 197653
tcp.rst                   | Worker-Q5000              | 126898
tcp.segment_memcap_drop   | Worker-Q5000              | 0
tcp.stream_depth_reached  | Worker-Q5000              | 5232
tcp.reassembly_memuse     | Worker-Q5000              | 21960574
tcp.reassembly_gap        | Worker-Q5000              | 132
http.memuse               | Worker-Q5000              | 732079
http.memcap               | Worker-Q5000              | 0
detect.alert              | Worker-Q5000              | 330
flow_mgr.closed_pruned    | FlowManagerThread         | 193720
flow_mgr.new_pruned       | FlowManagerThread         | 8184
flow_mgr.est_pruned       | FlowManagerThread         | 32960
flow.memuse               | FlowManagerThread         | 10434752
flow.spare                | FlowManagerThread         | 10007
flow.emerg_mode_entered   | FlowManagerThread         | 0
flow.emerg_mode_over      | FlowManagerThread         | 0


Stats.log DURING slowdown (ping times 200ms+):

-------------------------------------------------------------------
Date: 6/29/2015 -- 11:14:48 (uptime: 3d, 01h 14m 05s)
-------------------------------------------------------------------
Counter                   | TM Name                   | Value
-------------------------------------------------------------------
dns.memuse                | Worker-Q5000              | 10157
dns.memcap_state          | Worker-Q5000              | 0
dns.memcap_global         | Worker-Q5000              | 0
decoder.pkts              | Worker-Q5000              | 65077506
decoder.bytes             | Worker-Q5000              | 55559111913
decoder.invalid           | Worker-Q5000              | 0
decoder.ipv4              | Worker-Q5000              | 65077506
decoder.ipv6              | Worker-Q5000              | 3
decoder.ethernet          | Worker-Q5000              | 0
decoder.raw               | Worker-Q5000              | 0
decoder.sll               | Worker-Q5000              | 0
decoder.tcp               | Worker-Q5000              | 62638930
decoder.udp               | Worker-Q5000              | 2414101
decoder.sctp              | Worker-Q5000              | 0
decoder.icmpv4            | Worker-Q5000              | 24475
decoder.icmpv6            | Worker-Q5000              | 0
decoder.ppp               | Worker-Q5000              | 0
decoder.pppoe             | Worker-Q5000              | 0
decoder.gre               | Worker-Q5000              | 0
decoder.vlan              | Worker-Q5000              | 0
decoder.vlan_qinq         | Worker-Q5000              | 0
decoder.teredo            | Worker-Q5000              | 3
decoder.ipv4_in_ipv6      | Worker-Q5000              | 0
decoder.ipv6_in_ipv6      | Worker-Q5000              | 0
decoder.avg_pkt_size      | Worker-Q5000              | 853
decoder.max_pkt_size      | Worker-Q5000              | 1500
defrag.ipv4.fragments     | Worker-Q5000              | 0
defrag.ipv4.reassembled   | Worker-Q5000              | 0
defrag.ipv4.timeouts      | Worker-Q5000              | 0
defrag.ipv6.fragments     | Worker-Q5000              | 0
defrag.ipv6.reassembled   | Worker-Q5000              | 0
defrag.ipv6.timeouts      | Worker-Q5000              | 0
defrag.max_frag_hits      | Worker-Q5000              | 0
tcp.sessions              | Worker-Q5000              | 196683
tcp.ssn_memcap_drop       | Worker-Q5000              | 0
tcp.pseudo                | Worker-Q5000              | 43313
tcp.invalid_checksum      | Worker-Q5000              | 0
tcp.no_flow               | Worker-Q5000              | 0
tcp.reused_ssn            | Worker-Q5000              | 2967
tcp.memuse                | Worker-Q5000              | 440272
tcp.syn                   | Worker-Q5000              | 227775
tcp.synack                | Worker-Q5000              | 199569
tcp.rst                   | Worker-Q5000              | 127770
tcp.segment_memcap_drop   | Worker-Q5000              | 0
tcp.stream_depth_reached  | Worker-Q5000              | 5249
tcp.reassembly_memuse     | Worker-Q5000              | 21254788
tcp.reassembly_gap        | Worker-Q5000              | 136
http.memuse               | Worker-Q5000              | 688101
http.memcap               | Worker-Q5000              | 0
detect.alert              | Worker-Q5000              | 330
flow_mgr.closed_pruned    | FlowManagerThread         | 195775
flow_mgr.new_pruned       | FlowManagerThread         | 8224
flow_mgr.est_pruned       | FlowManagerThread         | 33396
flow.memuse               | FlowManagerThread         | 10354784
flow.spare                | FlowManagerThread         | 10030
flow.emerg_mode_entered   | FlowManagerThread         | 0
flow.emerg_mode_over      | FlowManagerThread         | 0



More information about the Oisf-users mailing list