[Oisf-users] Suricata pegs a detect thread and drops packets

Thu Jun 19 10:31:26 UTC 2014

On 06/18/2014 04:54 PM, David Vasil wrote:
> I have been trying to track down an issue I am having with Suricata
> dropping packets (seems to be a theme on this list), requiring a restart
> of the daemon to clear the condition.  My environment is not large
> (averge 40-80Mbps traffic, mostly user/http traffic) and I have Suricata
> 2.0.1 running on a base installation of Security Onion 12.04.4 on a Dell
> R610 (12GB RAM, Dual Intel X5570, Broadcom BCM5709 sniffing interface).
> 
> About once a day, Zabbix shows that I am starting to see a large number
> of capture.kernel_drops and some corresponding tcp.reassembly_gap.
>  Looking at htop, I can see that one of the Detect threads (Detect1 in
> this screenshot) is pegged at 100% utilization.  If I use 'perf top' to
> look at the perf events on the system, I see libhtp consuming a large
> number of the cycles (attached).  Restarting suricata using
> 'nsm_sensor_stop --only-snort-alert' results in child threads exiting,
> but the main suricata process itself never stops (requiring a kill -9).
>  Starting suricata again with 'nsm_sensor_start --only-snort-alert'
> starts up Suricata and shows that we are able to inspect traffic with no
> drops.
> 
> In the attached screenshots, I am only inspecting ~2k packets/sec
> ~16Mbit/s when Suricata started dropping packets.  As I write this,
> Suricata is processing ~7k packets/sec and ~40Mbit/s with no drops.  I
> could not see anything that I can directly correlate to the drops and
> the various tuning steps I have taken have not helped alleviate the
> issue, so I was hoping to leverage the community's wisdom.
> 
> Some observations I had:
> 
> - Bro (running on the same system, on the same interface) drops 0%
> packets without issue all day
> - When I start seeing capture.kernel_drops, I also begin seeing an
> uptick in flow_mgr.new_pruned and tcp.reassembly_gap, changing the
> associated memcaps of each has not seemed to help
> - tcp.reassembly_memuse jumps to a peak of around 2.66G even though my
> reassembly memcap is set to 2gb
> - http.memcap is set to 256mb in my config and logfile, but the
> stats.log show http.memcap = 0 (bug?)

When this happens, do you see a peak in syn/synack and flow manager
pruned stats each time?

The current flow timeout code has a weakness. When it injects fake
packets into the engine to do some final processing, it currently only
injects into Detect1. You might be seeing this here.

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------