[Oisf-users] Suricata pegs a detect thread and drops packets

Fri Jun 20 13:11:37 UTC 2014

Dave,

Can you stick gdb into the cycle gobbling Detect thread(which is stuck
inside the libhtp list call), and do a "print *l" and post back the
results here.  That should clear things up.

On Fri, Jun 20, 2014 at 6:29 PM, David Vasil <davidvasil at gmail.com> wrote:
> It is not always the Detect1 thread, though it was in that case.  As I write
> this, my Detect6 thread has been running at 100% CPU utilization for about
> 30 minutes while the other 7 threads are between 0% and 5% -- no dropped
> packets reported though.  I decreased my stream.reassembly.depth from 4mb to
> 2mb and have seen a drop in the frequency of Detect threads hitting 100%
> utilization (though my tcp.stream_depth_reached has increased, as expected I
> guess).
>
> Now that I decreased the the stream.reassembly.depth, I saw instances of my
> FlowManagerThread hit 100% CPU utilization, during which time my
> flow_mgr.*pruned counters stopped increasing for the duration and when
> FlowManagerThread finished doing whatever it was doing there was a large
> spike in those counters.  No other counters on the system seemed affected.
> I had increased the size of the flow.hash-size to 131072, but have since
> reverted it to the default as that did not seem to decrease the dropped
> packets.
>
> Using perf top I still see libhtp (htp_list_array_get) consuming a majority
> of the event cycles on my system -- close to 20% overall and about 80% of
> the cycles in the Suricata-Main thread.  Is this something that is specific
> to my configuration or are others seeing similar libhtp utilization?  Maybe
> Anoop is on to something?
>
> A couple of perf top screenshots attached.  Thanks!
>
> stats.log: http://pastebin.com/13j0DV7E
> suricata.yaml: http://pastebin.com/QAib5dYZ
>
> -dave
>
>
> On Thu, Jun 19, 2014 at 5:31 AM, Victor Julien <lists at inliniac.net> wrote:
>>
>> On 06/18/2014 04:54 PM, David Vasil wrote:
>> > I have been trying to track down an issue I am having with Suricata
>> > dropping packets (seems to be a theme on this list), requiring a restart
>> > of the daemon to clear the condition.  My environment is not large
>> > (averge 40-80Mbps traffic, mostly user/http traffic) and I have Suricata
>> > 2.0.1 running on a base installation of Security Onion 12.04.4 on a Dell
>> > R610 (12GB RAM, Dual Intel X5570, Broadcom BCM5709 sniffing interface).
>> >
>> > About once a day, Zabbix shows that I am starting to see a large number
>> > of capture.kernel_drops and some corresponding tcp.reassembly_gap.
>> >  Looking at htop, I can see that one of the Detect threads (Detect1 in
>> > this screenshot) is pegged at 100% utilization.  If I use 'perf top' to
>> > look at the perf events on the system, I see libhtp consuming a large
>> > number of the cycles (attached).  Restarting suricata using
>> > 'nsm_sensor_stop --only-snort-alert' results in child threads exiting,
>> > but the main suricata process itself never stops (requiring a kill -9).
>> >  Starting suricata again with 'nsm_sensor_start --only-snort-alert'
>> > starts up Suricata and shows that we are able to inspect traffic with no
>> > drops.
>> >
>> > In the attached screenshots, I am only inspecting ~2k packets/sec
>> > ~16Mbit/s when Suricata started dropping packets.  As I write this,
>> > Suricata is processing ~7k packets/sec and ~40Mbit/s with no drops.  I
>> > could not see anything that I can directly correlate to the drops and
>> > the various tuning steps I have taken have not helped alleviate the
>> > issue, so I was hoping to leverage the community's wisdom.
>> >
>> > Some observations I had:
>> >
>> > - Bro (running on the same system, on the same interface) drops 0%
>> > packets without issue all day
>> > - When I start seeing capture.kernel_drops, I also begin seeing an
>> > uptick in flow_mgr.new_pruned and tcp.reassembly_gap, changing the
>> > associated memcaps of each has not seemed to help
>> > - tcp.reassembly_memuse jumps to a peak of around 2.66G even though my
>> > reassembly memcap is set to 2gb
>> > - http.memcap is set to 256mb in my config and logfile, but the
>> > stats.log show http.memcap = 0 (bug?)
>>
>> When this happens, do you see a peak in syn/synack and flow manager
>> pruned stats each time?
>>
>> The current flow timeout code has a weakness. When it injects fake
>> packets into the engine to do some final processing, it currently only
>> injects into Detect1. You might be seeing this here.
>>
>> --
>> ---------------------------------------------
>> Victor Julien
>> http://www.inliniac.net/
>> PGP: http://www.inliniac.net/victorjulien.asc
>> ---------------------------------------------
>>
>> _______________________________________________
>> Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
>> Site: http://suricata-ids.org | Support: http://suricata-ids.org/support/
>> List: https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>> OISF: http://www.openinfosecfoundation.org/
>
>
>
> _______________________________________________
> Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
> Site: http://suricata-ids.org | Support: http://suricata-ids.org/support/
> List: https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
> OISF: http://www.openinfosecfoundation.org/

-- 
-------------------------------
Anoop Saldanha
http://www.poona.me
-------------------------------