[Oisf-users] Suricata pegs a detect thread and drops packets
Anoop Saldanha
anoopsaldanha at gmail.com
Fri Jun 20 14:41:20 UTC 2014
I don't think --enable-debug compiles it with optimization. Instead
compile it without optimization, i.e. either -g -O0 or -g -03. Copy
the new binaries over, like you previously did. You will also have to
compile libhtp the same way. You can either specify this in the
environment variable with configure or manually edit the configure
script and the makefiles, replacing all "-g -o2" with just "-g".
1. You can start suricata, and wait for one of the detect threads to
hit the 100% cpu utilization mark(make a note of the detect
threadname).
2. One you see that, attach gdb to the running process, and print the
threads using "info threads". If you see the offending thread stuck
in the libhtp get() function call, switch over to that thread using "t
<thread_number>" and do a "print l". The symbol "l" is inside the
libhtp get() function call. Unless you have the detect thread inside
the libhtp get() function scope that we are trying to trace, you won't
have the symbol available for printing.
3. If when you do a "info threads", you don't see any of the threads
currently inside htp get() function(gone out of scope at that instance
of time t), continue the process in gdb, and keep a tab on the threads
with top/htop, till you see the detect thread(s) again hit the 100%
cpu mark, post which you can interrupt the process inside gdb again
and hopefully find the detect thread still inside the libhtp get()
function context.
With the issue at hand, once the thread gets pegged, you should be
able to zero-in on the thread pretty quickly. In case you can't, I'll
provide a debug patch to corner the issue.
On Fri, Jun 20, 2014 at 7:42 PM, David Vasil <davidvasil at gmail.com> wrote:
> I may be overthinking this, so let me know if there is a better way. I
> tried connecting gdb to the thread and running 'print *l', but the binary
> had no debug symbols. Therefore, I rebuilt my suricata .deb with
> --enable-debug (from Victor's blog:
> http://blog.inliniac.net/2010/01/04/suricata-debugging/) and installed the
> package. I then copied suricata-src/src/.libs/suricata &
> libhtp/htp/.libs/libhtp-0.5.11.so.1.0.0 (the unstripped files) over the
> installed binary and started suricata. I run with:
>
> SC_LOG_LEVEL=None SC_LOG_OP_FILTER="stream" suricata --user sguil --group
> sguil -c /etc/nsm/hera-na0-eth1/suricata.yaml --pfring=eth1 -F
> /etc/nsm/hera-na0-eth1/bpf-ids.conf -l /nsm/sensor_data/hera-na0-eth1 >
> /dev/null 2>&1
>
> (without the redirect to /dev/null I get a deluge of htp* output as it
> inspects my traffic). Testing out gdb on the process now, before any Detect
> thread is pegged:
>
> # gdb suricata 9812
> GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
> ...
> Reading symbols from /usr/bin/suricata...done.
> Attaching to program: /usr/bin/suricata, process 9812
>
> warning: process 9812 is a cloned process
> Reading symbols from /usr/lib/libhtp-0.5.11.so.1...done.
> Loaded symbols for /usr/lib/libhtp-0.5.11.so.1
> Reading symbols from /usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2...(no
> debugging symbols found)...done.
> Loaded symbols for /usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2
> ...
> libraries with no debug symbols
> ...
> Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...(no debugging
> symbols found)...done.
> Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
> 0x00007f58bd8fed84 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> (gdb) print *l
> No symbol "l" in current context. <--- What am I doing wrong here?
> (gdb) bt
> #0 0x00007f58bd8fed84 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1 0x00000000005bb58c in TmqhInputFlow (tv=<optimized out>) at
> tmqh-flow.c:93
> #2 0x00000000005c290f in TmThreadsSlotVar (td=0x64186d0) at
> tm-threads.c:810
> #3 0x00007f58bd8fae9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #4 0x00007f58bd1c93fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #5 0x0000000000000000 in ?? ()
>
> So, what am I missing with the 'print *l'?
>
> -dave
>
>
> On Fri, Jun 20, 2014 at 8:11 AM, Anoop Saldanha <anoopsaldanha at gmail.com>
> wrote:
>>
>> Dave,
>>
>> Can you stick gdb into the cycle gobbling Detect thread(which is stuck
>> inside the libhtp list call), and do a "print *l" and post back the
>> results here. That should clear things up.
>>
>> On Fri, Jun 20, 2014 at 6:29 PM, David Vasil <davidvasil at gmail.com> wrote:
>> > It is not always the Detect1 thread, though it was in that case. As I
>> > write
>> > this, my Detect6 thread has been running at 100% CPU utilization for
>> > about
>> > 30 minutes while the other 7 threads are between 0% and 5% -- no dropped
>> > packets reported though. I decreased my stream.reassembly.depth from
>> > 4mb to
>> > 2mb and have seen a drop in the frequency of Detect threads hitting 100%
>> > utilization (though my tcp.stream_depth_reached has increased, as
>> > expected I
>> > guess).
>> >
>> > Now that I decreased the the stream.reassembly.depth, I saw instances of
>> > my
>> > FlowManagerThread hit 100% CPU utilization, during which time my
>> > flow_mgr.*pruned counters stopped increasing for the duration and when
>> > FlowManagerThread finished doing whatever it was doing there was a large
>> > spike in those counters. No other counters on the system seemed
>> > affected.
>> > I had increased the size of the flow.hash-size to 131072, but have since
>> > reverted it to the default as that did not seem to decrease the dropped
>> > packets.
>> >
>> > Using perf top I still see libhtp (htp_list_array_get) consuming a
>> > majority
>> > of the event cycles on my system -- close to 20% overall and about 80%
>> > of
>> > the cycles in the Suricata-Main thread. Is this something that is
>> > specific
>> > to my configuration or are others seeing similar libhtp utilization?
>> > Maybe
>> > Anoop is on to something?
>> >
>> > A couple of perf top screenshots attached. Thanks!
>> >
>> > stats.log: http://pastebin.com/13j0DV7E
>> > suricata.yaml: http://pastebin.com/QAib5dYZ
>> >
>> > -dave
>> >
>> >
>> > On Thu, Jun 19, 2014 at 5:31 AM, Victor Julien <lists at inliniac.net>
>> > wrote:
>> >>
>> >> On 06/18/2014 04:54 PM, David Vasil wrote:
>> >> > I have been trying to track down an issue I am having with Suricata
>> >> > dropping packets (seems to be a theme on this list), requiring a
>> >> > restart
>> >> > of the daemon to clear the condition. My environment is not large
>> >> > (averge 40-80Mbps traffic, mostly user/http traffic) and I have
>> >> > Suricata
>> >> > 2.0.1 running on a base installation of Security Onion 12.04.4 on a
>> >> > Dell
>> >> > R610 (12GB RAM, Dual Intel X5570, Broadcom BCM5709 sniffing
>> >> > interface).
>> >> >
>> >> > About once a day, Zabbix shows that I am starting to see a large
>> >> > number
>> >> > of capture.kernel_drops and some corresponding tcp.reassembly_gap.
>> >> > Looking at htop, I can see that one of the Detect threads (Detect1
>> >> > in
>> >> > this screenshot) is pegged at 100% utilization. If I use 'perf top'
>> >> > to
>> >> > look at the perf events on the system, I see libhtp consuming a large
>> >> > number of the cycles (attached). Restarting suricata using
>> >> > 'nsm_sensor_stop --only-snort-alert' results in child threads
>> >> > exiting,
>> >> > but the main suricata process itself never stops (requiring a kill
>> >> > -9).
>> >> > Starting suricata again with 'nsm_sensor_start --only-snort-alert'
>> >> > starts up Suricata and shows that we are able to inspect traffic with
>> >> > no
>> >> > drops.
>> >> >
>> >> > In the attached screenshots, I am only inspecting ~2k packets/sec
>> >> > ~16Mbit/s when Suricata started dropping packets. As I write this,
>> >> > Suricata is processing ~7k packets/sec and ~40Mbit/s with no drops.
>> >> > I
>> >> > could not see anything that I can directly correlate to the drops and
>> >> > the various tuning steps I have taken have not helped alleviate the
>> >> > issue, so I was hoping to leverage the community's wisdom.
>> >> >
>> >> > Some observations I had:
>> >> >
>> >> > - Bro (running on the same system, on the same interface) drops 0%
>> >> > packets without issue all day
>> >> > - When I start seeing capture.kernel_drops, I also begin seeing an
>> >> > uptick in flow_mgr.new_pruned and tcp.reassembly_gap, changing the
>> >> > associated memcaps of each has not seemed to help
>> >> > - tcp.reassembly_memuse jumps to a peak of around 2.66G even though
>> >> > my
>> >> > reassembly memcap is set to 2gb
>> >> > - http.memcap is set to 256mb in my config and logfile, but the
>> >> > stats.log show http.memcap = 0 (bug?)
>> >>
>> >> When this happens, do you see a peak in syn/synack and flow manager
>> >> pruned stats each time?
>> >>
>> >> The current flow timeout code has a weakness. When it injects fake
>> >> packets into the engine to do some final processing, it currently only
>> >> injects into Detect1. You might be seeing this here.
>> >>
>> >> --
>> >> ---------------------------------------------
>> >> Victor Julien
>> >> http://www.inliniac.net/
>> >> PGP: http://www.inliniac.net/victorjulien.asc
>> >> ---------------------------------------------
>> >>
>> >> _______________________________________________
>> >> Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
>> >> Site: http://suricata-ids.org | Support:
>> >> http://suricata-ids.org/support/
>> >> List:
>> >> https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>> >> OISF: http://www.openinfosecfoundation.org/
>> >
>> >
>> >
>> > _______________________________________________
>> > Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
>> > Site: http://suricata-ids.org | Support:
>> > http://suricata-ids.org/support/
>> > List:
>> > https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>> > OISF: http://www.openinfosecfoundation.org/
>>
>>
>>
>> --
>> -------------------------------
>> Anoop Saldanha
>> http://www.poona.me
>> -------------------------------
>
>
--
-------------------------------
Anoop Saldanha
http://www.poona.me
-------------------------------
More information about the Oisf-users
mailing list