[Oisf-users] Help with 99% CPU usage

Wed Jun 5 11:34:04 UTC 2013

On 06/05/2013 12:34 PM, Anoop Saldanha wrote:
> On Wed, Jun 5, 2013 at 2:45 PM, Duarte Silva
> <duarte.silva at serializing.me> wrote:
>> On Thursday 16 May 2013 10:01:26 Duarte Silva wrote:
>>> On Wednesday 15 May 2013 19:54:21 Anoop Saldanha wrote:
>>>> On Wed, May 15, 2013 at 3:55 PM, Duarte Silva
>>>>
>>>> <duarte.silva at serializing.me> wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm currently facing a problem with Suricata. After running for a while,
>>>>> there is always an AF_PACKET thread (workers mode) that hogs the CPU to
>>>>> which it is bound to creating an awful amount of packet loss. I have
>>>>> discarded the>
>>>>>
>>>>> following factors:
>>>>>  - Number of rules, it has also happened without rules;
>>>>>  - Amount of network traffic, I have seen Suricata handle ~18 MBs (150
>>>>>  MBps) of>
>>>>>
>>>>> traffic without problems with the current configuration and it as also
>>>>> happened with only ~2 MBs of traffic;
>>>>>
>>>>>  - Memory, Suricata was only using ~500 MB of it when the CPU usage
>>>>>  pegged
>>>>>  to>
>>>>>
>>>>> 100%;
>>>>>
>>>>> This happens repeatedly and after it happens, Suricata takes a long time
>>>>> to
>>>>> stop. Could some tell me what I can do to debug this issue?
>>>>>
>>>>> Suricata is executed with the following command line:
>>>>>
>>>>> suricata -D -c /etc/suricata/suricata.yaml --pidfile
>>>>> /var/lock/subsys/suricata --af-packet=eth1 --user=suri --group=suri
>>>>>
>>>>> I have also attached any files that can help out in debugging.
>>>>
>>>> While this thread hogs the cpu, can you attach gdb to the suricata
>>>> process, and get a bt for the specified thread, and also all the
>>>> threads.
>>>
>>> Follows in the attachments the traces for the hogging thread (I had to wait
>>> almost height hours for it to happen). I have created three traces in
>>> different times while the AFPacketeth12 was hoging the CPU, all of them end
>>> up in the list_array_get in dslib.c.
>>>
>>> I will investigate what is happening by looking at the code, when it happens
>>> again I will also take traces for the other threads.
>>
>> Hi,
>>
>> I have taken two more traces when it happened again. Could you please give a
>> little help on this? I think it has something to do with HTTP processing.
>>
> 
> @Duarte
> 
> What version of suricata are you running?
> 
> @Victor.
> 
> From the last bt that Duarte sent, it looks like the list has grown in
> size.  The size is around 4k.  Probably that's the reason for the
> slowdown?  Every time we inspect state we will end up looping through
> the whole array.

Looks like it ya. The array approach doesn't really seem to scale that
well. Maybe a optimization in the short run would be to walk it
backwards? If we have this many TX' we probably still only have the last
few active.

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------