[Oisf-users] Updated to 2.0.6 and experiencing event logging slow down

Mon Feb 9 20:18:46 UTC 2015

On Fri, Feb 6, 2015 at 11:25 PM, Gary Faulkner <gfaulkner.nsm at gmail.com> wrote:
> I've been experimenting with your suggestions, but I think the culprit may
> have been having an MTU of 9000 on the NIC and not setting
> default-packet-size to match in suricata.yaml. Jumbo frames are enabled for
> at least some of the links we have tapped. I tried setting
> default-packet-size to 9000 and saw improved processing, but high loss, so
> after some additional reading suggesting that it might be better to set use
> a lower MTU and discard jumbo frames (or at least let them fragment) I
> simply went back to the default MTU size on both the NICs ans in
> suricata.yaml. I need to let it run some more to collect stats, but I think
> I'm closer now with the below settings:
>
> dns parser disabled for udp and tcp (disabling this didn't seem to change
> the behavior I was seeing, but I hadn't enabled dns.log as I have dns
> logging going on elsewhere so I didn't it turn it back on)
>
> vlan.use-for-tracking set back to true (after trying no and seeing no change
> in behavior and the config file comments suggesting it should be on unless
> causing trouble)
>
> flow-timeouts:
>
>   default:
>     new: 3
>     established: 30
>     closed: 0
>     emergency-new: 10
>     emergency-established: 10
>     emergency-closed: 0
>   tcp:
>     new: 6
>     established: 100
>     closed: 12
>     emergency-new: 1
>     emergency-established: 5
>     emergency-closed: 2
>   udp:
>     new: 3
>     established: 30
>     emergency-new: 3
>     emergency-established: 10
>   icmp:
>     new: 3
>     established: 30
>     emergency-new: 1
>     emergency-established: 10
>
> My original stats.log file had grown quite large, so I rotated it out and am
> currently collecting fresh stats with the current settings. I did notice
> that there seemed to be a cosmetic bug in how dna interfaces are displayed
> in stats.log as in dna0 at 1 would display as dna0 at 11, dna0 at 2 would display as
> dna0 at 21 and so on. I can provide a stats.log once I have fresh stats.
>
> Thanks for the help!
>
> On 2/6/2015 5:56 AM, Victor Julien wrote:
>>
>> On 02/06/2015 06:30 AM, Gary Faulkner wrote:
>>>
>>> I was recently updating some sensors that I inherited from a previous
>>> admin from Suricata 1.4.7 to 2.0.6. The 1.4.7 install was happily
>>> processing around 15Gbps of traffic (multiple sensors) with 15K rules
>>> and less than 1% loss prior to the update. After the update I'm noticing
>>> that Suricata is logging fewer and fewer events as time goes on and loss
>>> as recorded in stats.log increases substantially over time as well. As
>>> an example, we record HTTP logs with Suricata and for the first few
>>> minutes the sensors will log thousands of events per second, but after
>>> 10 minutes or so that rate drops to a single event every few seconds.
>>> The box doesn't appear to be running out of memory, and Suricata will
>>> continue to run for hours in this state without crashing. Has anyone run
>>> into anything like this?
>>
>> Can you share a portion of your stats.log?
>>
>>> For some additional background I'm using Suricata with PF_RING 6.0.2 and
>>> the DNA drivers. PF_RING was also updated from 5.6.0 to 6.0.2. Traffic
>>> is load balanced across two sensors that each have 16 cores (32
>>> hyper-threading), 64GB RAM, and Intel 10G NICs. There were enough
>>> changes to the default config files between the two versions that I
>>> opted to diff the two and migrate our previous tuning settings to the
>>> new config file where appropriate, but I left the tuning settings
>>> largely untouched from what the previous set up was running. I've
>>> attached a config dump and build info if that is helpful. I suspect the
>>> config is suboptimal and I have experimented with some of the high
>>> performance tuning settings I've seen on the list, but I've largely
>>> found that I run out of RAM with them and Suricata never finishes
>>> loading. I appreciate any insight the list can offer.
>>
>> Most of the 1.4 -> 2.0 changes should make things work better, not the
>> opposite. So I'm curious what is going on.
>>
>> Couple of things to consider:
>>
>> 1. DNS parser was added. You don't seem to be using DNS logging so you
>> could disable it (set app-layer.protocols.dns.udp.enabled to no)
>>
>> 2. VLAN tracking was added. This is the most common source of upgrade
>> troubles from 1.4 to 2.0. Try with vlan.use-for-tracking set to no.
>>
>> 3. some of your TCP timeouts look like default. Most ppl lower them
>> (slash by 10) in high speed setups. Esp your flow-timeouts.tcp.closed
>> can be lowered.
>>
>
> _______________________________________________
> Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
> Site: http://suricata-ids.org | Support: http://suricata-ids.org/support/
> List: https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
> Training now available: http://suricata-ids.org/training/

Hi Gary,

Do you have any update with the new stats?

I also noticed you have that in the yaml:
stream.max-sessions = 20 000 000
stream.prealloc-sessions = 10 000 000

Can you comment out stream.max-sessions and use only
stream.prealloc-sessions = 100000
(the value is per stream thread)

Thanks

-- 
Regards,
Peter Manev