[Oisf-users] Updated to 2.0.6 and experiencing event logging slow down

Fri Feb 6 22:25:17 UTC 2015

I've been experimenting with your suggestions, but I think the culprit 
may have been having an MTU of 9000 on the NIC and not setting 
default-packet-size to match in suricata.yaml. Jumbo frames are enabled 
for at least some of the links we have tapped. I tried setting 
default-packet-size to 9000 and saw improved processing, but high loss, 
so after some additional reading suggesting that it might be better to 
set use a lower MTU and discard jumbo frames (or at least let them 
fragment) I simply went back to the default MTU size on both the NICs 
ans in suricata.yaml. I need to let it run some more to collect stats, 
but I think I'm closer now with the below settings:

dns parser disabled for udp and tcp (disabling this didn't seem to 
change the behavior I was seeing, but I hadn't enabled dns.log as I have 
dns logging going on elsewhere so I didn't it turn it back on)

vlan.use-for-tracking set back to true (after trying no and seeing no 
change in behavior and the config file comments suggesting it should be 
on unless causing trouble)

flow-timeouts:

   default:
     new: 3
     established: 30
     closed: 0
     emergency-new: 10
     emergency-established: 10
     emergency-closed: 0
   tcp:
     new: 6
     established: 100
     closed: 12
     emergency-new: 1
     emergency-established: 5
     emergency-closed: 2
   udp:
     new: 3
     established: 30
     emergency-new: 3
     emergency-established: 10
   icmp:
     new: 3
     established: 30
     emergency-new: 1
     emergency-established: 10

My original stats.log file had grown quite large, so I rotated it out 
and am currently collecting fresh stats with the current settings. I did 
notice that there seemed to be a cosmetic bug in how dna interfaces are 
displayed in stats.log as in dna0 at 1 would display as dna0 at 11, dna0 at 2 
would display as dna0 at 21 and so on. I can provide a stats.log once I 
have fresh stats.

Thanks for the help!
On 2/6/2015 5:56 AM, Victor Julien wrote:
> On 02/06/2015 06:30 AM, Gary Faulkner wrote:
>> I was recently updating some sensors that I inherited from a previous
>> admin from Suricata 1.4.7 to 2.0.6. The 1.4.7 install was happily
>> processing around 15Gbps of traffic (multiple sensors) with 15K rules
>> and less than 1% loss prior to the update. After the update I'm noticing
>> that Suricata is logging fewer and fewer events as time goes on and loss
>> as recorded in stats.log increases substantially over time as well. As
>> an example, we record HTTP logs with Suricata and for the first few
>> minutes the sensors will log thousands of events per second, but after
>> 10 minutes or so that rate drops to a single event every few seconds.
>> The box doesn't appear to be running out of memory, and Suricata will
>> continue to run for hours in this state without crashing. Has anyone run
>> into anything like this?
> Can you share a portion of your stats.log?
>
>> For some additional background I'm using Suricata with PF_RING 6.0.2 and
>> the DNA drivers. PF_RING was also updated from 5.6.0 to 6.0.2. Traffic
>> is load balanced across two sensors that each have 16 cores (32
>> hyper-threading), 64GB RAM, and Intel 10G NICs. There were enough
>> changes to the default config files between the two versions that I
>> opted to diff the two and migrate our previous tuning settings to the
>> new config file where appropriate, but I left the tuning settings
>> largely untouched from what the previous set up was running. I've
>> attached a config dump and build info if that is helpful. I suspect the
>> config is suboptimal and I have experimented with some of the high
>> performance tuning settings I've seen on the list, but I've largely
>> found that I run out of RAM with them and Suricata never finishes
>> loading. I appreciate any insight the list can offer.
> Most of the 1.4 -> 2.0 changes should make things work better, not the
> opposite. So I'm curious what is going on.
>
> Couple of things to consider:
>
> 1. DNS parser was added. You don't seem to be using DNS logging so you
> could disable it (set app-layer.protocols.dns.udp.enabled to no)
>
> 2. VLAN tracking was added. This is the most common source of upgrade
> troubles from 1.4 to 2.0. Try with vlan.use-for-tracking set to no.
>
> 3. some of your TCP timeouts look like default. Most ppl lower them
> (slash by 10) in high speed setups. Esp your flow-timeouts.tcp.closed
> can be lowered.
>