[Oisf-users] Suricata 4.0.3 with Napatech problems

Steve Castellarin steve.castellarin at gmail.com
Tue Jan 23 20:51:44 UTC 2018


Peter,

I reviewed my compile of Suricata 4.0.3 I noticed that I was using
Hyperscan version 4.7, as opposed to version 4.2 noted in the Suricata
documentation (
http://suricata.readthedocs.io/en/latest/performance/hyperscan.html).
After recompiling with 4.2 I was able to get Suricata 4.0.3 to run for 42
minutes before it started dropping packets uncontrollably.

I then made a change to /proc/sys/vm/max_map_count based on a note in
Napatech's documentation: "Especially for large host buffer configurations
it is necessary to adjust the kernel sysctl "vm.max_map_count"
(/proc/sys/vm/max_map_count).  The kernel sysctl "vm.max_map_count"
(/proc/sys/vm/max_map_count) should be adjusted to (at least) the total
configured host buffer memory in MB multiplied by four.
Example for total host buffer size 128GB (131072MB): 131072*4 = 524288.
Hence the minimum value for "vm.max_map_count" is 524288."

In my case I'm using 17 host buffers at 2048MB per ((17 * 2048) * 4), which
would be 139264.  My vm.max_map_count previously was 65530 (I guess default
for Ubuntu 14.04).  After changing that and re-running Suricata 4.0.3 it
ran for 45 minutes before the buffer/CPU issue came back.

On Tue, Jan 23, 2018 at 9:49 AM, Steve Castellarin <
steve.castellarin at gmail.com> wrote:

> Hi Peter,
>
> I just realized I responded directly to you instead of the mailing list -
> so here's my response, updated.
>
> I made a change to my YAML file for 4.0.3, dropping the
> detect-thread-ratio from 1.5 to 1 and on Friday was able to run Suricata
> 4.0.3 for five hours before the issue occurred.  This run did handle
> sustained network traffic of 1.2 through 1.7gbps.  So that is a step in the
> positive direction.  I'm going to have a hard time running 4.0.3 without
> rules, as this unfortunately is our only Suricata instance running our rule
> set.
>
> I've noticed one thing that's strange.  In my YAML file I have the
> "autofp-scheduler" set to "active-packets".  Yet everytime I run Suricata I
> see this noted in suricata.log "using flow hash instead of active
> packets".  When I comment out the "autofp-scheduler" setting in the YAML
> file then that message disappears.  Any idea on what that is all about?
>
> On Sun, Jan 21, 2018 at 3:49 PM, Peter Manev <petermanev at gmail.com> wrote:
>
>>
>>
>> On 18 Jan 2018, at 19:21, Steve Castellarin <steve.castellarin at gmail.com>
>> wrote:
>>
>> And also, the bandwidth utilization was just over 800Mbps.
>>
>>
>> Can you try the same run but this time - load no rules. I would like to
>> see if it would make difference or not in the same amount of time.
>>
>>
>> On Thu, Jan 18, 2018 at 1:16 PM, Steve Castellarin <
>> steve.castellarin at gmail.com> wrote:
>>
>>> Hey Peter,
>>>
>>> Those changes didn't help.  Around 23+ minutes into the run one worker
>>> CPU (#30) stayed at 100% while buffer NT11 dropped packets and would not
>>> recover.  I'm attaching a zip file that has the stats.log for that run, the
>>> suricata.log file as well as the information seen at the command line after
>>> issuing "/usr/bin/suricata -vvv -c /etc/suricata/suricata.yaml --napatech
>>> --runmode workers -D".
>>>
>>> Steve
>>>
>>>
>>> On Thu, Jan 18, 2018 at 11:30 AM, Steve Castellarin <
>>> steve.castellarin at gmail.com> wrote:
>>>
>>>> We never see above 2Gbps.  When the issue occurred a little bit ago I
>>>> was running the Napatech "monitoring" tool and it was saying we were
>>>> between 650-900Mbps.  I'll note the bandwidth utilization when the next
>>>> issue occurs.
>>>>
>>>> On Thu, Jan 18, 2018 at 11:28 AM, Peter Manev <petermanev at gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, Jan 18, 2018 at 5:27 PM, Steve Castellarin
>>>>> <steve.castellarin at gmail.com> wrote:
>>>>> > When you mean the "size of the traffic", are you asking what the
>>>>> bandwidth
>>>>> > utilization is at the time the issue begins?
>>>>>
>>>>> Sorry - i mean the traffic you sniff - 1/5/10...Gbps ?
>>>>>
>>>>> >
>>>>> > I will set things up and send you any/all output after the issue
>>>>> starts.
>>>>> >
>>>>> > On Thu, Jan 18, 2018 at 11:17 AM, Peter Manev <petermanev at gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> On Thu, Jan 18, 2018 at 4:43 PM, Steve Castellarin
>>>>> >> <steve.castellarin at gmail.com> wrote:
>>>>> >> > Hey Peter,
>>>>> >> >
>>>>> >> > I tried as you asked.  Less than 15 minutes after I restarted
>>>>> Suricata I
>>>>> >> > saw
>>>>> >> > my first CPU hitting 100% and one host buffer dropping all
>>>>> packets.
>>>>> >> > Shortly
>>>>> >> > after that the second CPU hit 100% and a second host buffer began
>>>>> >> > dropping
>>>>> >> > all packets.  I'm attaching the stats.log where you'll see at
>>>>> 10:31:11
>>>>> >> > the
>>>>> >> > first host buffer (nt1.drop) starts to register dropped packets,
>>>>> then at
>>>>> >> > 10:31:51 you'll see host buffer nt6.drop begin to register dropped
>>>>> >> > packets.
>>>>> >> > At that point I issued the kill.
>>>>> >> >
>>>>> >>
>>>>> >> What is the size of the traffic?
>>>>> >> Can you also try
>>>>> >> detect:
>>>>> >>   - profile: high
>>>>> >>
>>>>> >> (as opposed to "custom")
>>>>> >>
>>>>> >> Also if can run it in verbose mode (-vvv)   and send me that compete
>>>>> >> output after you start having the issues.
>>>>> >>
>>>>> >> Thanks
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> > Steve
>>>>> >> >
>>>>> >> > On Thu, Jan 18, 2018 at 10:05 AM, Peter Manev <
>>>>> petermanev at gmail.com>
>>>>> >> > wrote:
>>>>> >> >>
>>>>> >> >> On Wed, Jan 17, 2018 at 1:29 PM, Steve Castellarin
>>>>> >> >> <steve.castellarin at gmail.com> wrote:
>>>>> >> >> > Hey Pete,
>>>>> >> >> >
>>>>> >> >> > Here's the YAML file from the last time I attempted to run
>>>>> 4.0.3 -
>>>>> >> >> > with
>>>>> >> >> > the
>>>>> >> >> > network information removed.  Let me know if you need anything
>>>>> else
>>>>> >> >> > from
>>>>> >> >> > our
>>>>> >> >> > configuration.  I'll also go to the redmine site to open a bug
>>>>> >> >> > report.
>>>>> >> >> >
>>>>> >> >> > Steve
>>>>> >> >>
>>>>> >> >> Hi Steve,
>>>>> >> >>
>>>>> >> >> Can you try without -
>>>>> >> >>
>>>>> >> >>   midstream: true
>>>>> >> >>   asyn-oneside:true
>>>>> >> >> so
>>>>> >> >>   #midstream: true
>>>>> >> >>   #asyn-oneside:true
>>>>> >> >>
>>>>> >> >> and lower the "prealloc-session: 1000000" to 100 000 for example
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> Thank you.
>>>>> >> >>
>>>>> >> >> >
>>>>> >> >> > On Wed, Jan 17, 2018 at 6:36 AM, Peter Manev <
>>>>> petermanev at gmail.com>
>>>>> >> >> > wrote:
>>>>> >> >> >>
>>>>> >> >> >> On Tue, Jan 16, 2018 at 4:12 PM, Steve Castellarin
>>>>> >> >> >> <steve.castellarin at gmail.com> wrote:
>>>>> >> >> >> > Hey Peter, I didn't know if you had a chance to look at the
>>>>> stats
>>>>> >> >> >> > log
>>>>> >> >> >> > and
>>>>> >> >> >> > configuration file I sent.  So far, running 3.1.1 with the
>>>>> updated
>>>>> >> >> >> > Napatech
>>>>> >> >> >> > drivers my system is running without any issues.
>>>>> >> >> >> >
>>>>> >> >> >>
>>>>> >> >> >> The toughest part of the troubleshooting is that i dont have
>>>>> the set
>>>>> >> >> >> up to reproduce this.
>>>>> >> >> >> I didn't see anything that could lead me to definitive
>>>>> conclusion
>>>>> >> >> >> from
>>>>> >> >> >> the stats log.
>>>>> >> >> >> Can you please open a bug report on our redmine with the
>>>>> details
>>>>> >> >> >> form
>>>>> >> >> >> this mialthread?
>>>>> >> >> >>
>>>>> >> >> >> Would it be possible to share the suricata.yaml (privately if
>>>>> you
>>>>> >> >> >> would like works too; remove all networks)?
>>>>> >> >> >>
>>>>> >> >> >> Thank you
>>>>> >> >> >>
>>>>> >> >> >> > On Thu, Jan 11, 2018 at 12:54 PM, Steve Castellarin
>>>>> >> >> >> > <steve.castellarin at gmail.com> wrote:
>>>>> >> >> >> >>
>>>>> >> >> >> >> Here is the zipped stats.log.  I restarted the Napatech
>>>>> drivers
>>>>> >> >> >> >> before
>>>>> >> >> >> >> running Suricata 4.0.3 to clear out any previous drop
>>>>> counters,
>>>>> >> >> >> >> etc.
>>>>> >> >> >> >>
>>>>> >> >> >> >> The first time I saw a packet drop was at the 12:20:51
>>>>> mark, and
>>>>> >> >> >> >> you'll
>>>>> >> >> >> >> see "nt12.drop" increment.  During this time one of the
>>>>> CPUs
>>>>> >> >> >> >> acting
>>>>> >> >> >> >> as
>>>>> >> >> >> >> a
>>>>> >> >> >> >> "worker" was at 100%.  But these drops recovered at the
>>>>> 12:20:58
>>>>> >> >> >> >> mark,
>>>>> >> >> >> >> where
>>>>> >> >> >> >> "nt12.drop" stays constant at 13803.  The big issue
>>>>> triggered at
>>>>> >> >> >> >> the
>>>>> >> >> >> >> 12:27:05 mark in the file - where one worker CPU was stuck
>>>>> at
>>>>> >> >> >> >> 100%
>>>>> >> >> >> >> followed
>>>>> >> >> >> >> by packet drops in host buffer "nt3.drop".  Then came a
>>>>> second
>>>>> >> >> >> >> CPU
>>>>> >> >> >> >> at
>>>>> >> >> >> >> 100%
>>>>> >> >> >> >> (another "worker" CPU) and packet drops in buffer
>>>>> "nt2.drop" at
>>>>> >> >> >> >> 12:27:33.  I
>>>>> >> >> >> >> finally killed Suricata just before 12:27:54, where you
>>>>> see all
>>>>> >> >> >> >> host
>>>>> >> >> >> >> buffers
>>>>> >> >> >> >> beginning to drop packets.
>>>>> >> >> >> >>
>>>>> >> >> >> >> I'm also including the output from the "suricata
>>>>> --dump-config"
>>>>> >> >> >> >> command.
>>>>> >> >> >> >>
>>>>> >> >> >> >> On Thu, Jan 11, 2018 at 11:40 AM, Peter Manev
>>>>> >> >> >> >> <petermanev at gmail.com>
>>>>> >> >> >> >> wrote:
>>>>> >> >> >> >>>
>>>>> >> >> >> >>> On Thu, Jan 11, 2018 at 8:02 AM, Steve Castellarin
>>>>> >> >> >> >>> <steve.castellarin at gmail.com> wrote:
>>>>> >> >> >> >>> > Peter, yes that is correct.  I worked for almost a
>>>>> couple
>>>>> >> >> >> >>> > weeks
>>>>> >> >> >> >>> > with
>>>>> >> >> >> >>> > Napatech support and they believed the Napatech setup
>>>>> >> >> >> >>> > (ntservice.ini
>>>>> >> >> >> >>> > and
>>>>> >> >> >> >>> > custom NTPL script) are working as they should.
>>>>> >> >> >> >>> >
>>>>> >> >> >> >>>
>>>>> >> >> >> >>> Ok.
>>>>> >> >> >> >>>
>>>>> >> >> >> >>> One major difference between Suricata 3.x and 4.0.x in
>>>>> terms of
>>>>> >> >> >> >>> Napatech is that they did update the code, some fixes and
>>>>> >> >> >> >>> updated
>>>>> >> >> >> >>> the
>>>>> >> >> >> >>> counters.
>>>>> >> >> >> >>> There were a bunch of upgrades in Suricata too.
>>>>> >> >> >> >>> Is it possible to send over a stats.log - when the issue
>>>>> starts
>>>>> >> >> >> >>> occuring?
>>>>> >> >> >> >>>
>>>>> >> >> >> >>>
>>>>> >> >> >> >>> > On Thu, Jan 11, 2018 at 9:52 AM, Peter Manev
>>>>> >> >> >> >>> > <petermanev at gmail.com>
>>>>> >> >> >> >>> > wrote:
>>>>> >> >> >> >>> >>
>>>>> >> >> >> >>> >> I
>>>>> >> >> >> >>> >>
>>>>> >> >> >> >>> >> On 11 Jan 2018, at 07:19, Steve Castellarin
>>>>> >> >> >> >>> >> <steve.castellarin at gmail.com>
>>>>> >> >> >> >>> >> wrote:
>>>>> >> >> >> >>> >>
>>>>> >> >> >> >>> >> After my last email yesterday I decided to go back to
>>>>> our
>>>>> >> >> >> >>> >> 3.1.1
>>>>> >> >> >> >>> >> install of
>>>>> >> >> >> >>> >> Suricata, with
>>>>> >> >> >> >>> >>
>>>>> >> >> >> >>> >>
>>>>> >> >> >> >>> >> the upgraded Napatech version.  Since then I've seen no
>>>>> >> >> >> >>> >> packets
>>>>> >> >> >> >>> >> dropped
>>>>> >> >> >> >>> >> with sustained bandwidth of between 1 and 1.7Gbps.  So
>>>>> I'm
>>>>> >> >> >> >>> >> not
>>>>> >> >> >> >>> >> sure
>>>>> >> >> >> >>> >> what is
>>>>> >> >> >> >>> >> going on with my configuration/setup of Suricata 4.0.3.
>>>>> >> >> >> >>> >>
>>>>> >> >> >> >>> >>
>>>>> >> >> >> >>> >>
>>>>> >> >> >> >>> >> So the only thing that you changed is the upgrade of
>>>>> the
>>>>> >> >> >> >>> >> Napatech
>>>>> >> >> >> >>> >> drivers
>>>>> >> >> >> >>> >> ?
>>>>> >> >> >> >>> >> The Suricata config stayed the same -  you just
>>>>> upgraded to
>>>>> >> >> >> >>> >> 4.0.3
>>>>> >> >> >> >>> >> (from
>>>>> >> >> >> >>> >> 3.1.1) and the observed effect was - after a while all
>>>>> (or
>>>>> >> >> >> >>> >> most)
>>>>> >> >> >> >>> >> cpus
>>>>> >> >> >> >>> >> get
>>>>> >> >> >> >>> >> pegged at 100% - is that correct ?
>>>>> >> >> >> >>> >>
>>>>> >> >> >> >>> >>
>>>>> >> >> >> >>> >> On Wed, Jan 10, 2018 at 4:46 PM, Steve Castellarin
>>>>> >> >> >> >>> >> <steve.castellarin at gmail.com> wrote:
>>>>> >> >> >> >>> >>>
>>>>> >> >> >> >>> >>> Hey Peter, no there is no error messages.
>>>>> >> >> >> >>> >>>
>>>>> >> >> >> >>> >>> On Jan 10, 2018 4:37 PM, "Peter Manev"
>>>>> >> >> >> >>> >>> <petermanev at gmail.com>
>>>>> >> >> >> >>> >>> wrote:
>>>>> >> >> >> >>> >>>
>>>>> >> >> >> >>> >>> On Wed, Jan 10, 2018 at 11:29 AM, Steve Castellarin
>>>>> >> >> >> >>> >>> <steve.castellarin at gmail.com> wrote:
>>>>> >> >> >> >>> >>> > Hey Peter,
>>>>> >> >> >> >>> >>>
>>>>> >> >> >> >>> >>> Are there any errors msgs in suricata.log when that
>>>>> happens
>>>>> >> >> >> >>> >>> ?
>>>>> >> >> >> >>> >>>
>>>>> >> >> >> >>> >>> Thank you
>>>>> >> >> >> >>> >>>
>>>>> >> >> >> >>> >>>
>>>>> >> >> >> >>> >>>
>>>>> >> >> >> >>> >>> --
>>>>> >> >> >> >>> >>> Regards,
>>>>> >> >> >> >>> >>> Peter Manev
>>>>> >> >> >> >>> >>>
>>>>> >> >> >> >>> >>>
>>>>> >> >> >> >>> >>
>>>>> >> >> >> >>> >
>>>>> >> >> >> >>>
>>>>> >> >> >> >>>
>>>>> >> >> >> >>>
>>>>> >> >> >> >>> --
>>>>> >> >> >> >>> Regards,
>>>>> >> >> >> >>> Peter Manev
>>>>> >> >> >> >>
>>>>> >> >> >> >>
>>>>> >> >> >> >
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >> --
>>>>> >> >> >> Regards,
>>>>> >> >> >> Peter Manev
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Regards,
>>>>> >> >> Peter Manev
>>>>> >> >
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Regards,
>>>>> >> Peter Manev
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Peter Manev
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openinfosecfoundation.org/pipermail/oisf-users/attachments/20180123/a73294c3/attachment-0002.html>


More information about the Oisf-users mailing list