[Oisf-users] Suricata 4.0.3 with Napatech problems

Peter Manev petermanev at gmail.com
Wed Jan 24 14:12:38 UTC 2018


On Wed, Jan 24, 2018 at 2:44 PM, Steve Castellarin
<steve.castellarin at gmail.com> wrote:
> I'll give those a try now and let you know what happens.  In an earlier
> email I noted something and wanted to get your take on it...
>
> "I've noticed one thing that's strange.  In my YAML file I have the
> "autofp-scheduler" set to "active-packets".  Yet everytime I run Suricata I
> see this noted in suricata.log "using flow hash instead of active packets".
> When I comment out the "autofp-scheduler" setting in the YAML file then that
> message disappears.  Any idea on what that is all about?"
>

This is the default setting -
https://redmine.openinfosecfoundation.org/projects/suricata/repository/revisions/master/entry/suricata.yaml.in#L1024
with regards to autofp - but in your case you are using "--napatech
--runmode workers" so it should be unrelated.


> On Tue, Jan 23, 2018 at 6:00 PM, Peter Manev <petermanev at gmail.com> wrote:
>>
>> On Tue, Jan 23, 2018 at 9:51 PM, Steve Castellarin
>> <steve.castellarin at gmail.com> wrote:
>> > Peter,
>> >
>> > I reviewed my compile of Suricata 4.0.3 I noticed that I was using
>> > Hyperscan
>> > version 4.7, as opposed to version 4.2 noted in the Suricata
>> > documentation
>> > (http://suricata.readthedocs.io/en/latest/performance/hyperscan.html).
>> > After recompiling with 4.2 I was able to get Suricata 4.0.3 to run for
>> > 42
>> > minutes before it started dropping packets uncontrollably.
>> >
>>
>> If that made a change in behavior - can you try mpm-algo; ac-ks and
>> spm-algo: bm in the suricata.yaml?
>>
>> > I then made a change to /proc/sys/vm/max_map_count based on a note in
>> > Napatech's documentation: "Especially for large host buffer
>> > configurations
>> > it is necessary to adjust the kernel sysctl "vm.max_map_count"
>> > (/proc/sys/vm/max_map_count).  The kernel sysctl "vm.max_map_count"
>> > (/proc/sys/vm/max_map_count) should be adjusted to (at least) the total
>> > configured host buffer memory in MB multiplied by four.
>> > Example for total host buffer size 128GB (131072MB): 131072*4 = 524288.
>> > Hence the minimum value for "vm.max_map_count" is 524288."
>> >
>> > In my case I'm using 17 host buffers at 2048MB per ((17 * 2048) * 4),
>> > which
>> > would be 139264.  My vm.max_map_count previously was 65530 (I guess
>> > default
>> > for Ubuntu 14.04).  After changing that and re-running Suricata 4.0.3 it
>> > ran
>> > for 45 minutes before the buffer/CPU issue came back.
>> >
>> > On Tue, Jan 23, 2018 at 9:49 AM, Steve Castellarin
>> > <steve.castellarin at gmail.com> wrote:
>> >>
>> >> Hi Peter,
>> >>
>> >> I just realized I responded directly to you instead of the mailing list
>> >> -
>> >> so here's my response, updated.
>> >>
>> >> I made a change to my YAML file for 4.0.3, dropping the
>> >> detect-thread-ratio from 1.5 to 1 and on Friday was able to run
>> >> Suricata
>> >> 4.0.3 for five hours before the issue occurred.  This run did handle
>> >> sustained network traffic of 1.2 through 1.7gbps.  So that is a step in
>> >> the
>> >> positive direction.  I'm going to have a hard time running 4.0.3
>> >> without
>> >> rules, as this unfortunately is our only Suricata instance running our
>> >> rule
>> >> set.
>> >>
>> >> I've noticed one thing that's strange.  In my YAML file I have the
>> >> "autofp-scheduler" set to "active-packets".  Yet everytime I run
>> >> Suricata I
>> >> see this noted in suricata.log "using flow hash instead of active
>> >> packets".
>> >> When I comment out the "autofp-scheduler" setting in the YAML file then
>> >> that
>> >> message disappears.  Any idea on what that is all about?
>> >>
>> >> On Sun, Jan 21, 2018 at 3:49 PM, Peter Manev <petermanev at gmail.com>
>> >> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On 18 Jan 2018, at 19:21, Steve Castellarin
>> >>> <steve.castellarin at gmail.com>
>> >>> wrote:
>> >>>
>> >>> And also, the bandwidth utilization was just over 800Mbps.
>> >>>
>> >>>
>> >>> Can you try the same run but this time - load no rules. I would like
>> >>> to
>> >>> see if it would make difference or not in the same amount of time.
>> >>>
>> >>>
>> >>> On Thu, Jan 18, 2018 at 1:16 PM, Steve Castellarin
>> >>> <steve.castellarin at gmail.com> wrote:
>> >>>>
>> >>>> Hey Peter,
>> >>>>
>> >>>> Those changes didn't help.  Around 23+ minutes into the run one
>> >>>> worker
>> >>>> CPU (#30) stayed at 100% while buffer NT11 dropped packets and would
>> >>>> not
>> >>>> recover.  I'm attaching a zip file that has the stats.log for that
>> >>>> run, the
>> >>>> suricata.log file as well as the information seen at the command line
>> >>>> after
>> >>>> issuing "/usr/bin/suricata -vvv -c /etc/suricata/suricata.yaml
>> >>>> --napatech
>> >>>> --runmode workers -D".
>> >>>>
>> >>>> Steve
>> >>>>
>> >>>>
>> >>>> On Thu, Jan 18, 2018 at 11:30 AM, Steve Castellarin
>> >>>> <steve.castellarin at gmail.com> wrote:
>> >>>>>
>> >>>>> We never see above 2Gbps.  When the issue occurred a little bit ago
>> >>>>> I
>> >>>>> was running the Napatech "monitoring" tool and it was saying we were
>> >>>>> between
>> >>>>> 650-900Mbps.  I'll note the bandwidth utilization when the next
>> >>>>> issue
>> >>>>> occurs.
>> >>>>>
>> >>>>> On Thu, Jan 18, 2018 at 11:28 AM, Peter Manev <petermanev at gmail.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> On Thu, Jan 18, 2018 at 5:27 PM, Steve Castellarin
>> >>>>>> <steve.castellarin at gmail.com> wrote:
>> >>>>>> > When you mean the "size of the traffic", are you asking what the
>> >>>>>> > bandwidth
>> >>>>>> > utilization is at the time the issue begins?
>> >>>>>>
>> >>>>>> Sorry - i mean the traffic you sniff - 1/5/10...Gbps ?
>> >>>>>>
>> >>>>>> >
>> >>>>>> > I will set things up and send you any/all output after the issue
>> >>>>>> > starts.
>> >>>>>> >
>> >>>>>> > On Thu, Jan 18, 2018 at 11:17 AM, Peter Manev
>> >>>>>> > <petermanev at gmail.com>
>> >>>>>> > wrote:
>> >>>>>> >>
>> >>>>>> >> On Thu, Jan 18, 2018 at 4:43 PM, Steve Castellarin
>> >>>>>> >> <steve.castellarin at gmail.com> wrote:
>> >>>>>> >> > Hey Peter,
>> >>>>>> >> >
>> >>>>>> >> > I tried as you asked.  Less than 15 minutes after I restarted
>> >>>>>> >> > Suricata I
>> >>>>>> >> > saw
>> >>>>>> >> > my first CPU hitting 100% and one host buffer dropping all
>> >>>>>> >> > packets.
>> >>>>>> >> > Shortly
>> >>>>>> >> > after that the second CPU hit 100% and a second host buffer
>> >>>>>> >> > began
>> >>>>>> >> > dropping
>> >>>>>> >> > all packets.  I'm attaching the stats.log where you'll see at
>> >>>>>> >> > 10:31:11
>> >>>>>> >> > the
>> >>>>>> >> > first host buffer (nt1.drop) starts to register dropped
>> >>>>>> >> > packets,
>> >>>>>> >> > then at
>> >>>>>> >> > 10:31:51 you'll see host buffer nt6.drop begin to register
>> >>>>>> >> > dropped
>> >>>>>> >> > packets.
>> >>>>>> >> > At that point I issued the kill.
>> >>>>>> >> >
>> >>>>>> >>
>> >>>>>> >> What is the size of the traffic?
>> >>>>>> >> Can you also try
>> >>>>>> >> detect:
>> >>>>>> >>   - profile: high
>> >>>>>> >>
>> >>>>>> >> (as opposed to "custom")
>> >>>>>> >>
>> >>>>>> >> Also if can run it in verbose mode (-vvv)   and send me that
>> >>>>>> >> compete
>> >>>>>> >> output after you start having the issues.
>> >>>>>> >>
>> >>>>>> >> Thanks
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> > Steve
>> >>>>>> >> >
>> >>>>>> >> > On Thu, Jan 18, 2018 at 10:05 AM, Peter Manev
>> >>>>>> >> > <petermanev at gmail.com>
>> >>>>>> >> > wrote:
>> >>>>>> >> >>
>> >>>>>> >> >> On Wed, Jan 17, 2018 at 1:29 PM, Steve Castellarin
>> >>>>>> >> >> <steve.castellarin at gmail.com> wrote:
>> >>>>>> >> >> > Hey Pete,
>> >>>>>> >> >> >
>> >>>>>> >> >> > Here's the YAML file from the last time I attempted to run
>> >>>>>> >> >> > 4.0.3 -
>> >>>>>> >> >> > with
>> >>>>>> >> >> > the
>> >>>>>> >> >> > network information removed.  Let me know if you need
>> >>>>>> >> >> > anything
>> >>>>>> >> >> > else
>> >>>>>> >> >> > from
>> >>>>>> >> >> > our
>> >>>>>> >> >> > configuration.  I'll also go to the redmine site to open a
>> >>>>>> >> >> > bug
>> >>>>>> >> >> > report.
>> >>>>>> >> >> >
>> >>>>>> >> >> > Steve
>> >>>>>> >> >>
>> >>>>>> >> >> Hi Steve,
>> >>>>>> >> >>
>> >>>>>> >> >> Can you try without -
>> >>>>>> >> >>
>> >>>>>> >> >>   midstream: true
>> >>>>>> >> >>   asyn-oneside:true
>> >>>>>> >> >> so
>> >>>>>> >> >>   #midstream: true
>> >>>>>> >> >>   #asyn-oneside:true
>> >>>>>> >> >>
>> >>>>>> >> >> and lower the "prealloc-session: 1000000" to 100 000 for
>> >>>>>> >> >> example
>> >>>>>> >> >>
>> >>>>>> >> >>
>> >>>>>> >> >> Thank you.
>> >>>>>> >> >>
>> >>>>>> >> >> >
>> >>>>>> >> >> > On Wed, Jan 17, 2018 at 6:36 AM, Peter Manev
>> >>>>>> >> >> > <petermanev at gmail.com>
>> >>>>>> >> >> > wrote:
>> >>>>>> >> >> >>
>> >>>>>> >> >> >> On Tue, Jan 16, 2018 at 4:12 PM, Steve Castellarin
>> >>>>>> >> >> >> <steve.castellarin at gmail.com> wrote:
>> >>>>>> >> >> >> > Hey Peter, I didn't know if you had a chance to look at
>> >>>>>> >> >> >> > the
>> >>>>>> >> >> >> > stats
>> >>>>>> >> >> >> > log
>> >>>>>> >> >> >> > and
>> >>>>>> >> >> >> > configuration file I sent.  So far, running 3.1.1 with
>> >>>>>> >> >> >> > the
>> >>>>>> >> >> >> > updated
>> >>>>>> >> >> >> > Napatech
>> >>>>>> >> >> >> > drivers my system is running without any issues.
>> >>>>>> >> >> >> >
>> >>>>>> >> >> >>
>> >>>>>> >> >> >> The toughest part of the troubleshooting is that i dont
>> >>>>>> >> >> >> have
>> >>>>>> >> >> >> the set
>> >>>>>> >> >> >> up to reproduce this.
>> >>>>>> >> >> >> I didn't see anything that could lead me to definitive
>> >>>>>> >> >> >> conclusion
>> >>>>>> >> >> >> from
>> >>>>>> >> >> >> the stats log.
>> >>>>>> >> >> >> Can you please open a bug report on our redmine with the
>> >>>>>> >> >> >> details
>> >>>>>> >> >> >> form
>> >>>>>> >> >> >> this mialthread?
>> >>>>>> >> >> >>
>> >>>>>> >> >> >> Would it be possible to share the suricata.yaml (privately
>> >>>>>> >> >> >> if
>> >>>>>> >> >> >> you
>> >>>>>> >> >> >> would like works too; remove all networks)?
>> >>>>>> >> >> >>
>> >>>>>> >> >> >> Thank you
>> >>>>>> >> >> >>
>> >>>>>> >> >> >> > On Thu, Jan 11, 2018 at 12:54 PM, Steve Castellarin
>> >>>>>> >> >> >> > <steve.castellarin at gmail.com> wrote:
>> >>>>>> >> >> >> >>
>> >>>>>> >> >> >> >> Here is the zipped stats.log.  I restarted the Napatech
>> >>>>>> >> >> >> >> drivers
>> >>>>>> >> >> >> >> before
>> >>>>>> >> >> >> >> running Suricata 4.0.3 to clear out any previous drop
>> >>>>>> >> >> >> >> counters,
>> >>>>>> >> >> >> >> etc.
>> >>>>>> >> >> >> >>
>> >>>>>> >> >> >> >> The first time I saw a packet drop was at the 12:20:51
>> >>>>>> >> >> >> >> mark, and
>> >>>>>> >> >> >> >> you'll
>> >>>>>> >> >> >> >> see "nt12.drop" increment.  During this time one of the
>> >>>>>> >> >> >> >> CPUs
>> >>>>>> >> >> >> >> acting
>> >>>>>> >> >> >> >> as
>> >>>>>> >> >> >> >> a
>> >>>>>> >> >> >> >> "worker" was at 100%.  But these drops recovered at the
>> >>>>>> >> >> >> >> 12:20:58
>> >>>>>> >> >> >> >> mark,
>> >>>>>> >> >> >> >> where
>> >>>>>> >> >> >> >> "nt12.drop" stays constant at 13803.  The big issue
>> >>>>>> >> >> >> >> triggered at
>> >>>>>> >> >> >> >> the
>> >>>>>> >> >> >> >> 12:27:05 mark in the file - where one worker CPU was
>> >>>>>> >> >> >> >> stuck
>> >>>>>> >> >> >> >> at
>> >>>>>> >> >> >> >> 100%
>> >>>>>> >> >> >> >> followed
>> >>>>>> >> >> >> >> by packet drops in host buffer "nt3.drop".  Then came a
>> >>>>>> >> >> >> >> second
>> >>>>>> >> >> >> >> CPU
>> >>>>>> >> >> >> >> at
>> >>>>>> >> >> >> >> 100%
>> >>>>>> >> >> >> >> (another "worker" CPU) and packet drops in buffer
>> >>>>>> >> >> >> >> "nt2.drop" at
>> >>>>>> >> >> >> >> 12:27:33.  I
>> >>>>>> >> >> >> >> finally killed Suricata just before 12:27:54, where you
>> >>>>>> >> >> >> >> see all
>> >>>>>> >> >> >> >> host
>> >>>>>> >> >> >> >> buffers
>> >>>>>> >> >> >> >> beginning to drop packets.
>> >>>>>> >> >> >> >>
>> >>>>>> >> >> >> >> I'm also including the output from the "suricata
>> >>>>>> >> >> >> >> --dump-config"
>> >>>>>> >> >> >> >> command.
>> >>>>>> >> >> >> >>
>> >>>>>> >> >> >> >> On Thu, Jan 11, 2018 at 11:40 AM, Peter Manev
>> >>>>>> >> >> >> >> <petermanev at gmail.com>
>> >>>>>> >> >> >> >> wrote:
>> >>>>>> >> >> >> >>>
>> >>>>>> >> >> >> >>> On Thu, Jan 11, 2018 at 8:02 AM, Steve Castellarin
>> >>>>>> >> >> >> >>> <steve.castellarin at gmail.com> wrote:
>> >>>>>> >> >> >> >>> > Peter, yes that is correct.  I worked for almost a
>> >>>>>> >> >> >> >>> > couple
>> >>>>>> >> >> >> >>> > weeks
>> >>>>>> >> >> >> >>> > with
>> >>>>>> >> >> >> >>> > Napatech support and they believed the Napatech
>> >>>>>> >> >> >> >>> > setup
>> >>>>>> >> >> >> >>> > (ntservice.ini
>> >>>>>> >> >> >> >>> > and
>> >>>>>> >> >> >> >>> > custom NTPL script) are working as they should.
>> >>>>>> >> >> >> >>> >
>> >>>>>> >> >> >> >>>
>> >>>>>> >> >> >> >>> Ok.
>> >>>>>> >> >> >> >>>
>> >>>>>> >> >> >> >>> One major difference between Suricata 3.x and 4.0.x in
>> >>>>>> >> >> >> >>> terms of
>> >>>>>> >> >> >> >>> Napatech is that they did update the code, some fixes
>> >>>>>> >> >> >> >>> and
>> >>>>>> >> >> >> >>> updated
>> >>>>>> >> >> >> >>> the
>> >>>>>> >> >> >> >>> counters.
>> >>>>>> >> >> >> >>> There were a bunch of upgrades in Suricata too.
>> >>>>>> >> >> >> >>> Is it possible to send over a stats.log - when the
>> >>>>>> >> >> >> >>> issue
>> >>>>>> >> >> >> >>> starts
>> >>>>>> >> >> >> >>> occuring?
>> >>>>>> >> >> >> >>>
>> >>>>>> >> >> >> >>>
>> >>>>>> >> >> >> >>> > On Thu, Jan 11, 2018 at 9:52 AM, Peter Manev
>> >>>>>> >> >> >> >>> > <petermanev at gmail.com>
>> >>>>>> >> >> >> >>> > wrote:
>> >>>>>> >> >> >> >>> >>
>> >>>>>> >> >> >> >>> >> I
>> >>>>>> >> >> >> >>> >>
>> >>>>>> >> >> >> >>> >> On 11 Jan 2018, at 07:19, Steve Castellarin
>> >>>>>> >> >> >> >>> >> <steve.castellarin at gmail.com>
>> >>>>>> >> >> >> >>> >> wrote:
>> >>>>>> >> >> >> >>> >>
>> >>>>>> >> >> >> >>> >> After my last email yesterday I decided to go back
>> >>>>>> >> >> >> >>> >> to
>> >>>>>> >> >> >> >>> >> our
>> >>>>>> >> >> >> >>> >> 3.1.1
>> >>>>>> >> >> >> >>> >> install of
>> >>>>>> >> >> >> >>> >> Suricata, with
>> >>>>>> >> >> >> >>> >>
>> >>>>>> >> >> >> >>> >>
>> >>>>>> >> >> >> >>> >> the upgraded Napatech version.  Since then I've
>> >>>>>> >> >> >> >>> >> seen
>> >>>>>> >> >> >> >>> >> no
>> >>>>>> >> >> >> >>> >> packets
>> >>>>>> >> >> >> >>> >> dropped
>> >>>>>> >> >> >> >>> >> with sustained bandwidth of between 1 and 1.7Gbps.
>> >>>>>> >> >> >> >>> >> So
>> >>>>>> >> >> >> >>> >> I'm
>> >>>>>> >> >> >> >>> >> not
>> >>>>>> >> >> >> >>> >> sure
>> >>>>>> >> >> >> >>> >> what is
>> >>>>>> >> >> >> >>> >> going on with my configuration/setup of Suricata
>> >>>>>> >> >> >> >>> >> 4.0.3.
>> >>>>>> >> >> >> >>> >>
>> >>>>>> >> >> >> >>> >>
>> >>>>>> >> >> >> >>> >>
>> >>>>>> >> >> >> >>> >> So the only thing that you changed is the upgrade
>> >>>>>> >> >> >> >>> >> of
>> >>>>>> >> >> >> >>> >> the
>> >>>>>> >> >> >> >>> >> Napatech
>> >>>>>> >> >> >> >>> >> drivers
>> >>>>>> >> >> >> >>> >> ?
>> >>>>>> >> >> >> >>> >> The Suricata config stayed the same -  you just
>> >>>>>> >> >> >> >>> >> upgraded to
>> >>>>>> >> >> >> >>> >> 4.0.3
>> >>>>>> >> >> >> >>> >> (from
>> >>>>>> >> >> >> >>> >> 3.1.1) and the observed effect was - after a while
>> >>>>>> >> >> >> >>> >> all
>> >>>>>> >> >> >> >>> >> (or
>> >>>>>> >> >> >> >>> >> most)
>> >>>>>> >> >> >> >>> >> cpus
>> >>>>>> >> >> >> >>> >> get
>> >>>>>> >> >> >> >>> >> pegged at 100% - is that correct ?
>> >>>>>> >> >> >> >>> >>
>> >>>>>> >> >> >> >>> >>
>> >>>>>> >> >> >> >>> >> On Wed, Jan 10, 2018 at 4:46 PM, Steve Castellarin
>> >>>>>> >> >> >> >>> >> <steve.castellarin at gmail.com> wrote:
>> >>>>>> >> >> >> >>> >>>
>> >>>>>> >> >> >> >>> >>> Hey Peter, no there is no error messages.
>> >>>>>> >> >> >> >>> >>>
>> >>>>>> >> >> >> >>> >>> On Jan 10, 2018 4:37 PM, "Peter Manev"
>> >>>>>> >> >> >> >>> >>> <petermanev at gmail.com>
>> >>>>>> >> >> >> >>> >>> wrote:
>> >>>>>> >> >> >> >>> >>>
>> >>>>>> >> >> >> >>> >>> On Wed, Jan 10, 2018 at 11:29 AM, Steve
>> >>>>>> >> >> >> >>> >>> Castellarin
>> >>>>>> >> >> >> >>> >>> <steve.castellarin at gmail.com> wrote:
>> >>>>>> >> >> >> >>> >>> > Hey Peter,
>> >>>>>> >> >> >> >>> >>>
>> >>>>>> >> >> >> >>> >>> Are there any errors msgs in suricata.log when
>> >>>>>> >> >> >> >>> >>> that
>> >>>>>> >> >> >> >>> >>> happens
>> >>>>>> >> >> >> >>> >>> ?
>> >>>>>> >> >> >> >>> >>>
>> >>>>>> >> >> >> >>> >>> Thank you
>> >>>>>> >> >> >> >>> >>>
>> >>>>>> >> >> >> >>> >>>
>> >>>>>> >> >> >> >>> >>>
>> >>>>>> >> >> >> >>> >>> --
>> >>>>>> >> >> >> >>> >>> Regards,
>> >>>>>> >> >> >> >>> >>> Peter Manev
>> >>>>>> >> >> >> >>> >>>
>> >>>>>> >> >> >> >>> >>>
>> >>>>>> >> >> >> >>> >>
>> >>>>>> >> >> >> >>> >
>> >>>>>> >> >> >> >>>
>> >>>>>> >> >> >> >>>
>> >>>>>> >> >> >> >>>
>> >>>>>> >> >> >> >>> --
>> >>>>>> >> >> >> >>> Regards,
>> >>>>>> >> >> >> >>> Peter Manev
>> >>>>>> >> >> >> >>
>> >>>>>> >> >> >> >>
>> >>>>>> >> >> >> >
>> >>>>>> >> >> >>
>> >>>>>> >> >> >>
>> >>>>>> >> >> >>
>> >>>>>> >> >> >> --
>> >>>>>> >> >> >> Regards,
>> >>>>>> >> >> >> Peter Manev
>> >>>>>> >> >> >
>> >>>>>> >> >> >
>> >>>>>> >> >>
>> >>>>>> >> >>
>> >>>>>> >> >>
>> >>>>>> >> >> --
>> >>>>>> >> >> Regards,
>> >>>>>> >> >> Peter Manev
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> --
>> >>>>>> >> Regards,
>> >>>>>> >> Peter Manev
>> >>>>>> >
>> >>>>>> >
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Regards,
>> >>>>>> Peter Manev
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Regards,
>> Peter Manev
>
>



-- 
Regards,
Peter Manev



More information about the Oisf-users mailing list