[Oisf-users] Suricata 4.0.3 with Napatech problems
Peter Manev
petermanev at gmail.com
Tue Jan 23 23:00:14 UTC 2018
On Tue, Jan 23, 2018 at 9:51 PM, Steve Castellarin
<steve.castellarin at gmail.com> wrote:
> Peter,
>
> I reviewed my compile of Suricata 4.0.3 I noticed that I was using Hyperscan
> version 4.7, as opposed to version 4.2 noted in the Suricata documentation
> (http://suricata.readthedocs.io/en/latest/performance/hyperscan.html).
> After recompiling with 4.2 I was able to get Suricata 4.0.3 to run for 42
> minutes before it started dropping packets uncontrollably.
>
If that made a change in behavior - can you try mpm-algo; ac-ks and
spm-algo: bm in the suricata.yaml?
> I then made a change to /proc/sys/vm/max_map_count based on a note in
> Napatech's documentation: "Especially for large host buffer configurations
> it is necessary to adjust the kernel sysctl "vm.max_map_count"
> (/proc/sys/vm/max_map_count). The kernel sysctl "vm.max_map_count"
> (/proc/sys/vm/max_map_count) should be adjusted to (at least) the total
> configured host buffer memory in MB multiplied by four.
> Example for total host buffer size 128GB (131072MB): 131072*4 = 524288.
> Hence the minimum value for "vm.max_map_count" is 524288."
>
> In my case I'm using 17 host buffers at 2048MB per ((17 * 2048) * 4), which
> would be 139264. My vm.max_map_count previously was 65530 (I guess default
> for Ubuntu 14.04). After changing that and re-running Suricata 4.0.3 it ran
> for 45 minutes before the buffer/CPU issue came back.
>
> On Tue, Jan 23, 2018 at 9:49 AM, Steve Castellarin
> <steve.castellarin at gmail.com> wrote:
>>
>> Hi Peter,
>>
>> I just realized I responded directly to you instead of the mailing list -
>> so here's my response, updated.
>>
>> I made a change to my YAML file for 4.0.3, dropping the
>> detect-thread-ratio from 1.5 to 1 and on Friday was able to run Suricata
>> 4.0.3 for five hours before the issue occurred. This run did handle
>> sustained network traffic of 1.2 through 1.7gbps. So that is a step in the
>> positive direction. I'm going to have a hard time running 4.0.3 without
>> rules, as this unfortunately is our only Suricata instance running our rule
>> set.
>>
>> I've noticed one thing that's strange. In my YAML file I have the
>> "autofp-scheduler" set to "active-packets". Yet everytime I run Suricata I
>> see this noted in suricata.log "using flow hash instead of active packets".
>> When I comment out the "autofp-scheduler" setting in the YAML file then that
>> message disappears. Any idea on what that is all about?
>>
>> On Sun, Jan 21, 2018 at 3:49 PM, Peter Manev <petermanev at gmail.com> wrote:
>>>
>>>
>>>
>>> On 18 Jan 2018, at 19:21, Steve Castellarin <steve.castellarin at gmail.com>
>>> wrote:
>>>
>>> And also, the bandwidth utilization was just over 800Mbps.
>>>
>>>
>>> Can you try the same run but this time - load no rules. I would like to
>>> see if it would make difference or not in the same amount of time.
>>>
>>>
>>> On Thu, Jan 18, 2018 at 1:16 PM, Steve Castellarin
>>> <steve.castellarin at gmail.com> wrote:
>>>>
>>>> Hey Peter,
>>>>
>>>> Those changes didn't help. Around 23+ minutes into the run one worker
>>>> CPU (#30) stayed at 100% while buffer NT11 dropped packets and would not
>>>> recover. I'm attaching a zip file that has the stats.log for that run, the
>>>> suricata.log file as well as the information seen at the command line after
>>>> issuing "/usr/bin/suricata -vvv -c /etc/suricata/suricata.yaml --napatech
>>>> --runmode workers -D".
>>>>
>>>> Steve
>>>>
>>>>
>>>> On Thu, Jan 18, 2018 at 11:30 AM, Steve Castellarin
>>>> <steve.castellarin at gmail.com> wrote:
>>>>>
>>>>> We never see above 2Gbps. When the issue occurred a little bit ago I
>>>>> was running the Napatech "monitoring" tool and it was saying we were between
>>>>> 650-900Mbps. I'll note the bandwidth utilization when the next issue
>>>>> occurs.
>>>>>
>>>>> On Thu, Jan 18, 2018 at 11:28 AM, Peter Manev <petermanev at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Thu, Jan 18, 2018 at 5:27 PM, Steve Castellarin
>>>>>> <steve.castellarin at gmail.com> wrote:
>>>>>> > When you mean the "size of the traffic", are you asking what the
>>>>>> > bandwidth
>>>>>> > utilization is at the time the issue begins?
>>>>>>
>>>>>> Sorry - i mean the traffic you sniff - 1/5/10...Gbps ?
>>>>>>
>>>>>> >
>>>>>> > I will set things up and send you any/all output after the issue
>>>>>> > starts.
>>>>>> >
>>>>>> > On Thu, Jan 18, 2018 at 11:17 AM, Peter Manev <petermanev at gmail.com>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> On Thu, Jan 18, 2018 at 4:43 PM, Steve Castellarin
>>>>>> >> <steve.castellarin at gmail.com> wrote:
>>>>>> >> > Hey Peter,
>>>>>> >> >
>>>>>> >> > I tried as you asked. Less than 15 minutes after I restarted
>>>>>> >> > Suricata I
>>>>>> >> > saw
>>>>>> >> > my first CPU hitting 100% and one host buffer dropping all
>>>>>> >> > packets.
>>>>>> >> > Shortly
>>>>>> >> > after that the second CPU hit 100% and a second host buffer began
>>>>>> >> > dropping
>>>>>> >> > all packets. I'm attaching the stats.log where you'll see at
>>>>>> >> > 10:31:11
>>>>>> >> > the
>>>>>> >> > first host buffer (nt1.drop) starts to register dropped packets,
>>>>>> >> > then at
>>>>>> >> > 10:31:51 you'll see host buffer nt6.drop begin to register
>>>>>> >> > dropped
>>>>>> >> > packets.
>>>>>> >> > At that point I issued the kill.
>>>>>> >> >
>>>>>> >>
>>>>>> >> What is the size of the traffic?
>>>>>> >> Can you also try
>>>>>> >> detect:
>>>>>> >> - profile: high
>>>>>> >>
>>>>>> >> (as opposed to "custom")
>>>>>> >>
>>>>>> >> Also if can run it in verbose mode (-vvv) and send me that
>>>>>> >> compete
>>>>>> >> output after you start having the issues.
>>>>>> >>
>>>>>> >> Thanks
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> > Steve
>>>>>> >> >
>>>>>> >> > On Thu, Jan 18, 2018 at 10:05 AM, Peter Manev
>>>>>> >> > <petermanev at gmail.com>
>>>>>> >> > wrote:
>>>>>> >> >>
>>>>>> >> >> On Wed, Jan 17, 2018 at 1:29 PM, Steve Castellarin
>>>>>> >> >> <steve.castellarin at gmail.com> wrote:
>>>>>> >> >> > Hey Pete,
>>>>>> >> >> >
>>>>>> >> >> > Here's the YAML file from the last time I attempted to run
>>>>>> >> >> > 4.0.3 -
>>>>>> >> >> > with
>>>>>> >> >> > the
>>>>>> >> >> > network information removed. Let me know if you need anything
>>>>>> >> >> > else
>>>>>> >> >> > from
>>>>>> >> >> > our
>>>>>> >> >> > configuration. I'll also go to the redmine site to open a bug
>>>>>> >> >> > report.
>>>>>> >> >> >
>>>>>> >> >> > Steve
>>>>>> >> >>
>>>>>> >> >> Hi Steve,
>>>>>> >> >>
>>>>>> >> >> Can you try without -
>>>>>> >> >>
>>>>>> >> >> midstream: true
>>>>>> >> >> asyn-oneside:true
>>>>>> >> >> so
>>>>>> >> >> #midstream: true
>>>>>> >> >> #asyn-oneside:true
>>>>>> >> >>
>>>>>> >> >> and lower the "prealloc-session: 1000000" to 100 000 for example
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> Thank you.
>>>>>> >> >>
>>>>>> >> >> >
>>>>>> >> >> > On Wed, Jan 17, 2018 at 6:36 AM, Peter Manev
>>>>>> >> >> > <petermanev at gmail.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >>
>>>>>> >> >> >> On Tue, Jan 16, 2018 at 4:12 PM, Steve Castellarin
>>>>>> >> >> >> <steve.castellarin at gmail.com> wrote:
>>>>>> >> >> >> > Hey Peter, I didn't know if you had a chance to look at the
>>>>>> >> >> >> > stats
>>>>>> >> >> >> > log
>>>>>> >> >> >> > and
>>>>>> >> >> >> > configuration file I sent. So far, running 3.1.1 with the
>>>>>> >> >> >> > updated
>>>>>> >> >> >> > Napatech
>>>>>> >> >> >> > drivers my system is running without any issues.
>>>>>> >> >> >> >
>>>>>> >> >> >>
>>>>>> >> >> >> The toughest part of the troubleshooting is that i dont have
>>>>>> >> >> >> the set
>>>>>> >> >> >> up to reproduce this.
>>>>>> >> >> >> I didn't see anything that could lead me to definitive
>>>>>> >> >> >> conclusion
>>>>>> >> >> >> from
>>>>>> >> >> >> the stats log.
>>>>>> >> >> >> Can you please open a bug report on our redmine with the
>>>>>> >> >> >> details
>>>>>> >> >> >> form
>>>>>> >> >> >> this mialthread?
>>>>>> >> >> >>
>>>>>> >> >> >> Would it be possible to share the suricata.yaml (privately if
>>>>>> >> >> >> you
>>>>>> >> >> >> would like works too; remove all networks)?
>>>>>> >> >> >>
>>>>>> >> >> >> Thank you
>>>>>> >> >> >>
>>>>>> >> >> >> > On Thu, Jan 11, 2018 at 12:54 PM, Steve Castellarin
>>>>>> >> >> >> > <steve.castellarin at gmail.com> wrote:
>>>>>> >> >> >> >>
>>>>>> >> >> >> >> Here is the zipped stats.log. I restarted the Napatech
>>>>>> >> >> >> >> drivers
>>>>>> >> >> >> >> before
>>>>>> >> >> >> >> running Suricata 4.0.3 to clear out any previous drop
>>>>>> >> >> >> >> counters,
>>>>>> >> >> >> >> etc.
>>>>>> >> >> >> >>
>>>>>> >> >> >> >> The first time I saw a packet drop was at the 12:20:51
>>>>>> >> >> >> >> mark, and
>>>>>> >> >> >> >> you'll
>>>>>> >> >> >> >> see "nt12.drop" increment. During this time one of the
>>>>>> >> >> >> >> CPUs
>>>>>> >> >> >> >> acting
>>>>>> >> >> >> >> as
>>>>>> >> >> >> >> a
>>>>>> >> >> >> >> "worker" was at 100%. But these drops recovered at the
>>>>>> >> >> >> >> 12:20:58
>>>>>> >> >> >> >> mark,
>>>>>> >> >> >> >> where
>>>>>> >> >> >> >> "nt12.drop" stays constant at 13803. The big issue
>>>>>> >> >> >> >> triggered at
>>>>>> >> >> >> >> the
>>>>>> >> >> >> >> 12:27:05 mark in the file - where one worker CPU was stuck
>>>>>> >> >> >> >> at
>>>>>> >> >> >> >> 100%
>>>>>> >> >> >> >> followed
>>>>>> >> >> >> >> by packet drops in host buffer "nt3.drop". Then came a
>>>>>> >> >> >> >> second
>>>>>> >> >> >> >> CPU
>>>>>> >> >> >> >> at
>>>>>> >> >> >> >> 100%
>>>>>> >> >> >> >> (another "worker" CPU) and packet drops in buffer
>>>>>> >> >> >> >> "nt2.drop" at
>>>>>> >> >> >> >> 12:27:33. I
>>>>>> >> >> >> >> finally killed Suricata just before 12:27:54, where you
>>>>>> >> >> >> >> see all
>>>>>> >> >> >> >> host
>>>>>> >> >> >> >> buffers
>>>>>> >> >> >> >> beginning to drop packets.
>>>>>> >> >> >> >>
>>>>>> >> >> >> >> I'm also including the output from the "suricata
>>>>>> >> >> >> >> --dump-config"
>>>>>> >> >> >> >> command.
>>>>>> >> >> >> >>
>>>>>> >> >> >> >> On Thu, Jan 11, 2018 at 11:40 AM, Peter Manev
>>>>>> >> >> >> >> <petermanev at gmail.com>
>>>>>> >> >> >> >> wrote:
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>> On Thu, Jan 11, 2018 at 8:02 AM, Steve Castellarin
>>>>>> >> >> >> >>> <steve.castellarin at gmail.com> wrote:
>>>>>> >> >> >> >>> > Peter, yes that is correct. I worked for almost a
>>>>>> >> >> >> >>> > couple
>>>>>> >> >> >> >>> > weeks
>>>>>> >> >> >> >>> > with
>>>>>> >> >> >> >>> > Napatech support and they believed the Napatech setup
>>>>>> >> >> >> >>> > (ntservice.ini
>>>>>> >> >> >> >>> > and
>>>>>> >> >> >> >>> > custom NTPL script) are working as they should.
>>>>>> >> >> >> >>> >
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>> Ok.
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>> One major difference between Suricata 3.x and 4.0.x in
>>>>>> >> >> >> >>> terms of
>>>>>> >> >> >> >>> Napatech is that they did update the code, some fixes and
>>>>>> >> >> >> >>> updated
>>>>>> >> >> >> >>> the
>>>>>> >> >> >> >>> counters.
>>>>>> >> >> >> >>> There were a bunch of upgrades in Suricata too.
>>>>>> >> >> >> >>> Is it possible to send over a stats.log - when the issue
>>>>>> >> >> >> >>> starts
>>>>>> >> >> >> >>> occuring?
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>> > On Thu, Jan 11, 2018 at 9:52 AM, Peter Manev
>>>>>> >> >> >> >>> > <petermanev at gmail.com>
>>>>>> >> >> >> >>> > wrote:
>>>>>> >> >> >> >>> >>
>>>>>> >> >> >> >>> >> I
>>>>>> >> >> >> >>> >>
>>>>>> >> >> >> >>> >> On 11 Jan 2018, at 07:19, Steve Castellarin
>>>>>> >> >> >> >>> >> <steve.castellarin at gmail.com>
>>>>>> >> >> >> >>> >> wrote:
>>>>>> >> >> >> >>> >>
>>>>>> >> >> >> >>> >> After my last email yesterday I decided to go back to
>>>>>> >> >> >> >>> >> our
>>>>>> >> >> >> >>> >> 3.1.1
>>>>>> >> >> >> >>> >> install of
>>>>>> >> >> >> >>> >> Suricata, with
>>>>>> >> >> >> >>> >>
>>>>>> >> >> >> >>> >>
>>>>>> >> >> >> >>> >> the upgraded Napatech version. Since then I've seen
>>>>>> >> >> >> >>> >> no
>>>>>> >> >> >> >>> >> packets
>>>>>> >> >> >> >>> >> dropped
>>>>>> >> >> >> >>> >> with sustained bandwidth of between 1 and 1.7Gbps. So
>>>>>> >> >> >> >>> >> I'm
>>>>>> >> >> >> >>> >> not
>>>>>> >> >> >> >>> >> sure
>>>>>> >> >> >> >>> >> what is
>>>>>> >> >> >> >>> >> going on with my configuration/setup of Suricata
>>>>>> >> >> >> >>> >> 4.0.3.
>>>>>> >> >> >> >>> >>
>>>>>> >> >> >> >>> >>
>>>>>> >> >> >> >>> >>
>>>>>> >> >> >> >>> >> So the only thing that you changed is the upgrade of
>>>>>> >> >> >> >>> >> the
>>>>>> >> >> >> >>> >> Napatech
>>>>>> >> >> >> >>> >> drivers
>>>>>> >> >> >> >>> >> ?
>>>>>> >> >> >> >>> >> The Suricata config stayed the same - you just
>>>>>> >> >> >> >>> >> upgraded to
>>>>>> >> >> >> >>> >> 4.0.3
>>>>>> >> >> >> >>> >> (from
>>>>>> >> >> >> >>> >> 3.1.1) and the observed effect was - after a while all
>>>>>> >> >> >> >>> >> (or
>>>>>> >> >> >> >>> >> most)
>>>>>> >> >> >> >>> >> cpus
>>>>>> >> >> >> >>> >> get
>>>>>> >> >> >> >>> >> pegged at 100% - is that correct ?
>>>>>> >> >> >> >>> >>
>>>>>> >> >> >> >>> >>
>>>>>> >> >> >> >>> >> On Wed, Jan 10, 2018 at 4:46 PM, Steve Castellarin
>>>>>> >> >> >> >>> >> <steve.castellarin at gmail.com> wrote:
>>>>>> >> >> >> >>> >>>
>>>>>> >> >> >> >>> >>> Hey Peter, no there is no error messages.
>>>>>> >> >> >> >>> >>>
>>>>>> >> >> >> >>> >>> On Jan 10, 2018 4:37 PM, "Peter Manev"
>>>>>> >> >> >> >>> >>> <petermanev at gmail.com>
>>>>>> >> >> >> >>> >>> wrote:
>>>>>> >> >> >> >>> >>>
>>>>>> >> >> >> >>> >>> On Wed, Jan 10, 2018 at 11:29 AM, Steve Castellarin
>>>>>> >> >> >> >>> >>> <steve.castellarin at gmail.com> wrote:
>>>>>> >> >> >> >>> >>> > Hey Peter,
>>>>>> >> >> >> >>> >>>
>>>>>> >> >> >> >>> >>> Are there any errors msgs in suricata.log when that
>>>>>> >> >> >> >>> >>> happens
>>>>>> >> >> >> >>> >>> ?
>>>>>> >> >> >> >>> >>>
>>>>>> >> >> >> >>> >>> Thank you
>>>>>> >> >> >> >>> >>>
>>>>>> >> >> >> >>> >>>
>>>>>> >> >> >> >>> >>>
>>>>>> >> >> >> >>> >>> --
>>>>>> >> >> >> >>> >>> Regards,
>>>>>> >> >> >> >>> >>> Peter Manev
>>>>>> >> >> >> >>> >>>
>>>>>> >> >> >> >>> >>>
>>>>>> >> >> >> >>> >>
>>>>>> >> >> >> >>> >
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>> --
>>>>>> >> >> >> >>> Regards,
>>>>>> >> >> >> >>> Peter Manev
>>>>>> >> >> >> >>
>>>>>> >> >> >> >>
>>>>>> >> >> >> >
>>>>>> >> >> >>
>>>>>> >> >> >>
>>>>>> >> >> >>
>>>>>> >> >> >> --
>>>>>> >> >> >> Regards,
>>>>>> >> >> >> Peter Manev
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> --
>>>>>> >> >> Regards,
>>>>>> >> >> Peter Manev
>>>>>> >> >
>>>>>> >> >
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Regards,
>>>>>> >> Peter Manev
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Peter Manev
>>>>>
>>>>>
>>>>
>>>
>>
>
--
Regards,
Peter Manev
More information about the Oisf-users
mailing list