[Oisf-users] [EXT] Re: Packet loss and increased resource consumption after upgrade to 4.1.2 with Rust support

Eric Urban eurban at umn.edu
Mon Feb 18 22:35:23 UTC 2019


First, just in case it was lost in the details I do want to call out
something from my first email.  When compiling Suricata 4.1.2 without Rust,
we have little to no packet loss so looks to be the same behavior as we
have with 4.0.6 compiled without Rust.  I wanted to make sure to highlight
this point more as I probably should have used a subject line to that
extent instead of writing that the upgrade is the main difference where we
saw this increase in packet loss.  The upgrade to 4.1.2 likely just
triggered the issue since that is when Rust is enabled by default.  Do note
though that in 4.1.2 if I compile with Rust and explicitly disable the 6
new parsers (not disabling SMB though), then we still do have significant
packet loss.  So it seems there is more to disabling Rust than just
disabling the new parsers in the config?

Now on to responding to your most recent email:

This is very low (lowest i have seen) for the "tcp.pkt_on_wrong_thread "
> counter especially with a big run like the shared stats -over 20 days.
> Do you mind sharing a bit more info on your NIC (Myricom i think - if I am
> not mistaken) - driver/version/any specific set up - we are trying to keep
> a record for that here -
> https://redmine.openinfosecfoundation.org/issues/2725#note-13


We are using Myricom cards with SNFv3 (Sniffer 3) drivers.  The driver
version from the system where the stats log was taken from is
3.0.15.50857.  We were previously running 3.0.14.50843 and experienced a
similar volume of drops, so we upgraded to this newer patch version to rule
out issues with the driver.  Would you like me to add these details
directly to the Redmine issue?  Are there any other specific details that
would be helpful?


Observations:
> with 4.1.2 this counters seem odd -
> capture.kernel_packets                     | Total
> | 16345348068
> capture.kernel_drops                        | Total
>  | 33492892572
> aka you have more kernel_drops than kernel_packets - seems odd.
> Which makes me think It maybe a "counter" bug of some sort. Are the NIC
> driver versions the same on both boxes / same NIC config etc ?


When I look at the delta counters for packets and drops, there are many
times these appear to possibly overflow as they are resulting in large
negative values.  These negative delta values are being added to the
running counters so make it look like we have more drops than packets.  We
see this more often with packets than with drops.  If this is an issue with
overflow that seems to make sense as we often have much higher packet
counts than drops.  One example I found where
stats.capture.kernel_packets_delta: -4293864433.  The
stats.capture.kernel_packets count prior to this was 20713149739 then went
down to 16419285306 the next time stats were generated.  I had noticed this
behavior prior to 4.1.2, and am quite sure we saw this in 3.2.2 and
possibly earlier than that.  I typically just filter out these values when
looking through the delta data.  I can file a bug for this issue if you'd
like as we have plenty of examples in our logs where this occurs both with
packet and drop counters.

Also, to answer your question about the NIC drivers/config even though it
may not matter anymore give the info about the negative delta counters, at
the time I submitted the stats logs to you we were running slightly
different Myricom driver versions.  The 4.0.6 sensors were running
3.0.14.50843 and the 4.1.2 ones were running 3.0.15.50857.  The reason for
the different driver versions is as I mentioned above that we upgraded the
driver version to rule that out as a potential cause and only did it on our
one set running 4.1.2.

Suggestions for the 4.1.2 set up:
> Try a run where you disable those (false) and run again to see if any
> difference (?) :
>   midstream: true            # allow midstream session pickups
>   async-oneside: true        # enable async stream handling


I just made these changes today.  I will let it run for a few days and get
back to you with the results.

Thank you for your assistance,
Eric


On Thu, Feb 14, 2019 at 3:59 PM Cloherty, Sean E <scloherty at mitre.org>
wrote:

> That also seems to be the case with me regarding high counts on
> tcp.pkt_on_wrong_thread.  I've reverted to 4.0.6 using the same setup and
> YAML and the stats look much better with no packet loss.  I will forward
> the data.
>
> Thanks.
>
> -----Original Message-----
> From: Peter Manev <petermanev at gmail.com>
> Sent: Wednesday, February 13, 2019 3:52 PM
> To: Eric Urban <eurban at umn.edu>
> Cc: Cloherty, Sean E <scloherty at mitre.org>; Open Information Security
> Foundation <oisf-users at lists.openinfosecfoundation.org>
> Subject: Re: [EXT] Re: [Oisf-users] Packet loss and increased resource
> consumption after upgrade to 4.1.2 with Rust support
>
> On Fri, Feb 8, 2019 at 6:34 PM Eric Urban <eurban at umn.edu> wrote:
> >
> > Peter, I emailed our config to you directly.  I mentioned in my original
> email that we did test having Rust enabled in 4.1.2 where I explicitly
> disabled the Rust parsers and still experienced significant packet loss.
> In that case I added the following config under app-layer.protocols but
> left the rest of the config the same:
> >
>
>
> Thank you for sharing all the requested information.
> Please find below my observations and some suggestions.
>
> The good news with 4.1.2:
> tcp.pkt_on_wrong_thread                    | Total                     |
> 100
>
> This is very low (lowest i have seen) for the "tcp.pkt_on_wrong_thread "
> counter especially with a big run like the shared stats -over 20 days.
> Do you mind sharing a bit more info on your NIC (Myricom i think - if I am
> not mistaken) - driver/version/any specific set up - we are trying to keep
> a record for that here -
> https://redmine.openinfosecfoundation.org/issues/2725#note-13
>
>
> Observations:
> with 4.1.2 this counters seem odd -
> capture.kernel_packets                     | Total
> | 16345348068
> capture.kernel_drops                        | Total
>  | 33492892572
> aka you have more kernel_drops than kernel_packets - seems odd.
> Which makes me think It maybe a "counter" bug of some sort. Are the NIC
> driver versions the same on both boxes / same NIC config etc ?
>
>
> Suggestions for the 4.1.2 set up:
> Try a run where you disable those (false) and run again to see if any
> difference (?) :
>   midstream: true            # allow midstream session pickups
>   async-oneside: true        # enable async stream handling
>
> Thank you
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openinfosecfoundation.org/pipermail/oisf-users/attachments/20190218/9dd76719/attachment-0001.html>


More information about the Oisf-users mailing list