<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Sorry for the delay.  I am responding to where we left off as I don't believe the additional info added later applies to my situation.  See inline responses below:</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 20, 2019 at 4:57 AM Peter Manev <<a href="mailto:petermanev@gmail.com" target="_blank">petermanev@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, Feb 18, 2019 at 11:35 PM Eric Urban <<a href="mailto:eurban@umn.edu" target="_blank">eurban@umn.edu</a>> wrote:<br>

><br>

> First, just in case it was lost in the details I do want to call out something from my first email.  When compiling Suricata 4.1.2 without Rust, we have little to no packet loss so looks to be the same behavior as we have with 4.0.6 compiled without Rust.  I wanted to make sure to highlight this point more as I probably should have used a subject line to that extent instead of writing that the upgrade is the main difference where we saw this increase in packet loss.  The upgrade to 4.1.2 likely just triggered the issue since that is when Rust is enabled by default.  Do note though that in 4.1.2 if I compile with Rust and explicitly disable the 6 new parsers (not disabling SMB though), then we still do have significant packet loss.  So it seems there is more to disabling Rust than just disabling the new parsers in the config?<br>

<br>

Noted - thank you.<br>

To that point - what is your full compile / install line ?<br>

<br></blockquote><div><br></div><div>HAVE_PYTHON=/usr/bin/python3 ./configure --with-libpcap=/opt/snf --localstatedir=/var/ --with-libhs-includes=/usr/local/include/hs/ --with-libhs-libraries=/usr/local/lib64/</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

><br>

> Now on to responding to your most recent email:<br>

><br>

>> This is very low (lowest i have seen) for the "tcp.pkt_on_wrong_thread " counter especially with a big run like the shared stats -over 20 days.<br>

>> Do you mind sharing a bit more info on your NIC (Myricom i think - if I am not mistaken) - driver/version/any specific set up - we are trying to keep a record for that here -<br>

>> <a href="https://redmine.openinfosecfoundation.org/issues/2725#note-13" rel="noreferrer" target="_blank">https://redmine.openinfosecfoundation.org/issues/2725#note-13</a><br>

><br>

><br>

> We are using Myricom cards with SNFv3 (Sniffer 3) drivers.  The driver version from the system where the stats log was taken from is 3.0.15.50857.  We were previously running 3.0.14.50843 and experienced a similar volume of drops, so we upgraded to this newer patch version to rule out issues with the driver.  Would you like me to add these details directly to the Redmine issue?  Are there any other specific details that would be helpful?<br>

><br>

<br>

Please feel free to update the issue and then i can further update the<br>

matrix/table if needed.<br>

I think info like:<br>

-  kernel (just uname -a)<br>

- ethtool -i iface<br>

- runmode (example : af-packet cluster_flow / pfring )<br>

- ethtool -x iface<br>

<br>

would help a lot! Thank you<br>

<br></blockquote><div><br></div><div>I updated the the Redmine issue.  Please let me know if you need any additional info there.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

><br>

>> Observations:<br>

>> with 4.1.2 this counters seem odd -<br>

>> capture.kernel_packets                     | Total<br>

>> | 16345348068<br>

>> capture.kernel_drops                        | Total<br>

>>  | 33492892572<br>

>> aka you have more kernel_drops than kernel_packets - seems odd.<br>

>> Which makes me think It maybe a "counter" bug of some sort. Are the NIC driver versions the same on both boxes / same NIC config etc ?<br>

><br>

><br>

> When I look at the delta counters for packets and drops, there are many times these appear to possibly overflow as they are resulting in large negative values.  These negative delta values are being added to the running counters so make it look like we have more drops than packets.  We see this more often with packets than with drops.  If this is an issue with overflow that seems to make sense as we often have much higher packet counts than drops.  One example I found where stats.capture.kernel_packets_delta: -4293864433.  The stats.capture.kernel_packets count prior to this was 20713149739 then went down to 16419285306 the next time stats were generated.  I had noticed this behavior prior to 4.1.2, and am quite sure we saw this in 3.2.2 and possibly earlier than that.  I typically just filter out these values when looking through the delta data.  I can file a bug for this issue if you'd like as we have plenty of examples in our logs where this occurs both with packet and drop counters.<br>

<br>

Yes - could you please file a report including th edetails - OS/Suri<br>

versions affected/ NIC model used / runmode used (afpacket for<br>

example) etc<br>

<br></blockquote><div><br></div><div>I opened <a href="https://redmine.openinfosecfoundation.org/issues/2845" target="_blank">https://redmine.openinfosecfoundation.org/issues/2845</a> so will work from there on that issue.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

><br>

> Also, to answer your question about the NIC drivers/config even though it may not matter anymore give the info about the negative delta counters, at the time I submitted the stats logs to you we were running slightly different Myricom driver versions.  The 4.0.6 sensors were running 3.0.14.50843 and the 4.1.2 ones were running 3.0.15.50857.  The reason for the different driver versions is as I mentioned above that we upgraded the driver version to rule that out as a potential cause and only did it on our one set running 4.1.2.<br>

><br>

>> Suggestions for the 4.1.2 set up:<br>

>> Try a run where you disable those (false) and run again to see if any difference (?) :<br>

>>   midstream: true            # allow midstream session pickups<br>

>>   async-oneside: true        # enable async stream handling<br>

><br>

><br>

> I just made these changes today.  I will let it run for a few days and get back to you with the results.<br>

><br>

<br>

Any diff observed yet ?<br></blockquote><div><br></div><div>It looks like the config changes you recommended did reduce the number of drops we saw, but did not avoid them altogether.</div><div><br></div><div>Right now we have three Suricata instances that are set up to get the same traffic for troubleshooting this issue.  We had a few periods of drops on the 4.1.2 sensors (with Rust) where the 4.0.6 (no Rust) instance had no drops.  The 4.1.2 instance with midstream and async disabled had some drops but not as many as the 4.1.2 instance running our unmodified config that I provided to you earlier on this thread.  </div><div><br></div><div>I looked at one period (Feb 21 13:14 through Feb 21 13:26) where we had some heavy packet loss on our 4.1.2 sensor interfaces.  Unfortunately I don't have full stats logs to provide for this window as they were deleted before I got to them.  I am assuming you actually need the stats log so will make sure to pay more attention to this to grab the logs before they are deleted.</div><div><br></div><div>To give you a summary of what I saw:</div><div>The first sensor running 4.0.6 had 0 packets dropped during this time.  </div><div>The second sensor running 4.1.2 with your recommended config options to disable midstream sessions and async stream handling still had a significant amount of drops (1.2 million to 16 million per minute).  However, the total number of drops during this time period (77,782,338) was much less than the third sensor.</div><div>The third sensor running 4.1.2 and our unmodified config had the most amount of drops overall during this period.  The drops per minute ranged from about 1 million to 39 million  per minute and totaled 173,550,987.</div><div><br></div><div>Something that is strange to me is that even though these should be getting the same traffic, the Suricata counters show a large spike in packets_received for the two 4.1.2 hosts so appears to have significantly more packets than the 4.0.6 host.  I checked our Myricom stats to compare with this period and each of these have a very similar number of packets received so from that point of view does seem that these are getting the same traffic.</div><div><br></div><div>I will get back to you once this happens again and I can get stats logs from all three of these sensors to compare.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> Thank you for your assistance,<br>

> Eric<br>

<br>

Thank you for the feedback!<br>

<br>

><br>

><br>

> On Thu, Feb 14, 2019 at 3:59 PM Cloherty, Sean E <<a href="mailto:scloherty@mitre.org" target="_blank">scloherty@mitre.org</a>> wrote:<br>

>><br>

>> That also seems to be the case with me regarding high counts on tcp.pkt_on_wrong_thread.  I've reverted to 4.0.6 using the same setup and YAML and the stats look much better with no packet loss.  I will forward the data.<br>

>><br>

>> Thanks.<br>

>><br>

>> -----Original Message-----<br>

>> From: Peter Manev <<a href="mailto:petermanev@gmail.com" target="_blank">petermanev@gmail.com</a>><br>

>> Sent: Wednesday, February 13, 2019 3:52 PM<br>

>> To: Eric Urban <<a href="mailto:eurban@umn.edu" target="_blank">eurban@umn.edu</a>><br>

>> Cc: Cloherty, Sean E <<a href="mailto:scloherty@mitre.org" target="_blank">scloherty@mitre.org</a>>; Open Information Security Foundation <<a href="mailto:oisf-users@lists.openinfosecfoundation.org" target="_blank">oisf-users@lists.openinfosecfoundation.org</a>><br>

>> Subject: Re: [EXT] Re: [Oisf-users] Packet loss and increased resource consumption after upgrade to 4.1.2 with Rust support<br>

>><br>

>> On Fri, Feb 8, 2019 at 6:34 PM Eric Urban <<a href="mailto:eurban@umn.edu" target="_blank">eurban@umn.edu</a>> wrote:<br>

>> ><br>

>> > Peter, I emailed our config to you directly.  I mentioned in my original email that we did test having Rust enabled in 4.1.2 where I explicitly disabled the Rust parsers and still experienced significant packet loss.  In that case I added the following config under app-layer.protocols but left the rest of the config the same:<br>

>> ><br>

>><br>

>><br>

>> Thank you for sharing all the requested information.<br>

>> Please find below my observations and some suggestions.<br>

>><br>

>> The good news with 4.1.2:<br>

>> tcp.pkt_on_wrong_thread                    | Total                     | 100<br>

>><br>

>> This is very low (lowest i have seen) for the "tcp.pkt_on_wrong_thread " counter especially with a big run like the shared stats -over 20 days.<br>

>> Do you mind sharing a bit more info on your NIC (Myricom i think - if I am not mistaken) - driver/version/any specific set up - we are trying to keep a record for that here -<br>

>> <a href="https://redmine.openinfosecfoundation.org/issues/2725#note-13" rel="noreferrer" target="_blank">https://redmine.openinfosecfoundation.org/issues/2725#note-13</a><br>

>><br>

>><br>

>> Observations:<br>

>> with 4.1.2 this counters seem odd -<br>

>> capture.kernel_packets                     | Total<br>

>> | 16345348068<br>

>> capture.kernel_drops                        | Total<br>

>>  | 33492892572<br>

>> aka you have more kernel_drops than kernel_packets - seems odd.<br>

>> Which makes me think It maybe a "counter" bug of some sort. Are the NIC driver versions the same on both boxes / same NIC config etc ?<br>

>><br>

>><br>

>> Suggestions for the 4.1.2 set up:<br>

>> Try a run where you disable those (false) and run again to see if any difference (?) :<br>

>>   midstream: true            # allow midstream session pickups<br>

>>   async-oneside: true        # enable async stream handling<br>

>><br>

>> Thank you<br>

<br>

<br>

<br>

-- <br>

Regards,<br>

Peter Manev<br>

</blockquote></div></div></div></div></div></div></div></div></div></div>