[Oisf-users] Are kernel_drops a symptom of dysfunctional afpacket fanout and/or RSS?

Peter Manev petermanev at gmail.com
Thu Aug 3 15:28:49 UTC 2017


On Thu, Aug 3, 2017 at 3:15 PM, erik clark <philosnef at gmail.com> wrote:
> If you are trying to squash the rss queue to 1, it causes giant problems. I
> didn't squash the queue and let suri handle its fanout using tpacket-v3 (you
> are on the same kernel branch as I am, so this should be fine), and get
> about .01-.04% packet drops. I noticed you were working with the queues; you
> might be shooting yourself in the foot with this.
>
> YMMV, personal experience only. I shot an email about this earlier last
> month, but not sure where that went. Peter was the last one to comment, not
> sure if those guys are looking at that or not:
>
> https://lists.openinfosecfoundation.org/pipermail/oisf-users/2017-July/007323.html
>
>

I think there is more to just switching in or out of 1 RSS queue.
Among other things if the kernel (example here it is mentioned 4.12)
and the NIC driver have the patches and via testing (afpacket fan out
testing tool mentioned above) you can confirm that there is no packet
reordering and the hash is symmetrical - you should be ok. That needs
to be (re)checked with every kernel and/or NIC driver upgrade.

If not then you should try 1 RSS queue.
Some major things to watch out there are would be how busy is the CPU
that is dedicated to the RSS, CPU cache misses (worker threads being
on same NUMA node location as the receiving NIC is essential) , FIFO
buffer on the NIC, coalescence/ring descriptors (NIC via ethtool)  and
of course kernel drops in AFPv3 for example.
Kernel drops in AFPv2/3 would mean that there is no place left in the
buffers/rings (not processed fast enough) -
https://redmine.openinfosecfoundation.org/projects/suricata/repository/revisions/master/entry/suricata.yaml.in#L584

Having said all that even if you have everything very well tuned -
you need to be aware and watch out for variance that can impact
negatively your performance. For example a badly or very general
written PCRE rule can significantly degrade your performance. You can
also be impacted by a change of traffic pattern that a certain subset
of rules kick in that are not optimized for the set up you have for
example.


-- 
Regards,
Peter Manev



More information about the Oisf-users mailing list