[Oisf-users] [EXT] Re: Packet loss and increased resource consumption after upgrade to 4.1.2 with Rust support

Wed Jul 3 08:00:38 UTC 2019

1. af_packet flow defragments before calculating hash
2. cluster_flow ignores hardware hash and computes it on its own
3. cluster_qm uses skb_get_queue_mapping() and gets an index of a queue
from the hardware (so it uses the hardware hashing, although implictly)

On Wed, Jul 3, 2019 at 12:25 AM Michał Purzyński <michalpurzynski1 at gmail.com>
wrote:

> I will have more observations before the weekend, I've been running Zeek +
> Suricata at the same busy office, same traffic, on two different sensors
> 1. nsm1 - hardware hashing with the low-entropy key (keep reading why)
> 2. nsm2 - software hashing with cluster_flow
>
> I'll dig more into the source code tomorrow but from what I remember
>
> 1. The symmetric hash is disabled by default and cannot be enabled with
> ethtool, without changes to the ethtool Victor proposed them once and they
> were rejected. Using the low-entropy key was the solution. I might ping
> Intel again about that.
>
> BTW - we do not know if the symmeric hardware hashing handles fragmented
> packets correctly, i.e. WHAT is hashed. I'll take a look at the X710 specs.
>
> 2. AFAIR the cluster_flow ignores card's hash? That will not help, it
> hashes ports as well so it will fail to compute consistent hash for
> fragmented packets. Now, I can see, from a quick peek at the code, that
> sometimes the HW hash is fetched from the SKB, where it was put by the
> card. I need to investigate further when it is used, pretty sure that's
> only for the QM demux and not for the cluster_flow.
>
>
> On Tue, Jul 2, 2019 at 9:51 AM Nelson, Cooper <cnelson at ucsd.edu> wrote:
>
>> Ok that is good to know.  Btw, I’m pretty sure the X710 supports
>> symmetric hashing in hardware so you don’t need to set  the low-entropy
>> key.  But of course it doesn’t hurt.
>>
>>
>>
>> At this point I think the best course of action to fix this for the older
>> cards (ixgbe) that don’t allow for ‘sd’ hashing is to simply update
>> ‘cluster_flow’ to ignore the rxhash from the card (or zero it out) and
>> compute a new hash on the header of the reassembled packets.  So you should
>> follow the SEPTun guides for the older cards and  let suricata do the load
>> balancing in software on cores isolated from the worker threads.
>>
>>
>>
>> Fragmented TCP packets will still be directed to the ‘wrong’ RSS queue,
>> but will ultimately be copied to the correct worker thread.  They may
>> arrive out-of-order, not sure how much of an issue this is.   Something you
>> could do to further test this would be to enable full file
>> logging/extraction and look for files tagged as “TRUNCATED” in the eve
>> logs.  That’s a ‘red flag’ that the stream tracking isn’t working properly
>> for big TCP flows.  Keep in mind you will always see some truncated files
>> organically; however if you see the same file being truncated that’s
>> indicative of a problem with TCP flow tracking for that particular
>> network.
>>
>>
>>
>> I’m “stuck” on the old hardware for the time being and I can’t get any of
>> the new eBPF or XDP modes working on my system, so it would be great if
>> cluster_flow could be updated to handle this edge case.
>>
>>
>>
>> Btw, one of the many reasons I’m such a fan of using a zero-trust
>> deployment with an explicit proxy is that the proxy will ‘streamline’ dirty
>> ‘Net traffic and deliver it to the inside interface fully defragmented and
>> in the correct order.  Then you can use the ‘autofp’ runmode and guarantee
>> that packets are delivered properly to the worker threads.
>>
>>
>>
>> -Coop
>>
>>
>>
>> *From:* Michał Purzyński <michalpurzynski1 at gmail.com>
>> *Sent:* Monday, July 1, 2019 4:48 PM
>> *To:* Nelson, Cooper <cnelson at ucsd.edu>
>> *Cc:* Peter Manev <petermanev at gmail.com>; Cloherty, Sean E <
>> scloherty at mitre.org>; Eric Urban <eurban at umn.edu>; Open Information
>> Security Foundation <oisf-users at lists.openinfosecfoundation.org>
>> *Subject:* Re: [Oisf-users] [EXT] Re: Packet loss and increased resource
>> consumption after upgrade to 4.1.2 with Rust support
>>
>>
>>
>> So here's what I did at one of out offices. Intel X710 are used there
>>
>>
>>
>> ethtool -L enp17s0f0 combined 4
>>
>> ethtool -L enp17s0f0 rxhash on
>>
>> ethtool -K enp17s0f0 ntuple on
>>
>>
>>
>> ethtool -X enp17s0f0 hkey
>> 6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A
>> equal 4
>>
>>
>>
>> ethtool -N enp17s0f0 rx-flow-hash tcp4 sd
>>
>>
>>
>> The _qm flow has been used in both Zeek and Suricata.
>>
>>
>>
>> So far I can see traffic hashed correctly. If you have some ideas how I
>> could test this further, please let me know.
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openinfosecfoundation.org/pipermail/oisf-users/attachments/20190703/0a5c9892/attachment.html>