[Oisf-devel] imporve flow table in workers runmode

Fri Apr 5 09:09:21 UTC 2013

(please don't top-post in discussions like this and also don't use HTML)

On 04/04/2013 02:44 PM, Vito Piserchia wrote:
> On Thu, Apr 4, 2013 at 12:10 PM, Victor Julien <victor at inliniac.net
> <mailto:victor at inliniac.net>> wrote:
> 
>     On 04/04/2013 11:53 AM, Eric Leblond wrote:
>     > On Thu, 2013-04-04 at 11:05 +0200, Victor Julien wrote:
>     >> On 04/04/2013 10:25 AM, Eric Leblond wrote:
>     >>> Hello,
>     >>>
>     >>> On Wed, 2013-04-03 at 15:40 +0200, Victor Julien wrote:
>     >>>> On 04/03/2013 10:59 AM, Chris Wakelin wrote:
>     >>>>> On 03/04/13 09:19, Victor Julien wrote:
>     >>>>>>> On 04/03/2013 02:31 AM, Song liu wrote:
>     >>>>>>>>>
>     >>>>>>>>> Right now, all workers will share one big flow table, and
>     there will be
>     >>>>>>>>> contention for it.
>     >>>>>>>>> Supposed that the network interface is flow affinity, each
>     worker will
>     >>>>>>>>> handle individual flows.
>     >>>>>>>>> In this way, I think it makes more sense that each worker
>     has its own
>     >>>>>>>>> flow table rather than one big table to reduce contention.
>     >>>>>>>>>
>     >>>>>>>
>     >>>>>>> We've been discussing this before and I think it would make
>     sense. It
>     >>>>>>> does require quite a bit of refactoring though, especially
>     since we'd
>     >>>>>>> have to support the current setup as well for the
>     non-workers runmodes.
>     >>>>>>>
>     >>>>> It sounds like a good idea when things like PF_RING are
>     supposed to
>     >>>>> handle the flow affinity onto virtual interfaces for us
>     (PF_RING DNA +
>     >>>>> libzero clusters do, and there's the PF_RING_DNA_SYMMETRIC_RSS
>     flag for
>     >>>>> PF_RING DNA without libzero and interfaces that support RSS).
>     >>>>
>     >>>> Actually, all workers implementations share the same assumption
>     >>>> currently: flow based load balancing in pf_ring, af_packet,
>     nfq, etc. So
>     >>>> I think it makes sense to have a flow engine per worker in all
>     these cases.
>     >>>
>     >>> There may be a special point in the IPS mode. For example, NFQ
>     will soon
>     >>> provide a cpu fanout mode where the worker will be selected based on
>     >>> CPU. The idea is to have the NIC do the flow balancing. But this
>     implies
>     >>> that the return packet may come to a different CPU based on the flow
>     >>> hash function used by the NIC.
>     >>> We have the same behavior in af_packet IPS mode...
>     >>
>     >> I think this can lead to some weird packet order problems. T1
>     inspects
>     >> toserver, T2 toclient. If the T1 worker is held up for whatever
>     reason,
>     >> we may for example process ACKs in T2 for packets we've not
>     processed in
>     >> T1 yet. I'm pretty sure this won't work correctly.
>     >
>     > In the case of IPS mode, do inline streaming depends on ACKed packet ?
> 
>     No, but the stream engine is written with the assumption that what we
>     see is the order of packets on the wire. TCP packets may still be out of
>     order of course, but in this case the end-host has to deal with it
>     as well.
> 
>     In cases like window checks, sequence validation, SACK checks, etc I can
>     imagine problems. We'd possibly reject/accept packets in the stream
>     handling that the end host will treat differently.
> 
>     >
>     >> This isn't limited to workers btw, in autofp when using multiple
>     capture
>     >> threads we can have the same issue. One side of a connection getting
>     >> ahead of the other.
>     >
>     > Yes, I've observed this lead to strange behavior...
>     >
>     >> Don't think we can solve this in Suricata itself, as the OS has a
>     lot of
>     >> liberty in scheduling threads. A full packet reordering module would
>     >> maybe work, but it's performance affect would probably completely nix
>     >> all gains by the said capture methods.
>     >
>     > Sure
>     >
>     >>> In this case, we may want to disable the per-worker flow engine
>     which is
>     >>> a really good idea for other running mode.
>     >>
>     >> Don't think it would be sufficient. The ordering problem won't be
>     solved
>     >> by it.
>     >
>     > Yes, it may be interesting to study a bit the hash function used
>     by NIC
>     > to see if they behave symetrically. In this case, this should fix the
>     > issue (at least for NFQ). I will have a look into it.
> 
> IMHO the success key is having a symmetric RSS hash function. Someone
> already made experiments/studies about this: i.e.
> http://www.ndsl.kaist.edu/~shinae/papers/TR-symRSS.pdf

Interesting, thanks.

> Obviously this could lead to unbalanced flow queue, think about a long
> standing flows which remain alive for long time period... To take
> into account this kind of situation one could think to
> assign a group of processing CPU thread to packets that arrive from
> the same RSS queue, loosing,  of course, in this case the cache (ant
> interrupt) affinity benefits.

With our autofp mode this could be done. We could also consider a more
advanced autofp mode where instead of on global load balancer over all
cpu's/threads we'd have autofp style load balancing over a select group
of threads that run on the same cpu.

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------