[Oisf-devel] imporve flow table in workers runmode
vpiserchia at gmail.com
vpiserchia at gmail.com
Wed Apr 17 14:35:33 UTC 2013
Hello all,
On 04/05/2013 11:09 AM, Victor Julien wrote:
> (please don't top-post in discussions like this and also don't use HTML)
damn Google Mail interface, sry again
>
> On 04/04/2013 02:44 PM, Vito Piserchia wrote:
>> On Thu, Apr 4, 2013 at 12:10 PM, Victor Julien <victor at inliniac.net
>> <mailto:victor at inliniac.net>> wrote:
>>
>> On 04/04/2013 11:53 AM, Eric Leblond wrote:
>> > On Thu, 2013-04-04 at 11:05 +0200, Victor Julien wrote:
>> >> On 04/04/2013 10:25 AM, Eric Leblond wrote:
>> >>> Hello,
>> >>>
>> >>> On Wed, 2013-04-03 at 15:40 +0200, Victor Julien wrote:
>> >>>> On 04/03/2013 10:59 AM, Chris Wakelin wrote:
>> >>>>> On 03/04/13 09:19, Victor Julien wrote:
>> >>>>>>> On 04/03/2013 02:31 AM, Song liu wrote:
>> >>>>>>>>>
>> >>>>>>>>> Right now, all workers will share one big flow table, and
>> there will be
>> >>>>>>>>> contention for it.
>> >>>>>>>>> Supposed that the network interface is flow affinity, each
>> worker will
>> >>>>>>>>> handle individual flows.
>> >>>>>>>>> In this way, I think it makes more sense that each worker
>> has its own
>> >>>>>>>>> flow table rather than one big table to reduce contention.
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>> We've been discussing this before and I think it would make
>> sense. It
>> >>>>>>> does require quite a bit of refactoring though, especially
>> since we'd
>> >>>>>>> have to support the current setup as well for the
>> non-workers runmodes.
>> >>>>>>>
An other approach could be having something like a partitioned hash
table with a fine-grained lock strategy.
Even stronger could be using a lock-free data structures, does anyone
have experience in this topic?
The last alternative idea could be the usage of the userspace RCU
implementation [1]
>> >>>>> It sounds like a good idea when things like PF_RING are
>> supposed to
>> >>>>> handle the flow affinity onto virtual interfaces for us
>> (PF_RING DNA +
>> >>>>> libzero clusters do, and there's the PF_RING_DNA_SYMMETRIC_RSS
>> flag for
>> >>>>> PF_RING DNA without libzero and interfaces that support RSS).
>> >>>>
>> >>>> Actually, all workers implementations share the same assumption
>> >>>> currently: flow based load balancing in pf_ring, af_packet,
>> nfq, etc. So
>> >>>> I think it makes sense to have a flow engine per worker in all
>> these cases.
>> >>>
>> >>> There may be a special point in the IPS mode. For example, NFQ
>> will soon
>> >>> provide a cpu fanout mode where the worker will be selected based on
>> >>> CPU. The idea is to have the NIC do the flow balancing. But this
>> implies
>> >>> that the return packet may come to a different CPU based on the flow
>> >>> hash function used by the NIC.
>> >>> We have the same behavior in af_packet IPS mode...
>> >>
>> >> I think this can lead to some weird packet order problems. T1
>> inspects
>> >> toserver, T2 toclient. If the T1 worker is held up for whatever
>> reason,
>> >> we may for example process ACKs in T2 for packets we've not
>> processed in
>> >> T1 yet. I'm pretty sure this won't work correctly.
>> >
>> > In the case of IPS mode, do inline streaming depends on ACKed packet ?
>>
>> No, but the stream engine is written with the assumption that what we
>> see is the order of packets on the wire. TCP packets may still be out of
>> order of course, but in this case the end-host has to deal with it
>> as well.
>>
>> In cases like window checks, sequence validation, SACK checks, etc I can
>> imagine problems. We'd possibly reject/accept packets in the stream
>> handling that the end host will treat differently.
>>
>> >
>> >> This isn't limited to workers btw, in autofp when using multiple
>> capture
>> >> threads we can have the same issue. One side of a connection getting
>> >> ahead of the other.
>> >
>> > Yes, I've observed this lead to strange behavior...
>> >
>> >> Don't think we can solve this in Suricata itself, as the OS has a
>> lot of
>> >> liberty in scheduling threads. A full packet reordering module would
>> >> maybe work, but it's performance affect would probably completely nix
>> >> all gains by the said capture methods.
>> >
>> > Sure
>> >
>> >>> In this case, we may want to disable the per-worker flow engine
>> which is
>> >>> a really good idea for other running mode.
>> >>
>> >> Don't think it would be sufficient. The ordering problem won't be
>> solved
>> >> by it.
>> >
>> > Yes, it may be interesting to study a bit the hash function used
>> by NIC
>> > to see if they behave symetrically. In this case, this should fix the
>> > issue (at least for NFQ). I will have a look into it.
>>
>> IMHO the success key is having a symmetric RSS hash function. Someone
>> already made experiments/studies about this: i.e.
>> http://www.ndsl.kaist.edu/~shinae/papers/TR-symRSS.pdf
>
> Interesting, thanks.
>
>> Obviously this could lead to unbalanced flow queue, think about a long
>> standing flows which remain alive for long time period... To take
>> into account this kind of situation one could think to
>> assign a group of processing CPU thread to packets that arrive from
>> the same RSS queue, loosing, of course, in this case the cache (ant
>> interrupt) affinity benefits.
>
> With our autofp mode this could be done. We could also consider a more
> advanced autofp mode where instead of on global load balancer over all
> cpu's/threads we'd have autofp style load balancing over a select group
> of threads that run on the same cpu.
>
A finer autofp mode will for sure help a lot in this. I also would
suggest to take into consideration the "new players" in the market with
even more advanced packet distribution policies [2]
[1] urcu at http://lttng.org/urcu
[1] DPDK at http://dpdk.org/
More information about the Oisf-devel
mailing list