[Oisf-devel] imporve flow table in workers runmode

Thu Apr 4 09:05:11 UTC 2013

On 04/04/2013 10:25 AM, Eric Leblond wrote:
> Hello,
> 
> On Wed, 2013-04-03 at 15:40 +0200, Victor Julien wrote:
>> On 04/03/2013 10:59 AM, Chris Wakelin wrote:
>>> On 03/04/13 09:19, Victor Julien wrote:
>>>>> On 04/03/2013 02:31 AM, Song liu wrote:
>>>>>>>
>>>>>>> Right now, all workers will share one big flow table, and there will be
>>>>>>> contention for it.
>>>>>>> Supposed that the network interface is flow affinity, each worker will
>>>>>>> handle individual flows.
>>>>>>> In this way, I think it makes more sense that each worker has its own
>>>>>>> flow table rather than one big table to reduce contention.
>>>>>>>
>>>>>
>>>>> We've been discussing this before and I think it would make sense. It
>>>>> does require quite a bit of refactoring though, especially since we'd
>>>>> have to support the current setup as well for the non-workers runmodes.
>>>>>
>>> It sounds like a good idea when things like PF_RING are supposed to
>>> handle the flow affinity onto virtual interfaces for us (PF_RING DNA +
>>> libzero clusters do, and there's the PF_RING_DNA_SYMMETRIC_RSS flag for
>>> PF_RING DNA without libzero and interfaces that support RSS).
>>
>> Actually, all workers implementations share the same assumption
>> currently: flow based load balancing in pf_ring, af_packet, nfq, etc. So
>> I think it makes sense to have a flow engine per worker in all these cases.
> 
> There may be a special point in the IPS mode. For example, NFQ will soon
> provide a cpu fanout mode where the worker will be selected based on
> CPU. The idea is to have the NIC do the flow balancing. But this implies
> that the return packet may come to a different CPU based on the flow
> hash function used by the NIC.
> We have the same behavior in af_packet IPS mode...

I think this can lead to some weird packet order problems. T1 inspects
toserver, T2 toclient. If the T1 worker is held up for whatever reason,
we may for example process ACKs in T2 for packets we've not processed in
T1 yet. I'm pretty sure this won't work correctly.

This isn't limited to workers btw, in autofp when using multiple capture
threads we can have the same issue. One side of a connection getting
ahead of the other.

Don't think we can solve this in Suricata itself, as the OS has a lot of
liberty in scheduling threads. A full packet reordering module would
maybe work, but it's performance affect would probably completely nix
all gains by the said capture methods.

> In this case, we may want to disable the per-worker flow engine which is
> a really good idea for other running mode.

Don't think it would be sufficient. The ordering problem won't be solved
by it.

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------