[Oisf-devel] reliability of 'cluster_flow' in af_packet and pf_ring

Victor Julien victor at inliniac.net
Mon Mar 10 17:22:21 UTC 2014


On 03/10/2014 09:27 AM, Victor Julien wrote:
> On 03/08/2014 06:13 PM, Victor Julien wrote:
>> In the context of Duartes report of #1011, I've been playing with some
>> validation code for the clustering.
>>
>> There is some debug code in
>> https://github.com/inliniac/suricata/pull/880, esp.
>> https://github.com/inliniac/suricata/commit/939e3e3fd9550916361e3f5ff4c54590d3949bbb
>>
>> If I run this in workers mode with anything above 1 thread, it fails
>> almost immediately:
>>
>> suricata: stream-tcp.c:4180: StreamTcpPacket: Assertion
>> `!(p->flow->thread_id != tv->thread_id)' failed.
>> Aborted (core dumped)
>> # gdb ./src/suricata core
>> ...
>> Reading symbols from /usr/src/oisf/src/suricata...done.
>>
>> warning: core file may not match specified executable file.
>> [New LWP 3711]
>> [New LWP 3712]
>> [New LWP 3714]
>> [New LWP 3713]
>> [New LWP 3710]
>> [New LWP 3709]
>>
>> warning: Can't read pathname for load map: Input/output error.
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
>> Core was generated by `./src/suricata -c copy-of-suricata-pfring.yaml
>> --af-packet=eth2 -S /dev/null -v'.
>> Program terminated with signal 6, Aborted.
>> #0  0x00007f82f353e425 in __GI_raise (sig=<optimized out>) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>> 64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>> (gdb) f 4
>> #4  0x0000000000518ec3 in StreamTcpPacket (tv=0x226c33f0, p=0x2deae00,
>> stt=0x7f82e00138c0, pq=0x226c37a0) at stream-tcp.c:4180
>> 4180                BUG_ON(p->flow->thread_id != tv->thread_id);
>> (gdb) print p->flow->thread_id
>> $1 = 3710
>> (gdb) print tv->thread_id
>> $2 = 3711
>> (gdb)
>>
>> AF_PACKET is set to do cluster_flow. None of the AF_PACKET options seem
>> to affect it.
>>
>> # ethtool -k eth2
>> Offload parameters for eth2:
>> rx-checksumming: off
>> tx-checksumming: off
>> scatter-gather: off
>> tcp-segmentation-offload: off
>> udp-fragmentation-offload: off
>> generic-segmentation-offload: off
>> generic-receive-offload: off
>> large-receive-offload: off
>> rx-vlan-offload: off
>> tx-vlan-offload: off
>> ntuple-filters: off
>> receive-hashing: off
>>
>> In autofp mode it does work correctly. There we have our own flow
>> distribution logic instead of relying on the nic/driver/capture method...
>>
> 
> I've been working with Alfredo and Luca to understand whats happening on
> the pfring side of this. It turns out our pfring setup logic isn't correct.
> 
> The patch at PR 881 [1] fixes the start up for pfring by moving
> pfring_enable_ring to the start of ReceivePfringLoop() so that we don't
> enable the ring before all threads have registered to it with
> pfring_set_cluster.
> 
> 
> I suspect a similar issue in the AF_PACKET, although there is a
> difference with pfring. In pfring, the problem would self-correct
> quickly after startup. After a few minutes all packets would be on the
> correct threads. In AF_PACKET this isn't the case.
> 
> AF_PACKET doesn't do most of the setup logic in the ThreadInit function,
> but in the Loop function by calling AFPCreateSocket. This means that
> multiple threads may execute this logic concurrently. I wonder if that
> perhaps leads to some race condition. Eric, any ideas there?

I've been moving around the init code a bit, but I don't see any
difference. It seems that the cluster flow setting enabled a fanout mode
'skb rxhash'. Any idea what factors influence that?

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------




More information about the Oisf-devel mailing list