[Oisf-devel] Linux af-packet::mmap enhancement

Victor Julien victor at inliniac.net
Mon Jun 27 07:11:10 UTC 2011


On 06/26/2011 06:39 PM, chetan loke wrote:
> On Sat, Jun 25, 2011 at 4:25 AM, Victor Julien <victor at inliniac.net> wrote:
>> On 06/24/2011 09:17 PM, chetan loke wrote:
>>> On Fri, Jun 24, 2011 at 11:25 AM, Victor Julien <victor at inliniac.net> wrote:
>>>> On 06/23/2011 06:30 PM, chetan loke wrote:
>>>>> On Thu, Jun 23, 2011 at 11:59 AM, Victor Julien <victor at inliniac.net> wrote:
>>>>>> The disadvantage is that multiple threads/cpu's/cores will suddenly be
>>>>>> handling the same packet, instead of just the one dealing with the
>>>>>> FANOUT socket.
>>>>>>
>>>>>
>>>>> My description wasn't so clear. Actually what I meant was:
>>>>>
>>>>> lookup_table : <rx_hash> <thread_id>;
>>>>>
>>>>> thread_id = get_worker_th_id (pkt_hdr->rx_hash);
>>>>>
>>>>> if (thread_id) {
>>>>>   /* This flow is HOT. steer this flow to the same thread */
>>>>>   enqueue_pkt_to_thread(thread_id);
>>>>> } else {
>>>>>  /* This flow is COLD.No thread is handling this flow yet.*/
>>>>>  /* So queue it to the next thread */
>>>>>     enqueue_pkt_to_thread(GET_NEXT_THREAD(ROUND_ROBIN_POLICY));
>>>>>     update lookup_table;
>>>>> }
>>>>>
>>>>> So no two threads will work on the same flow.
>>>>
>>>> To be clear, this functionality would live in the kernel, right?
>>>
>>> Nope, it would reside in user-space because kernel doesn't know which
>>> fanout-listener
>>> is under heavy load. Once the 'rx_hash' is exported in
>>> tpacket_headers, we can spread
>>> the load in user space.
>>
>> So how would this work? Would Suricata talk to a userspace daemon
>> instead of the kernel directly?
> 
> Not really. I quickly looked at the code and my vague understanding is
> as follows:
> 
> 1) .func(pkt_rcv_function) of a receiver would be running in a thread.
> 2) .func  dumps packets in 'some-queue', correct?
> 3) decoder pulls packets from the above 'some-queue' and normalizes
> them, correct?

Yes and no. The way the threads and "thread modules" (modules like
decode, detect, stream) are arranged is controlled by the "runmode".
Runmodes can be in the form of one single thread containing everything
from packet acquisition to alerting. Another runmode could have multiple
instances on that thread. Another runmode could have many threads that
each run a single module where packets are passed on to the next through
queues. Highly flexible, but it does impose some challenges.

Queues are controlled by queue handlers. The "queue handler"
implementations live in tmqh-* There is "simple" which is a mutexed
fifo. There is "flow" which is the load balancing. It hashes based on
the packets flow memory address to get a hash for the flow w/o having to
lock it. And there is ringbuffer, an attempt to do a lockless buffer. We
had some issues with this in higher optimization levels.

> We should introduce a new function called 'steer_packets' which will
> implement the
> above 'hash+load_balance' steering logic. rx_hash field could be null
> for receiver's who don't yet
> export rx_hash in the packet descriptor. So if the rx_hash field is
> null then it should
> operate in current mode. This also implies struct Packet should
> introduce a new field
> called rx_hash.
> 
> steer_packet() can be called as part of decoder logic but before the
> pkt is picked-up by the next thread. But that's just implementation details.

Such logic could be implemented as part of "queue handler". Then we'd
have one or more packet acquisition threads passing packets to this a
queue of this type and that would then distribute over the other threads.

This is pretty much how the "autofp" PF_RING runmode currently works. I
has 1-N acquisition (including decode) threads, it passes all packets to
a "flow" queue. The listening threads to that queue each contain stream,
detect, output. Note that if pfrings flow based clustering is used using
the flow queue is doing duplicate work. Not completely optimal :)

Cheers,
Victor

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------




More information about the Oisf-devel mailing list