[Oisf-devel] [PATCH V2 0/6] Align elements in 'trans_q' and 'data_queues' array

Tue Dec 10 12:21:52 UTC 2013

(seems I never hit 'send' for this, sorry about that!)

On 10/28/2013 09:35 AM, Holger Eitzenberger wrote:
> Hi Victor,
> 
>> I think it would be more useful if you would send the patches inline, so
>> I can comment on them line by line. Alternatively, a github pull request
>> works as well.
> 
> I am using 'quilt' for sending the patches out, and am a bit surprised
> that it doesn't do already.  Will check if a github pull is
> more convenient.

If it's not a problem to you, github is preferred. It's very easy for
me, both in review and taking in code.

>>> The first patch is just cleanup, as it only introduces NUM_QUEUES.
>>> But from grepping through the source I see that I may not have spotted
>>> all places.
>>
>> I need to check more carefully, but I'm almost sure TMQ_MAX_QUEUES used
>> in tm-queues.[ch] is related.
> 
> Ok, will check.
> 
>> These queues are used in the autofp runmode to transfer packets between
>> the pktacq thread and the workers, so there is one per thread. This
>> limits us to 256 threads. I think nfq may actually be worse, as it uses
>> more queues iirc. Making this dynamic has been on my list for quite some
>> time.
> 
> I am sure this would make sense.  I spotted this issue without
> actually knowing how often those queues are used in practice.
> And therefore can't make a statement about the exptected performance
> increase.
> 
>> The __ALIGN macro looks convenient, I think we should replace the
>> current __attribute__((__aligned__(a))) users with it as well.
> 
> Ok, I can do so in subsequent patch.
> 
>> For the allocation functions we use the wrappers from util-mem.h, so in
>> the case of posix_memalign, we would use SCMallocAligned. Although I see
>> that it is a wrapper for _mm_malloc().
> 
> Creating SCMallocAligned() surely makes sense, so I'll create that.

It's already there. It's a wrapper around _mm_malloc though currently.
Not sure if that is relevant.

>> Minimal testing on my dual core laptop (still traveling) suggests a
>> minor slowdown (17.2s to 17.6s for one test, consistently over multiple
>> runs). Will try again on bigger hardware later when I'm home.
> 
> As said above I don't have an idea about the actual performance
> gain from the alignment itself.  But I'd expect that to be
> visible on a large machine with many threads.  Also this change
> goes into the direction of the 'dynamic queues' you described.
> 

Cheers,
Victor

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------