[Oisf-devel] [PATCH V2 0/6] Align elements in 'trans_q' and 'data_queues' array

Victor Julien victor at inliniac.net
Sun Oct 27 07:35:33 UTC 2013

Hi Holger,

First of all, thanks a lot for helping out!

I think it would be more useful if you would send the patches inline, so
I can comment on them line by line. Alternatively, a github pull request
works as well.

On 10/24/2013 11:32 PM, Holger Eitzenberger wrote:
> Hi Victor,
> what follows is a small patchset for RFC which tries to address the
> false sharing happening on both the 'trans_q' and 'data_queues'
> arrays.  They are therefore a performance improvement on installations
> having many queues.
> V1 -> V2
> - rediffed against current master (semantic patches applied again)
> - use CLS instead of CACHE_LINE_SIZE
> The first patch is just cleanup, as it only introduces NUM_QUEUES.
> But from grepping through the source I see that I may not have spotted
> all places.

I need to check more carefully, but I'm almost sure TMQ_MAX_QUEUES used
in tm-queues.[ch] is related.

The whole fixed size array solution isn't great. We've had ppl report
errors because of this.

These queues are used in the autofp runmode to transfer packets between
the pktacq thread and the workers, so there is one per thread. This
limits us to 256 threads. I think nfq may actually be worse, as it uses
more queues iirc. Making this dynamic has been on my list for quite some

> The 2nd patch changes the 'trans_q' array to allocate each instance
> individually, as AFAIK only then the alignment per element can be
> specified later.  I also add the __ALIGN() macro, which allows
> to align the array itself.  I have used a small semantic patch
> (coccinelle) to change the callers.

The __ALIGN macro looks convenient, I think we should replace the
current __attribute__((__aligned__(a))) users with it as well.

> The 3rd patch does the same to the 'data_queues' array.
> The 4th patch corrects the declaration of GlobalInits().
> The 5th patch introduces GlobalFrees(), which is then used in the
> error patch of GlobalInits().  The use there is currently a bit
> pointless, as the program is terminated shortly after.  So the
> function is mostly added for possible future users.

I like this addition.

> The last patch actually aligns the queues themself.

For the allocation functions we use the wrappers from util-mem.h, so in
the case of posix_memalign, we would use SCMallocAligned. Although I see
that it is a wrapper for _mm_malloc().

> These are my first patches for Suricata :).  I therefore appreciate
> any advise on coding style etc.

Style note: we use 4 space indentation.

> I wasn't able to actually test the performance improvent yet, but I
> can do so next week if need be.

Minimal testing on my dual core laptop (still traveling) suggests a
minor slowdown (17.2s to 17.6s for one test, consistently over multiple
runs). Will try again on bigger hardware later when I'm home.


Victor Julien
PGP: http://www.inliniac.net/victorjulien.asc

More information about the Oisf-devel mailing list