[Oisf-users] TCP reassembly gaps

Chris Wakelin c.d.wakelin at reading.ac.uk
Thu Apr 26 13:09:11 UTC 2012


On 23/04/12 13:05, Chris Wakelin wrote:
> On 21/04/12 13:02, Seth Hall wrote:
>>
>> On Apr 21, 2012, at 6:19 AM, Chris Wakelin wrote:
>>
>>> The other odd thing of course is that the switch is VLAN-tagging
>>> packets in one direction only, which might be confusing things.
>>
>>
>> I would kind of expect RSS to get messed up with the VLAN tagged
>> packets in one direction.  Have you tried disabling that?
>
> Yes, just tried it again to be sure. It seems to make no difference at
> all, which might mean RSS isn't actually helping in any case.
>

Actually I think RSS doesn't use the VLAN tag by default; (recent)
"ethtool -n eth1 rx-flow-hash tcp4" shows it's using just src/dst
address and ports. As I mentioned I've also stopped PF_RING from using
the VLAN tag in its hash (perhaps we can give Suricata more cluster-type
options; there are now various levels of hashing you can specify).

> The number of gaps does vary with the traffic load. Our students are
> back now and using their full 800Mb in the evenings. PF_RING is happily
> reporting no lost packets though.
> 
> I'm still guessing this is an issue with PF_RING rather than the machine
> being unable to cope with the traffic. I might try updating to the
> latest SVN (there's fixes for locking apparently).

Well, the latest SVN panicked the system :( I guess I'll wait for a
release or find a machine that doesn't matter as much to test it on!

>> For Bro we have a script that does the gap counting and reporting in
>> production.  It's the same technique that wireshark uses for it's gap
>> reporting, but you can run it all the time on live traffic with Bro.
>> The script is named "misc/capture-loss" and I can help you out with
>> it if you're interested.

I've given Bro a try (lots of interesting stuff in there!) and it
reckons that 80% of streams are missing packets, even when only Bro is
running (in cluster mode with 3 workers).

I've also tried Bro using the PF_RING DNA driver (had to reduce the
capture statistics frequency to fit within the 5 minute trial without a
license) and that managed not to lose any significant number of packets.

I ran Suricata with DNA (runmode=workers, 6 interfaces defined dna0 at 0 ->
dna at 0-5, with each having threads=1) and that saw only 1/2 gaps per
second and over 1Gb/s of traffic processed, but filled up my syslog and
kernel log with error messages: "[PF_RING] Unable to activate two or
more DNA sockets on the same interface dna0/link direction" (I'm not
sure why that got called more than once per interface.)

The upshot is I still think it might be an issue with the PF_RING
non-DNA ixgbe driver (mainly because I see gaps at comparatively low
traffic loads), but it could be just that my current hardware can't cope
without using DNA or TNAPI (though the same hardware can manage almost
as much traffic on the campus network through its e1000e 1Gb card
without significant gaps.)

I've been "promised" a couple of Dell R610s in the summer to replace my
Poweredge 2950s which might give better performance (better chipset).

Best Wishes,
Chris

-- 
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 2908
Whiteknights, Reading, RG6 6AF, UK              Fax: +44 (0)118 975 3094



More information about the Oisf-users mailing list