[Oisf-users] Collecting recommendations for performance analysis - Community input needed

Cooper F. Nelson cnelson at ucsd.edu
Fri Feb 14 17:00:06 UTC 2020


This is interesting and very similar to my process for looking for top 
talkers to filter on our Arista switch.  Just dropping 
NetFlix/Youtube/Twitch resulted in something like a 50% reduction in 
port utilization with only a minor loss of visibility, as it is 100% 
encrypted "single origin" IP traffic.  I wrote some simple shell scripts 
like 'top_talkers', 'top_nets' and 'top_ports' to help find 'elephant' 
CIDR blocks and flows.  When I first setup my sensor I would just keep a 
htop window open and run 'top_talkers' when I saw a core pegged at 100%.

You can also use ipinfo.io to get the full CIDR range for a specific 
vendors ASN.

I have observed that there are some bugs/features with the flow bypass 
feature that cause it not to work as expected in all cases.  I may be 
wrong about this, but I think if packets are coming in from a flow at a 
rate faster than they can be processed by hyperscan, this breaks the 
flow tracking and the ring buffer will fill up before the flow can be 
properly bypassed. Unfortunately this seems to be a pretty common 
occurrence, especially on research networks with big bulk data 
transfers. Flow bypass also doesn't work with non-TCP flows and Google's 
QUIC protocol seems to be a frequent source of packet drops.

I have a couple feature requests I periodically bring up in this scope.

One, have an option to dump the ring buffer to a file whenever an 
"emergency-flush" is triggered.  Then you can just run perf tools on 
that pcap and easily identify elephant flows.

Two, replace the current flow bypass logic (TCP flow by size) with IP 
flows by packet count, using the existing clustering code.  So for 
example, in our deployment the IP flows would be tracked by cluster_flow 
on separate cores from the worker threads, which should eliminate any 
possible race conditions.

-Coop

On 2/14/2020 1:38 AM, Andreas Herz wrote:
> I'm still looking for some input from others if you have anything to
> share. I'm especially interested in methods that try to deal with
> elephant flows (bulk traffic). If you have any experience with that,
> please feel free to discuss those with me and we can add those to the
> documentation as well.
>
> Andi

-- 
Cooper Nelson
Network Security Analyst
UCSD ITS Security Team
cnelson at ucsd.edu x41042



More information about the Oisf-users mailing list