[Oisf-users] Collecting recommendations for performance analysis - Community input needed
Cooper F. Nelson
cnelson at ucsd.edu
Fri Feb 14 17:00:06 UTC 2020
This is interesting and very similar to my process for looking for top
talkers to filter on our Arista switch. Just dropping
NetFlix/Youtube/Twitch resulted in something like a 50% reduction in
port utilization with only a minor loss of visibility, as it is 100%
encrypted "single origin" IP traffic. I wrote some simple shell scripts
like 'top_talkers', 'top_nets' and 'top_ports' to help find 'elephant'
CIDR blocks and flows. When I first setup my sensor I would just keep a
htop window open and run 'top_talkers' when I saw a core pegged at 100%.
You can also use ipinfo.io to get the full CIDR range for a specific
vendors ASN.
I have observed that there are some bugs/features with the flow bypass
feature that cause it not to work as expected in all cases. I may be
wrong about this, but I think if packets are coming in from a flow at a
rate faster than they can be processed by hyperscan, this breaks the
flow tracking and the ring buffer will fill up before the flow can be
properly bypassed. Unfortunately this seems to be a pretty common
occurrence, especially on research networks with big bulk data
transfers. Flow bypass also doesn't work with non-TCP flows and Google's
QUIC protocol seems to be a frequent source of packet drops.
I have a couple feature requests I periodically bring up in this scope.
One, have an option to dump the ring buffer to a file whenever an
"emergency-flush" is triggered. Then you can just run perf tools on
that pcap and easily identify elephant flows.
Two, replace the current flow bypass logic (TCP flow by size) with IP
flows by packet count, using the existing clustering code. So for
example, in our deployment the IP flows would be tracked by cluster_flow
on separate cores from the worker threads, which should eliminate any
possible race conditions.
-Coop
On 2/14/2020 1:38 AM, Andreas Herz wrote:
> I'm still looking for some input from others if you have anything to
> share. I'm especially interested in methods that try to deal with
> elephant flows (bulk traffic). If you have any experience with that,
> please feel free to discuss those with me and we can add those to the
> documentation as well.
>
> Andi
--
Cooper Nelson
Network Security Analyst
UCSD ITS Security Team
cnelson at ucsd.edu x41042
More information about the Oisf-users
mailing list