[Oisf-devel] [iovisor-dev] Best userspace programming API for XDP features query to kernel?

Thu Apr 5 20:51:33 UTC 2018

On Thu, 5 Apr 2018 12:37:19 +0200
Daniel Borkmann <daniel at iogearbox.net> wrote:

> On 04/04/2018 02:28 PM, Jesper Dangaard Brouer via iovisor-dev wrote:
> > Hi Suricata people,
> > 
> > When Eric Leblond (and I helped) integrated XDP in Suricata, we ran
> > into the issue, that at Suricata load/start time, we cannot determine
> > if the chosen XDP config options, like xdp-cpu-redirect[1], is valid on
> > this HW (e.g require driver XDP_REDIRECT support and bpf cpumap).
> > 
> > We would have liked a way to report that suricata.yaml config was
> > invalid for this hardware/setup.  Now, it just loads, and packets gets
> > silently dropped by XDP (well a WARN_ONCE and catchable via tracepoints).
> > 
> > My question to suricata developers: (Q1) Do you already have code that
> > query the kernel or drivers for features?
> > 
> > At the IOvisor call (2 weeks ago), we discussed two options of exposing
> > XDP features avail in a given driver.
> > 
> > Option#1: Extend existing ethtool -k/-K "offload and other features"
> > with some XDP features, that userspace can query. (Do you already query
> > offloads, regarding Q1)
> > 
> > Option#2: Invent a new 'ip link set xdp' netlink msg with a query option.  
> 
> I don't really mind if you go via ethtool, as long as we handle this
> generically from there and e.g. call the dev's ndo_bpf handler such that
> we keep all the information in one place. This can be a new ndo_bpf command
> e.g. XDP_QUERY_FEATURES or such.

Just to be clear: notice as Victor points out[2], they are programmable
going though the IOCTL (SIOCETHTOOL) and not using cmdline tools.

[2] https://github.com/OISF/suricata/blob/master/src/util-ioctl.c#L326

If you want everything to go through the drivers ndo_bpf call anyway
(which userspace API is netlink based) then at what point to you
want drivers to call their own ndo_bpf, when activated though their
ethtool_ops ? (Sorry, but I don't follow the flow you are proposing)

Notice, I'm not directly against using the drivers ndo_bpf call.  I can
see it does provide kernel more flexibility than the ethtool IOCTL.

> More specifically, how would such feature mask look like? How fine grained
> would this be? When you add a new minor feature to, say, cpumap that not
> all drivers support yet, we'd need a new flag each time, no?

No, CPUMAP is not a driver level feature, and thus does not require a
features flag exposed by the driver.  CPUMAP depends on the driver
feature XDP_REDIRECT.

It is important we separate driver-level features and BPF/XDP core
features.  I feel that we constantly talk past each-other when we mix
that up.

It is true that Suricata _also_ need to detect if the running kernel
support the map type called: BPF_MAP_TYPE_CPUMAP.  *BUT* that is a
completely separate mechanism.  It is a core kernel bpf feature, and I
have accepted that this can only be done via probing the kernel (simply
use the bpf syscall and try to create this map type).

Here, I want to discuss how drivers expose/tell userspace that they
support a given feature: Specifically a bit for: XDP_REDIRECT action
support.

> Same for meta data,

Well, not really.  It would be a "nice-to-have", but not strictly
needed as a feature bit.  XDP meta-data is controlled via a helper.
And the BPF-prog can detect/see runtime, that the helper bpf_xdp_adjust_meta
returns -ENOTSUPP (and need to check the ret value anyhow).  Thus,
there is that not much gained by exposing this to be detected setup
time, as all drivers should eventually support this, and we can detect
it runtime.

The missing XDP_REDIRECT action features bit it different, as the
BPF-prog cannot detect runtime that this is an unsupported action.
Plus, setup time we cannot query the driver for supported XDP actions.

> then potentially for the redirect memory return work,

I'm not sure the redirect memory return types, belong here.  First of
all it is per RX-ring.  Second, some other config method will likely be
config interface.  Like with AF_XDP-zero-copy it is a new NDO. So, it
is more losely coupled.

> or the af_xdp bits, 

No need for bit for AF_XDP in copy-mode (current RFC), as this only
depend on driver supporting XDP_REDIRECT action.

For AF_XDP in zero-copy mode, then yes we need a way for userspace to
"see" if this mode is supported by the driver.  But it might not need a
feature bit here... as the bind() call (which knows the ifindex) could
fail when it tried to enable this ZC mode.  It would make userspace's
live easier to add ZC as a driver feature bit.

> the xdp_rxq_info would have needed it, etc. 

Same comment as mem-type, not necessarily, as it is more losely coupled
to XDP.

> What about nfp in terms of XDP
> offload capabilities, should they be included as well or is probing to load
> the program and see if it loads/JITs as we do today just fine (e.g. you'd
> otherwise end up with extra flags on a per BPF helper basis)?

No, flags per BPF helper basis. As I've described above, helper belong
to the BPF core, not the driver.  Here I want to know what the specific
driver support.

> To make a
> somewhat reliable assertion whether feature xyz would work, this would
> explode in new feature bits long term. Additionally, if we end up with a
> lot of feature flags, it will be very hard for users to determine whether
> this particular set of features a driver supports actually represents a
> fully supported native XDP driver.

Think about it, what does a "fully supported native XDP driver" mean,
when the kernel evolve and new features gets added.  How will the
end-user know what XDP features are in their running kernel release?

> What about keeping this high level to users? E.g. say you have 2 options
> that drivers can expose as netdev_features_strings 'xdp-native-full' or
> 'xdp-native-partial'. If a driver truly supports all XDP features for a
> given kernel e.g. v4.16, then a query like 'ethtool -k foo' will say
> 'xdp-native-full', if at least one feature is missing to be feature complete
> from e.g. above list, then ethtool will tell 'xdp-native-partial', and if
> not even ndo_bpf callback exists then no 'xdp-native-*' is reported.

I use-to-be, an advocate for this.  I even think I send patches
implementing this. Later, I've realized that this model is flawed.

When e.g. suricata loads it need to look at both "xdp-native-full" and
the kernel version, to determine if XDP_REDIRECT action is avail.
Later when a new kernel version gets released, the driver is missing a
new XDP feature.  Then suricata, which doesn't use/need the new
feature, need to be updated, to check that kernel below this version,
with 'xdp-native-partial' and this NIC driver is still okay.  Can you
see the problem?

Even if Suricate goes though the pain of keeping track of kernel
version vs drivers vs xdp-native-full/partial.  Then, they also want to
run their product on distro kernels.  They might convince some distro,
to backport some XDP features they need.  So, now they also need to
keep track of distro kernel minor versions... and all they really
wanted as a single feature bit saying if the running NIC driver
supports the XDP_REDIRECT action code.

> Side-effect might be that it would give incentive to keep drivers in
> state 'xdp-native-full' instead of being downgraded to
> 'xdp-native-partial'. Potentially, in the 'xdp-native-partial' state,
> we can expose a high-level list of missing features that the driver
> does not support yet, which would over time converge towards 'zero'
> and thus 'xdp-native-full' again. ethtool itself could get a new XDP
> specific query option that, based on this info, can then dump the
> full list of supported and not supported features. In order for this
> to not explode, such features would need to be kept on a high-level
> basis, meaning if e.g. cpumap gets extended along with support for a
> number of drivers, then those that missed out would need to be
> temporarily re-flagged with e.g. 'cpumap not supported' until it gets
> also implemented there. That way, we don't explode in adding too
> fine-grained feature bit combinations long term and make it easier to
> tell whether a driver supports the full set in native XDP or not.
> Thoughts?

(I really liked creating an incentive for driver vendors)
Thoughs inlined above...

> > (Q2) Do Suricata devs have any preference (or other options/ideas)
> > for the way the kernel expose this info to userspace?
> > 
> > [1]
> > http://suricata.readthedocs.io/en/latest/capture-hardware/ebpf-xdp.html#the-xdp-cpu-redirect-case  

Regarding how fine grained features bits I want.  I also want to mention
that, I want the drivers XDP_REDIRECT action support to be decoupled
from whether the driver support ndo_xdp_xmit, and if ndo_xdp_xmit is
enabled or disabled.
  E.g. for the macvlan driver, I don't see much performance gain in
implementing the native XDP-RX actions "side", while there will be a
huge performance gain in implemeting ndo_xdp_xmit.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer