[Oisf-devel] [iovisor-dev] Best userspace programming API for XDP features query to kernel?

Fri Apr 6 10:36:18 UTC 2018

On 04/05/2018 10:51 PM, Jesper Dangaard Brouer wrote:
> On Thu, 5 Apr 2018 12:37:19 +0200
> Daniel Borkmann <daniel at iogearbox.net> wrote:
> 
>> On 04/04/2018 02:28 PM, Jesper Dangaard Brouer via iovisor-dev wrote:
>>> Hi Suricata people,
>>>
>>> When Eric Leblond (and I helped) integrated XDP in Suricata, we ran
>>> into the issue, that at Suricata load/start time, we cannot determine
>>> if the chosen XDP config options, like xdp-cpu-redirect[1], is valid on
>>> this HW (e.g require driver XDP_REDIRECT support and bpf cpumap).
>>>
>>> We would have liked a way to report that suricata.yaml config was
>>> invalid for this hardware/setup.  Now, it just loads, and packets gets
>>> silently dropped by XDP (well a WARN_ONCE and catchable via tracepoints).
>>>
>>> My question to suricata developers: (Q1) Do you already have code that
>>> query the kernel or drivers for features?
>>>
>>> At the IOvisor call (2 weeks ago), we discussed two options of exposing
>>> XDP features avail in a given driver.
>>>
>>> Option#1: Extend existing ethtool -k/-K "offload and other features"
>>> with some XDP features, that userspace can query. (Do you already query
>>> offloads, regarding Q1)
>>>
>>> Option#2: Invent a new 'ip link set xdp' netlink msg with a query option.  
>>
>> I don't really mind if you go via ethtool, as long as we handle this
>> generically from there and e.g. call the dev's ndo_bpf handler such that
>> we keep all the information in one place. This can be a new ndo_bpf command
>> e.g. XDP_QUERY_FEATURES or such.
> 
> Just to be clear: notice as Victor points out[2], they are programmable
> going though the IOCTL (SIOCETHTOOL) and not using cmdline tools.

Sure, that was perfectly clear. (But at the same time if you extend the
ioctl, it's obvious to also add support to actual ethtool cmdline tool.)

> [2] https://github.com/OISF/suricata/blob/master/src/util-ioctl.c#L326
> 
> If you want everything to go through the drivers ndo_bpf call anyway
> (which userspace API is netlink based) then at what point to you

Not really, that's the front end. ndo_bpf itself is a plain netdev op
and has no real tie to netlink.

> want drivers to call their own ndo_bpf, when activated though their
> ethtool_ops ? (Sorry, but I don't follow the flow you are proposing)
> 
> Notice, I'm not directly against using the drivers ndo_bpf call.  I can
> see it does provide kernel more flexibility than the ethtool IOCTL.

What I was saying is that even if you go via ethtool ioctl api, where
you end up in dev_ethtool() and have some new ETHTOOL_* query command,
then instead of adding a new ethtool_ops callback, we can and should
reuse ndo_bpf from there.

[...]
> Here, I want to discuss how drivers expose/tell userspace that they
> support a given feature: Specifically a bit for: XDP_REDIRECT action
> support.
> 
>> Same for meta data,
> 
> Well, not really.  It would be a "nice-to-have", but not strictly
> needed as a feature bit.  XDP meta-data is controlled via a helper.
> And the BPF-prog can detect/see runtime, that the helper bpf_xdp_adjust_meta
> returns -ENOTSUPP (and need to check the ret value anyhow).  Thus,
> there is that not much gained by exposing this to be detected setup
> time, as all drivers should eventually support this, and we can detect
> it runtime.
> 
> The missing XDP_REDIRECT action features bit it different, as the
> BPF-prog cannot detect runtime that this is an unsupported action.
> Plus, setup time we cannot query the driver for supported XDP actions.

Ok, so with the example of meta data, you're arguing that it's okay
to load a native XDP program onto a driver, and run actual traffic on
the NIC in order probe for the availability of the feature when you're
saying that it "can detect/see [at] runtime". I totally agree with you
that all drivers should eventually support this (same with XDP_REDIRECT),
but today there are even differences in drivers on bpf_xdp_adjust_meta()/
bpf_xdp_adjust_head() with regards to how much headroom they have available,
etc (e.g. some of them have none), so right now you can either go and
read the code or do a runtime test with running actual traffic through
the NIC to check whether your BPF prog is supported or not. Theoretically,
you can do the same runtime test with XDP_REDIRECT (taking the warn in
bpf_warn_invalid_xdp_action() aside for a moment), but you do have the
trace_xdp_exception() tracepoint to figure it out, yes, it's a painful
hassle, but overall, it's not that different as you were trying to argue
here. For /both/ cases it would be nice to know at setup time whether
this would be supported or not. Hence, such query is not just limited to
XDP_REDIRECT alone. Eventually once such interface is agreed upon,
undoubtedly the list of feature bits will grow is what I'm trying to say;
only arguing on the XDP_REDIRECT here would be short term.

[...]
>> What about keeping this high level to users? E.g. say you have 2 options
>> that drivers can expose as netdev_features_strings 'xdp-native-full' or
>> 'xdp-native-partial'. If a driver truly supports all XDP features for a
>> given kernel e.g. v4.16, then a query like 'ethtool -k foo' will say
>> 'xdp-native-full', if at least one feature is missing to be feature complete
>> from e.g. above list, then ethtool will tell 'xdp-native-partial', and if
>> not even ndo_bpf callback exists then no 'xdp-native-*' is reported.
> 
> I use-to-be, an advocate for this.  I even think I send patches
> implementing this. Later, I've realized that this model is flawed.
> 
> When e.g. suricata loads it need to look at both "xdp-native-full" and
> the kernel version, to determine if XDP_REDIRECT action is avail.
> Later when a new kernel version gets released, the driver is missing a
> new XDP feature.  Then suricata, which doesn't use/need the new
> feature, need to be updated, to check that kernel below this version,
> with 'xdp-native-partial' and this NIC driver is still okay.  Can you
> see the problem?
> 
> Even if Suricate goes though the pain of keeping track of kernel
> version vs drivers vs xdp-native-full/partial.  Then, they also want to
> run their product on distro kernels.  They might convince some distro,
> to backport some XDP features they need.  So, now they also need to
> keep track of distro kernel minor versions... and all they really
> wanted as a single feature bit saying if the running NIC driver
> supports the XDP_REDIRECT action code.

Yep, agree it's not pretty, not claiming any of this is. You kind of
need to be aware of the underlying kernel, similar to the tracing case.
The underlying problem is effectively the decoupling of program verification
that doesn't have/know the context of where it is being attached to in
this case. Thinking out loud for a sec on couple of other options aside
from feature bits, what about i) providing the target ifindex to the
verifier for XDP programs, such that at verification time you have the
full context similar to nfp offload case today, or ii) populating some
XDP specific auxillary data to the BPF program at verification time such
that the driver can check at program attach time whether the requested
features are possible and if not it will reject and respond with netlink
extack message to the user (as we do in various XDP attach cases already
through XDP_SETUP_PROG{,_HW}).

This would, for example, avoid the need for feature bits, and do actual
rejection of the program while retaining flexibility (and avoiding to
expose bits that over time hopefully will deprecate anyway due to all
XDP aware drivers implementing them). For both cases i) and ii), it
would mean we make the verifier a bit smarter with regards to keeping
track of driver related (XDP) requirements. Program return code checking
is already present in the verifier's check_return_code() and we could
extend it for XDP as well, for example. Seems cleaner and more extensible
than feature bits, imho.

Thanks,
Daniel