[Discussion] features (mainly dns)

Sun Dec 7 21:28:45 UTC 2008

On Sun, Dec 07, 2008 at 09:21:53PM +0100, Florian Weimer wrote:

> > I've found that linux hosts can be filtered by keeping state on AAAA
> > lookups, followed by A lookups (glibc has this behavior).
> 
> Nice idea.  But this depends on the application (it has to use
> getaddrinfo) and system configuration (GNU libc must detect some form
> of IPv6 connectivity).

I seem to recall lo0 having ::1/128 was enough to convince glibc that
it should perform AAAA lookups; even if an answer is returned, glibc
only later observes the 4-only interface needs an A record.  (I would
not consider this a bug per se, as I once did; it's just aggressive
forward compliance.)  

This behavior is most acute in networks (e.g., a few cyber cafes)
doing DNS filtering/typo mining, and dropping all AAAA traffic.  So
even though the network only hands out ipv4 addresses, the linux users
experience timeouts for many common applications.  The XP users just
laugh.  (I think OSX is well behaved, prioritizing A over AAAA?)

But you're correct; there are some linux applications will not signal
this way.

(I think someone needs to take on the enormous project of
stub/application fingerprinting and analysis.  Particularly now that
DNS prefetching is in Chrome an allegedly coming to firefox some day.)

> >   -- Consider filters for RFC suggested limits on NS counts (7 I
> >      believe).
> 
> I don't think such a limit exists at the RFC level. 

Correct; rfc 1912 was non-standards track.  It summarizes general
rules of thumb for DNS deployment (most still relevant); 'violations'
of those standards are useful first-order filtering rules.

To Matt and the others in the project:

  I think in summary I can suggest three categories of DNS module
filtering, based on the amount of time they require:

  -- rules that do inline per-packet checks (e.g., queries from DNSBL
     listed hosts; other IP-layer intel); this ideally works at line
     speed.  Some narrow state window, on the order a small multiple
     of DNS timeouts, should let one do simple checks across DNS
     conversations.  (E.g., a class of SLBs can be found by noting a
     non-response to AAAA?, and an affirmative response to A?.)

     Such rules are IPS candidates, garden-wall capable, etc.

  -- rules that can be done inline, but should also be done on
     subsequent updates of data feeds.  (E.g., RRsets can be checked
     for dom and ip4 BL matches; subsequent updates should be
     *rechecked* so one can alert that a host was likely an early
     infection victim).  This works at line speed; a lower priority
     thread rechecks key records based on updated intelligence feeds,
     DNSBLs, etc.  You might expect DNSBLs will update once every
     15-20 minutes; a suitably small time-based LRU structure may let
     you recheck a window of previously seen DNS traffic.

     Similarly, this class of includes rules that would trigger only
     after active traffic probes complete.  (E.g., after witnessing a
     cautionary RRset, the tool queries for MX, SOA, and triggers only
     after analyzing the answer.)  Assuming one finds heuristics that
     work and are not DoS contributors, these rules would need some
     time--multiples of average RTT times--to complete.  So a short
     window of old traffic is useful; something more than flow bits.

  -- rules that require keeping complex state on a DNS conversation,
     and are therefore much slower.  (E.g., delaying analysis of a TC
     truncated DNS packet, since we later observe the affected host is
     successfully doing a tcp connection, or edns0 is being used, etc.
     Counts of NS and other metrics are best done against the larger
     data).

     Machine learning fits here, potentially, but really might be off
     a span or using dedicated hosts.

I think everything in this thread, even if not yet a mature
stand-alone filter, can fall into one of three categories.  (Is there
a 4th?)  This might help you design general properties of the dns
detection module.  A user would potentially label which rules fire at
which stage of decoding: per-packet, per-DNS-conversation, or post-hoc
analysis.

We're entering an era where DNS vendors are now building IDS-style
security into their offerings; even IPS capabilities.  So perhaps the
third category (certainly more difficult to code) is a lower priority
in your tool.  I almost think there's need to create a DNS-specific
offering, since traffic aggregation for DNS clusters makes it easy to
gather large amounts of traffic.  The more traditional tcp and DPI
work seems better done elsewhere in the edge.   Perhaps your dns
module, when run exclusively, would fill that role.

-- 
David Dagon              /"\                          "When cryptography
dagon at cc.gatech.edu      \ /  ASCII RIBBON CAMPAIGN    is outlawed, bayl
Ph.D. Student             X     AGAINST HTML MAIL      bhgynjf jvyy unir
Georgia Inst. of Tech.   / \                           cevinpl."