[Discussion] Features - Designing for Scaleability

Martin Holste mcholste at gmail.com
Sat Nov 29 00:56:24 UTC 2008


John,

I think you've hit on some key points here.  I manage about a dozen sensors,
and there is a definite line that is crossed somewhere around a half-dozen
servers where managing the configuration and health of the boxes requires
some power tools.  I prefer Nagios for general system health monitoring, so
that has never been an issue.  Conversely, it has been an incredible
challenge to manage rules and granular Snort performance on many boxes with
multiple Snort instances.  I've taken the route you have with many
home-grown scripts, etc., but there's always something new that comes out
with Snort making the management difficult.  Specifically, dealing with SO
rules throws a huge wrench in the works when dealing with mixed
architectures.

My strategy thus far has been to have my sensors completely self contained
and manage them centrally using a home-grown system written in Perl to make
generic queries and issue commands.  The databases which Snort writes to are
on the box (performance really isn't impacted at all by doing this).  The
Perl agent receives queries like "get all alerts in last X minutes," or "add
this rule to the running configuration," or "calculate stats for Mbps and
alerts per second for last 1000 ticks."  The key is that since all the data
is on the sensor, it scales very well.  The central Perl client can then
parallelize queries so all sensors can search at the same time much faster
than if all the alerts were located in one central database.  Since
everything goes through the central management client, it can easily log all
the actions (and even queries) which are run for audit/change management
purposes.

For encryption, I run everything through OpenVPN.  This works really well,
especially for debugging, since it is much easier to tcpdump a tunnel
interface than get a hook into a TLS socket.

I'm working on a 2.0 version of this architecture which will be entirely
asynchronous, so that the sensors can alert the central management hub on
predetermined criteria.

For truly large deployments, I think you're right that a parent-child setup
might be necessary.  An exploration of peer-to-peer techniques might be an
interesting, but I think that for simplicity, a tree hierarchy would make
the most sense.  That is, a "hub" may have X number of servers assigned to
it, and a hub higher up the tree would be able to ask hubs to delegate
queries down the hierarchy.  It would be interesting to see if there would
be performance gains versus attempting parallelized queries over thousands
of sensors from just one hub.  My thinking is that some of the sensors could
also function as hubs.

I think that we need to remember that the vast majority of users do not
deploy more than a few sensors, so we need to guard against spending too
much devel time on features that will only serve a small percentage of the
community.  That said, audit, change management, archiving, and other
management features are things that benefit everyone.  As long as we keep it
all modular, users can mix and match to get the features they require.

--Martin

On Tue, Nov 18, 2008 at 12:47 PM, John Pritchard <john.r.pritchard at gmail.com
> wrote:

> Team,
>
> My apologies if I've missed this being covered previously. Or, if the
> Bro framework already takes some of these things into consideration.
>
> I'd like to suggest that we take very large deployments into
> consideration when designing our solution. The kind of problems you
> encounter when managing an infrastructure with a handful or even a
> dozen different IDS sensors is very different than trying to
> efficiently and consistently manage infrastructures with larger
> footprints (e.g. > 100+ sensors).
>
> A couple of suggestions to help our design address these potential
> deployment and management scenarios:
>
> 1) Centralized sensor and policy management platform (or framework)
> --> Such a solution may be restricted to a single centralized server
> or multiple servers.
> --> Might be a parent -> child relationship among configuration
> servers to segregate business units, or simply replication among
> servers for disaster recovery / business continuity purposes
> --> efficient, repeatable, and audit-able methodology for making
> changes to both individual sensors as well as pre-defined groups of
> sensors (e.g. dmz sensors, dns sensors, development lab sensors,
> etc...)
> --> My experience to date has been performing these kind of tasks with
> home-grown scripts, ssh, scp, audit logs, etc... However, it would be
> nice to find something a little more mature for this project.
>
> I have not used it, but there is a project called "Puppet" that looks
> like it might be a good candidate for trying to develop a framework
> along these lines:
> http://reductivelabs.com/trac/puppet/wiki/PuppetIntroduction
>
>
> 2) Centralized sensor and policy monitoring platform (or framework)
> --> This is similar to the "management framework" concept, but the
> focus is more on device health monitoring... and possibly policy
> integrity monitoring...
> --> For this piece, I'm thinking of something that provides functions
> such as looking at memory utilization, cpu utilization, hard-drive
> space, network interface stats... and other "bits" such as dates and
> checksums for critical configuration files changed (e.g. detection
> policies, tuning policies, variable definitions, etc)...
>
> There are a number of open-source enterprise monitoring utilities out
> there. Here's one I've been playing with recently:
> http://www.zabbix.com/
>
>
> 3) Distributed Data Repositories
> I know we briefly touched on the database stuff when talking about
> schema design. I just wanted to add a plug for a couple of things
> here:
> --> encrypted communication channels (sensor -> DB or sensor -> sensor)
> --> ability to simultaneously log to 2 or more data repositories
>
> I strongly agree with the concept of designing modular solutions. So,
> these kind of features can be "bolted on" if they were required.
>
> Look forward to everyone's thoughts on how we can most effectively
> tackle problems of scale for large deployments.
>
> Cheers, John
> _______________________________________________
> Discussion mailing list
> Discussion at openinfosecfoundation.org
> http://lists.openinfosecfoundation.org/mailman/listinfo/discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openinfosecfoundation.org/pipermail/discussion/attachments/20081128/70fc7fa7/attachment-0002.html>


More information about the Discussion mailing list