[Oisf-wg-configuration_language] Configuration Structure

Fri Aug 7 01:45:05 UTC 2009

*"This all seems overkill to me, xml and yaml are both designed for big
jobs, and machine sharing of lots of information."*

Not exactly true.  Both XML and YAML were designed to be used in a verity of
situations one of which happens to be config files.
*
"In the case of xml I think we've all seen some of that, not really user
friendly if your
editing files by hand - which in my experience is just a reality of
working with IDS systems day in day out."  *

Agree.

*"YAML IMHO is in the end in the same boat as xml, it just seems less
daunting at first glance - checkout the user and dev manuals. (I have to
admit I am not a big fan of indentation as a grouping delimiter - always
seems more prone to mistakes - of course thats kind of subjective, I'll
admit that up front)."  *

I'm not sure which manuals you are referring to?  If you are talking about
the YAML specification, found here:
http://www.yaml.org/spec/1.2/spec.htmlthan yes, it is very daunting.
The specification document lays out the
entire YAML grammar for the purpose of having consistent YAML parsers.
Fortunately for us, many YAML parsers have already been built and
implemented (in nearly every mainstream language).  Thus, reading the
specification is mostly unnecessary unless you are working on a YAML
library.  The best place to start is to read tutorials about YAML, and then
read the documentation for the YAML library that you want to use (dependent
on the language).
*
"In the end, both of these languages do not directly support the primitives
needed, so another layer of code still has to be written to re-parse the
data usually strings into primitives."*

I am not sure what exactly you mean by "primitives".  YAML certainly
supports things like ints, floats, and strings.  Each parser is slightly
different here and this is where reading the actual implementation for the
language is key.  In my experiences with python, the PyYAML library was able
to automatically convert to the correct data type (even though python
doesn't really have a traditional type system).

>>> import yaml
>>> doc = """
... a: 5.55
... b: 4
... c: "hello"
... """
>>> y = yaml.load(doc)
>>> print yaml.dump(y)
{a: 5.5499999999999998, b: 4, c: hello}

>>> print y['a']
5.55
>>> print y['b']
4
>>> print y['c']
hello
>>> type(y['a'])
<type 'float'>
>>> type(y['b'])
<type 'int'>
>>> type(y['c'])
<type 'str'>
>>>

Here we see the python YAML library (PyYAML) has automatically loaded our
document (just a string, could be from a file just as easily).  As the
document is parsed the data is represented using the correct types.  I have
not studied it, but I would think the C implementation would do the same
thing.  YAML also has identifiers to load any object as a specific type.
(using !!).  The PyYAML library provides a load_safe method that does not
let you load anything but primitive types (for security reasons).
*
"Seems to me the end goal is a simple language that users can pick up and
use easily (5 minutes tops- even better if they can just see an example
and figure it out), and is equally easy for developers to access the data
without writing volumes of code to access or re-parse the data because the
library used was not designed with IDS primitives needed for the job.  My
suggestion is to go for the basics needed for IDS, and not fret over
multitudes of other tasks that could considered data parsing, but are
irrelevant to ids."*

I agree.  The language should be simple.  YAML allows for precisely that.
Developers do not have to re-parse the code, in fact they do not even have
to write the parsing library (unless they are using some language where a
parser isn't already available).  In most implementations structures would
be retrieved from a config file and be directly accessible in the program
using the native type system. (Notice in my code PyYAML was able to utilize
python dictionaries).

*"On a somewhat more constructive note I think spelling out the the basic
data needs of an IDS may provide some sense of what we need and thereby
clarify the features required.  Obviously we want to be a little bit forward
thinking, but planning too far ahead can be counter productive
as well."*

No need to really plan ahead.  If you need a feature in the future, just add
code to handle a new option.  Thats the benefit of using something like
YAML.  You do not need to edit a parsing library, and you do not need to
change the way that users structure their config files.  All you have to do
is handle new data.

*So heres some features I see pretty common to the IDS problems, please
chime in so we can build the set out or better define what a version 1 set
might be.  Use cases are good as well.*

*Primitives
 - signed/unsigned numbers
    i.e.   max-sessions 256K
 - ports, port ranges and port lists (or tables take your pick of
terminology)
   i.e   http-ports [ 80 8080 8138 9100:9200 ]
 - floating point numbers, ranges, and possibly lists
  i.e.   1.5 1e+8
- time values are nice to have from seconds, minutes hours, to dates.
    i.e.  0.5s    30m   etc..
 - ip addresses, ip address ranges, ip lists/tables
  i.e. home-net [ 10.64/16 10.65/16 192.168.1/24 ]
 - identifiers and list of identifiers
  i.e.  a rule may reference an iplist       ....   rule ....  home-net
         -here home-net appears as an identifier refering to the home net.*

I like these ideas.  They can all be implemented very easily using YAML.
ints and floats are easy.  Dates may have to be parsed as strings although
you could define some custom objects in your code and then have users write
the config file according to your custom objects.

*Data Structures
 name value pairs
    name <= ascii data name
    value <= any data primitive array or table of primtivies
 arrays/tables of data values - the provide grouping of data values
 structures - provide grouping of name/value pairs, for instance
grouping tcp processing  parameters might go something like:
    tcp
    {
         max-sessions 256K
         reassembly-timeout   900s
         max-memory 1.5G
    }*

Nested Name/Value pairs and arrays.  The bread and butter of YAML.

*
Programmer Data Access
 - data primitives should be trivially accessible and require no
additional processing for most needs.  For instance retrieving a port
table referenced
   in a rule should provide a pointer to the table, that can be stored
with the rule data structure(s) for real time processing.  It should be
no different
   for a protocol processor. IP addresses should work as fluidly.
 i.e.
     iptable_t * ip
     ip  = get_iptable( "home-net", default-table-if-any,
required-param-flag  );

  The  access to the data should  provide good support for the common
use cases a developer must deal with such as a parameter was not
  specified - is that ok, or should we issue a message and exit,  it
should also support default values - this simplifies a developers
configuration loading.*

I am not a C programmer so I can't really comment on this.

*Whether we build something or use a 3rd party library,  wrapping the
common needs of developers up around the core data loading and parsing
alleviates lots of potential bugs.  One the biggest sources of bugs and
consumers of qa testing time in developing with snort has always been
due to a lack of standardized parse engine and a lack of a formal grammar.

thats my $.02 worth, sorry  it  kind of ranted on...

thoughts, more data types, use cases ..... other 3rd party libraries ?

marc

*I agree.  The snort configuration is a mess.

Matt C
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openinfosecfoundation.org/pipermail/oisf-wg-configuration_language/attachments/20090806/62afefe7/attachment-0002.html>