[Oisf-wg-configuration_language] Configuration Structure

Wed Aug 5 18:09:44 UTC 2009

This all seems overkill to me, xml and yaml are both designed for big 
jobs, and machine sharing of lots of information.  In the case of xml I 
think we've all seen some of that, not really user friendly if your 
editing files by hand - which in my experience is just a reality of 
working with IDS systems day in day out.  YAML IMHO is in the end in the 
same boat as xml, it just seems less daunting at first glance - check 
out the user and dev manuals. (I have to admit I am not a big fan of 
indentation as a grouping delimiter - always seems more prone to 
mistakes - of course thats kind of subjective, I'll admit that up 
front).  In the end, both of these languages do not directly support the 
primitives needed, so another layer of code still has to be written to 
re-parse the data usually strings into primitives. 

Seems to me the end goal is a simple language that users can pick up and 
use easily (5 minutes tops- even better if they can just see an example 
and figure it out), and is equally easy for developers to access the 
data without writing volumes of code to access or re-parse the data 
because the library used was not designed with IDS primitives needed for 
the job.  My suggestion is to go for the basics needed for IDS, and not 
fret over multitudes of other tasks that could considered data parsing, 
but are irrelevant to ids.  

On a somewhat more constructive note I think spelling out the the basic 
data needs of an IDS may provide some sense of what we need and thereby 
clarify the features required.  Obviously we want to be a little bit 
forward thinking, but planning too far ahead can be counter productive 
as well.

So heres some features I see pretty common to the IDS problems, please 
chime in so we can build the set out or better define what a version 1 
set might be.  Use cases are good as well. 

Primitives
 - signed/unsigned numbers 
     i.e.   max-sessions 256K
 - ports, port ranges and port lists (or tables take your pick of 
terminology)
    i.e   http-ports [ 80 8080 8138 9100:9200 ]
 - floating point numbers, ranges, and possibly lists
   i.e.   1.5 1e+8 
- time values are nice to have from seconds, minutes hours, to dates.
     i.e.  0.5s    30m   etc..
 - ip addresses, ip address ranges, ip lists/tables 
   i.e. home-net [ 10.64/16 10.65/16 192.168.1/24 ]
 - identifiers and list of identifiers
   i.e.  a rule may reference an iplist       ....   rule ....  home-net
          -here home-net appears as an identifier refering to the home net.

Data Structures
  name value pairs
     name <= ascii data name
     value <= any data primitive array or table of primtivies
  arrays/tables of data values - the provide grouping of data values
  structures - provide grouping of name/value pairs, for instance  
grouping tcp processing  parameters might go something like:
     tcp
     {
          max-sessions 256K
          reassembly-timeout   900s
          max-memory 1.5G  
     }

Programmer Data Access
  - data primitives should be trivially accessible and require no 
additional processing for most needs.  For instance retrieving a port 
table referenced
    in a rule should provide a pointer to the table, that can be stored 
with the rule data structure(s) for real time processing.  It should be 
no different
    for a protocol processor. IP addresses should work as fluidly.
  i.e.  
      iptable_t * ip
      ip  = get_iptable( "home-net", default-table-if-any, 
required-param-flag  );

   The  access to the data should  provide good support for the common 
use cases a developer must deal with such as a parameter was not
   specified - is that ok, or should we issue a message and exit,  it 
should also support default values - this simplifies a developers 
configuration loading.

Whether we build something or use a 3rd party library,  wrapping the 
common needs of developers up around the core data loading and parsing 
alleviates lots of potential bugs.  One the biggest sources of bugs and 
consumers of qa testing time in developing with snort has always been 
due to a lack of standardized parse engine and a lack of a formal grammar. 

thats my $.02 worth, sorry  it  kind of ranted on...

thoughts, more data types, use cases ..... other 3rd party libraries ?

marc

Matt Jonkman wrote:
> Have you ever hand-typed out a long xml doc? :) I have, it's not pretty.
> In fact it just plain sucks. :)
>
> Great machine language, but not a good human usable language.
>
> YAML looks good. Lots of support, human readable, not a lot of typing
> overhead. Relatively flexible structure.
>
> Matt
>
>
> Nick Rogness wrote:
>   
>> On Thu, Jul 30, 2009 at 8:39 AM, Victor Julien <victor at inliniac.net> wrote:
>>     
>>>> In other words, no configuration language exists in our codebase at this
>>>> point.
>>>>
>>>> Since our schedule is pretty tight, this probably means we should go for
>>>> existing code for this part of the engine. So suggestions for libraries
>>>> are very much appreciated.
>>>>         
>>> [SNIP]
>>> No one going to speak up about XML?
>>>       
>> This is a no brainer, config should be in XML. Maybe the better
>> question should be why NOT use XML for the config syntax?
>>
>> As I mentioned on the rules list, using XML gives the engine the
>> flexibility to make parsing, integration, and versioning a breeze.
>> Using a well known library like libXML2 makes parsing in C fairly
>> straight forward.  Additionally, every other language worth a mention
>> already have XML libraries so building GUIs and integrating with other
>> party's software could be straight forward.
>>
>> Nick Rogness
>> _______________________________________________
>> Oisf-wg-configuration_language mailing list
>> Oisf-wg-configuration_language at openinfosecfoundation.org
>> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-wg-configuration_language
>>     
>
>