<i>"This all seems overkill to me, xml and yaml are both designed for big<br>
jobs, and machine sharing of lots of information."</i><br><br>Not exactly true. Both XML and YAML were designed to be used in a verity of situations one of which happens to be config files. <br><i><br>"In the case of xml I think we've all seen some of that, not really user friendly if your<br>
editing files by hand - which in my experience is just a reality of<br>
working with IDS systems day in day out." </i><br><br>Agree.<br><br><i>"YAML IMHO is in the end in the same boat as xml, it just seems less daunting at first glance - checkout the user and dev manuals. (I have to admit I am not a big fan of indentation as a grouping delimiter - always seems more prone to mistakes - of course thats kind of subjective, I'll admit that up front)." </i><br>
<br>I'm not sure which manuals you are referring to? If you are talking about the YAML specification, found here: <a href="http://www.yaml.org/spec/1.2/spec.html">http://www.yaml.org/spec/1.2/spec.html</a> than yes, it is very daunting. The specification document lays out the entire YAML grammar for the purpose of having consistent YAML parsers. Fortunately for us, many YAML parsers have already been built and implemented (in nearly every mainstream language). Thus, reading the specification is mostly unnecessary unless you are working on a YAML library. The best place to start is to read tutorials about YAML, and then read the documentation for the YAML library that you want to use (dependent on the language). <br>
<i><br>"In the end, both of these languages do not directly support the primitives needed, so another layer of code still has to be written to re-parse the data usually strings into primitives."</i><br><br>I am not sure what exactly you mean by "primitives". YAML certainly supports things like ints, floats, and strings. Each parser is slightly different here and this is where reading the actual implementation for the language is key. In my experiences with python, the PyYAML library was able to automatically convert to the correct data type (even though python doesn't really have a traditional type system).<br>
<br>>>> import yaml<br>>>> doc = """<br>... a: 5.55<br>... b: 4<br>... c: "hello"<br>... """<br>>>> y = yaml.load(doc)<br>>>> print yaml.dump(y)<br>
{a: 5.5499999999999998, b: 4, c: hello}<br><br>>>> print y['a']<br>5.55<br>>>> print y['b']<br>4<br>>>> print y['c']<br>hello<br>>>> type(y['a'])<br><type 'float'><br>
>>> type(y['b'])<br><type 'int'><br>>>> type(y['c'])<br><type 'str'><br>>>><br><br>Here we see the python YAML library (PyYAML) has automatically loaded our document (just a string, could be from a file just as easily). As the document is parsed the data is represented using the correct types. I have not studied it, but I would think the C implementation would do the same thing. YAML also has identifiers to load any object as a specific type. (using !!). The PyYAML library provides a load_safe method that does not let you load anything but primitive types (for security reasons). <br>
<i><br>
"Seems to me the end goal is a simple language that users can pick up and use easily (5 minutes tops- even better if they can just see an example<br>
and figure it out), and is equally easy for developers to access the data without writing volumes of code to access or re-parse the data because the library used was not designed with IDS primitives needed for the job. My suggestion is to go for the basics needed for IDS, and not fret over multitudes of other tasks that could considered data parsing, but are irrelevant to ids."</i><br>
<br>I agree. The language should be simple. YAML allows for precisely that. Developers do not have to re-parse the code, in fact they do not even have to write the parsing library (unless they are using some language where a parser isn't already available). In most implementations structures would be retrieved from a config file and be directly accessible in the program using the native type system. (Notice in my code PyYAML was able to utilize python dictionaries). <br>
<br>
<i>"On a somewhat more constructive note I think spelling out the the basic data needs of an IDS may provide some sense of what we need and thereby<br>
clarify the features required. Obviously we want to be a little bit forward thinking, but planning too far ahead can be counter productive<br>
as well."</i><br><br>No need to really plan ahead. If you need a feature in the future, just add code to handle a new option. Thats the benefit of using something like YAML. You do not need to edit a parsing library, and you do not need to change the way that users structure their config files. All you have to do is handle new data. <br>
<br>
<i>So heres some features I see pretty common to the IDS problems, please chime in so we can build the set out or better define what a version 1 set might be. Use cases are good as well.</i><br>
<br>
<i>Primitives<br>
- signed/unsigned numbers<br>
i.e. max-sessions 256K<br>
- ports, port ranges and port lists (or tables take your pick of<br>
terminology)<br>
i.e http-ports [ 80 8080 8138 9100:9200 ]<br>
- floating point numbers, ranges, and possibly lists<br>
i.e. 1.5 1e+8<br>
- time values are nice to have from seconds, minutes hours, to dates.<br>
i.e. 0.5s 30m etc..<br>
- ip addresses, ip address ranges, ip lists/tables<br>
i.e. home-net [ 10.64/16 10.65/16 192.168.1/24 ]<br>
- identifiers and list of identifiers<br>
i.e. a rule may reference an iplist .... rule .... home-net<br>
-here home-net appears as an identifier refering to the home net.</i><br><br>I like these ideas. They can all be implemented very easily using YAML. ints and floats are easy. Dates may have to be parsed as strings although you could define some custom objects in your code and then have users write the config file according to your custom objects. <br>
<br>
<i>Data Structures<br>
name value pairs<br>
name <= ascii data name<br>
value <= any data primitive array or table of primtivies<br>
arrays/tables of data values - the provide grouping of data values<br>
structures - provide grouping of name/value pairs, for instance<br>
grouping tcp processing parameters might go something like:<br>
tcp<br>
{<br>
max-sessions 256K<br>
reassembly-timeout 900s<br>
max-memory 1.5G<br>
}</i><br><br>Nested Name/Value pairs and arrays. The bread and butter of YAML. <br><br>
<i><br>
Programmer Data Access<br>
- data primitives should be trivially accessible and require no<br>
additional processing for most needs. For instance retrieving a port<br>
table referenced<br>
in a rule should provide a pointer to the table, that can be stored<br>
with the rule data structure(s) for real time processing. It should be<br>
no different<br>
for a protocol processor. IP addresses should work as fluidly.<br>
i.e.<br>
iptable_t * ip<br>
ip = get_iptable( "home-net", default-table-if-any,<br>
required-param-flag );<br><br>
The access to the data should provide good support for the common<br>
use cases a developer must deal with such as a parameter was not<br>
specified - is that ok, or should we issue a message and exit, it<br>
should also support default values - this simplifies a developers<br>
configuration loading.</i>
<br><br>I am not a C programmer so I can't really comment on this.<br>
<br>
<i>Whether we build something or use a 3rd party library, wrapping the<br>
common needs of developers up around the core data loading and parsing<br>
alleviates lots of potential bugs. One the biggest sources of bugs and<br>
consumers of qa testing time in developing with snort has always been<br>
due to a lack of standardized parse engine and a lack of a formal grammar.<br><br>
thats my $.02 worth, sorry it kind of ranted on...<br><br>
thoughts, more data types, use cases ..... other 3rd party libraries ?<br><br>
marc<br><br></i>I agree. The snort configuration is a mess. <br><br>Matt C<br>