[Oisf-users] Massive kernel drops with HTTP traffic

Wed Sep 26 07:08:11 UTC 2018

On Fri, Sep 21, 2018 at 11:07 AM Konstantin Klinger
<konstantin.klinger at dcso.de> wrote:
>
> Hi Peter,
>
> thanks for doing those test runs, they are helping a lot. We were
> thinking about the exact same solution, reducing the libmagic ruleset
> and run Suricata with a custom libmagic instead the Debian built in one.
> This will improve the situation and reduce the kernel drops.
> We have done also some tests with other filestore rules that are not
> using the keyword "filemagic", but it is difficult to cover all cases.
>
> We hadn't lot of time in the past weeks, but this is our current state:
>
> 1. Massive kernel drops (as described in the past mails) with 4.1.0-dev
> (git 2018-06-20):
> -> disable force-magic or use customized libmagic reduces kernel drops
> -> filestore v2 has better performance than filestore v1
> -> Septun tuning helps to reduce kernel drops and helps to decrease cpu
> usage (almost no drops on NIC side)
>
> => After all there are no kernel drops, less kernel drops or high kernel
> drops depending on the activated rulesets, the traffic situation and the
> time/day. I have played a lot with activating/deactivating different
> rules on different sensors and it seems, that the reason for high kernel
> drops is different for each sensor. You have to look at each sensor
> separately. That costs me lot of my time. For the moment I couldn't
> figured out what exactly triggers the problem, because the behavior is a
> bit not deterministic.

Yes - the Suricata engine/HW and ruleset could be the same but the
traffic might differ.

>
> 2. Memcap problem
> -> We are now also seeing our old memcap problem (memcaps are reached
> without rehabilitation) with version 4.1.0-dev, but not that often than
> before. I have also played around with activating/deactivating different
> rulesets and I can now manually trigger the problem on some sensors by
> activating specific rulesets. So it seems that the reason is also a
> mixture of the traffic situation, the activated rules and the day/time.
>

Do you  restart suricata or only do rule reload?
>From what i remember you also dont have access to the port mirrors and
cant check if everything is alright, correct?

> I will do further research and hope I will have better results until
> SuriCon.
>
> Cheers,
>
> Konstantin
>
> On 07.09.2018 11:13, Peter Manev wrote:
> > On Tue, Aug 28, 2018 at 9:55 AM Peter Manev <petermanev at gmail.com> wrote:
> >>
> >> On Tue, Aug 21, 2018 at 8:20 AM Konstantin Klinger
> >> <konstantin.klinger at dcso.de> wrote:
> >>>
> >>> Good morning all,
> >>>
> >>> I've made multiple tests with different settings and you can find the
> >>> results (drops in percentage) for each run in the attached table. We
> >>> will rewrite our filestore rules without the "filemagic" keyword and try
> >>> them in production. Further I will open a bug report.
> >>>
> >>
> >> Looking at the sum up - it seems the biggest impact(responsible for
> >> 14-37% drops just by having it on even with no rules) is having the
> >> following combination in the config with filestore v1 -
> >>
> >> filestore (v1) = on
> >> force-magic = on
> >>
> >> filestore v2 seems to behave  better but for the purpose of
> >> completeness of the tests - I am curious of how it would behave with
> >> rules loaded and filestore v2 off?
> >>
> >> Thanks for testing!
> >>
> >>
> >
> > Some feedback from some pcap runs.
> > So what threw me off in the config (that i didnt notice before or paid
> > attention to in the config ) was that we had filestore v1 used but
> > with "force-magic = on" - this is  quite a perf hitter.
> >
> > Futhermore - I made some tests for the purpose of explanation and
> > visualization with the latest git Suricata.
> > In my test I had a 150GB pcap with "goodies" in it.
> >
> > I did three  pcap red runs (multiple times ) for verification. First
> > one was using the regular default libmagic in Ubuntu LTS. Second was
> > with custom  libmagic  (using only the DBs for "linux "msdos"
> > "msooxml" "pdf"). Third one was using a minimal only "msdos" custom
> > libmagic.
> >
> > Ruels were the following
> > alert http any any -> any any (msg:"Windows executable- 111";
> > flow:established,to_client; file_data; content:"MZ"; within:2;
> > byte_jump:4,58,relative,little; content:"PE|00 00|"; distance:-64;
> > within:4;  sid:111;)
> > alert http any any -> any any (msg:"FILE magic -- windows - 222 ";
> > flow:established,to_client; filemagic:"PE32 executable (GUI) Intel
> > 80386"; sid:222;)
> > alert http any any -> any any (msg:"FILE magic -- windows - 333 ";
> > flow:established,to_client; filemagic:"executable"; sid:333;)
> >
> >
> > Results were:
> >
> > cat log-default-libmagic/perf.txt
> >   --------------------------------------------------------------------------
> >   Date: 9/7/2018 -- 02:41:30. Sorted by: max ticks.
> >   --------------------------------------------------------------------------
> >    Num      Rule         Gid      Rev      Ticks        %      Checks
> >  Matches  Max Ticks   Avg Ticks   Avg Match   Avg No Match
> >   -------- ------------ -------- -------- ------------ ------ --------
> > -------- ----------- ----------- ----------- --------------
> >   1        333          1        0        22327658225264 98.45
> > 118566538 47342    415093788   188313.32   791959.95   188072.19
> >   2        222          1        0        351732357712 1.55
> > 118566538 30783    102916570   2966.54     3597.55     2966.38
> >   3        111          1        0        702230102    0.00   61477
> > 38142    4501834     11422.65    16233.59    3558.96
> >
> > cat log-custom-libmagic/perf.txt
> >   --------------------------------------------------------------------------
> >   Date: 9/7/2018 -- 02:54:47. Sorted by: max ticks.
> >   --------------------------------------------------------------------------
> >    Num      Rule         Gid      Rev      Ticks        %      Checks
> >  Matches  Max Ticks   Avg Ticks   Avg Match   Avg No Match
> >   -------- ------------ -------- -------- ------------ ------ --------
> > -------- ----------- ----------- ----------- --------------
> >   1        333          1        0        881204182756 92.60
> > 118770249 46400    46191030    7419.40     253220.56   7323.34
> >   2        222          1        0        69774674892  7.33
> > 118770249 30836    30659992    587.48      2627.87     586.95
> >   3        111          1        0        665754074    0.07   61419
> > 38208    4832186     10839.55    15194.36    3671.02
> >
> > cat log-minimal-libmagic/perf.txt
> >   --------------------------------------------------------------------------
> >   Date: 9/7/2018 -- 03:07:58. Sorted by: max ticks.
> >   --------------------------------------------------------------------------
> >    Num      Rule         Gid      Rev      Ticks        %      Checks
> >  Matches  Max Ticks   Avg Ticks   Avg Match   Avg No Match
> >   -------- ------------ -------- -------- ------------ ------ --------
> > -------- ----------- ----------- ----------- --------------
> >   1        333          1        0        763688478252 91.78
> > 118768933 46408    47418122    6430.04     230035.02   6342.63
> >   2        222          1        0        67731921076  8.14
> > 118768933 30844    34765770    570.28      2698.15     569.73
> >   3        111          1        0        643933350    0.08   61406
> > 38216    3894478     10486.49    14792.65    3390.15
> >
> >
> > Obviously the worst case is using the default OS magic is rather perf
> > intensive , follow by using custom magic  and extended magic name
> > matching but sid 111 is the least perf hitter in this test.
> >
> > Relevant traffic and diff rules variations tests are essential for
> > testing of course but in the test that i did for this specific case
> > for example using
> > filemagic:"PE32 executable (GUI) Intel 80386";
> > instead of
> > filemagic:"executable";
> > with custom magic db scored much better  - in terms of using  filemagic.
> >
> > Sid 111  - is basically a copy of  ET's sid: 2018959
> > Bottom line is actually - the default OS libmagic has such a perf hit
> > that the processing time of the pcap I was testing with went down from
> > 22 min to 12 min when using the custom compiled magic.
> >
> > We actually teach/train/discuss all that in the Suricata Advanced
> > Deployment and Engineering class as well (SuriCon is our next one).
> > Would be happy to get feedback and discussion going in person as well
> > at anytime in SuriCon in Vancouver this year :)
> >
> > Thought it was good to share up anyway...
> >
>
> --
> Konstantin Klinger
> Security Content Engineer
> Threat Detection & Hunting (TDH)
>
> +49 160 95476260
> konstantin.klinger at dcso.de
>
> dcso.de
> blog.dcso.de
>
> PGP: 180D C5B3 3C68 5C9A FB58 6F33 400E 5A35 3307 8D46
>
> DCSO Deutsche Cyber-Sicherheitsorganisation GmbH • EUREF-Campus 22 •
> 10829 Berlin, Germany
> Geschäftsführer: Dr.-Ing. Gunnar Siebert, Sitz der Gesellschaft: Berlin,
> Amtsgericht Charlottenburg HRB 172382

-- 
Regards,
Peter Manev