[Oisf-users] Suffering Simultaneous Suricata Segfaults

Fri Sep 28 02:32:27 UTC 2018

Thanks Peter - 

I've recompiled the 4.1rc1 using the following :

./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --enable-lua --enable-unix-socket --enable-geoip --with-libhs-includes=/usr/local/include/hs/ --with-libhs-libraries=/usr/local/lib/ --with-liblzma-includes=/usr/include/lzma/ --with-liblzma-libraries=/usr/lib64/ --enable-profiling CFLAGS="-ggdb -O0"

Before starting the application I also entered the following settings:

ulimit -c unlimited
sysctl -w kernel.core_pattern=/tmp/core/-%e.%p.%h.%t

Sean

-----Original Message-----
From: Peter Manev <petermanev at gmail.com> 
Sent: Thursday, September 27, 2018 11:55 AM
To: Cloherty, Sean E <scloherty at mitre.org>
Cc: lists at inliniac.net; Open Information Security Foundation <oisf-users at lists.openinfosecfoundation.org>
Subject: Re: [Oisf-users] Suffering Simultaneous Suricata Segfaults

On Thu, Sep 27, 2018 at 3:51 PM Cloherty, Sean E <scloherty at mitre.org> wrote:
>
> Hello Victor -
>
> I am not sure if the actual fault messages came across in my previous email. Below is what I've got from syslog - (apologies if the tabs and spaces mess up the faux table).  No core dump so I've gone back and reverted the two test servers to the settings that they had when they faulted, enabled.  Now I need to puzzle through enabling cored dumps on CentOS 7.
>
> TIME                    HOST                    SURICATA        SEGFAULT
> 9/25/2018 18:26 production host #1      4.04     kernel: W#14-ens1f1[29348]: segfault at 0 ip 0000000000597207 sp 00007f918b7fbef0 error 4 in suricata[400000+256000]
> 9/25/2018 18:26 test-host #1            4.1rc1   kernel: W#03-ens1f1[24471]: segfault at 0 ip 00000000005b7787 sp 00007f6650b27cb0 error 4 in suricata[400000+28c000]
> 9/25/2018 18:26 production host #3      4.04     kernel: W#06-ens1f1[24268]: segfault at 0 ip 0000000000597207 sp 00007f3a077fbef0 error 4 in suricata[400000+256000]
> 9/25/2018 18:26 test-host #2            4.05     kernel: W#01-ens1f1[4720]: segfault at 0 ip 000000000059b557 sp 00007efc6e69cde0 error 4 in suricata[400000+265000]
> 9/25/2018 18:27 test-host #2            4.05     kernel: W#07-ens1f1[4406]: segfault at 0 ip 000000000059b557 sp 00007fc4c2504de0 error 4 in suricata[400000+265000]
>

Sorry to jump in  - just though as an option -  since you have   test
host in the mix with 4.1rc1 you could compile with debugging enabled and get a useful coredump for investigation (whenever/if happens again).
https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Reporting_Bugs
Basically if you can add ->  CFLAGS="-ggdb -O0"
to your config line.

Thanks

> -----Original Message-----
> From: Oisf-users <oisf-users-bounces at lists.openinfosecfoundation.org> 
> On Behalf Of Victor Julien
> Sent: Thursday, September 27, 2018 1:23 AM
> To: oisf-users at lists.openinfosecfoundation.org
> Subject: Re: [Oisf-users] Suffering Simultaneous Suricata Segfaults
>
> On 26-09-18 18:55, Cloherty, Sean E wrote:
> > I was troubleshooting instances of Suricata being down on multiple 
> > hosts and I found that 2 production hosts running 4.04 and 2 test 
> > hosts running 4.05 and 4.1rc1 faulted at roughly the same time.
> > Strangely,  2 additional production hosts running 4.04 on duplicate 
> > hardware have not had any issues to date.  Below is the outline of 
> > what I’ve been able to put together this morning.
> >
> >
>
> Did any of the instances dump a core file you can inspect?
>
> Another way to get more info based on the lines you posted is 
> described
> here:
> https://stackoverflow.com/questions/2549214/interpreting-segfault-mess
> ages could you try to see if you can get more info about where in the 
> code the crash happens?
>
>
> >
> > What is the same across all platforms faulting or not:
> >
> >
> >
> > All use tpacket v3 & AF-PACKET
> >
> > All use workers mode
> >
> > All are in IDS mode
> >
> > All ingest traffic from Gigamon taps
> >
> > All are running CentOS 7.5 64bit
> >
> > All use Intel(R) 10GbE PCI Express Linux Network Driver 5.3.7
> >
> > All use Intel Corporation 82599ES 10-Gigabit SFI/SFP+
> >
> > What is different:
> >
> >
> >
> > NO FAULT:          #zero-copy-size: 128
> >
> > FAULT:                  zero-copy-size: 128
>
> This option is no longer used by any of the versions you are using.
>
>
> > NO FAULT:
> >
> >         prio:
> >
> > #          low: [ 0 ]
> >
> > #          medium: [ "1-2" ]
> >
> > #          high: [ 3 ]
> >
> >           default: "high"
> >
> >
> >
> > FAULT:
> >
> >         prio:
> >
> >           low: [ 0 ]
> >
> >           medium: [ "1-2" ]
> >
> >           high: [ 3 ]
> >
> >           default: "high"
> Would be weird if this did anything.
>
> --
> ---------------------------------------------
> Victor Julien
> http://www.inliniac.net/
> PGP: http://www.inliniac.net/victorjulien.asc
> ---------------------------------------------
>
> _______________________________________________
> Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
> Site: http://suricata-ids.org | Support: 
> http://suricata-ids.org/support/
> List: 
> https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>
> Conference: https://suricon.net
> Trainings: https://suricata-ids.org/training/
> _______________________________________________
> Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
> Site: http://suricata-ids.org | Support: 
> http://suricata-ids.org/support/
> List: 
> https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>
> Conference: https://suricon.net
> Trainings: https://suricata-ids.org/training/

--
Regards,
Peter Manev