[Oisf-users] Battling segfaults on 3.2.1

Duarte Silva duarte.silva at serializing.me
Thu Apr 13 04:52:59 UTC 2017


Hi Sean,

To debug such situations what I do is:
- install Suricata debug symbols
- install gdb
- launch Suricata and attach gdb
- when the error occurs, I look at the call trace and stack to determine where the problem is and maybe the why.

To help reproduce the error, I would have tcpdump creating a network packet dump so that I could replay traffic to Suricata.

Cheers,
Duarte

De: Cloherty, Sean E
Enviado: 12 de abril de 2017 18:18
Para: oisf-users at lists.openinfosecfoundation.org
Assunto: [Oisf-users] Battling segfaults on 3.2.1

I am running 3.2.1 on 4 identical servers.  Two of them started having segfaults and traps.  

Troubleshooting - Compared yamls amd found an extra 0 (making the tracker 10x larger) in the SMTP mime section for inspected-tracker for file data keyword.  Also, one system had 2gb vs. 4gb for the http memcap in the app layer protocol config.  I changed the yamls to match the less problematic server.  I also took the opportunity to recompile Suricata with Hyperscan (Thank you Derek Spransy and Justin Viiret!).

On one box I’ve had no segfaults since the April 7th (following the changes). The other one continues to have the problem 2-3 times a day at random hours – mid-morning, early evening, sometimes after midnight. Messages in the system log only include the actual fault message and nothing else. The fault always points to a worker thread and the numbers vary W#01-ensf1 or  W#15-ens1f1 etc.   Two types of errors come up from segfaults

       error 4 in suricata[400000+242000] or 
       error 5 in suricata[400000+242000]

Trap messages seem to have stopped on April 7th (following the changes), but also had error messages with the same info in the brackets –

       error:0 in suricata[400000+242000]


I’ve attached a zip file of the startup script, suricata.yaml, the suricata.log, stats.log, a copy of the faults listed in the /var/log/messages, and a textfule with the time and date of crashes.  The server details follow:

GENERAL SERVER INFO :

- CentOS Linux release 7.3.1611 (Core) 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
- Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz - 16 cores / 32 threads 
- 128GB of RAM
- Capture NIC is a dual port Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
- NIC Driver is Intel(R) 10GbE PCI Express Linux Network Driver - version 4.6.4
- Max traffic seen on the interface in the last 4 months has been 1.2 Gb/s, but usually mid-day peaks are around 1.1 Gb/s


Any suggestions of what to check next?

Sean


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openinfosecfoundation.org/pipermail/oisf-users/attachments/20170413/fc62d942/attachment-0002.html>


More information about the Oisf-users mailing list