[Oisf-users] Suricata on 8 cores, ~70K packets/sec

Chris Wakelin c.d.wakelin at reading.ac.uk
Tue Feb 15 11:48:04 EST 2011


Apologies for the long post!

Here's some logs from the Suricata instance monitoring our student
residences (no prizes for guessing which rules they trigger most often
...). We have an identical machine monitoring the campus network.

> [7012] 15/2/2011 -- 01:02:57 - (flow-hash.c:324) <Warning> (FlowGetNew) -- [ERRCODE: SC_WARN_FLOW_EMERGENCY(158)] - Warning, engine running with FLOW_EMERGENCY bit set (ts.tv_sec: 1297731777, ts.tv_usec:777481)
> [7028] 15/2/2011 -- 01:02:57 - (flow.c:1099) <Info> (FlowManagerThread) -- Flow emergency mode over, back to normal... unsetting FLOW_EMERGENCY bit (ts.tv_sec: 1297731777, ts.tv_usec:799576) flow_spare_q status(): 31% flows at the queue
> [7012] 15/2/2011 -- 01:03:34 - (flow-hash.c:324) <Warning> (FlowGetNew) -- [ERRCODE: SC_WARN_FLOW_EMERGENCY(158)] - Warning, engine running with FLOW_EMERGENCY bit set (ts.tv_sec: 1297731814, ts.tv_usec:850460)
> [7028] 15/2/2011 -- 01:03:34 - (flow.c:1099) <Info> (FlowManagerThread) -- Flow emergency mode over, back to normal... unsetting FLOW_EMERGENCY bit (ts.tv_sec: 1297731814, ts.tv_usec:870498) flow_spare_q status(): 32% flows at the queue

We get lots of that eventually (and on the campus instance). We seem to
be leaking flows somewhere. Any idea how?

> [7008] 15/2/2011 -- 08:29:36 - (suricata.c:1258) <Info> (main) -- signal received
> [7011] 15/2/2011 -- 08:29:36 - (source-pfring.c:311) <Info> (ReceivePfringThreadExitStats) -- (ReceivePfring) Packets 1256720045, bytes 999535465040
> [7008] 15/2/2011 -- 08:29:36 - (suricata.c:1288) <Info> (main) -- time elapsed 37487s
> [7011] 15/2/2011 -- 08:29:36 - (source-pfring.c:315) <Info> (ReceivePfringThreadExitStats) -- (ReceivePfring) Pfring Total:1256720045 Recv:1256720045 Drop:0 (0.0%).
> [7013] 15/2/2011 -- 08:29:36 - (stream-tcp.c:3465) <Info> (StreamTcpExitPrintStats) -- (Stream1) Packets 975933372

I guess that means Suricata doesn't think it missed anything.

> [7027] 15/2/2011 -- 08:29:36 - (alert-fastlog.c:324) <Info> (AlertFastLogExitPrintStats) -- (Outputs) Alerts 1656
> [7027] 15/2/2011 -- 08:29:36 - (alert-unified2-alert.c:603) <Info> (Unified2AlertThreadDeinit) -- Alert unified2 module wrote 1656 alerts
> [7027] 15/2/2011 -- 08:29:36 - (log-httplog.c:404) <Info> (LogHttpLogExitPrintStats) -- (Outputs) HTTP requests 51027
> [7027] 15/2/2011 -- 08:29:36 - (log-droplog.c:389) <Info> (LogDropLogExitPrintStats) -- (Outputs) Dropped Packets 0
> [7028] 15/2/2011 -- 08:29:36 - (flow.c:1141) <Info> (FlowManagerThread) -- 5916442 new flows, 2665165 established flows were timed out, 4425494 flows in closed state

Are these supposed to add up?

> [7008] 15/2/2011 -- 08:29:36 - (stream-tcp-reassemble.c:352) <Info> (StreamTcpReassembleFree) -- Max memuse of the stream reassembly engine 67108863 (in use 0)
> [7008] 15/2/2011 -- 08:29:36 - (stream-tcp.c:466) <Info> (StreamTcpFreeConfig) -- Max memuse of stream engine 33554304 (in use 0)
> [7008] 15/2/2011 -- 08:29:36 - (detect.c:3335) <Info> (SigAddressCleanupStage1) -- cleaning up signature grouping structure... complete

I have (tweaked)

> max-pending-packets: 2000

> flow:
> #  memcap: 33554432
> #  hash_size: 65536
> #  prealloc: 10000
>   memcap: 268435456
>   hash_size: 262144
>   prealloc: 40000
>   emergency_recovery: 30
>   prune_flows: 5

(left at the defaults)

> stream:
>   memcap: 33554432              # 32mb
>   checksum_validation: yes      # reject wrong csums
>   inline: no                    # no inline mode
>   reassembly:
>     memcap: 67108864            # 64mb for reassembly
>     depth: 1048576              # reassemble 1mb into a stream

Should these be increased? The machine has loads of memory (16GB).

I have 8 cores, and have "set_cpu_affinity: no" and
"detect_thread_ratio: 1.5". I'm getting roughly 60% idle on each core
(more at the moment, and the 99.7% use by Decode1 is unusual):

> top - 16:34:47 up 4 days, 23:12,  5 users,  load average: 1.91, 2.00, 1.98
> Tasks: 226 total,   3 running, 223 sleeping,   0 stopped,   0 zombie
> Cpu0  :  1.7%us,  3.3%sy,  0.0%ni, 95.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu1  :  0.3%us,  2.0%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu2  :  0.6%us,  5.5%sy,  0.0%ni, 93.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  : 99.7%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu4  :  2.0%us,  2.6%sy,  0.0%ni, 95.1%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
> Cpu5  : 11.9%us,  3.5%sy,  0.0%ni, 83.1%id,  0.0%wa,  0.0%hi,  1.5%si,  0.0%st
> Cpu6  :  1.6%us,  3.2%sy,  0.0%ni, 95.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu7  :  2.6%us,  3.6%sy,  0.0%ni, 93.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  16465480k total, 14030348k used,  2435132k free,   247064k buffers
> Swap:  3905528k total,     9016k used,  3896512k free, 12264960k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND                                                                                      
>  8280 snort     20   0 1063m 537m 1864 R 99.7  3.3  30:24.07 3 Decode1
>  8175 snort     20   0  238m 174m 1008 R 21.2  1.1  65:18.84 5 argus
>  8281 snort     20   0 1063m 537m 1864 S  4.3  3.3   3:07.09 7 Stream1
>  8279 snort     20   0 1063m 537m 1864 S  3.0  3.3   1:18.07 6 ReceivePfring
>  8296 snort     20   0 1063m 537m 1864 S  2.6  3.3   0:45.02 6 FlowManagerThre
>  8294 snort     20   0 1063m 537m 1864 S  2.3  3.3   1:19.98 6 RespondReject
>  8295 snort     20   0 1063m 537m 1864 S  2.0  3.3   1:14.83 2 Outputs
>  8174 snort     20   0  238m 174m 1008 S  1.3  1.1   4:33.75 1 argus
>  8282 snort     20   0 1063m 537m 1864 S  1.3  3.3   0:54.00 4 Detect1
>  8283 snort     20   0 1063m 537m 1864 S  1.3  3.3   0:54.16 4 Detect2
>  8288 snort     20   0 1063m 537m 1864 S  1.3  3.3   0:54.07 7 Detect7
>  8290 snort     20   0 1063m 537m 1864 S  1.3  3.3   0:54.14 7 Detect9
>  8292 snort     20   0 1063m 537m 1864 S  1.3  3.3   0:54.18 4 Detect11
>  8284 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:53.96 0 Detect3
>  8285 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:54.10 2 Detect4
>  8286 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:53.94 7 Detect5
>  8287 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:54.18 0 Detect6
>  8289 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:54.01 0 Detect8
>  8291 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:53.83 0 Detect10
>  8293 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:54.09 2 Detect12

(The, ahem, user ID is a historical relic ...)

I'm not sure whether setting CPU affinity would help; the comment "On
Intel Core2 and Nehalem CPU's enabling this will degrade performance"
put me off, though in fact our CPUs are slightly older:

> model name      : Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow

Any hints and pointers would be most welcome!

Best Wishes,
Chris

--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 8439
Whiteknights, Reading, RG6 6AF, UK              Fax: +44 (0)118 975 3094


More information about the Oisf-users mailing list