[Oisf-users] Suricata on 8 cores, ~70K packets/sec

Victor Julien victor at inliniac.net
Tue Feb 15 17:30:17 UTC 2011


Hey Chris, thanks for your report. Comments inline.

On 02/15/2011 08:48 AM, Chris Wakelin wrote:
> Apologies for the long post!
> 
> Here's some logs from the Suricata instance monitoring our student
> residences (no prizes for guessing which rules they trigger most often
> ...). We have an identical machine monitoring the campus network.
> 
>> [7012] 15/2/2011 -- 01:02:57 - (flow-hash.c:324) <Warning> (FlowGetNew) -- [ERRCODE: SC_WARN_FLOW_EMERGENCY(158)] - Warning, engine running with FLOW_EMERGENCY bit set (ts.tv_sec: 1297731777, ts.tv_usec:777481)
>> [7028] 15/2/2011 -- 01:02:57 - (flow.c:1099) <Info> (FlowManagerThread) -- Flow emergency mode over, back to normal... unsetting FLOW_EMERGENCY bit (ts.tv_sec: 1297731777, ts.tv_usec:799576) flow_spare_q status(): 31% flows at the queue
>> [7012] 15/2/2011 -- 01:03:34 - (flow-hash.c:324) <Warning> (FlowGetNew) -- [ERRCODE: SC_WARN_FLOW_EMERGENCY(158)] - Warning, engine running with FLOW_EMERGENCY bit set (ts.tv_sec: 1297731814, ts.tv_usec:850460)
>> [7028] 15/2/2011 -- 01:03:34 - (flow.c:1099) <Info> (FlowManagerThread) -- Flow emergency mode over, back to normal... unsetting FLOW_EMERGENCY bit (ts.tv_sec: 1297731814, ts.tv_usec:870498) flow_spare_q status(): 32% flows at the queue
> 
> We get lots of that eventually (and on the campus instance). We seem to
> be leaking flows somewhere. Any idea how?

You might just try further increasing your flow settings. Also you could
consider lowering the flow-timeouts.

> 
>> [7008] 15/2/2011 -- 08:29:36 - (suricata.c:1258) <Info> (main) -- signal received
>> [7011] 15/2/2011 -- 08:29:36 - (source-pfring.c:311) <Info> (ReceivePfringThreadExitStats) -- (ReceivePfring) Packets 1256720045, bytes 999535465040
>> [7008] 15/2/2011 -- 08:29:36 - (suricata.c:1288) <Info> (main) -- time elapsed 37487s
>> [7011] 15/2/2011 -- 08:29:36 - (source-pfring.c:315) <Info> (ReceivePfringThreadExitStats) -- (ReceivePfring) Pfring Total:1256720045 Recv:1256720045 Drop:0 (0.0%).
>> [7013] 15/2/2011 -- 08:29:36 - (stream-tcp.c:3465) <Info> (StreamTcpExitPrintStats) -- (Stream1) Packets 975933372
> 
> I guess that means Suricata doesn't think it missed anything.

Correct.

>> [7027] 15/2/2011 -- 08:29:36 - (alert-fastlog.c:324) <Info> (AlertFastLogExitPrintStats) -- (Outputs) Alerts 1656
>> [7027] 15/2/2011 -- 08:29:36 - (alert-unified2-alert.c:603) <Info> (Unified2AlertThreadDeinit) -- Alert unified2 module wrote 1656 alerts
>> [7027] 15/2/2011 -- 08:29:36 - (log-httplog.c:404) <Info> (LogHttpLogExitPrintStats) -- (Outputs) HTTP requests 51027
>> [7027] 15/2/2011 -- 08:29:36 - (log-droplog.c:389) <Info> (LogDropLogExitPrintStats) -- (Outputs) Dropped Packets 0
>> [7028] 15/2/2011 -- 08:29:36 - (flow.c:1141) <Info> (FlowManagerThread) -- 5916442 new flows, 2665165 established flows were timed out, 4425494 flows in closed state
> 
> Are these supposed to add up?

The flow numbers? I think they do actually, but never really tested it.

>> [7008] 15/2/2011 -- 08:29:36 - (stream-tcp-reassemble.c:352) <Info> (StreamTcpReassembleFree) -- Max memuse of the stream reassembly engine 67108863 (in use 0)

The stream reassembly code reached it's memcap here, please try setting
it much higher.

>> [7008] 15/2/2011 -- 08:29:36 - (stream-tcp.c:466) <Info> (StreamTcpFreeConfig) -- Max memuse of stream engine 33554304 (in use 0)

Same here.

>> [7008] 15/2/2011 -- 08:29:36 - (detect.c:3335) <Info> (SigAddressCleanupStage1) -- cleaning up signature grouping structure... complete
> 
> I have (tweaked)
> 
>> max-pending-packets: 2000

Think this is fine if you're not dropping packets.

> 
>> flow:
>> #  memcap: 33554432
>> #  hash_size: 65536
>> #  prealloc: 10000
>>   memcap: 268435456
>>   hash_size: 262144
>>   prealloc: 40000
>>   emergency_recovery: 30
>>   prune_flows: 5
> 
> (left at the defaults)
> 
>> stream:
>>   memcap: 33554432              # 32mb
>>   checksum_validation: yes      # reject wrong csums
>>   inline: no                    # no inline mode
>>   reassembly:
>>     memcap: 67108864            # 64mb for reassembly
>>     depth: 1048576              # reassemble 1mb into a stream
> 
> Should these be increased? The machine has loads of memory (16GB).

Ya.

> I have 8 cores, and have "set_cpu_affinity: no" and
> "detect_thread_ratio: 1.5". I'm getting roughly 60% idle on each core
> (more at the moment, and the 99.7% use by Decode1 is unusual):
> 
>> top - 16:34:47 up 4 days, 23:12,  5 users,  load average: 1.91, 2.00, 1.98
>> Tasks: 226 total,   3 running, 223 sleeping,   0 stopped,   0 zombie
>> Cpu0  :  1.7%us,  3.3%sy,  0.0%ni, 95.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu1  :  0.3%us,  2.0%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu2  :  0.6%us,  5.5%sy,  0.0%ni, 93.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu3  : 99.7%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu4  :  2.0%us,  2.6%sy,  0.0%ni, 95.1%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
>> Cpu5  : 11.9%us,  3.5%sy,  0.0%ni, 83.1%id,  0.0%wa,  0.0%hi,  1.5%si,  0.0%st
>> Cpu6  :  1.6%us,  3.2%sy,  0.0%ni, 95.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu7  :  2.6%us,  3.6%sy,  0.0%ni, 93.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> Mem:  16465480k total, 14030348k used,  2435132k free,   247064k buffers
>> Swap:  3905528k total,     9016k used,  3896512k free, 12264960k cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND                                                                                      
>>  8280 snort     20   0 1063m 537m 1864 R 99.7  3.3  30:24.07 3 Decode1
>>  8175 snort     20   0  238m 174m 1008 R 21.2  1.1  65:18.84 5 argus
>>  8281 snort     20   0 1063m 537m 1864 S  4.3  3.3   3:07.09 7 Stream1
>>  8279 snort     20   0 1063m 537m 1864 S  3.0  3.3   1:18.07 6 ReceivePfring
>>  8296 snort     20   0 1063m 537m 1864 S  2.6  3.3   0:45.02 6 FlowManagerThre
>>  8294 snort     20   0 1063m 537m 1864 S  2.3  3.3   1:19.98 6 RespondReject
>>  8295 snort     20   0 1063m 537m 1864 S  2.0  3.3   1:14.83 2 Outputs
>>  8174 snort     20   0  238m 174m 1008 S  1.3  1.1   4:33.75 1 argus
>>  8282 snort     20   0 1063m 537m 1864 S  1.3  3.3   0:54.00 4 Detect1
>>  8283 snort     20   0 1063m 537m 1864 S  1.3  3.3   0:54.16 4 Detect2
>>  8288 snort     20   0 1063m 537m 1864 S  1.3  3.3   0:54.07 7 Detect7
>>  8290 snort     20   0 1063m 537m 1864 S  1.3  3.3   0:54.14 7 Detect9
>>  8292 snort     20   0 1063m 537m 1864 S  1.3  3.3   0:54.18 4 Detect11
>>  8284 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:53.96 0 Detect3
>>  8285 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:54.10 2 Detect4
>>  8286 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:53.94 7 Detect5
>>  8287 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:54.18 0 Detect6
>>  8289 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:54.01 0 Detect8
>>  8291 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:53.83 0 Detect10
>>  8293 snort     20   0 1063m 537m 1864 S  1.0  3.3   0:54.09 2 Detect12
> 
> (The, ahem, user ID is a historical relic ...)

Haha!

> I'm not sure whether setting CPU affinity would help; the comment "On
> Intel Core2 and Nehalem CPU's enabling this will degrade performance"
> put me off, though in fact our CPUs are slightly older:
> 
>> model name      : Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow

I guess you can just try it. Or like Eric suggested, update to the
latest git master and use his new way to control the threading in more
detail...

Cheers,
Victor

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------




More information about the Oisf-users mailing list