[Oisf-users] Packet Loss

Yasha Zislin coolyasha at hotmail.com
Wed Jun 4 15:04:23 UTC 2014


To summarize my environment and set up.
I have one server (16 CPU Cores, 132gb of ram) with two span ports which I monitor with PF_RING and Suricata 2.0.1.
I've configured all of the buffers pretty high. When my suricata is running it is using almost 70 gigs of RAM.
It seems to run pretty good except at some times packet drops start to increase drastically.
I've noticed the correlation between Free Num Packets value in properties of each PF_RING thread. They all get to 0 and then packet drop occurs.
I've also figured out that I can configure Min Num Packets value of PF_RING to be higher than 65536 (or whatever is the max defined). 
I've noticed that I can get it configured to 400,000 to get each thread to use more Slots. Sometimes I would see errors in var/log/messages about this number being too high and memory reservation would warp or something to that sort. At the same time PF_RING works. I've tried using default 65k value and packet drop starts way quicker (when Free Num Packets get to 0 as well).

So my opinion is that there is some sort of buffer which gets overwhelmed by flows or sessions and Suricata just cant keep all of these sessions in memory. 
Most of my traffic is HTTP.
At peak times, one SPAN port gets about 180k packets per second and the other is 150k packets per second. Bandwidth never goes over 1 gig.
I am using PF_RING in Mode 0. 

I do have a lot of Rules (22k) but this is default ruleset from ET PRO.

I've followed a bunch of really good guides:
http://pevma.blogspot.se/2013/12/suricata-and-grand-slam-of-open-source.html
https://home.regit.org/2012/07/suricata-to-10gbps-and-beyond/

My server is beefy enough that I think it should be able to handle the load with 0 packet loss. I have plenty of RAM to increase any buffers.
Here is an output of my stats.log for one of the threads.
capture.kernel_packets    | RxPFReth1516              | 86380604
capture.kernel_drops      | RxPFReth1516              | 195847
dns.memuse                | RxPFReth1516              | 12078
dns.memcap_state          | RxPFReth1516              | 0
dns.memcap_global         | RxPFReth1516              | 0
decoder.pkts              | RxPFReth1516              | 86380604
decoder.bytes             | RxPFReth1516              | 61642944370
decoder.invalid           | RxPFReth1516              | 8628951
decoder.ipv4              | RxPFReth1516              | 85716318
decoder.ipv6              | RxPFReth1516              | 523422
decoder.ethernet          | RxPFReth1516              | 86380604
decoder.raw               | RxPFReth1516              | 0
decoder.sll               | RxPFReth1516              | 0
decoder.tcp               | RxPFReth1516              | 72885123
decoder.udp               | RxPFReth1516              | 4365200
decoder.sctp              | RxPFReth1516              | 0
decoder.icmpv4            | RxPFReth1516              | 27438
decoder.icmpv6            | RxPFReth1516              | 40
decoder.ppp               | RxPFReth1516              | 0
decoder.pppoe             | RxPFReth1516              | 0
decoder.gre               | RxPFReth1516              | 0
decoder.vlan              | RxPFReth1516              | 0
decoder.vlan_qinq         | RxPFReth1516              | 0
decoder.teredo            | RxPFReth1516              | 309
decoder.ipv4_in_ipv6      | RxPFReth1516              | 0
decoder.ipv6_in_ipv6      | RxPFReth1516              | 0
decoder.avg_pkt_size      | RxPFReth1516              | 713
decoder.max_pkt_size      | RxPFReth1516              | 1514
defrag.ipv4.fragments     | RxPFReth1516              | 34145
defrag.ipv4.reassembled   | RxPFReth1516              | 125
defrag.ipv4.timeouts      | RxPFReth1516              | 0
defrag.ipv6.fragments     | RxPFReth1516              | 0
defrag.ipv6.reassembled   | RxPFReth1516              | 0
defrag.ipv6.timeouts      | RxPFReth1516              | 0
defrag.max_frag_hits      | RxPFReth1516              | 0
tcp.sessions              | RxPFReth1516              | 563119
tcp.ssn_memcap_drop       | RxPFReth1516              | 0
tcp.pseudo                | RxPFReth1516              | 42359
tcp.invalid_checksum      | RxPFReth1516              | 0
tcp.no_flow               | RxPFReth1516              | 0
tcp.reused_ssn            | RxPFReth1516              | 823
tcp.memuse                | RxPFReth1516              | 81792
tcp.syn                   | RxPFReth1516              | 649078
tcp.synack                | RxPFReth1516              | 556641
tcp.rst                   | RxPFReth1516              | 11970453
tcp.segment_memcap_drop   | RxPFReth1516              | 0
tcp.stream_depth_reached  | RxPFReth1516              | 66
tcp.reassembly_memuse     | RxPFReth1516              | 16419280000
tcp.reassembly_gap        | RxPFReth1516              | 150821
http.memuse               | RxPFReth1516              | 171054
http.memcap               | RxPFReth1516              | 0
detect.alert              | RxPFReth1516              | 872
flow_mgr.closed_pruned    | FlowManagerThread         | 40806816
flow_mgr.new_pruned       | FlowManagerThread         | 18269012
flow_mgr.est_pruned       | FlowManagerThread         | 8070620
flow.memuse               | FlowManagerThread         | 11742233656
flow.spare                | FlowManagerThread         | 40001391
flow.emerg_mode_entered   | FlowManagerThread         | 0
flow.emerg_mode_over      | FlowManagerThread         | 0


Here is some info on start up of Suricata:
3/6/2014 -- 15:35:29 - <Notice> - This is Suricata version 2.0.1 RELEASE
3/6/2014 -- 15:35:29 - <Info> - CPUs/cores online: 16
3/6/2014 -- 15:35:29 - <Info> - Live rule reloads enabled
3/6/2014 -- 15:35:29 - <Info> - 'default' server has 'request-body-minimal-inspect-size' set to 33882 and 'request-body-inspect-window' set to 4053 after randomization.
3/6/2014 -- 15:35:29 - <Info> - 'default' server has 'response-body-minimal-inspect-size' set to 33695 and 'response-body-inspect-window' set to 4218 after randomization.
3/6/2014 -- 15:35:29 - <Info> - 'apache' server has 'request-body-minimal-inspect-size' set to 34116 and 'request-body-inspect-window' set to 3973 after randomization.
3/6/2014 -- 15:35:29 - <Info> - 'apache' server has 'response-body-minimal-inspect-size' set to 32229 and 'response-body-inspect-window' set to 4205 after randomization.
3/6/2014 -- 15:35:29 - <Info> - 'iis7' server has 'request-body-minimal-inspect-size' set to 32040 and 'request-body-inspect-window' set to 4118 after randomization.
3/6/2014 -- 15:35:29 - <Info> - 'iis7' server has 'response-body-minimal-inspect-size' set to 32694 and 'response-body-inspect-window' set to 4148 after randomization.
3/6/2014 -- 15:35:29 - <Info> - DNS request flood protection level: 500
3/6/2014 -- 15:35:29 - <Info> - DNS per flow memcap (state-memcap): 524288
3/6/2014 -- 15:35:29 - <Info> - DNS global memcap: 16777216
3/6/2014 -- 15:35:29 - <Info> - allocated 3670016 bytes of memory for the defrag hash... 65536 buckets of size 56
3/6/2014 -- 15:35:29 - <Info> - preallocated 65535 defrag trackers of size 152
3/6/2014 -- 15:35:29 - <Info> - defrag memory usage: 13631336 bytes, maximum: 2147483648
3/6/2014 -- 15:35:29 - <Info> - AutoFP mode using default "Active Packets" flow load balancer
3/6/2014 -- 15:35:30 - <Info> - preallocated 65000 packets. Total memory 227370000
3/6/2014 -- 15:35:30 - <Info> - allocated 6400000 bytes of memory for the host hash... 100000 buckets of size 64
3/6/2014 -- 15:35:30 - <Info> - preallocated 100000 hosts of size 112
3/6/2014 -- 15:35:30 - <Info> - host memory usage: 19200000 bytes, maximum: 10737418240
3/6/2014 -- 15:35:30 - <Info> - allocated 192000000 bytes of memory for the flow hash... 3000000 buckets of size 64
3/6/2014 -- 15:35:44 - <Info> - preallocated 40000000 flows of size 280
3/6/2014 -- 15:35:44 - <Info> - flow memory usage: 11712000000 bytes, maximum: 21474836480
3/6/2014 -- 15:35:44 - <Info> - IP reputation disabled
3/6/2014 -- 15:35:44 - <Info> - using magic-file /usr/share/file/magic
3/6/2014 -- 15:35:45 - <Info> - Delayed detect disabled

3/6/2014 -- 15:36:20 - <Info> - 50 rule files processed. 22607 rules successfully loaded, 8 rules failed
3/6/2014 -- 15:36:21 - <Info> - 22613 signatures processed. 1254 are IP-only rules, 9129 are inspecting packet payload, 15210 inspect application layer, 72 are decoder event only
3/6/2014 -- 15:36:21 - <Info> - building signature grouping structure, stage 1: preprocessing rules... complete
3/6/2014 -- 15:36:21 - <Info> - building signature grouping structure, stage 2: building source address list... complete
3/6/2014 -- 15:39:38 - <Info> - building signature grouping structure, stage 3: building destination address lists... complete
3/6/2014 -- 15:40:09 - <Info> - Threshold config parsed: 69 rule(s) found
3/6/2014 -- 15:40:09 - <Info> - Core dump size set to unlimited.
3/6/2014 -- 15:40:09 - <Info> - dropped the caps for main thread
3/6/2014 -- 15:40:09 - <Info> - Unified2-alert initialized: filename unified2.alert, limit 32 MB
3/6/2014 -- 15:40:09 - <Info> - Syslog output initialized
3/6/2014 -- 15:40:09 - <Info> - Adding interface eth17 from config file
3/6/2014 -- 15:40:09 - <Info> - Adding interface eth15 from config file
3/6/2014 -- 15:40:09 - <Info> - Using flow cluster mode for PF_RING (iface eth17)
3/6/2014 -- 15:40:09 - <Info> - Going to use 16 thread(s)
3/6/2014 -- 15:40:10 - <Info> - (RxPFReth171) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:10 - <Info> - (RxPFReth172) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:11 - <Info> - (RxPFReth173) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:12 - <Info> - (RxPFReth174) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:12 - <Info> - (RxPFReth175) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:13 - <Info> - (RxPFReth176) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:13 - <Info> - (RxPFReth177) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:14 - <Info> - (RxPFReth178) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:15 - <Info> - (RxPFReth179) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:15 - <Info> - (RxPFReth1710) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:16 - <Info> - (RxPFReth1711) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:17 - <Info> - (RxPFReth1712) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:17 - <Info> - (RxPFReth1713) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:18 - <Info> - (RxPFReth1714) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:19 - <Info> - (RxPFReth1715) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:19 - <Info> - (RxPFReth1716) Using PF_RING v.6.0.2, interface eth17, cluster-id 99
3/6/2014 -- 15:40:19 - <Info> - Using flow cluster mode for PF_RING (iface eth15)
3/6/2014 -- 15:40:19 - <Info> - Going to use 16 thread(s)
3/6/2014 -- 15:40:20 - <Info> - (RxPFReth151) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:21 - <Info> - (RxPFReth152) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:22 - <Info> - (RxPFReth153) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:22 - <Info> - (RxPFReth154) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:23 - <Info> - (RxPFReth155) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:24 - <Info> - (RxPFReth156) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:24 - <Info> - (RxPFReth157) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:25 - <Info> - (RxPFReth158) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:26 - <Info> - (RxPFReth159) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:27 - <Info> - (RxPFReth1510) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:27 - <Info> - (RxPFReth1511) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:28 - <Info> - (RxPFReth1512) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:29 - <Info> - (RxPFReth1513) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:29 - <Info> - (RxPFReth1514) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:30 - <Info> - (RxPFReth1515) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:31 - <Info> - (RxPFReth1516) Using PF_RING v.6.0.2, interface eth15, cluster-id 99
3/6/2014 -- 15:40:31 - <Info> - RunModeIdsPfringWorkers initialised
3/6/2014 -- 15:40:31 - <Info> - stream "prealloc-sessions": 2000000 (per thread)
3/6/2014 -- 15:40:31 - <Info> - stream "memcap": 85899345920
3/6/2014 -- 15:40:31 - <Info> - stream "midstream" session pickups: disabled
3/6/2014 -- 15:40:31 - <Info> - stream "async-oneside": disabled
3/6/2014 -- 15:40:31 - <Info> - stream "checksum-validation": disabled
3/6/2014 -- 15:40:31 - <Info> - stream."inline": disabled
3/6/2014 -- 15:40:31 - <Info> - stream "max-synack-queued": 5
3/6/2014 -- 15:40:31 - <Info> - stream.reassembly "memcap": 42949672960
3/6/2014 -- 15:40:31 - <Info> - stream.reassembly "depth": 4194304
3/6/2014 -- 15:40:31 - <Info> - stream.reassembly "toserver-chunk-size": 2617
3/6/2014 -- 15:40:31 - <Info> - stream.reassembly "toclient-chunk-size": 2631
3/6/2014 -- 15:40:31 - <Info> - stream.reassembly.raw: enabled
3/6/2014 -- 15:40:31 - <Info> - segment pool: pktsize 4, prealloc 15000
3/6/2014 -- 15:40:31 - <Info> - segment pool: pktsize 16, prealloc 20000
3/6/2014 -- 15:40:31 - <Info> - segment pool: pktsize 112, prealloc 100000
3/6/2014 -- 15:40:31 - <Info> - segment pool: pktsize 248, prealloc 100000
3/6/2014 -- 15:40:31 - <Info> - segment pool: pktsize 512, prealloc 100000
3/6/2014 -- 15:40:31 - <Info> - segment pool: pktsize 768, prealloc 100000
3/6/2014 -- 15:40:32 - <Info> - segment pool: pktsize 1448, prealloc 1000000
3/6/2014 -- 15:40:33 - <Info> - segment pool: pktsize 65535, prealloc 100000
3/6/2014 -- 15:40:33 - <Info> - stream.reassembly "chunk-prealloc": 2000000
3/6/2014 -- 15:40:40 - <Notice> - all 32 packet processing threads, 3 management threads initialized, engine started.

I am going crazy from trying to figure out where is the bottleneck. Any help would be appreciated.

Thanks.
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openinfosecfoundation.org/pipermail/oisf-users/attachments/20140604/cec5ed8e/attachment-0001.html>


More information about the Oisf-users mailing list