[Oisf-users] 3.2 - Wildly Dropping Packets

Cloherty, Sean E scloherty at mitre.org
Tue Dec 6 17:39:48 UTC 2016


I've implemented the suggested changes except where noted below.  Reaching the midday traffic peak with these settings, the drop rate is now 11%.


"in the af-packet section try 14 threads as oppose to 28. When you do that - increase the ring-size: 150000 (as the value is per thread)" 
- The machine has 16 HT cores so therefore 32 threads and I'm not using any CPU affinity.  Also the manual notes that more CPU was better than more RAM (7.5 High Performance Consideration).  Is there a balance or ratio that is optimal?  I lowered it to 16 for now.

"-switch midstream and async-oneside off(false) - unless you explicitly want it that way for a reason. Also give depth some definitive number as opposed to 0 (unlimited)"
- I enabled them to test whether or not some rules were failing to fire b/c of unexamined traffic or reaching the set depth prior to examining an email attachment (rules troubleshooting).  What impact will the change make? I've left them as is for now.  

"- switch mpm algos. ac-ks is the most performant out of the box with hyperscan being the best when/where available (at least in our tests)."
- I've tried a few algos to see if they would impact rules which content match email attachments, but I've had no luck. I expect that this will impact CPU or RAM utilization and drops but can different algorithms impact on rules firing or not?

"- adjust max-pending-packets: 65534"
- https://home.regit.org/2012/07/suricata-to-10gbps-and-beyond/  is where I got the idea that higher  m-p-p wouldn't be useful in workers mode based on question from Victor Julien asking about workers mode. Have the changes to Suricata since then invalidated that?

" I noticed that you load about 1500 rules  - are those just selected/narrowed from a bigger ruleset or you write your own rules as well?"
- The rules are primarily our own rules.

" I think the fact that you have the counters displaying no drops at all is misleading out of suboptimal config combination."
- Well that took the wind from my sails.  Should I always expect at least some drops no matter the server hardware and NIC config?



-----Original Message-----
From: Peter Manev [mailto:petermanev at gmail.com] 
Sent: Monday, December 05, 2016 19:11 PM
To: Cloherty, Sean E <scloherty at mitre.org>
Cc: oisf-users at lists.openinfosecfoundation.org
Subject: Re: [Oisf-users] 3.2 - Wildly Dropping Packets

On Mon, Dec 5, 2016 at 8:28 PM, Cloherty, Sean E <scloherty at mitre.org> wrote:
> Latest - at peak traffic today around noon, the drop rate was around 14% which is a vast improvement.  I have 4 other servers with the same hardware and similar traffic, all running 3.1.3 that have almost no packet drop at all.

I think the fact that you have the counters displaying no drops at all is misleading out of suboptimal config combination.

I had gone through the yaml file provided. With the HW you have you should be able to grind much more than 2Gbps I believe.

I noticed that you load about 1500 rules  - are those just selected/narrowed from a bigger ruleset or you write your own rules as well?

Some more suggestions and comments:

in the af-packet section
- the rollover option that was on - is highly experimental atm.
- buffer-size - should only be used used in the case of non mmap sockets (regit was just updating me). ring-size is the one that matters with af-packet and mmap - "To use the ring feature of AF_PACKET, set 'use-mmap' to yes"  and I have only seen better results with it.

Some more suggestions:

- in the af-packet section try 14 threads as oppose to 28. When you do that - increase the ring-size: 150000 (as the value is per thread)
- adjust max-pending-packets: 65534

- try custom number of groups
detect:
  profile: custom
  custom-values:
    toclient-groups: 500
    toserver-groups: 500

- and port groupings

  grouping:
    tcp-whitelist: 53, 80, 139, 443, 445, 1433, 3306, 3389, 6666, 6667, 8080
    udp-whitelist: 53, 135, 5060

Increases performance as a result of a major rewrite of the detection engine and simplifying the rule grouping in 3.1

- switch midstream and async-oneside off(false) - unless you explicitly want it that way for a reason. Also give depth some definitive number as opposed to 0 (unlimited)

stream:
  memcap: 12gb
  checksum-validation: no       # reject wrong csums
  inline: no                    # auto will use inline mode in IPS
mode, yes or no set it statically
  midstream: false           # don't allow midstream session pickups
  async-oneside: false
  reassembly:
    memcap: 32gb
    depth: 2mb

- switch mpm algos. ac-ks is the most performant out of the box with hyperscan being the best when/where available (at least in our tests).

mpm-algo: ac-ks

- make sure that line stays enabled. Suricta will switch things like gro/lro off but not all of the bellow

ethtool -K ens1f1 sg off gro off lro off tso off gso off rx off tx off rxvlan off txvlan off

Please let us know how it goes.

Thanks


>
> -----Original Message-----
> From: Oisf-users 
> [mailto:oisf-users-bounces at lists.openinfosecfoundation.org] On Behalf 
> Of Cloherty, Sean E
> Sent: Friday, December 02, 2016 16:31 PM
> To: Peter Manev <petermanev at gmail.com>
> Cc: oisf-users at lists.openinfosecfoundation.org
> Subject: Re: [Oisf-users] 3.2 - Wildly Dropping Packets
>
> I've attached the latest copy after making the changes to af-packet settings as you had noted.  Although the traffic tapered off later in the day, the stats appear to have improved.  I'm puzzled that commenting out the buffers setting (which I'd set to double the default) seems to have improved performance.  One observation of note - the number of CPUs in use and % utilization increased significantly.  Previously I saw a few - maybe 5-6 running at 30-40%.  Now there were 10-14 running at 30-70 %.  Until mid-morning Monday, I don't expect to get a good read while under load.
>
> Are there any other changes that you'd recommend / yaml settings I missed ? Should I re-enable the ethtool offload setting I had remarked out?
>
> 1/12/2016 -- 13:26:03 - <Notice> - Stats for 'ens1f1':  pkts: 
> 3022883600, drop: 2562703071 (84.78%), invalid chksum: 0
> 1/12/2016 -- 15:11:44 - <Notice> - Stats for 'ens1f1':  pkts: 
> 1116249532, drop: 1017784357 (91.18%), invalid chksum: 0
> 1/12/2016 -- 15:14:21 - <Notice> - Stats for 'ens1f1':  pkts: 1640124, 
> drop: 989798 (60.35%), invalid chksum: 4
> 2/12/2016 -- 13:13:56 - <Notice> - Stats for 'ens1f1':  pkts: 
> 7249810905, drop: 229091986 (3.16%), invalid chksum: 30830
> 2/12/2016 -- 16:19:43 - <Notice> - Stats for 'ens1f1':  pkts: 
> 1846855151, drop: 343284745 (18.59%), invalid chksum: 0
>
> Thanks again.
>
> -----Original Message-----
> From: Peter Manev [mailto:petermanev at gmail.com]
> Sent: Friday, December 02, 2016 12:23 PM
> To: Cloherty, Sean E <scloherty at mitre.org>
> Cc: oisf-users at lists.openinfosecfoundation.org
> Subject: Re: [Oisf-users] 3.2 - Wildly Dropping Packets
>
> On Fri, Dec 2, 2016 at 2:21 PM, Cloherty, Sean E <scloherty at mitre.org> wrote:
>> Thanks for the quick response -
>>
>> My startup script, suricata.log and suricata.yaml are attached. One note - the entries on the log are with a modified yaml where I was testing the unix sockets but if you ignore those errors, the output is pretty much identical to what is usually displayed.
>>
>> OS info is:
>> CentOS Linux release 7.2.1511 (Core) /  3.10.0-327.36.3.el7.x86_64 
>> The server has 16 cores / 32 threads, 128GB of RAM, has an 10Gb Intel NIC running 5.2.15-k ixgbe drivers.
>> Max traffic seen on the interface in the last 4 months has been 1.9 
>> Gb/s, but usually mid-day peaks are around 1.3 Gb/s
>>
>
> I have a suggestion for you to try out - if you could.
>
> With Suricata 3.2 - in the af-packet section comment out:
> ->    rollover: yes
> ->    buffer-size: 65536
> like so
> ->    #rollover: yes
> ->    #buffer-size: 65536
>
>
> restart (or start) Suricata 3.2. Let it run for a bit and if you could please share the results from stats.log again?
>
> Thanks
>
>> Sean.
>>
>> -----Original Message-----
>> From: Peter Manev [mailto:petermanev at gmail.com]
>> Sent: Friday, December 02, 2016 02:51 AM
>> To: Cloherty, Sean E <scloherty at mitre.org>
>> Cc: oisf-users at lists.openinfosecfoundation.org
>> Subject: Re: [Oisf-users] 3.2 - Wildly Dropping Packets
>>
>> On Thu, Dec 1, 2016 at 8:51 PM, Cloherty, Sean E <scloherty at mitre.org> wrote:
>>>
>>> Thankfully this is a test box, but it has been cooking along with a 
>>> less than 1% drop rate until I upgraded from 3.1.3 to 3.2
>>>
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> -
>>> --------------
>>>
>>> Date: 12/1/2016 -- 13:18:53 (uptime: 0d, 04h 29m 24s)
>>>
>>> --------------------------------------------------------------------
>>> -
>>> -
>>> --------------
>>>
>>> Counter                                    | TM Name                   | Value
>>>
>>> --------------------------------------------------------------------
>>> -
>>> -
>>> --------------
>>>
>>> capture.kernel_packets                     | Total                     | 2926059934
>>>
>>> capture.kernel_drops                       | Total                     | 2471792091
>>>
>>> decoder.pkts                               | Total                     | 451535597
>>>
>>> decoder.bytes                              | Total                     | 273993787357
>>>
>>> decoder.ipv4                               | Total                     | 451533977
>>>
>>> decoder.ipv6                               | Total                     | 3194
>>>
>>> decoder.ethernet                           | Total                     | 451535597
>>>
>>> decoder.tcp                                | Total                     | 340732185
>>>
>>> decoder.udp                                | Total                     | 109126355
>>>
>>> decoder.sctp                               | Total                     | 5
>>>
>>> decoder.icmpv4                             | Total                     | 62733
>>>
>>> decoder.icmpv6                             | Total                     | 782
>>>
>>> decoder.gre                                | Total                     | 425
>>>
>>> decoder.teredo                             | Total                     | 2280
>>>
>>> decoder.avg_pkt_size                       | Total                     | 606
>>>
>>> decoder.max_pkt_size                       | Total                     | 1514
>>>
>>> defrag.ipv4.fragments                      | Total                     | 1495
>>>
>>> defrag.ipv4.reassembled                    | Total                     | 626
>>>
>>> defrag.ipv6.fragments                      | Total                     | 26
>>>
>>> tcp.sessions                               | Total                     | 9529307
>>>
>>> tcp.pseudo                                 | Total                     | 358711
>>>
>>> tcp.syn                                    | Total                     | 4198604
>>>
>>> tcp.synack                                 | Total                     | 2568583
>>>
>>> tcp.rst                                    | Total                     | 3300939
>>>
>>> tcp.reassembly_gap                         | Total                     | 3687801
>>>
>>> detect.alert                               | Total                     | 39
>>>
>>> detect.nonmpm_list                         | Total                     | 4
>>>
>>> app_layer.flow.http                        | Total                     | 435661
>>>
>>> app_layer.tx.http                          | Total                     | 1705795
>>>
>>> app_layer.tx.smtp                          | Total                     | 5009
>>>
>>> app_layer.flow.tls                         | Total                     | 245724
>>>
>>> app_layer.flow.ssh                         | Total                     | 835
>>>
>>> app_layer.flow.dcerpc_tcp                  | Total                     | 17
>>>
>>> app_layer.flow.dns_tcp                     | Total                     | 49
>>>
>>> app_layer.tx.dns_tcp                       | Total                     | 98
>>>
>>> app_layer.flow.failed_tcp                  | Total                     | 2754586
>>>
>>> app_layer.flow.dcerpc_udp                  | Total                     | 4
>>>
>>> app_layer.flow.dns_udp                     | Total                     | 265532
>>>
>>> app_layer.tx.dns_udp                       | Total                     | 281469
>>>
>>> app_layer.flow.failed_udp                  | Total                     | 2327184
>>>
>>> flow_mgr.closed_pruned                     | Total                     | 1628718
>>>
>>> flow_mgr.new_pruned                        | Total                     | 3996279
>>>
>>> flow_mgr.est_pruned                        | Total                     | 6816703
>>>
>>> flow.spare                                 | Total                     | 10278
>>>
>>> flow.tcp_reuse                             | Total                     | 204468
>>>
>>> flow_mgr.flows_checked                     | Total                     | 14525
>>>
>>> flow_mgr.flows_notimeout                   | Total                     | 13455
>>>
>>> flow_mgr.flows_timeout                     | Total                     | 1070
>>>
>>> flow_mgr.flows_timeout_inuse               | Total                     | 171
>>>
>>> flow_mgr.flows_removed                     | Total                     | 899
>>>
>>> flow_mgr.rows_checked                      | Total                     | 65536
>>>
>>> flow_mgr.rows_skipped                      | Total                     | 62883
>>>
>>> flow_mgr.rows_empty                        | Total                     | 3
>>>
>>> flow_mgr.rows_busy                         | Total                     | 1
>>>
>>> flow_mgr.rows_maxlen                       | Total                     | 15
>>>
>>> tcp.memuse                                 | Total                     | 66079224
>>>
>>> tcp.reassembly_memuse                      | Total                     | 16619040438
>>>
>>> dns.memuse                                 | Total                     | 2244149
>>>
>>> http.memuse                                | Total                     | 335827011
>>>
>>> flow.memuse                                | Total                     | 94227136
>>>
>>>
>>
>>
>> Interesting.
>> No memcaphits. Is the only change upgrade from 3.1.3 to 3.2? (nothing
>> else?)
>>
>> I would like to reproduce this.
>> Can you please share your suricata.log and your suricata.yaml (feel free to do it privately if you would like)?
>>
>> What is your start command and OS you are running on?
>>
>> Thank you
>>
>>
>>
>> --
>> Regards,
>> Peter Manev
>>
>
>
>
> --
> Regards,
> Peter Manev
>



--
Regards,
Peter Manev



More information about the Oisf-users mailing list