[Oisf-users] 3.2 - Wildly Dropping Packets

Peter Manev petermanev at gmail.com
Tue Dec 6 00:11:15 UTC 2016


On Mon, Dec 5, 2016 at 8:28 PM, Cloherty, Sean E <scloherty at mitre.org> wrote:
> Latest - at peak traffic today around noon, the drop rate was around 14% which is a vast improvement.  I have 4 other servers with the same hardware and similar traffic, all running 3.1.3 that have almost no packet drop at all.

I think the fact that you have the counters displaying no drops at all
is misleading out of suboptimal config combination.

I had gone through the yaml file provided. With the HW you have you
should be able to grind much more than 2Gbps I believe.

I noticed that you load about 1500 rules  - are those just
selected/narrowed from a bigger ruleset or you write your own rules as
well?

Some more suggestions and comments:

in the af-packet section
- the rollover option that was on - is highly experimental atm.
- buffer-size - should only be used used in the case of non mmap
sockets (regit was just updating me). ring-size is the one that
matters with af-packet and mmap - "To use the ring feature of
AF_PACKET, set 'use-mmap' to yes"  and I have only seen better results
with it.

Some more suggestions:

- in the af-packet section try 14 threads as oppose to 28. When you do
that - increase the ring-size: 150000 (as the value is per thread)
- adjust max-pending-packets: 65534

- try custom number of groups
detect:
  profile: custom
  custom-values:
    toclient-groups: 500
    toserver-groups: 500

- and port groupings

  grouping:
    tcp-whitelist: 53, 80, 139, 443, 445, 1433, 3306, 3389, 6666, 6667, 8080
    udp-whitelist: 53, 135, 5060

Increases performance as a result of a major rewrite of the detection
engine and simplifying the rule grouping in 3.1

- switch midstream and async-oneside off(false) - unless you
explicitly want it that way for a reason. Also give depth some
definitive number as opposed to 0 (unlimited)

stream:
  memcap: 12gb
  checksum-validation: no       # reject wrong csums
  inline: no                    # auto will use inline mode in IPS
mode, yes or no set it statically
  midstream: false           # don't allow midstream session pickups
  async-oneside: false
  reassembly:
    memcap: 32gb
    depth: 2mb

- switch mpm algos. ac-ks is the most performant out of the box with
hyperscan being the best when/where available (at least in our tests).

mpm-algo: ac-ks

- make sure that line stays enabled. Suricta will switch things like
gro/lro off but not all of the bellow

ethtool -K ens1f1 sg off gro off lro off tso off gso off rx off tx off
rxvlan off txvlan off

Please let us know how it goes.

Thanks


>
> -----Original Message-----
> From: Oisf-users [mailto:oisf-users-bounces at lists.openinfosecfoundation.org] On Behalf Of Cloherty, Sean E
> Sent: Friday, December 02, 2016 16:31 PM
> To: Peter Manev <petermanev at gmail.com>
> Cc: oisf-users at lists.openinfosecfoundation.org
> Subject: Re: [Oisf-users] 3.2 - Wildly Dropping Packets
>
> I've attached the latest copy after making the changes to af-packet settings as you had noted.  Although the traffic tapered off later in the day, the stats appear to have improved.  I'm puzzled that commenting out the buffers setting (which I'd set to double the default) seems to have improved performance.  One observation of note - the number of CPUs in use and % utilization increased significantly.  Previously I saw a few - maybe 5-6 running at 30-40%.  Now there were 10-14 running at 30-70 %.  Until mid-morning Monday, I don't expect to get a good read while under load.
>
> Are there any other changes that you'd recommend / yaml settings I missed ? Should I re-enable the ethtool offload setting I had remarked out?
>
> 1/12/2016 -- 13:26:03 - <Notice> - Stats for 'ens1f1':  pkts: 3022883600, drop: 2562703071 (84.78%), invalid chksum: 0
> 1/12/2016 -- 15:11:44 - <Notice> - Stats for 'ens1f1':  pkts: 1116249532, drop: 1017784357 (91.18%), invalid chksum: 0
> 1/12/2016 -- 15:14:21 - <Notice> - Stats for 'ens1f1':  pkts: 1640124, drop: 989798 (60.35%), invalid chksum: 4
> 2/12/2016 -- 13:13:56 - <Notice> - Stats for 'ens1f1':  pkts: 7249810905, drop: 229091986 (3.16%), invalid chksum: 30830
> 2/12/2016 -- 16:19:43 - <Notice> - Stats for 'ens1f1':  pkts: 1846855151, drop: 343284745 (18.59%), invalid chksum: 0
>
> Thanks again.
>
> -----Original Message-----
> From: Peter Manev [mailto:petermanev at gmail.com]
> Sent: Friday, December 02, 2016 12:23 PM
> To: Cloherty, Sean E <scloherty at mitre.org>
> Cc: oisf-users at lists.openinfosecfoundation.org
> Subject: Re: [Oisf-users] 3.2 - Wildly Dropping Packets
>
> On Fri, Dec 2, 2016 at 2:21 PM, Cloherty, Sean E <scloherty at mitre.org> wrote:
>> Thanks for the quick response -
>>
>> My startup script, suricata.log and suricata.yaml are attached. One note - the entries on the log are with a modified yaml where I was testing the unix sockets but if you ignore those errors, the output is pretty much identical to what is usually displayed.
>>
>> OS info is:
>> CentOS Linux release 7.2.1511 (Core) /  3.10.0-327.36.3.el7.x86_64 The
>> server has 16 cores / 32 threads, 128GB of RAM, has an 10Gb Intel NIC running 5.2.15-k ixgbe drivers.
>> Max traffic seen on the interface in the last 4 months has been 1.9
>> Gb/s, but usually mid-day peaks are around 1.3 Gb/s
>>
>
> I have a suggestion for you to try out - if you could.
>
> With Suricata 3.2 - in the af-packet section comment out:
> ->    rollover: yes
> ->    buffer-size: 65536
> like so
> ->    #rollover: yes
> ->    #buffer-size: 65536
>
>
> restart (or start) Suricata 3.2. Let it run for a bit and if you could please share the results from stats.log again?
>
> Thanks
>
>> Sean.
>>
>> -----Original Message-----
>> From: Peter Manev [mailto:petermanev at gmail.com]
>> Sent: Friday, December 02, 2016 02:51 AM
>> To: Cloherty, Sean E <scloherty at mitre.org>
>> Cc: oisf-users at lists.openinfosecfoundation.org
>> Subject: Re: [Oisf-users] 3.2 - Wildly Dropping Packets
>>
>> On Thu, Dec 1, 2016 at 8:51 PM, Cloherty, Sean E <scloherty at mitre.org> wrote:
>>>
>>> Thankfully this is a test box, but it has been cooking along with a
>>> less than 1% drop rate until I upgraded from 3.1.3 to 3.2
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> --------------
>>>
>>> Date: 12/1/2016 -- 13:18:53 (uptime: 0d, 04h 29m 24s)
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> --------------
>>>
>>> Counter                                    | TM Name                   | Value
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> --------------
>>>
>>> capture.kernel_packets                     | Total                     | 2926059934
>>>
>>> capture.kernel_drops                       | Total                     | 2471792091
>>>
>>> decoder.pkts                               | Total                     | 451535597
>>>
>>> decoder.bytes                              | Total                     | 273993787357
>>>
>>> decoder.ipv4                               | Total                     | 451533977
>>>
>>> decoder.ipv6                               | Total                     | 3194
>>>
>>> decoder.ethernet                           | Total                     | 451535597
>>>
>>> decoder.tcp                                | Total                     | 340732185
>>>
>>> decoder.udp                                | Total                     | 109126355
>>>
>>> decoder.sctp                               | Total                     | 5
>>>
>>> decoder.icmpv4                             | Total                     | 62733
>>>
>>> decoder.icmpv6                             | Total                     | 782
>>>
>>> decoder.gre                                | Total                     | 425
>>>
>>> decoder.teredo                             | Total                     | 2280
>>>
>>> decoder.avg_pkt_size                       | Total                     | 606
>>>
>>> decoder.max_pkt_size                       | Total                     | 1514
>>>
>>> defrag.ipv4.fragments                      | Total                     | 1495
>>>
>>> defrag.ipv4.reassembled                    | Total                     | 626
>>>
>>> defrag.ipv6.fragments                      | Total                     | 26
>>>
>>> tcp.sessions                               | Total                     | 9529307
>>>
>>> tcp.pseudo                                 | Total                     | 358711
>>>
>>> tcp.syn                                    | Total                     | 4198604
>>>
>>> tcp.synack                                 | Total                     | 2568583
>>>
>>> tcp.rst                                    | Total                     | 3300939
>>>
>>> tcp.reassembly_gap                         | Total                     | 3687801
>>>
>>> detect.alert                               | Total                     | 39
>>>
>>> detect.nonmpm_list                         | Total                     | 4
>>>
>>> app_layer.flow.http                        | Total                     | 435661
>>>
>>> app_layer.tx.http                          | Total                     | 1705795
>>>
>>> app_layer.tx.smtp                          | Total                     | 5009
>>>
>>> app_layer.flow.tls                         | Total                     | 245724
>>>
>>> app_layer.flow.ssh                         | Total                     | 835
>>>
>>> app_layer.flow.dcerpc_tcp                  | Total                     | 17
>>>
>>> app_layer.flow.dns_tcp                     | Total                     | 49
>>>
>>> app_layer.tx.dns_tcp                       | Total                     | 98
>>>
>>> app_layer.flow.failed_tcp                  | Total                     | 2754586
>>>
>>> app_layer.flow.dcerpc_udp                  | Total                     | 4
>>>
>>> app_layer.flow.dns_udp                     | Total                     | 265532
>>>
>>> app_layer.tx.dns_udp                       | Total                     | 281469
>>>
>>> app_layer.flow.failed_udp                  | Total                     | 2327184
>>>
>>> flow_mgr.closed_pruned                     | Total                     | 1628718
>>>
>>> flow_mgr.new_pruned                        | Total                     | 3996279
>>>
>>> flow_mgr.est_pruned                        | Total                     | 6816703
>>>
>>> flow.spare                                 | Total                     | 10278
>>>
>>> flow.tcp_reuse                             | Total                     | 204468
>>>
>>> flow_mgr.flows_checked                     | Total                     | 14525
>>>
>>> flow_mgr.flows_notimeout                   | Total                     | 13455
>>>
>>> flow_mgr.flows_timeout                     | Total                     | 1070
>>>
>>> flow_mgr.flows_timeout_inuse               | Total                     | 171
>>>
>>> flow_mgr.flows_removed                     | Total                     | 899
>>>
>>> flow_mgr.rows_checked                      | Total                     | 65536
>>>
>>> flow_mgr.rows_skipped                      | Total                     | 62883
>>>
>>> flow_mgr.rows_empty                        | Total                     | 3
>>>
>>> flow_mgr.rows_busy                         | Total                     | 1
>>>
>>> flow_mgr.rows_maxlen                       | Total                     | 15
>>>
>>> tcp.memuse                                 | Total                     | 66079224
>>>
>>> tcp.reassembly_memuse                      | Total                     | 16619040438
>>>
>>> dns.memuse                                 | Total                     | 2244149
>>>
>>> http.memuse                                | Total                     | 335827011
>>>
>>> flow.memuse                                | Total                     | 94227136
>>>
>>>
>>
>>
>> Interesting.
>> No memcaphits. Is the only change upgrade from 3.1.3 to 3.2? (nothing
>> else?)
>>
>> I would like to reproduce this.
>> Can you please share your suricata.log and your suricata.yaml (feel free to do it privately if you would like)?
>>
>> What is your start command and OS you are running on?
>>
>> Thank you
>>
>>
>>
>> --
>> Regards,
>> Peter Manev
>>
>
>
>
> --
> Regards,
> Peter Manev
>



-- 
Regards,
Peter Manev



More information about the Oisf-users mailing list