[Oisf-users] [EXT] Re: Inconsistent packet dropped behaviour with the same config on several nodes

Peter Manev petermanev at gmail.com
Wed Feb 13 19:30:20 UTC 2019


On Thu, Feb 7, 2019 at 11:34 PM Cloherty, Sean E <scloherty at mitre.org> wrote:
>
> Peter,
>
> The rule that is firing out of control on 4.1.x but not on 4.0.6 is:
>
> alert tls any any -> any any (msg:"SURICATA TLS invalid heartbeat encountered, possible exploit attempt (heartbleed)"; flow:established; app-layer-event:tls.invalid_heartbeat_message; flowint:tls.anomaly.count,+,1; classtype:protocol-command-decode; reference:cve,2014-0160; sid:2230013; rev:1;)
>
> I see about 24 alerts in 48 hours on the production 4.0.6 host.  For the same src/dst/time I see about the number of alerts on 4.1.2 host.  So that is good.
>
> The total of alerts on the 4.1.2 host in that time period is 7707.  The alerts which don't match those on 4.0.6, are all between our proxies and user laptops but only when attached to our network via the VPN.
>
> I am going to rebuild our test server back to 4.0.6 (See Eric Urban thread) and see if the issue persists.  I don't plan to change the yaml / vars / startup at all to keep things as similar as possible.

Ok - please keep us posted.
Thank you for your feedback!

>
> Sean
>
> -----Original Message-----
> From: Peter Manev <petermanev at gmail.com>
> Sent: Monday, January 21, 2019 3:13 AM
> To: Cloherty, Sean E <scloherty at mitre.org>
> Cc: lists at inliniac.net; oisf-users at lists.openinfosecfoundation.org
> Subject: Re: [EXT] Re: [Oisf-users] Inconsistent packet dropped behaviour with the same config on several nodes
>
> On Thu, Jan 17, 2019 at 8:05 PM Cloherty, Sean E <scloherty at mitre.org> wrote:
> >
> > Peter,
> >
> > Answers to your questions -
> >
> > 1. I use af-packet/workers/cluster-flow on all of my hosts.
> >
> > 2. The rule(s) firing way too often fire >99% of the time on traffic from VPN connected laptops to our proxies.  Since it doesn’t fire on the same traffic on 4.0.6 hosts with the same vars and rules and traffic, one thought is that something changed in the way Suricata parses the network address variables.  In my case I wonder about double negation required where our tap is located just before the proxy.  Simplified example:
> >
>
> Is it a custom sig ? or ET/Open/Pro etc?
>
> > PROXYB=[IP1,IP2,IP3]   <These IPs are within the ranges of CIDR1>
> > PROXYM=[IP4,IP5,IP6] <These IPs are within the ranges of CIDR2>
> > HOME_NET = [CIDR1,CIDR2,CIDR3,!$PROXYB,$PROXYM]
> > EXTERNAL_NET: "!$HOME_NET"
> >
> > Is there a more elegant way to express this? While on the topic of variables, is there a rule about when to use quotes vs. quotes and brackets around the values ?  Didn't see any reference in the docs.
> >
>
> I have  a very similar set up - works quite good there.
> Can you reproduce this behavior - the sigs that fire too often 99%  on VPN traffic - with  a pcap ?
>
> Thank you
>
> > Still seeing some odd variance now between my two 4.1.2 test hosts with the host that has more traffic and fewer CPUs having no drops and the one with more CPU and less traffic going as high as 15% with the same yaml / vars /etc.  Even did a diff on --dump-config and see little difference.
> > -----Original Message-----
> > From: Peter Manev <petermanev at gmail.com>
> > Sent: Tuesday, January 15, 2019 9:16 AM
> > To: Cloherty, Sean E <scloherty at mitre.org>
> > Cc: lists at inliniac.net; oisf-users at lists.openinfosecfoundation.org
> > Subject: Re: [EXT] Re: [Oisf-users] Inconsistent packet dropped
> > behaviour with the same config on several nodes
> >
> > On Fri, Jan 11, 2019 at 4:54 PM Cloherty, Sean E <scloherty at mitre.org> wrote:
> > >
> > > Going over everything carefully, I found that a recent OS updates reverted to using the RHEL ixgbe drivers.  I replaced it with the 5.5.2 Intel ones and now no drops in 36 hours.  Well, OK  it says 6 drops but I can live with that.
> > >
> >
> > Thank you for the feedback !
> >
> > > I still have
> > > tcp.pkt_on_wrong_thread                    | Total                     | 115035286
> > >
> >
> >  Do you use "cluster-type: cluster_flow"  for af-packet
> > https://github.com/OISF/suricata/blob/master/suricata.yaml.in#L644  ?
> >
> > > and a couple of signatures firing way too often compare to 4.0.x to look into.
> > >
> >
> > If possible could you please share the sig IDs? (after confirming they
> > fire way too often than they should)
> >
> > Thank you
> >
> >
> > > -----Original Message-----
> > > From: Cloherty, Sean E
> > > Sent: Wednesday, January 9, 2019 10:20 AM
> > > To: 'Peter Manev' <petermanev at gmail.com>
> > > Cc: lists at inliniac.net; oisf-users at lists.openinfosecfoundation.org
> > > Subject: RE: [EXT] Re: [Oisf-users] Inconsistent packet dropped
> > > behaviour with the same config on several nodes
> > >
> > > Thanks for pointing that out the open issue.  I will definingly get some data there today. All sensors:
> > >
> > > - Run on Suricata 4.0.6 compiled with rust 1.30 on CentOS 7.6
> > > - Have 32 cores and 128 GB of RAM and use Intel IXGBE 5.5.2 drivers
> > > - Use a single RSS queue and rarely see traffic volumes of greater than 2GB/s.
> > > - Run in worker mode and use af-packet.
> > > - Isolate the CPU cores for Suricata  -
> > > isolcpus=9,10,11,12,13,14,15,24,25,26,27,28,29,30,31 in command line
> > >
> > > The sensor showing this problem is the same except that the issue appeared when updated to 4.1.1 and 4.1.2.  It may have been present in 4.1.0 but I don't have any notes on that.
> > >
> > > The CPU utilization issue I mentioned earlier was my fault.  I was compiling from the history buffer and used the configure string with debugging enabled from when I was chasing that segfault.
> > >
> > > -----Original Message-----
> > > From: Peter Manev <petermanev at gmail.com>
> > > Sent: Wednesday, January 9, 2019 6:49 AM
> > > To: Cloherty, Sean E <scloherty at mitre.org>
> > > Cc: lists at inliniac.net; oisf-users at lists.openinfosecfoundation.org
> > > Subject: [EXT] Re: [Oisf-users] Inconsistent packet dropped
> > > behaviour with the same config on several nodes
> > >
> > > On Mon, Jan 7, 2019 at 11:43 PM Cloherty, Sean E <scloherty at mitre.org> wrote:
> > > >
> > > > Thank you Victor.  I've just picked this back up again with 4.1.2 and the packet loss seems worse.  Regarding the rust-dependent parsers, do you mean to disable the parsers or recompile without rust?
> > > >
> > >
> > > Is this inconsistent behavior appearing only on some sensors or on all that are upgraded?
> > > Also  - (you have mentioned it before but wasn't sure) do you have enabled pie on all builds?
> > > Can you share this sig IDs of the rues that fire much more with 4.1.2 than with 4.0.5?
> > >
> > > > Is it notable that I see in the stats that  tcp.pkt_on_wrong_thread seem to be constantly  growing larger and CPU utilization on the worker threads is  very high?
> > > >
> > >
> > > We are tracking  something similar here -
> > > https://redmine.openinfosecfoundation.org/issues/2725
> > > Feel free to update with some of your set up details if you would like.
> > >
> > > Thank you
> > >
> > > > -----Original Message-----
> > > > From: Oisf-users
> > > > <oisf-users-bounces at lists.openinfosecfoundation.org>
> > > > On Behalf Of Victor Julien
> > > > Sent: Friday, October 19, 2018 10:04 AM
> > > > To: oisf-users at lists.openinfosecfoundation.org
> > > > Subject: Re: [Oisf-users] Inconsistent packet dropped behaviour
> > > > with the same config on several nodes
> > > >
> > > > On 19-10-18 15:52, Cloherty, Sean E wrote:
> > > > > I am also seeing a significant difference between rc1 and rc2. From very low (sub .01% packet drops) to 20% or more using the same rule set.  A couple of differences include compiling rc2 with rust enabled, compiling with --pie, and my attempt to put the existing Suricata.yaml values into the new format.
> > > >
> > > > Could you disable rust to check if the difference with rc1 goes away? If so, it might be worth looking at disabling some of the new protocol parsers that are only available if Rust is enabled. With these parsers active Suricata does quite a bit more work and produces quite a bit more output. Both might lead to higher resource usage.
> > > >
> > > > > My 1st attempt to alleviate that situation was reverting to the older ya ml which didn't seem to help.
> > > > >
> > > > > In addition to the packet drops, I've noticed that certain rules have been firing with a higher frequency than they do in 4.0.5 despite the test box getting the same traffic and having the same HW, OS (CentOS 7, and NIC drivers (Intel ixgbe 5.3.7).
> > > > >
> > > > > Specifically about the excessive alerts - the rules that have fired on rc2 appear to firing on external_net to home_net but which  are firing on addresses that are local to local.  That  makes me wonder if the vars get parsed differerntly in rc2 since this doesn't happen with the same vars in all other IDSs.
> > > >
> > > > Are you able to provide a test case?
> > > >
> > > > Regards,
> > > > Victor
> > > >
> > > >
> > > > > I've attached stats files for with and without rules.
> > > > >
> > > > > -----Original Message-----
> > > > > From: Oisf-users
> > > > > <oisf-users-bounces at lists.openinfosecfoundation.org>
> > > > > On Behalf Of Peter Manev
> > > > > Sent: Friday, September 14, 2018 5:14 AM
> > > > > To: magmi.sec at gmail.com
> > > > > Cc: Open Information Security Foundation
> > > > > <oisf-users at lists.openinfosecfoundation.org>
> > > > > Subject: Re: [Oisf-users] Inconsistent packet dropped behaviour
> > > > > with the same config on several nodes
> > > > >
> > > > > On Thu, Sep 13, 2018 at 8:48 AM Magmi A <magmi.sec at gmail.com> wrote:
> > > > >>
> > > > >> Hi Peter,
> > > > >>
> > > > >> Yes, of course.
> > > > >> Here you have stats.log for Node2:
> > > > >>
> > > > >> Date:9/13/2018--06:37:42(uptime:6d,17h05m59s)
> > > > >> ---------------------------------------------------------------
> > > > >> --
> > > > >> --
> > > > >> --
> > > > >> -
> > > > >> --------------
> > > > >> Counter |TMName |Value
> > > > >> ---------------------------------------------------------------
> > > > >> --
> > > > >> --
> > > > >> --
> > > > >> -
> > > > >> -------------- capture.kernel_packets |Total |135374454
> > > > >> capture.kernel_drops |Total |11430384 decoder.pkts |Total
> > > > >> |123946335 decoder.bytes |Total |122943123694
> > > > >> decoder.ipv4 |Total |123767936
> > > > >> decoder.ipv6 |Total |5036
> > > > >> decoder.ethernet |Total |123946335 decoder.tcp |Total
> > > > >> |120082602 decoder.udp |Total |683567
> > > > >> decoder.icmpv4 |Total |114883
> > > > >> decoder.icmpv6 |Total |189
> > > > >> decoder.tered |Total |5
> > > > >> decoder.avg_pkt_size |Total |991 decoder.max_pkt_size |Total
> > > > >> |1514 flow.tcp |Total |780742 flow.udp |Total |305951
> > > > >> flow.icmpv6 |Total |162
> > > > >> tcp.sessions |Total |727356
> > > > >> tcp.syn |Total |771112
> > > > >> tcp.synack |Total |720764
> > > > >> tcp.rst |Total |549359
> > > > >> tcp.stream_depth_reached |Total |454 tcp.reassembly_gap |Total
> > > > >> |7722 tcp.overlap |Total |483624 detect.alert |Total |4
> > > > >> app_layer.flow.http |Total |108080 app_layer.tx.http |Total
> > > > >> |262748 app_layer.flow.smtp |Total |7 app_layer.tx.smtp |Total
> > > > >> ||7
> > > > >> app_layer.flow.tls |Total |6612 app_layer.flow.ssh |Total |13
> > > > >> app_layer.flow.smb |Total |55361 app_layer.flow.dcerpc_tcp
> > > > >> |Total
> > > > >> |202204 app_layer.flow.dns_tcp
> > > > >> |Total
> > > > >> |10 app_layer.tx.dns_tcp |Total |10 app_layer.flow.failed_tcp
> > > > >> ||Total
> > > > >> |274419 app_layer.flow.dcerpc_udp |Total |4
> > > > >> |app_layer.flow.dns_udp Total |139325 app_layer.tx.dns_udp
> > > > >> ||Total
> > > > >> ||239577
> > > > >> app_layer.flow.failed_udp |Total |166622 flow_mgr.closed_pruned
> > > > >> |Total
> > > > >> |684943 flow_mgr.new_pruned |Total |290500 flow_mgr.est_pruned
> > > > >> ||Total
> > > > >> |110950 flow.spare |Total |10000 flow.tcp_reuse |Total |6055
> > > > >> flow_mgr.flows_checked |Total |12 flow_mgr.flows_notimeout
> > > > >> |Total
> > > > >> |5 flow_mgr.flows_timeout |Total |7
> > > > >> |flow_mgr.flows_timeout_inuse Total
> > > > >> |1 flow_mgr.flows_removed |Total |6 flow_mgr.rows_checked
> > > > >> ||Total
> > > > >> |65536 flow_mgr.rows_skipped |Total |65522 flow_mgr.rows_empty
> > > > >> ||Total
> > > > >> |2 flow_mgr.rows_maxlen |Total |1 tcp.memuse |Total |4587520
> > > > >> tcp.reassembly_memuse |Total |6650304 dns.memuse |Total |16386
> > > > >> http.memuse |Total |12142178 flow.memuse |Total |7207360
> > > > >>
> > > > >
> > > > > Can you do a zero test and see if you run it for  a bit with 0 rules loaded ( -S /dev/null )- if you are going to have the same amount of drops percentage wise?
> > > > >
> > > > >
> > > > >
> > > > >> And just for a reference stats.log for Node1:
> > > > >>
> > > > >> Date:9/13/2018--06:35:39(uptime:6d,16h39m54s)
> > > > >> ---------------------------------------------------------------
> > > > >> --
> > > > >> --
> > > > >> --
> > > > >> -
> > > > >> --------------
> > > > >> Counter |TMName |Value
> > > > >> ---------------------------------------------------------------
> > > > >> --
> > > > >> --
> > > > >> --
> > > > >> -
> > > > >> -------------- capture.kernel_packets |Total |3577965800
> > > > >> capture.kernel_drops |Total |3416155 decoder.pkts |Total
> > > > >> |3574589712 decoder.bytes |Total |3536139875210 decoder.invalid
> > > > >> |Total |10070
> > > > >> decoder.ipv4 |Total |3571132083
> > > > >> decoder.ipv6 |Total |143756
> > > > >> decoder.ethernet |Total |3574589712 decoder.tcp |Total
> > > > >> |3522243739 decoder.udp |Total |34827953
> > > > >> decoder.icmpv4 |Total |963831
> > > > >> decoder.icmpv6 |Total |33551
> > > > >> decoder.teredo |Total |1399
> > > > >> decoder.avg_pkt_size |Total |989 decoder.max_pkt_size |Total
> > > > >> |1534 flow.tcp |Total |1144524 flow.udp |Total |202960
> > > > >> flow.icmpv6 |Total |439
> > > > >> decoder.ipv4.trunc_pkt |Total |10070 tcp.sessions |Total
> > > > >> |341278 tcp.ssn_memcap_drop |Total |4446979 tcp.pseudo |Total
> > > > >> |84 tcp.invalid_checksum |Total |4 tcp.syn |Total |6653717
> > > > >> tcp.synack
> > > > >> |Total |2572744 tcp.rst |Total |1857715 tcp.segment_memcap_drop
> > > > >> |Total |10 tcp.stream_depth_reached |Total |303
> > > > >> |tcp.reassembly_gap Total |95648 tcp.overlap |Total |3889304
> > > > >> tcp.insert_data_normal_fail |Total |3314483 detect.alert |Total
> > > > >> |518 app_layer.flow.http |Total |34820 app_layer.tx.http |Total
> > > > >> |60759 app_layer.flow.ftp |Total |20 app_layer.flow.smtp |Total
> > > > >> |140 app_layer.tx.smtp |Total |177 app_layer.flow.tls |Total
> > > > >> ||43356
> > > > >> app_layer.flow.smb |Total |3430 app_layer.flow.dcerpc_tcp
> > > > >> |Total
> > > > >> |8894 app_layer.flow.dns_tcp |Total |48 app_layer.tx.dns_tcp
> > > > >> ||Total
> > > > >> |46 app_layer.flow.failed_tcp |Total |107518
> > > > >> app_layer.flow.dcerpc_udp
> > > > >> |Total |5 app_layer.flow.dns_udp |Total |114888
> > > > >> |app_layer.tx.dns_udp Total |482904 app_layer.flow.failed_udp
> > > > >> ||Total |88067
> > > > >> flow_mgr.closed_pruned |Total |259368 flow_mgr.new_pruned
> > > > >> |Total
> > > > >> |981024 flow_mgr.est_pruned |Total |107531 flow.spare |Total
> > > > >> ||10000
> > > > >> flow.tcp_reuse |Total |29932 flow_mgr.rows_checked |Total
> > > > >> |65536 flow_mgr.rows_skipped |Total |65536 tcp.memuse |Total
> > > > >> |4587520 tcp.reassembly_memuse |Total |655360 dns.memcap_global
> > > > >> |Total
> > > > >> |1086836 flow.memuse |Total |7074304
> > > > >>
> > > > >> Thank you for any suggestions.
> > > > >>
> > > > >> Best,
> > > > >> magmi
> > > > >>
> > > > >> On Wed, 12 Sep 2018 at 16:16, Peter Manev <petermanev at gmail.com> wrote:
> > > > >>>
> > > > >>> On Wed, Sep 12, 2018 at 10:57 AM Magmi A <magmi.sec at gmail.com> wrote:
> > > > >>>>
> > > > >>>>
> > > > >>>>>> * Node1 receives ~ 500Mbps of traffic (it's 1Gbps
> > > > >>>>>> interface), and gets in average 1-2% kernel packet dropped
> > > > >>>>>> while
> > > > >>>>>> * Node2 receives ~ 500kbps of traffic and gets in average
> > > > >>>>>> 10% kernel packet dropped
> > > > >>>>>
> > > > >>>>> What is different between node 1 and node 2 ? (same
> > > > >>>>> config/same suricata/same HW/same rules...?)]
> > > > >>>>
> > > > >>>>
> > > > >>>> The nodes have the same HW, run the same config/ suricata version, have the same set of rules.
> > > > >>>>
> > > > >>>> The only difference is that they are exposed to different sources of traffic.
> > > > >>>> From Wireshark analysis the protocol hierarchies for both cases seem similar - there is no spectacular difference.
> > > > >>>>
> > > > >>>> So really the only difference is the captured traffic itself (MACs, IPs, partly protocols, data etc).
> > > > >>>>
> > > > >>>> That is why we have such a problem how to approach the problem and troubleshoot it.
> > > > >>>
> > > > >>>
> > > > >>> Can you share full update of the latest stats.log ?
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> Regards,
> > > > >>> Peter Manev
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Peter Manev
> > > > > _______________________________________________
> > > > > Suricata IDS Users mailing list:
> > > > > oisf-users at openinfosecfoundation.org
> > > > > Site: http://suricata-ids.org | Support:
> > > > > http://suricata-ids.org/support/
> > > > > List:
> > > > > https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-us
> > > > > er
> > > > > s
> > > > >
> > > > > Conference: https://suricon.net
> > > > > Trainings: https://suricata-ids.org/training/
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Suricata IDS Users mailing list:
> > > > > oisf-users at openinfosecfoundation.org
> > > > > Site: http://suricata-ids.org | Support:
> > > > > http://suricata-ids.org/support/
> > > > > List:
> > > > > https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-us
> > > > > er
> > > > > s
> > > > >
> > > > > Conference: https://suricon.net
> > > > > Trainings: https://suricata-ids.org/training/
> > > > >
> > > >
> > > >
> > > > --
> > > > ---------------------------------------------
> > > > Victor Julien
> > > > http://www.inliniac.net/
> > > > PGP: http://www.inliniac.net/victorjulien.asc
> > > > ---------------------------------------------
> > > >
> > > > _______________________________________________
> > > > Suricata IDS Users mailing list:
> > > > oisf-users at openinfosecfoundation.org
> > > > Site: http://suricata-ids.org | Support:
> > > > http://suricata-ids.org/support/
> > > > List:
> > > > https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-user
> > > > s
> > > >
> > > > Conference: https://suricon.net
> > > > Trainings: https://suricata-ids.org/training/
> > > > _______________________________________________
> > > > Suricata IDS Users mailing list:
> > > > oisf-users at openinfosecfoundation.org
> > > > Site: http://suricata-ids.org | Support:
> > > > http://suricata-ids.org/support/
> > > > List:
> > > > https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-user
> > > > s
> > > >
> > > > Conference: https://suricon.net
> > > > Trainings: https://suricata-ids.org/training/
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Peter Manev
> >
> >
> >
> > --
> > Regards,
> > Peter Manev
>
>
>
> --
> Regards,
> Peter Manev



-- 
Regards,
Peter Manev


More information about the Oisf-users mailing list