[Oisf-users] [EXT] Re: Inconsistent packet dropped behaviour with the same config on several nodes

Cloherty, Sean E scloherty at mitre.org
Fri Jan 11 15:54:53 UTC 2019


Going over everything carefully, I found that a recent OS updates reverted to using the RHEL ixgbe drivers.  I replaced it with the 5.5.2 Intel ones and now no drops in 36 hours.  Well, OK  it says 6 drops but I can live with that.

I still have 
tcp.pkt_on_wrong_thread                    | Total                     | 115035286

and a couple of signatures firing way too often compare to 4.0.x to look into.

-----Original Message-----
From: Cloherty, Sean E 
Sent: Wednesday, January 9, 2019 10:20 AM
To: 'Peter Manev' <petermanev at gmail.com>
Cc: lists at inliniac.net; oisf-users at lists.openinfosecfoundation.org
Subject: RE: [EXT] Re: [Oisf-users] Inconsistent packet dropped behaviour with the same config on several nodes

Thanks for pointing that out the open issue.  I will definingly get some data there today. All sensors:

- Run on Suricata 4.0.6 compiled with rust 1.30 on CentOS 7.6
- Have 32 cores and 128 GB of RAM and use Intel IXGBE 5.5.2 drivers
- Use a single RSS queue and rarely see traffic volumes of greater than 2GB/s.  
- Run in worker mode and use af-packet. 
- Isolate the CPU cores for Suricata  - isolcpus=9,10,11,12,13,14,15,24,25,26,27,28,29,30,31 in command line

The sensor showing this problem is the same except that the issue appeared when updated to 4.1.1 and 4.1.2.  It may have been present in 4.1.0 but I don't have any notes on that.  

The CPU utilization issue I mentioned earlier was my fault.  I was compiling from the history buffer and used the configure string with debugging enabled from when I was chasing that segfault. 

-----Original Message-----
From: Peter Manev <petermanev at gmail.com>
Sent: Wednesday, January 9, 2019 6:49 AM
To: Cloherty, Sean E <scloherty at mitre.org>
Cc: lists at inliniac.net; oisf-users at lists.openinfosecfoundation.org
Subject: [EXT] Re: [Oisf-users] Inconsistent packet dropped behaviour with the same config on several nodes

On Mon, Jan 7, 2019 at 11:43 PM Cloherty, Sean E <scloherty at mitre.org> wrote:
>
> Thank you Victor.  I've just picked this back up again with 4.1.2 and the packet loss seems worse.  Regarding the rust-dependent parsers, do you mean to disable the parsers or recompile without rust?
>

Is this inconsistent behavior appearing only on some sensors or on all that are upgraded?
Also  - (you have mentioned it before but wasn't sure) do you have enabled pie on all builds?
Can you share this sig IDs of the rues that fire much more with 4.1.2 than with 4.0.5?

> Is it notable that I see in the stats that  tcp.pkt_on_wrong_thread seem to be constantly  growing larger and CPU utilization on the worker threads is  very high?
>

We are tracking  something similar here -
https://redmine.openinfosecfoundation.org/issues/2725
Feel free to update with some of your set up details if you would like.

Thank you

> -----Original Message-----
> From: Oisf-users <oisf-users-bounces at lists.openinfosecfoundation.org>
> On Behalf Of Victor Julien
> Sent: Friday, October 19, 2018 10:04 AM
> To: oisf-users at lists.openinfosecfoundation.org
> Subject: Re: [Oisf-users] Inconsistent packet dropped behaviour with 
> the same config on several nodes
>
> On 19-10-18 15:52, Cloherty, Sean E wrote:
> > I am also seeing a significant difference between rc1 and rc2. From very low (sub .01% packet drops) to 20% or more using the same rule set.  A couple of differences include compiling rc2 with rust enabled, compiling with --pie, and my attempt to put the existing Suricata.yaml values into the new format.
>
> Could you disable rust to check if the difference with rc1 goes away? If so, it might be worth looking at disabling some of the new protocol parsers that are only available if Rust is enabled. With these parsers active Suricata does quite a bit more work and produces quite a bit more output. Both might lead to higher resource usage.
>
> > My 1st attempt to alleviate that situation was reverting to the older ya ml which didn't seem to help.
> >
> > In addition to the packet drops, I've noticed that certain rules have been firing with a higher frequency than they do in 4.0.5 despite the test box getting the same traffic and having the same HW, OS (CentOS 7, and NIC drivers (Intel ixgbe 5.3.7).
> >
> > Specifically about the excessive alerts - the rules that have fired on rc2 appear to firing on external_net to home_net but which  are firing on addresses that are local to local.  That  makes me wonder if the vars get parsed differerntly in rc2 since this doesn't happen with the same vars in all other IDSs.
>
> Are you able to provide a test case?
>
> Regards,
> Victor
>
>
> > I've attached stats files for with and without rules.
> >
> > -----Original Message-----
> > From: Oisf-users
> > <oisf-users-bounces at lists.openinfosecfoundation.org>
> > On Behalf Of Peter Manev
> > Sent: Friday, September 14, 2018 5:14 AM
> > To: magmi.sec at gmail.com
> > Cc: Open Information Security Foundation 
> > <oisf-users at lists.openinfosecfoundation.org>
> > Subject: Re: [Oisf-users] Inconsistent packet dropped behaviour with 
> > the same config on several nodes
> >
> > On Thu, Sep 13, 2018 at 8:48 AM Magmi A <magmi.sec at gmail.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> Yes, of course.
> >> Here you have stats.log for Node2:
> >>
> >> Date:9/13/2018--06:37:42(uptime:6d,17h05m59s)
> >> -------------------------------------------------------------------
> >> --
> >> -
> >> --------------
> >> Counter |TMName |Value
> >> -------------------------------------------------------------------
> >> --
> >> -
> >> -------------- capture.kernel_packets |Total |135374454 
> >> capture.kernel_drops |Total |11430384 decoder.pkts |Total
> >> |123946335 decoder.bytes |Total |122943123694
> >> decoder.ipv4 |Total |123767936
> >> decoder.ipv6 |Total |5036
> >> decoder.ethernet |Total |123946335
> >> decoder.tcp |Total |120082602
> >> decoder.udp |Total |683567
> >> decoder.icmpv4 |Total |114883
> >> decoder.icmpv6 |Total |189
> >> decoder.tered |Total |5
> >> decoder.avg_pkt_size |Total |991
> >> decoder.max_pkt_size |Total |1514
> >> flow.tcp |Total |780742
> >> flow.udp |Total |305951
> >> flow.icmpv6 |Total |162
> >> tcp.sessions |Total |727356
> >> tcp.syn |Total |771112
> >> tcp.synack |Total |720764
> >> tcp.rst |Total |549359
> >> tcp.stream_depth_reached |Total |454 tcp.reassembly_gap |Total
> >> |7722 tcp.overlap |Total |483624 detect.alert |Total |4
> >> app_layer.flow.http |Total |108080 app_layer.tx.http |Total |262748 
> >> app_layer.flow.smtp |Total |7 app_layer.tx.smtp |Total |7 
> >> app_layer.flow.tls |Total |6612 app_layer.flow.ssh |Total |13 
> >> app_layer.flow.smb |Total |55361 app_layer.flow.dcerpc_tcp |Total
> >> |202204 app_layer.flow.dns_tcp
> >> |Total
> >> |10 app_layer.tx.dns_tcp |Total |10 app_layer.flow.failed_tcp
> >> ||Total
> >> |274419 app_layer.flow.dcerpc_udp |Total |4 app_layer.flow.dns_udp 
> >> |Total |139325 app_layer.tx.dns_udp |Total |239577
> >> app_layer.flow.failed_udp |Total |166622 flow_mgr.closed_pruned
> >> |Total
> >> |684943 flow_mgr.new_pruned |Total |290500 flow_mgr.est_pruned
> >> ||Total
> >> |110950 flow.spare |Total |10000 flow.tcp_reuse |Total |6055
> >> flow_mgr.flows_checked |Total |12 flow_mgr.flows_notimeout |Total
> >> |5 flow_mgr.flows_timeout |Total |7 flow_mgr.flows_timeout_inuse 
> >> |Total
> >> |1 flow_mgr.flows_removed |Total |6 flow_mgr.rows_checked |Total
> >> |65536 flow_mgr.rows_skipped |Total |65522 flow_mgr.rows_empty
> >> ||Total
> >> |2 flow_mgr.rows_maxlen |Total |1 tcp.memuse |Total |4587520
> >> tcp.reassembly_memuse |Total |6650304 dns.memuse |Total |16386 
> >> http.memuse |Total |12142178 flow.memuse |Total |7207360
> >>
> >
> > Can you do a zero test and see if you run it for  a bit with 0 rules loaded ( -S /dev/null )- if you are going to have the same amount of drops percentage wise?
> >
> >
> >
> >> And just for a reference stats.log for Node1:
> >>
> >> Date:9/13/2018--06:35:39(uptime:6d,16h39m54s)
> >> -------------------------------------------------------------------
> >> --
> >> -
> >> --------------
> >> Counter |TMName |Value
> >> -------------------------------------------------------------------
> >> --
> >> -
> >> -------------- capture.kernel_packets |Total |3577965800 
> >> capture.kernel_drops |Total |3416155 decoder.pkts |Total
> >> |3574589712 decoder.bytes |Total |3536139875210 decoder.invalid 
> >> |Total |10070
> >> decoder.ipv4 |Total |3571132083
> >> decoder.ipv6 |Total |143756
> >> decoder.ethernet |Total |3574589712 decoder.tcp |Total |3522243739 
> >> decoder.udp |Total |34827953
> >> decoder.icmpv4 |Total |963831
> >> decoder.icmpv6 |Total |33551
> >> decoder.teredo |Total |1399
> >> decoder.avg_pkt_size |Total |989
> >> decoder.max_pkt_size |Total |1534
> >> flow.tcp |Total |1144524
> >> flow.udp |Total |202960
> >> flow.icmpv6 |Total |439
> >> decoder.ipv4.trunc_pkt |Total |10070 tcp.sessions |Total |341278 
> >> tcp.ssn_memcap_drop |Total |4446979 tcp.pseudo |Total |84 
> >> tcp.invalid_checksum |Total |4 tcp.syn |Total |6653717 tcp.synack
> >> |Total |2572744 tcp.rst |Total |1857715 tcp.segment_memcap_drop 
> >> |Total |10 tcp.stream_depth_reached |Total |303 tcp.reassembly_gap 
> >> |Total |95648 tcp.overlap |Total |3889304
> >> tcp.insert_data_normal_fail |Total |3314483 detect.alert |Total
> >> |518 app_layer.flow.http |Total |34820 app_layer.tx.http |Total
> >> |60759 app_layer.flow.ftp |Total |20 app_layer.flow.smtp |Total
> >> |140 app_layer.tx.smtp |Total |177 app_layer.flow.tls |Total |43356
> >> app_layer.flow.smb |Total |3430 app_layer.flow.dcerpc_tcp |Total
> >> |8894 app_layer.flow.dns_tcp |Total |48 app_layer.tx.dns_tcp |Total
> >> |46 app_layer.flow.failed_tcp |Total |107518
> >> app_layer.flow.dcerpc_udp
> >> |Total |5 app_layer.flow.dns_udp |Total |114888 
> >> |app_layer.tx.dns_udp Total |482904 app_layer.flow.failed_udp
> >> ||Total |88067
> >> flow_mgr.closed_pruned |Total |259368 flow_mgr.new_pruned |Total
> >> |981024 flow_mgr.est_pruned |Total |107531 flow.spare |Total |10000
> >> flow.tcp_reuse |Total |29932 flow_mgr.rows_checked |Total |65536 
> >> flow_mgr.rows_skipped |Total |65536 tcp.memuse |Total |4587520 
> >> tcp.reassembly_memuse |Total |655360 dns.memcap_global |Total
> >> |1086836 flow.memuse |Total |7074304
> >>
> >> Thank you for any suggestions.
> >>
> >> Best,
> >> magmi
> >>
> >> On Wed, 12 Sep 2018 at 16:16, Peter Manev <petermanev at gmail.com> wrote:
> >>>
> >>> On Wed, Sep 12, 2018 at 10:57 AM Magmi A <magmi.sec at gmail.com> wrote:
> >>>>
> >>>>
> >>>>>> * Node1 receives ~ 500Mbps of traffic (it's 1Gbps interface), 
> >>>>>> and gets in average 1-2% kernel packet dropped while
> >>>>>> * Node2 receives ~ 500kbps of traffic and gets in average 10% 
> >>>>>> kernel packet dropped
> >>>>>
> >>>>> What is different between node 1 and node 2 ? (same config/same 
> >>>>> suricata/same HW/same rules...?)]
> >>>>
> >>>>
> >>>> The nodes have the same HW, run the same config/ suricata version, have the same set of rules.
> >>>>
> >>>> The only difference is that they are exposed to different sources of traffic.
> >>>> From Wireshark analysis the protocol hierarchies for both cases seem similar - there is no spectacular difference.
> >>>>
> >>>> So really the only difference is the captured traffic itself (MACs, IPs, partly protocols, data etc).
> >>>>
> >>>> That is why we have such a problem how to approach the problem and troubleshoot it.
> >>>
> >>>
> >>> Can you share full update of the latest stats.log ?
> >>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Peter Manev
> >
> >
> >
> > --
> > Regards,
> > Peter Manev
> > _______________________________________________
> > Suricata IDS Users mailing list: 
> > oisf-users at openinfosecfoundation.org
> > Site: http://suricata-ids.org | Support:
> > http://suricata-ids.org/support/
> > List:
> > https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
> >
> > Conference: https://suricon.net
> > Trainings: https://suricata-ids.org/training/
> >
> >
> > _______________________________________________
> > Suricata IDS Users mailing list: 
> > oisf-users at openinfosecfoundation.org
> > Site: http://suricata-ids.org | Support:
> > http://suricata-ids.org/support/
> > List:
> > https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
> >
> > Conference: https://suricon.net
> > Trainings: https://suricata-ids.org/training/
> >
>
>
> --
> ---------------------------------------------
> Victor Julien
> http://www.inliniac.net/
> PGP: http://www.inliniac.net/victorjulien.asc
> ---------------------------------------------
>
> _______________________________________________
> Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
> Site: http://suricata-ids.org | Support: 
> http://suricata-ids.org/support/
> List: 
> https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>
> Conference: https://suricon.net
> Trainings: https://suricata-ids.org/training/
> _______________________________________________
> Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
> Site: http://suricata-ids.org | Support: 
> http://suricata-ids.org/support/
> List: 
> https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>
> Conference: https://suricon.net
> Trainings: https://suricata-ids.org/training/



--
Regards,
Peter Manev


More information about the Oisf-users mailing list