[Oisf-users] Inconsistent packet dropped behaviour with the same config on several nodes

Mon Jan 7 21:42:43 UTC 2019

Thank you Victor.  I've just picked this back up again with 4.1.2 and the packet loss seems worse.  Regarding the rust-dependent parsers, do you mean to disable the parsers or recompile without rust? 

Is it notable that I see in the stats that  tcp.pkt_on_wrong_thread seem to be constantly  growing larger and CPU utilization on the worker threads is  very high?  

-----Original Message-----
From: Oisf-users <oisf-users-bounces at lists.openinfosecfoundation.org> On Behalf Of Victor Julien
Sent: Friday, October 19, 2018 10:04 AM
To: oisf-users at lists.openinfosecfoundation.org
Subject: Re: [Oisf-users] Inconsistent packet dropped behaviour with the same config on several nodes

On 19-10-18 15:52, Cloherty, Sean E wrote:
> I am also seeing a significant difference between rc1 and rc2. From very low (sub .01% packet drops) to 20% or more using the same rule set.  A couple of differences include compiling rc2 with rust enabled, compiling with --pie, and my attempt to put the existing Suricata.yaml values into the new format.  

Could you disable rust to check if the difference with rc1 goes away? If so, it might be worth looking at disabling some of the new protocol parsers that are only available if Rust is enabled. With these parsers active Suricata does quite a bit more work and produces quite a bit more output. Both might lead to higher resource usage.

> My 1st attempt to alleviate that situation was reverting to the older ya ml which didn't seem to help.
> 
> In addition to the packet drops, I've noticed that certain rules have been firing with a higher frequency than they do in 4.0.5 despite the test box getting the same traffic and having the same HW, OS (CentOS 7, and NIC drivers (Intel ixgbe 5.3.7).
> 
> Specifically about the excessive alerts - the rules that have fired on rc2 appear to firing on external_net to home_net but which  are firing on addresses that are local to local.  That  makes me wonder if the vars get parsed differerntly in rc2 since this doesn't happen with the same vars in all other IDSs.  

Are you able to provide a test case?

Regards,
Victor

> I've attached stats files for with and without rules.
> 
> -----Original Message-----
> From: Oisf-users <oisf-users-bounces at lists.openinfosecfoundation.org> 
> On Behalf Of Peter Manev
> Sent: Friday, September 14, 2018 5:14 AM
> To: magmi.sec at gmail.com
> Cc: Open Information Security Foundation 
> <oisf-users at lists.openinfosecfoundation.org>
> Subject: Re: [Oisf-users] Inconsistent packet dropped behaviour with 
> the same config on several nodes
> 
> On Thu, Sep 13, 2018 at 8:48 AM Magmi A <magmi.sec at gmail.com> wrote:
>>
>> Hi Peter,
>>
>> Yes, of course.
>> Here you have stats.log for Node2:
>>
>> Date:9/13/2018--06:37:42(uptime:6d,17h05m59s)
>> ---------------------------------------------------------------------
>> -
>> --------------
>> Counter |TMName |Value
>> ---------------------------------------------------------------------
>> -
>> -------------- capture.kernel_packets |Total |135374454 
>> capture.kernel_drops |Total |11430384 decoder.pkts |Total |123946335 
>> decoder.bytes |Total |122943123694
>> decoder.ipv4 |Total |123767936
>> decoder.ipv6 |Total |5036
>> decoder.ethernet |Total |123946335
>> decoder.tcp |Total |120082602
>> decoder.udp |Total |683567
>> decoder.icmpv4 |Total |114883
>> decoder.icmpv6 |Total |189
>> decoder.tered |Total |5
>> decoder.avg_pkt_size |Total |991
>> decoder.max_pkt_size |Total |1514
>> flow.tcp |Total |780742
>> flow.udp |Total |305951
>> flow.icmpv6 |Total |162
>> tcp.sessions |Total |727356
>> tcp.syn |Total |771112
>> tcp.synack |Total |720764
>> tcp.rst |Total |549359
>> tcp.stream_depth_reached |Total |454
>> tcp.reassembly_gap |Total |7722
>> tcp.overlap |Total |483624
>> detect.alert |Total |4
>> app_layer.flow.http |Total |108080
>> app_layer.tx.http |Total |262748
>> app_layer.flow.smtp |Total |7
>> app_layer.tx.smtp |Total |7
>> app_layer.flow.tls |Total |6612
>> app_layer.flow.ssh |Total |13
>> app_layer.flow.smb |Total |55361
>> app_layer.flow.dcerpc_tcp |Total |202204 app_layer.flow.dns_tcp 
>> |Total
>> |10 app_layer.tx.dns_tcp |Total |10 app_layer.flow.failed_tcp |Total
>> |274419 app_layer.flow.dcerpc_udp |Total |4 app_layer.flow.dns_udp 
>> |Total |139325 app_layer.tx.dns_udp |Total |239577
>> app_layer.flow.failed_udp |Total |166622 flow_mgr.closed_pruned 
>> |Total
>> |684943 flow_mgr.new_pruned |Total |290500 flow_mgr.est_pruned |Total
>> |110950 flow.spare |Total |10000 flow.tcp_reuse |Total |6055
>> flow_mgr.flows_checked |Total |12 flow_mgr.flows_notimeout |Total |5 
>> flow_mgr.flows_timeout |Total |7 flow_mgr.flows_timeout_inuse |Total
>> |1 flow_mgr.flows_removed |Total |6 flow_mgr.rows_checked |Total
>> |65536 flow_mgr.rows_skipped |Total |65522 flow_mgr.rows_empty |Total
>> |2 flow_mgr.rows_maxlen |Total |1 tcp.memuse |Total |4587520
>> tcp.reassembly_memuse |Total |6650304 dns.memuse |Total |16386 
>> http.memuse |Total |12142178 flow.memuse |Total |7207360
>>
> 
> Can you do a zero test and see if you run it for  a bit with 0 rules loaded ( -S /dev/null )- if you are going to have the same amount of drops percentage wise?
> 
> 
> 
>> And just for a reference stats.log for Node1:
>>
>> Date:9/13/2018--06:35:39(uptime:6d,16h39m54s)
>> ---------------------------------------------------------------------
>> -
>> --------------
>> Counter |TMName |Value
>> ---------------------------------------------------------------------
>> -
>> -------------- capture.kernel_packets |Total |3577965800 
>> capture.kernel_drops |Total |3416155 decoder.pkts |Total |3574589712 
>> decoder.bytes |Total |3536139875210 decoder.invalid |Total |10070
>> decoder.ipv4 |Total |3571132083
>> decoder.ipv6 |Total |143756
>> decoder.ethernet |Total |3574589712
>> decoder.tcp |Total |3522243739
>> decoder.udp |Total |34827953
>> decoder.icmpv4 |Total |963831
>> decoder.icmpv6 |Total |33551
>> decoder.teredo |Total |1399
>> decoder.avg_pkt_size |Total |989
>> decoder.max_pkt_size |Total |1534
>> flow.tcp |Total |1144524
>> flow.udp |Total |202960
>> flow.icmpv6 |Total |439
>> decoder.ipv4.trunc_pkt |Total |10070
>> tcp.sessions |Total |341278
>> tcp.ssn_memcap_drop |Total |4446979
>> tcp.pseudo |Total |84
>> tcp.invalid_checksum |Total |4
>> tcp.syn |Total |6653717
>> tcp.synack |Total |2572744
>> tcp.rst |Total |1857715
>> tcp.segment_memcap_drop |Total |10
>> tcp.stream_depth_reached |Total |303
>> tcp.reassembly_gap |Total |95648
>> tcp.overlap |Total |3889304
>> tcp.insert_data_normal_fail |Total |3314483 detect.alert |Total |518 
>> app_layer.flow.http |Total |34820 app_layer.tx.http |Total |60759 
>> app_layer.flow.ftp |Total |20 app_layer.flow.smtp |Total |140 
>> app_layer.tx.smtp |Total |177 app_layer.flow.tls |Total |43356 
>> app_layer.flow.smb |Total |3430 app_layer.flow.dcerpc_tcp |Total 
>> |8894 app_layer.flow.dns_tcp |Total |48 app_layer.tx.dns_tcp |Total 
>> |46 app_layer.flow.failed_tcp |Total |107518 
>> app_layer.flow.dcerpc_udp
>> |Total |5 app_layer.flow.dns_udp |Total |114888 app_layer.tx.dns_udp 
>> |Total |482904 app_layer.flow.failed_udp |Total |88067
>> flow_mgr.closed_pruned |Total |259368 flow_mgr.new_pruned |Total
>> |981024 flow_mgr.est_pruned |Total |107531 flow.spare |Total |10000
>> flow.tcp_reuse |Total |29932 flow_mgr.rows_checked |Total |65536 
>> flow_mgr.rows_skipped |Total |65536 tcp.memuse |Total |4587520 
>> tcp.reassembly_memuse |Total |655360 dns.memcap_global |Total 
>> |1086836 flow.memuse |Total |7074304
>>
>> Thank you for any suggestions.
>>
>> Best,
>> magmi
>>
>> On Wed, 12 Sep 2018 at 16:16, Peter Manev <petermanev at gmail.com> wrote:
>>>
>>> On Wed, Sep 12, 2018 at 10:57 AM Magmi A <magmi.sec at gmail.com> wrote:
>>>>
>>>>
>>>>>> * Node1 receives ~ 500Mbps of traffic (it's 1Gbps interface), and 
>>>>>> gets in average 1-2% kernel packet dropped while
>>>>>> * Node2 receives ~ 500kbps of traffic and gets in average 10% 
>>>>>> kernel packet dropped
>>>>>
>>>>> What is different between node 1 and node 2 ? (same config/same 
>>>>> suricata/same HW/same rules...?)]
>>>>
>>>>
>>>> The nodes have the same HW, run the same config/ suricata version, have the same set of rules.
>>>>
>>>> The only difference is that they are exposed to different sources of traffic.
>>>> From Wireshark analysis the protocol hierarchies for both cases seem similar - there is no spectacular difference.
>>>>
>>>> So really the only difference is the captured traffic itself (MACs, IPs, partly protocols, data etc).
>>>>
>>>> That is why we have such a problem how to approach the problem and troubleshoot it.
>>>
>>>
>>> Can you share full update of the latest stats.log ?
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Peter Manev
> 
> 
> 
> --
> Regards,
> Peter Manev
> _______________________________________________
> Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
> Site: http://suricata-ids.org | Support: 
> http://suricata-ids.org/support/
> List: 
> https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
> 
> Conference: https://suricon.net
> Trainings: https://suricata-ids.org/training/
> 
> 
> _______________________________________________
> Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
> Site: http://suricata-ids.org | Support: 
> http://suricata-ids.org/support/
> List: 
> https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
> 
> Conference: https://suricon.net
> Trainings: https://suricata-ids.org/training/
> 

--
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------

_______________________________________________
Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
Site: http://suricata-ids.org | Support: http://suricata-ids.org/support/
List: https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users

Conference: https://suricon.net
Trainings: https://suricata-ids.org/training/