[Oisf-users] Performance with 3x600Mbps and up to 10Gbps

Jozef Mlích Jozef.Mlich at trustport.com
Tue Oct 20 12:19:48 UTC 2015


On Mon, 2015-10-19 at 17:03 +0200, Victor Julien wrote:
> On 16-10-15 13:34, Jozef Mlích wrote:
> > Dear suricata users,
> > 
> > we have done performance testing for 3x600 MBits on selks and it is
> > working well with runmode: workers, but not with autofp. We reach
> > about
> > 0% packet drop with workers mode, but about 50% packet drop with
> > autofp
> > mode.
> > 
> > For some reasons, we need to use autofp mode, but we are interested
> > also about other modes.
> 
> Why do you need autofp? Workers is known to be much better for
> performance. It has less lock contention, better cache locality and 
> no queuing.

I have some extra processing in autofp mode. It is not easy to get rid
of it.

> 
> > We were using suricata from SELKS (which is something between 2.0.9
> > and
> > current master). Additionally, we made tests also on CentOS7 with
> > kernel 3.10+ and compiled upstream suricata (master branch) from
> > last
> > week with same results. 
> > 
> > Our configuration is almost the default one for SELKS except:
> >   - af-packet cluster_type=cluster_round_robin
> 
> You'll really have to use cluster_flow instead (or cluster_cpu if you
> *really* know what you're doing). The same thread needs to be fed
> packets from the same flow.

I was playing also with cluster_flow and cluster_cpu types, but I
haven't seen significant improvement. I was using mainly cluster_flow. 

The utilization of machine and throughput (number of drops) was better
in one case with workers and cluster_round_robin cluster type.

> 
> Cheers,
> Victor
> 
> >   - af-packet threads 16
> >   - detect-engine profile
> >   - we have about 26k rules
> >   - we use just eve-log (json) as output, unified is disabled
> > 
> > Results:
> > 
> > runmode: workers; # 1st scenario
> > - packets replayed on speed 600 Mbit/s with 3 interfaces using
> > tcpreplay (i.e. 3x600Mbps)
> > - duration: 11m 58s
> > - threads: 48+4
> > - avg load: 12.76 11.22 6.87
> > - packet drop 0,57%, packets: 305908678 drops: 1748851	
> > 
> > runmode: autofp: # 2nd scenario
> > - packets replayed on speed 600 Mbit/s with 3 interfaces using
> > tcpreplay (i.e. 3x600Mbps)
> > - duration: 11m 57s
> > - threads: 60+4
> > - avg load: 40.85 30.08 23.66
> > - packet drop 53,91 packets: 302916555 drops: 163309837
> > 
> > hardware:
> > - Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz (12 cores with HT
> > enabled =
> > 24 cores)
> > - Memory (4x16 DIMM @ 2133MHz = 64GB)
> > - NICs 4x Broadcom BCM5720/1GBit/PCIExpress (it should manage 17
> > threads per one dualport card devs)
> > 
> > 
> > We have tried tuning of following settings based on links below [1
> > -4]:
> >   - detect-engine, high profile and custom values
> >   - max-pending-packets
> >   - mpm-algo: ac (almost every test)
> >     - but no acceleration (cuda, napatech)
> >   - af-packet
> >     - flow-type cluster_flow/cluster_round_robin
> >     - ring_size
> >     - buffer_size
> >     - thread count
> >     - mtu
> >     - rollover [4]
> >   - sysctl 
> >     - rmem_max 
> >     - tcp_timestamp
> >     - tcp_sack
> >     - tcp_no_metrics_save
> >     - tcp_window_scaling
> >     - netdev_max_backlog
> >   - NIC offloading disabled (via ethtool) 
> > 
> > 
> > 
> > What hardware do you recommend for our configuration and following
> > scenarios:
> > - 1.5Gbps traffic with 30k rules
> > - 10Gbps traffic with 30k rules
> > 
> > Also I am not sure about memory requirements. I read some
> > information
> > on links [3] and [5] and I would like to know more about setting up
> > upper memory limit.
> > 
> > It seems the memory usage is computed as follows:
> >   [af-packet memory allocations] + [detect-engine memory
> > allocation] + 
> >   ??? 
> > 
> > The af-packet memory allocations is given by:
> > af-packet = threads * mtu * ring_size
> > 
> > The max_pending_packets should be also in this formula, but I am
> > not
> > sure where.
> > 
> > The detect engine memory usage is given by multiplication and
> > addition
> > of 8 values, which could be configured for custom profile. 
> > detect-engine.profile=high == ?
> > detect-engine.profile=low == ?
> >   
> > 
> > 
> > [1] 
> > https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Hi
> > gh_Performance_Configuration
> > [2] 
> > https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Tu
> > ning_Considerations
> > [3] 
> > https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Su
> > ricatayaml
> > [4] 
> > https://lists.openinfosecfoundation.org/pipermail/oisf-users/2015-O
> > ctober/005243.html
> > [5] 
> > http://pevma.blogspot.cz/2014/05/playing-with-memory-consumption.ht
> > ml
> > 
> > 
> > regards,
> > 
> 
> 
-- 
Jozef Mlich <Jozef.Mlich at trustport.com>


More information about the Oisf-users mailing list