[Oisf-users] Unbalanced load on AFpacket threads
Fernando Sclavo
fsclavo at gmail.com
Mon Jun 3 18:38:04 UTC 2013
Off course. Attached is one record of stats.log.
It looks like there is only one thread per NIC doing (almost) all the job:
top - 15:36:56 up 16 min, 2 users, load average: 2.22, 2.71, 2.63
Tasks: 277 total, 3 running, 274 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.4%us, 0.0%sy, 1.3%ni, 93.7%id, 0.0%wa, 0.0%hi, 1.6%si,
0.0%st
Mem: 198002932k total, 59683604k used, 138319328k free, 30632k buffers
Swap: 15624188k total, 0k used, 15624188k free, 296972k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
* 2731 root 18 -2 55.8g 53g 51g R 99.9 28.6 8:05.85
AFPacketeth71
2715 root 22 2 55.8g 53g 51g R 84.6 28.6 6:49.80
AFPacketeth51 *
2747 root 20 0 55.8g 53g 51g S 15.9 28.6 1:25.74
FlowManagerThre
2558 root 20 0 102m 6728 1232 S 0.5 0.0 0:01.76
barnyard2
2740 root 18 -2 55.8g 53g 51g S 0.5 28.6 0:00.90
AFPacketeth710
1 root 20 0 24460 2340 1352 S 0.0 0.0 0:03.91
init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00
kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:03.22
ksoftirqd/0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.00
migration/0
7 root RT 0 0 0 0 S 0.0 0.0 0:00.07
watchdog/0
Thanks a lot!
2013/6/3 Victor Julien <lists at inliniac.net>
> On 06/03/2013 07:52 PM, Fernando Sclavo wrote:
> > We set "cluster-type: cluster_cpu" as suggested and CPU load lowered
> > from 30% (average) to 5%!! But, the unbalance is still there. Also the
> > UDP traffic is balanced now (sudo ethtool -N eth7 rx-flow-hash udp4
> sdfn).
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> > COMMAND
> >
> > 2299 root 22 2 55.8g 53g 51g R 80.1 28.6 5:12.04
> > AFPacketeth51
> >
> > 2331 root 20 0 55.8g 53g 51g R 19.9 28.6 1:19.97
> > FlowManagerThre
> >
> > 2324 root 18 -2 55.8g 53g 51g S 16.4 28.6 1:13.06
> > AFPacketeth710
> >
> > 2315 root 18 -2 55.8g 53g 51g S 11.9 28.6 0:49.75
> > AFPacketeth71
> >
> > 2328 root 18 -2 55.8g 53g 51g S 11.9 28.6 0:55.22
> > AFPacketeth714
> >
> > 2316 root 18 -2 55.8g 53g 51g S 10.9 28.6 0:54.53
> > AFPacketeth72
> >
> > 2326 root 18 -2 55.8g 53g 51g S 10.9 28.6 0:45.33
> > AFPacketeth712
> >
> > 2317 root 18 -2 55.8g 53g 51g S 10.4 28.6 0:38.21
> > AFPacketeth73
> >
> > 2323 root 18 -2 55.8g 53g 51g S 9.9 28.6 0:44.72
> > AFPacketeth79
> >
> >
> > Dropped kernel packets:
> >
> > capture.kernel_drops | AFPacketeth51 | 449774742
> > capture.kernel_drops | AFPacketeth52 | 48573
> > capture.kernel_drops | AFPacketeth53 | 104763
> > capture.kernel_drops | AFPacketeth54 | 108080
> > capture.kernel_drops | AFPacketeth55 | 95763
> > capture.kernel_drops | AFPacketeth56 | 105133
> > capture.kernel_drops | AFPacketeth57 | 103984
> > capture.kernel_drops | AFPacketeth58 | 100208
> > capture.kernel_drops | AFPacketeth59 | 86704
> > capture.kernel_drops | AFPacketeth510 | 95995
> > capture.kernel_drops | AFPacketeth511 | 89633
> > capture.kernel_drops | AFPacketeth512 | 94029
> > capture.kernel_drops | AFPacketeth513 | 95192
> > capture.kernel_drops | AFPacketeth514 | 106460
> > capture.kernel_drops | AFPacketeth515 | 109770
> > capture.kernel_drops | AFPacketeth516 | 108373
>
> Can you share a full record from the stats.log?
>
> Cheers,
> Victor
>
> >
> > idsuser at suricata:/var/log/suricata$ cat /etc/rc.local
> > #!/bin/sh -e
> > #
> > # rc.local
> > #
> > # This script is executed at the end of each multiuser runlevel.
> > # Make sure that the script will "exit 0" on success or any other
> > # value on error.
> > #
> > # In order to enable or disable this script just change the execution
> > # bits.
> > #
> > # By default this script does nothing.
> >
> > sudo sysctl -w net.core.rmem_max=536870912
> > sudo sysctl -w net.core.wmem_max=67108864
> > sudo sysctl -w net.ipv4.tcp_window_scaling=1
> > sudo sysctl -w net.core.netdev_max_backlog=1000000
> >
> > # Seteo tamaño de MMRBC en bus en 4K
> > sudo setpci -d 8086:10fb e6.b=2e
> >
> > # sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 67108864"
> > # sudo sysctl -w net.ipv4.tcp_wmem="4096 87380 67108864"
> >
> > sleep 2
> > sudo rmmod ixgbe
> > sleep 2
> > sudo insmod
> >
> /lib/modules/3.2.0-45-generic/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko
> > FdirPballoc=3,3,3,3 RSS=16,16,16,16 DCA=2,2,2,2
> > sleep 2
> >
> > # Seteo ring size
> > # sudo ethtool -G eth4 rx 4096
> > sudo ethtool -G eth5 rx 4096
> > # sudo ethtool -G eth6 rx 4096
> > sudo ethtool -G eth7 rx 4096
> >
> > # Balanceo de carga de flows UDP
> > # sudo ethtool -N eth4 rx-flow-hash udp4 sdfn
> > sudo ethtool -N eth5 rx-flow-hash udp4 sdfn
> > # sudo ethtool -N eth6 rx-flow-hash udp4 sdfn
> > sudo ethtool -N eth7 rx-flow-hash udp4 sdfn
> >
> > sleep 2
> > sudo ksh /home/idsuser/ixgbe-3.14.5/scripts/set_irq_affinity eth4 eth5
> > eth6 eth7
> > sleep 2
> > # sudo ifconfig eth4 up && sleep 1
> > sudo ifconfig eth5 up && sleep 1
> > # sudo ifconfig eth6 up && sleep 1
> > sudo ifconfig eth7 up && sleep 1
> > sleep 5
> > sudo suricata -D -c /etc/suricata/suricata.yaml --af-packet
> > sleep 10
> > sudo barnyard2 -c /etc/suricata/barnyard2.conf -d /var/log/suricata -f
> > unified2.alert -w /var/log/suricata/suricata.waldo -D
> > exit 0
> >
> >
> >
> > 2013/6/3 Fernando Sclavo <fsclavo at gmail.com <mailto:fsclavo at gmail.com>>
> >
> > Correction to previous email: runmode IS set to workers
> >
> >
> > 2013/6/3 Fernando Sclavo <fsclavo at gmail.com <mailto:
> fsclavo at gmail.com>>
> >
> > Hi Peter/Eric, I will try "flow per cpu" and mail the results.
> > Same to "workers", but if I don't mistake we has tried but CPU
> > usage was very high.
> >
> >
> > Queues and IRQ affinity: each NIC has 16 queues, with IRQ
> > assigned to one core each one (Intel driver script), and
> > Suricata has CPU affinity enabled, confirmed that each thread
> > keeps in their own core.
> >
> >
> >
> > 2013/6/3 Eric Leblond <eric at regit.org <mailto:eric at regit.org>>
> >
> > Hi,
> >
> > Le lundi 03 juin 2013 à 15:54 +0200, Peter Manev a écrit :
> > >
> > >
> > >
> > > On Mon, Jun 3, 2013 at 3:34 PM, Fernando Sclavo
> > <fsclavo at gmail.com <mailto:fsclavo at gmail.com>>
> > > wrote:
> > > Hi all!
> > > We are running Suricata 1.4.2 with two Intel x520
> > cards,
> > > connected each one to the core switches on our
> > datacenter
> > > network. The average traffic is about 1~2Gbps per
> > port.
> > > As you can see on the following top output, there
> > are some
> > > threads significantly more loaded than others
> > (AFPacketeth54
> > > for example): these threads are continuously
> > dropping kernel
> > > packets. We raised kernel parameters (buffers and
> > rmem, etc)
> > > and lowered suricata timeouts flows to just a few
> > seconds, but
> > > we can't keep drops counter static when CPU goes
> > to 99.9% for
> > > a specific thread.
> > > How can we do to balance the load better on all
> > threads to
> > > prevent this issue?
> > >
> > > The server is a Dell R715 2x16 core AMD
> > Opteron(tm) Processor
> > > 6284, 192Gb RAM.
> > >
> > > idsuser at suricata:~$ top -d2
> > >
> > > top - 10:24:05 up 1 min, 2 users, load average:
> > 4.49, 1.14,
> > > 0.38
> > > Tasks: 287 total, 15 running, 272 sleeping, 0
> > stopped, 0
> > > zombie
> > > Cpu(s): 30.3%us, 1.3%sy, 0.0%ni, 65.3%id,
> > 0.0%wa, 0.0%hi,
> > > 3.1%si, 0.0%st
> > > Mem: 198002932k total, 59619020k used, 138383912k
> > free,
> > > 25644k buffers
> > > Swap: 15624188k total, 0k used, 15624188k
> free,
> > > 161068k cached
> > >
> > > PID USER PR NI VIRT RES SHR S %CPU %MEM
> > TIME+
> > > COMMAND
> > > 2309 root 18 -2 55.8g 54g 51g R 99.9 28.6
> > 0:20.96
> > > AFPacketeth54
> > > 2314 root 18 -2 55.8g 54g 51g R 99.9 28.6
> > 0:18.29
> > > AFPacketeth59
> > > 2318 root 18 -2 55.8g 54g 51g R 99.9 28.6
> > 0:12.90
> > > AFPacketeth513
> > > 2319 root 18 -2 55.8g 54g 51g R 77.6 28.6
> > 0:12.78
> > > AFPacketeth514
> > > 2307 root 20 0 55.8g 54g 51g S 66.6 28.6
> > 0:21.25
> > > AFPacketeth52
> > > 2338 root 20 0 55.8g 54g 51g R 58.2 28.6
> > 0:09.94
> > > FlowManagerThre
> > > 2310 root 18 -2 55.8g 54g 51g S 51.2 28.6
> > 0:15.35
> > > AFPacketeth55
> > > 2320 root 18 -2 55.8g 54g 51g R 50.2 28.6
> > 0:07.83
> > > AFPacketeth515
> > > 2313 root 18 -2 55.8g 54g 51g S 48.7 28.6
> > 0:11.66
> > > AFPacketeth58
> > > 2321 root 18 -2 55.8g 54g 51g S 47.7 28.6
> > 0:07.75
> > > AFPacketeth516
> > > 2315 root 18 -2 55.8g 54g 51g R 45.2 28.6
> > 0:12.18
> > > AFPacketeth510
> > > 2306 root 22 2 55.8g 54g 51g R 37.3 28.6
> > 0:12.32
> > > AFPacketeth51
> > > 2312 root 18 -2 55.8g 54g 51g S 35.8 28.6
> > 0:11.90
> > > AFPacketeth57
> > > 2308 root 20 0 55.8g 54g 51g R 34.8 28.6
> > 0:16.69
> > > AFPacketeth53
> > > 2317 root 18 -2 55.8g 54g 51g R 33.3 28.6
> > 0:07.93
> > > AFPacketeth512
> > > 2316 root 18 -2 55.8g 54g 51g S 28.8 28.6
> > 0:08.03
> > > AFPacketeth511
> > > 2311 root 18 -2 55.8g 54g 51g S 24.9 28.6
> > 0:10.51
> > > AFPacketeth56
> > > 2331 root 18 -2 55.8g 54g 51g R 19.9 28.6
> > 0:02.41
> > > AFPacketeth710
> > > 2323 root 18 -2 55.8g 54g 51g S 17.9 28.6
> > 0:03.60
> > > AFPacketeth72
> > > 2336 root 18 -2 55.8g 54g 51g S 16.9 28.6
> > 0:01.50
> > > AFPacketeth715
> > > 2333 root 18 -2 55.8g 54g 51g S 14.9 28.6
> > 0:02.14
> > > AFPacketeth712
> > > 2330 root 18 -2 55.8g 54g 51g S 13.9 28.6
> > 0:02.12
> > > AFPacketeth79
> > > 2324 root 18 -2 55.8g 54g 51g R 11.9 28.6
> > 0:02.96
> > > AFPacketeth73
> > > 2329 root 18 -2 55.8g 54g 51g S 11.9 28.6
> > 0:01.90
> > > AFPacketeth78
> > > 2335 root 18 -2 55.8g 54g 51g S 11.9 28.6
> > 0:01.44
> > > AFPacketeth714
> > > 2334 root 18 -2 55.8g 54g 51g R 10.9 28.6
> > 0:01.68
> > > AFPacketeth713
> > > 2325 root 18 -2 55.8g 54g 51g S 9.4 28.6
> > 0:02.38
> > > AFPacketeth74
> > > 2326 root 18 -2 55.8g 54g 51g S 8.9 28.6
> > 0:02.71
> > > AFPacketeth75
> > > 2327 root 18 -2 55.8g 54g 51g S 7.5 28.6
> > 0:01.98
> > > AFPacketeth76
> > > 2332 root 18 -2 55.8g 54g 51g S 7.5 28.6
> > 0:01.53
> > > AFPacketeth711
> > > 2337 root 18 -2 55.8g 54g 51g S 7.0 28.6
> > 0:01.09
> > > AFPacketeth716
> > > 2328 root 18 -2 55.8g 54g 51g S 6.0 28.6
> > 0:02.11
> > > AFPacketeth77
> > > 2322 root 18 -2 55.8g 54g 51g R 5.5 28.6
> > 0:03.78
> > > AFPacketeth71
> > > 3 root 20 0 0 0 0 S 4.5 0.0
> > 0:01.25
> > > ksoftirqd/0
> > > 11 root 20 0 0 0 0 S 0.5 0.0
> > 0:00.14
> > > kworker/0:1
> > >
> > > Regards
> > >
> > >
> > > _______________________________________________
> > > Suricata IDS Users mailing list:
> > > oisf-users at openinfosecfoundation.org
> > <mailto:oisf-users at openinfosecfoundation.org>
> > > Site: http://suricata-ids.org | Support:
> > > http://suricata-ids.org/support/
> > > List:
> > >
> >
> https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
> > > OISF: http://www.openinfosecfoundation.org/
> > >
> > >
> > > Hi,
> > >
> > >
> > > You could try "runmode: workers".
> >
> > From thread name it seems it is already the case.
> >
> > >
> > >
> > > What is your flow balance method?
> > >
> > > Can you try "flow per cpu" in the yaml section of afpacket?
> > > ("cluster-type: cluster_cpu")
> >
> > It could help indeed.
> >
> > A few questions:
> >
> > Are your IRQ affinity setting correct ? (meaning multiqueue
> > used on the
> > NICs and well balanced accross CPU ?)
> >
> > If you have a lot of UDP on your network use ethtool to load
> > balance it
> > as it is not done by default.
> >
> > BR,
> > >
> > >
> > >
> > >
> > >
> > > Thank you
> > >
> > >
> > > --
> > > Regards,
> > > Peter Manev
> > > _______________________________________________
> > > Suricata IDS Users mailing list:
> > oisf-users at openinfosecfoundation.org
> > <mailto:oisf-users at openinfosecfoundation.org>
> > > Site: http://suricata-ids.org | Support:
> > http://suricata-ids.org/support/
> > > List:
> >
> https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
> > > OISF: http://www.openinfosecfoundation.org/
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
> > Site: http://suricata-ids.org | Support:
> http://suricata-ids.org/support/
> > List:
> https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
> > OISF: http://www.openinfosecfoundation.org/
> >
>
>
> --
> ---------------------------------------------
> Victor Julien
> http://www.inliniac.net/
> PGP: http://www.inliniac.net/victorjulien.asc
> ---------------------------------------------
>
> _______________________________________________
> Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
> Site: http://suricata-ids.org | Support: http://suricata-ids.org/support/
> List: https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
> OISF: http://www.openinfosecfoundation.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openinfosecfoundation.org/pipermail/oisf-users/attachments/20130603/bfae583f/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stats.log
Type: application/octet-stream
Size: 86266 bytes
Desc: not available
URL: <http://lists.openinfosecfoundation.org/pipermail/oisf-users/attachments/20130603/bfae583f/attachment-0002.obj>
More information about the Oisf-users
mailing list