fbpx

Is it insane if much more multicast drops after changing Linux kernel from desktop version to server version?

(Last Updated On: April 4, 2012)

Is it insane if much more multicast drops after changing Linux kernel from desktop version to server version?

We just changed Linux kernel of new machine from desktop version to server version, 2.6.37.6-0.11-default, the same NIC and settings. But we can see much more multicast drops. ethtool shows our driver/firmware is not fast enough. This looks insane, but I really can not find any

==

 

Depending on what hardware

==

you have and parsing and feed mechanism you use, you may not be capturing the packets fast enough.

Without knowing your hardware (e.g. SMP configuration, bus, etc.) NIC card capacity/configuration settings (are you using RDMA or copying whatever is in the buffer?) its hard to diagnose.

The solution can range from the simple to the complex:

Simple Solution: If you have one thread per receiver and more threads than CPUs, you may be oversubscribing your threads, causing a lot of expensive context switching. This could lead to dropped packets.

Complex Solution:
Regen the Linux Kernel to eliminate all non-essential interrupts (e.g. poser management, etc.). Modify the Linux kernel to use a non-preemptive scheduler, build a proprietary memory manager, and/or rewrite the NIC device driver to run in the non-preemptive kernel.

I’d need to know your hardware configuration, NIC settings, and Linux kernel parameters to make detailed suggestions.

 

==

last this occured to me, the value in net.core.rmem_max/net.core.rmem_default was reset to a tiny value. Try to set this couple to something like 128Mo if not already set.

==

 

changing kernel is like changing car engine…unless you read all changelog between the two version and you know all differences (no all are documented anyway)…
first check the Nic kernel module : modinfo bnx2 (if broadcom)
to see with driver is using type: ethtool -i eth0
is it the same?

 

==

It seems that you are using openSUSE. Mentioned ‘-desktop’ kernel version is preemptible kernel with HZ set to 1000, while in ‘-default’ preemption is disabled and HZ is set to 250 (install both and compare /boot/config-2.6* files). Most probably these two settings caused mcast drops.

 

Kernel stack buffers and network card ring buffers are all values that have a nasty tendency to get reset / renamed between versions (for instance in the last couple of years we’ve seen the introduction of udp-specific kernel parameters, as well as the overall ipv4 parameters).

One other thing that could be playing into this is that the -desktop versions of the kernel tend to have (depending on the distro’s feeling on the matter) more of the preemption flags turned on since a user desktop experience is oddly dependent on a consistent kernel response time (eg playing video, etc).

 

==

It is Myri10GE. rmem is big enough, and it is certain that the gaps come from kernel level per ethtool -S report. We have finely tuned the NIC on desktop kernel, it worked very well. But we failed to tune the NIC on server kernel. We do not have much experience to tune kernel parameters. I just heard that desktops tend to be configured for lower latency than servers. Maybe we’d better to change back to desktop kernel.

 

==

You can try to install both server and desktop kernel machines to listen same multicast and to compare. This is really interesting case, please keep us posted

 

==

diff config-2.6.37.6-0.11-default config-2.6.37.6-0.11-desktop

60c60
< # CONFIG_KERNEL_DESKTOP is not set

> CONFIG_KERNEL_DESKTOP=y
73c73
< CONFIG_LOCALVERSION=”-0.11-default”

> CONFIG_LOCALVERSION=”-0.11-desktop”
117,118c117,118
< CONFIG_TREE_RCU=y
< # CONFIG_PREEMPT_RCU is not set

> CONFIG_TREE_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
122d121
< CONFIG_RCU_FAST_NO_HZ=y
151c150
< # CONFIG_SCHED_AUTOGROUP is not set

> CONFIG_SCHED_AUTOGROUP=y
196c195
< CONFIG_DEFAULT_VM_DIRTY_RATIO=40

> CONFIG_DEFAULT_VM_DIRTY_RATIO=20
204d202
< CONFIG_OPTPROBES=y
263c261
< CONFIG_INLINE_SPIN_UNLOCK=y

> # CONFIG_INLINE_SPIN_UNLOCK is not set
265c263
< CONFIG_INLINE_SPIN_UNLOCK_IRQ=y

> # CONFIG_INLINE_SPIN_UNLOCK_IRQ is not set
272c270
< CONFIG_INLINE_READ_UNLOCK=y

> # CONFIG_INLINE_READ_UNLOCK is not set
274c272
< CONFIG_INLINE_READ_UNLOCK_IRQ=y

> # CONFIG_INLINE_READ_UNLOCK_IRQ is not set
281c279
< CONFIG_INLINE_WRITE_UNLOCK=y

> # CONFIG_INLINE_WRITE_UNLOCK is not set
283c281
< CONFIG_INLINE_WRITE_UNLOCK_IRQ=y

> # CONFIG_INLINE_WRITE_UNLOCK_IRQ is not set
350c348
< CONFIG_PREEMPT_NONE=y

> # CONFIG_PREEMPT_NONE is not set
352c350
< # CONFIG_PREEMPT is not set

> CONFIG_PREEMPT=y
358c356
< CONFIG_X86_MCE_XEON75XX=m

> # CONFIG_X86_MCE_XEON75XX is not set
426c424
< CONFIG_HZ_250=y

> # CONFIG_HZ_250 is not set
428,429c426,427
< # CONFIG_HZ_1000 is not set
< CONFIG_HZ=250

> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
523c521
< CONFIG_X86_PCC_CPUFREQ=m

> # CONFIG_X86_PCC_CPUFREQ is not set
5130a5124
> # CONFIG_DEBUG_PREEMPT is not set
5193a5188
> # CONFIG_PREEMPT_TRACER is not set

==

Check your security settings. The server version has more packet filtering to counter spoofing and DOS attacks.

FYI, Here’s an old article that describes the packet flow in Linux:
http://www.linuxjournal.com/article/4852

 

==

29 West have an open sourced multicast tool called mtools that will quantify and diagnose the issue. Maybe worth a try

 

==

Thanks for your suggestion. Just have a look at its source code, seems just send/recv multicast messages with pure number or pure timestamp to check gaps and latency.

 

==

Where are the drops happening? Is the machine the receiver or the sender? What sort of spec is it? What does utilisation look like while this is going on? If its the receiver, then increasing the buffers would be the first thing to try. What else happens on the machine? Does it have TCP traffic as well?

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Subscribe For Latest Updates

Sign up to best of business news, informed analysis and opinions on what matters to you.
Invalid email address
We promise not to spam you. You can unsubscribe at any time.

NOTE!

Check NEW site on stock forex and ETF analysis and automation

Scroll to Top