Tag Archives: kernel

Is it insane if much more multicast drops after changing Linux kernel from desktop version to server version?

Is it insane if much more multicast drops after changing Linux kernel from desktop version to server version?

We just changed Linux kernel of new machine from desktop version to server version, 2.6.37.6-0.11-default, the same NIC and settings. But we can see much more multicast drops. ethtool shows our driver/firmware is not fast enough. This looks insane, but I really can not find any

==

 

Depending on what hardware

==

you have and parsing and feed mechanism you use, you may not be capturing the packets fast enough.

Without knowing your hardware (e.g. SMP configuration, bus, etc.) NIC card capacity/configuration settings (are you using RDMA or copying whatever is in the buffer?) its hard to diagnose.

The solution can range from the simple to the complex:

Simple Solution: If you have one thread per receiver and more threads than CPUs, you may be oversubscribing your threads, causing a lot of expensive context switching. This could lead to dropped packets.

Complex Solution:
Regen the Linux Kernel to eliminate all non-essential interrupts (e.g. poser management, etc.). Modify the Linux kernel to use a non-preemptive scheduler, build a proprietary memory manager, and/or rewrite the NIC device driver to run in the non-preemptive kernel.

I’d need to know your hardware configuration, NIC settings, and Linux kernel parameters to make detailed suggestions.

 

==

last this occured to me, the value in net.core.rmem_max/net.core.rmem_default was reset to a tiny value. Try to set this couple to something like 128Mo if not already set.

==

 

changing kernel is like changing car engine…unless you read all changelog between the two version and you know all differences (no all are documented anyway)…
first check the Nic kernel module : modinfo bnx2 (if broadcom)
to see with driver is using type: ethtool -i eth0
is it the same?

 

==

It seems that you are using openSUSE. Mentioned ‘-desktop’ kernel version is preemptible kernel with HZ set to 1000, while in ‘-default’ preemption is disabled and HZ is set to 250 (install both and compare /boot/config-2.6* files). Most probably these two settings caused mcast drops.

 

Kernel stack buffers and network card ring buffers are all values that have a nasty tendency to get reset / renamed between versions (for instance in the last couple of years we’ve seen the introduction of udp-specific kernel parameters, as well as the overall ipv4 parameters).

One other thing that could be playing into this is that the -desktop versions of the kernel tend to have (depending on the distro’s feeling on the matter) more of the preemption flags turned on since a user desktop experience is oddly dependent on a consistent kernel response time (eg playing video, etc).

 

==

It is Myri10GE. rmem is big enough, and it is certain that the gaps come from kernel level per ethtool -S report. We have finely tuned the NIC on desktop kernel, it worked very well. But we failed to tune the NIC on server kernel. We do not have much experience to tune kernel parameters. I just heard that desktops tend to be configured for lower latency than servers. Maybe we’d better to change back to desktop kernel.

 

==

You can try to install both server and desktop kernel machines to listen same multicast and to compare. This is really interesting case, please keep us posted

 

==

diff config-2.6.37.6-0.11-default config-2.6.37.6-0.11-desktop

60c60
< # CONFIG_KERNEL_DESKTOP is not set

> CONFIG_KERNEL_DESKTOP=y
73c73
< CONFIG_LOCALVERSION=”-0.11-default”

> CONFIG_LOCALVERSION=”-0.11-desktop”
117,118c117,118
< CONFIG_TREE_RCU=y
< # CONFIG_PREEMPT_RCU is not set

> CONFIG_TREE_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
122d121
< CONFIG_RCU_FAST_NO_HZ=y
151c150
< # CONFIG_SCHED_AUTOGROUP is not set

> CONFIG_SCHED_AUTOGROUP=y
196c195
< CONFIG_DEFAULT_VM_DIRTY_RATIO=40

> CONFIG_DEFAULT_VM_DIRTY_RATIO=20
204d202
< CONFIG_OPTPROBES=y
263c261
< CONFIG_INLINE_SPIN_UNLOCK=y

> # CONFIG_INLINE_SPIN_UNLOCK is not set
265c263
< CONFIG_INLINE_SPIN_UNLOCK_IRQ=y

> # CONFIG_INLINE_SPIN_UNLOCK_IRQ is not set
272c270
< CONFIG_INLINE_READ_UNLOCK=y

> # CONFIG_INLINE_READ_UNLOCK is not set
274c272
< CONFIG_INLINE_READ_UNLOCK_IRQ=y

> # CONFIG_INLINE_READ_UNLOCK_IRQ is not set
281c279
< CONFIG_INLINE_WRITE_UNLOCK=y

> # CONFIG_INLINE_WRITE_UNLOCK is not set
283c281
< CONFIG_INLINE_WRITE_UNLOCK_IRQ=y

> # CONFIG_INLINE_WRITE_UNLOCK_IRQ is not set
350c348
< CONFIG_PREEMPT_NONE=y

> # CONFIG_PREEMPT_NONE is not set
352c350
< # CONFIG_PREEMPT is not set

> CONFIG_PREEMPT=y
358c356
< CONFIG_X86_MCE_XEON75XX=m

> # CONFIG_X86_MCE_XEON75XX is not set
426c424
< CONFIG_HZ_250=y

> # CONFIG_HZ_250 is not set
428,429c426,427
< # CONFIG_HZ_1000 is not set
< CONFIG_HZ=250

> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
523c521
< CONFIG_X86_PCC_CPUFREQ=m

> # CONFIG_X86_PCC_CPUFREQ is not set
5130a5124
> # CONFIG_DEBUG_PREEMPT is not set
5193a5188
> # CONFIG_PREEMPT_TRACER is not set

==

Check your security settings. The server version has more packet filtering to counter spoofing and DOS attacks.

FYI, Here’s an old article that describes the packet flow in Linux:
http://www.linuxjournal.com/article/4852

 

==

29 West have an open sourced multicast tool called mtools that will quantify and diagnose the issue. Maybe worth a try

 

==

Thanks for your suggestion. Just have a look at its source code, seems just send/recv multicast messages with pure number or pure timestamp to check gaps and latency.

 

==

Where are the drops happening? Is the machine the receiver or the sender? What sort of spec is it? What does utilisation look like while this is going on? If its the receiver, then increasing the buffers would be the first thing to try. What else happens on the machine? Does it have TCP traffic as well?

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Is it insane if much more multicast drops after changing Linux kernel from desktop version to server version?

Is it insane if much more multicast drops after changing Linux kernel from desktop version to server version?

We just changed Linux kernel of new machine from desktop version to server version, 2.6.37.6-0.11-default, the same NIC and settings. But we can see much more multicast drops. ethtool shows our driver/firmware is not fast enough. This looks insane, but I really can not find any other stupid things we have done.

 

Depending on what hardware you have and parsing and feed mechanism you use, you may not be capturing the packets fast enough.

Without knowing your hardware (e.g. SMP configuration, bus, etc.) NIC card capacity/configuration settings (are you using RDMA or copying whatever is in the buffer?) its hard to diagnose.

The solution can range from the simple to the complex:

Simple Solution: If you have one thread per receiver and more threads than CPUs, you may be oversubscribing your threads, causing a lot of expensive context switching. This could lead to dropped packets.

Complex Solution:
Regen the Linux Kernel to eliminate all non-essential interrupts (e.g. poser management, etc.). Modify the Linux kernel to use a non-preemptive scheduler, build a proprietary memory manager, and/or rewrite the NIC device driver to run in the non-preemptive kernel.

I’d need to know your hardware configuration, NIC settings, and Linux kernel parameters to make detailed suggestions.

 

 

last this occured to me, the value in net.core.rmem_max/net.core.rmem_default was reset to a tiny value. Try to set this couple to something like 128Mo if not already set.
My 2 cents.

 

changing kernel is like changing car engine…unless you read all changelog between the two version and you know all differences (no all are documented anyway)…
first check the Nic kernel module : modinfo bnx2 (if broadcom)
to see with driver is using type: ethtool -i eth0
is it the same?

 

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Linux kernel Unix kernel PID and file descriptor fully explained during network administration

Linux kernel Unix kernel PID and file descriptor fully explained during network administration
One of the more tricky things in Linux within the kernel are the process Ids and file descriptors that go with each binary executable file you run or have access to. All sorts of things can happen within each process a bin executable file, a process could: open the following kind of files:
=> Actual log file
=> /dev files
=> UNIX Sockets
=> Network sockets
=> Library files /lib /lib64
=> Executables and other programs etc
Some handy ways to find out about processes is to use the lsof command. There are number of steps we must accomplish:
1. Get the PID we need by running ‘ps aux|grep prog’ or ‘pidof myprog’. This will then out put the process id. A PID could be 2892
2. We need to list the files open by the PID by using ‘lsof –p 2892’. You could also ‘cd /proc/2892’/fd’. Then ‘ls –l|wc –‘ for the file count used.
Do understand that /proc is file system generated for each process created when the system boots. There is a variety stories from this /proc directory. We get”
? /proc/PID/cmdline : process arguments
? /proc/PID/cwd : process current working directory (symlink)
? /proc/PID/exe : path to actual process executable file (symlink)
? /proc/PID/environ : environment used by process
? /proc/PID/root : the root path as seen by the process. For most processes this will be a link to / unless the process is running in a chroot jail.
? /proc/PID/status : basic information about a process including its run state and memory usage.
? /proc/PID/task : hard links to any tasks that have been started by this (the parent) process.

Hopefully this will give you a better of how processes are managed by the Linux kernel as it executes.

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!