HADES Network Tests

  • Get network card info: hwinfo --netcard
  • List loaded modules: lsmod
  • List all PCI buses and devices: lspci
  • Grep the log for driver related messages: dmesg | grep ixgbe
  • Get driver info: modinfo ixgbe

Exchange driver

  • Unload driver: rmmod ixgbe
  • Copy new driver: cp /home/hadaq/drivers/ixgbe-2.1.4/src/ixgbe.ko /lib/modules/2.6.26-2-amd64/kernel/drivers/net/ixgbe/
    • Or compile and install under root: cd /home/hadaq/drivers/ixgbe-2.1.4/src/; make CFLAGS_EXTRA="-DIXGBE_NO_HW_RSC" install
  • Set new dependencies: depmod -a
  • Check if dependencies have changed: ls -ltr /lib/modules/2.6.26-2-amd64/
  • Set immune flag for the driver that even 'root' cannot overwrite the driver: cd /lib/modules/2.6.26-2-amd64/kernel/drivers/net/ixgbe/; chattr +i ixgbe.ko
  • (To remove immune flag use: chattr -i ixgbe.ko)
  • Load driver: modprobe ixgbe MQ=0,0

Statistics

  • Enable flow control on the server: ethtool -A eth3 rx on tx on autoconf on
  • Disable flow control on the server: ethtool -A eth3 rx off tx off autoconf off
  • Statistics at driver level: netstat -s eth3 | grep "packet receive errors"
  • Statistics at driver level: netstat -s eth3 | grep "packet reassembles failed"
  • Statistics at hardware level: ethtool -S eth3 | grep "rx_missed_errors"
    • Indicates the number of frames that were dropped due to the adapter's fifo getting full and over flowing. It seems there are interrupt delivery problems or interrupts are getting lost.
  • Statistics at hardware level: ethtool -S eth3 | grep "rx_no_buffer_count"
    • Indicates that the driver didn't return buffers to the hardware soon enough, but the hardware was able to store the packet (at the time of reception) in the fifo to try again. It seems there are interrupt delivery problems or interrupts are getting lost.
  • Slot settings for the NIC: lspci -v -v -s 07:00.1
  • Processor related statistics: mpstat -P ALL 5

Troubleshooting

  • Event Builder ERROR: netmem.c, 645: NetTrans_create: failed to create UDP:0.0.0.0:50534: Address already in use.
    • Reason: port 50534 is already used by other application (most likely by EPICS IOC)
    • Debug: lsof -i | grep 50534 => ebctrl 25724 scs 3u IPv4 8662658 UDP *:50534
    • Solution: Close all EBs, close all IOCs, start EBs, start IOCs, close EBs, start EBs. By doing this sequence we start IOCs after EBs because IOCs are able to dynamically pick up unused UDP ports. Then we restart again EBs because for a proper start they need running IOCs.

-- SergeyYurevich - 01 Apr 2010
Topic revision: r7 - 2010-10-21, SergeyYurevich
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki Send feedback | Imprint | Privacy Policy (in German)