Table of contents:


hadeb05

Due to JUMBO frames hadeb05 can stop receiving DHCP discovers from TRBs. Solution:
  • ifconfig eth0 down
  • modprobe -r sk98lin; modprobe sk98lin
  • ifconfig eth0 192.168.0.1 netmask 255.255.255.0 up
  • rcdhcpd restart

hadeb07

hadeb07 parameters:
  • SuSE 10.2
  • Hard disks: sda+sdb = 0.32TB, sdc+sdd = 1TB. The two last disks (sdc, sdd) were additionally put into the machine to serve as backup disks. They are not really fixed inside the machine: take care when moving the machine. Nagios monitors the temperature of both disks.
  • Two 3GHz processors
  • 2GB memory

hadeb04 (remote backup system)

hadeb04 serves as a remote backup system.
  • Software: rsnapshot, executed 1 times a day (crontab -e)
  • Config file: /etc/rsnapshot.conf
  • Backup disk: /data/hadeb04/backup
  • Test mode: rsnapshot -t hourly

Fixes for rsnapshot on hadeb04:
  • WARNING: Could not lchown() symlink
    • Reason: perl Lchown module is missing
    • Fix: perl -MCPAN -e 'install qw(Lchown)'
  • ERROR: rsync returned error 12 in rsync_cleanup_after_native_cp_al()
    • Reason: old versions of rsync cannot hold the entire file list in memory at once when there are too many files to be rsynced.
    • Fix: upgrade to rsync 3.0.0 or newer. This uses an incremental recursion mode to avoid the need to hold the entire file list in memory.

The following machines/directories are backed up:

Note: since kp1pc098 has Ubuntu installed, 'marek' is a 'root'.

the backup of hades17: hades25/home/hadaq/backups/hades17_hadaq_home/

hadeb04 is a 1.7TB fileserver with a 3200+ AMD 64 CPU. It is running a 64bit linux. It has not system-disk. The bootserver is lxhadesdaq. Some things about the Asus-Board:
  • The BIOS upgrade to version 1013 is not working.
  • BIOS upgrades only work via "ALT-F2" during booting and providing the file on a Floppy-Disk. All other methods failed with a Checksum-Error!
  • One has to boot over the second Giga-Bit interface. The first (NVIDIA) is for some reason not recoginized by the kernel

UNAME_MACHINE="x86_64" make modules_install

Compile the kernel 2.6.12.5 for x86_64:
  1. one has to change directory to: lxhadesdaq:/var/diskless/hadeb04/usr/src/linux-2.6.12.5
  2. one has to use cross-compiler. It is in lxhadesdaq:/var/diskless/hadeb04/usr/src/linux-2.6.12.5/x86_64-unknown-linux-gnu directory,
  3. type: export PATH=$PATH:/var/diskless/hadeb04/usr/src/linux-2.6.12.5/x86_64-unknown-linux-gnu/gcc-3.4.0-glibc-2.3.2/bin,
  4. make ARCH=x86_64 menuconfig
  5. make ARCH=x86_64 CROSS_COMPILE=x86_64-unknown-linux-gnu-
  6. INSTALL_MOD_PATH=/var/diskless/hadeb04 make ARCH=x86_64 CROSS_COMPILE=x86_64-unknown-linux-gnu- modules_install
  7. cp arch/x86_64/boot/bzImage /tftpboot/vmlinuz_2.6.12.5_64bit

-- SergeyYurevich - 04 Jun 2009

Miscellaneous

Hard disks for servers

  • Type: Seagate Barracuda ES.2 ST31000340NS 1TB
  • Bought by GSI: 12
  • Installed: lxhadeb01

  • Type: WD RE4-Green Power 2TB 2.5 SATA WD2002FYPS
  • Bought by GSI: 70 + 10 (which are not yet installed)
  • Bought by Coimbra: 16
  • Installed: lxhadeb01, 02, 03, 04

lxhadeb01

lxhadeb01 is our new powerful server for parallel event building.

Info:
  • 2x4 cores (Dual Quad-Core AMD Opteron Processor 2.3 GHz)
  • 32 GB memory (4x8 GB Kingston DDR2)
  • 24 slots for hard disks:
    • RAID 1 for 2 system disks (slots 0-1)
    • Stand alone disks (slots 2-23)

kernel

Event Builders require the following settings:
  • kernel.sem="250 128000 32 512"
    • 250 :SEMMSL is the maximum number of semaphores per semaphore set (default)
    • 128000 : SEMMNS defines the total number of semaphores for the system (changed: 250x512)
    • 32 : SEMOPM defines the maximum number of semaphore operations per semaphore call (default)
    • 512 : SEMMNI defines the number of entire semaphore sets for the system (changed)
  • net.core.rmem_max=10485760 : Receive socket buffer size
  • net.core.wmem_max=10485760 : Send socket buffer size

Possible errors:
  • "No space left on device". This error occurs when the event builder application tries to open more than 128 sets of semaphores (the standard setting is kernel.sem="250 32000 32 128"). 128 sets mean 64 shared memory segments since two semaphore sets are required per memory segment. In this case, daq_evtbuild -m 65 will lead to an error.
  • "File exists". This error occurs when semaphores remained from previous execution of daq_evtbuild are not properly cleaned. Use ipcrm -s semid (or /home/hadaq/bin/ipcrm.pl).

Howto:
  • List open semaphores: ipcs -s
  • List open shared memory segments: ipcs -m
  • List all: ipcs -a
  • Remove semaphore: ipcs -s semid
  • Remove all open semaphores: /home/hadaq/bin/ipcrm.pl

RAID Array Controller

Adaptec RAID Controller has been exchanged on 04.06.2009.

Adaptec Storage Manager is a java tool to control RAID Arrays. You can start it under root by executing /usr/StorMan/StorMan.sh
  • How to rebuild degraded logical device with failed segment:
    • Click on lxhadeb01.gsi.de (Logical system) in Enterprise view,
    • then click on Controller in Physical devices
    • Goto Actions -> Rescan
    • Wait until rescan is finished. You will see that the failed disk is taken out of the logical device.
    • Replace the failed disk if needed
    • Click on the 'failed' disk -> Initialize
    • Wait a bit, the rebuild of logical device should start automaticaly

arcconf is a command line interface.
  • To get information about RAID controller status, try the following:
    • For controller 1 and first RAID array: /root/bin/arcconf GETCONFIG 1 LD 0
    • For controller 1 and second RAID array: /root/bin/arcconf GETCONFIG 1 LD 1
  • To silence the alarm: arcconf setalarm 1 silence

Configuration

  • Many configuration files are overwritten by cfagent which is started as a cron job at reboot and once per day (/etc/cron.d/gsi). If you want to stop it, you should comment out a couple of lines in /etc/cron.d/gsi.
  • To enable remote logins for new users you should add the user to /etc/security/access.conf (access.conf is also overwritten!)

IPMI Module

IPMI module provides a remote access to the machine. It is connected to ITM 'yellow' network. Currently we have hades30.gsi.de machine in the 'yellow' network for an access to IPMI module.

How to access:

How to get MAC address of IPMI module:
  • Execute as 'root': ipmitool lan print

Network

  • lxhadeb01b : 1 Gbps NIC
  • lxhadeb01 : Intel Corporation (vendor: 8086), device: 82599EB 10 Gigabit Network Connection (device: 10fb)
    • ixgbe driver version: ixgbe-2.1.4
    • Vendor-device table to recognize devices.
    • This might be old: source code: http://sourceforge.net/projects/e1000/files/ixgbe stable/
    • To start ixgbe at boot time with single queue setting: /etc/modprobe.d/ixgbe
                                options ixgbe MQ=0,0
                                
    • To load new driver: rmmod ixgbe; modprobe ixgbe MQ=0,0

lxhadeb02

kernel

Event Builders require the following settings:
  • kernel.sem="250 128000 32 512"
    • 250 :SEMMSL is the maximum number of semaphores per semaphore set (default)
    • 128000 : SEMMNS defines the total number of semaphores for the system (changed: 250x512)
    • 32 : SEMOPM defines the maximum number of semaphore operations per semaphore call (default)
    • 512 : SEMMNI defines the number of entire semaphore sets for the system (changed)
  • net.core.rmem_max=10485760 : Receive socket buffer size
  • net.core.wmem_max=10485760 : Send socket buffer size

IPMI Module

How to access:

Network

  • lxhadeb02b : 1 Gbps NIC
  • lxhadeb02 : Intel Corporation (vendor: 8086), 82598EB 10-Gigabit AF Dual Port Network Connection (device: 10f1)
    • ixgbe driver version: 2.0.75.7-NAPI

lxhadeb03

kernel

Event Builders require the following settings:
  • kernel.sem="250 128000 32 512"
    • 250 :SEMMSL is the maximum number of semaphores per semaphore set (default)
    • 128000 : SEMMNS defines the total number of semaphores for the system (changed: 250x512)
    • 32 : SEMOPM defines the maximum number of semaphore operations per semaphore call (default)
    • 512 : SEMMNI defines the number of entire semaphore sets for the system (changed)
  • net.core.rmem_max=10485760 : Receive socket buffer size
  • net.core.wmem_max=10485760 : Send socket buffer size

IPMI Module

How to access:

Network

  • lxhadeb03b : 1 Gbps NIC
  • lxhadeb03 : Intel Corporation (vendor: 8086), 82598EB 10-Gigabit AT Dual Port Network Connection (device: 10f1)
    • ixgbe driver version: ixgbe-2.1.4

lxhadeb04

kernel

Event Builders require the following settings:
  • kernel.sem="250 128000 32 512"
    • 250 :SEMMSL is the maximum number of semaphores per semaphore set (default)
    • 128000 : SEMMNS defines the total number of semaphores for the system (changed: 250x512)
    • 32 : SEMOPM defines the maximum number of semaphore operations per semaphore call (default)
    • 512 : SEMMNI defines the number of entire semaphore sets for the system (changed)
  • net.core.rmem_max=10485760 : Receive socket buffer size
  • net.core.wmem_max=10485760 : Send socket buffer size

IPMI Module

How to access:

Network

  • lxhadeb04b : 1 Gbps NIC
  • lxhadeb04 : Intel Corporation (vendor: 8086), device: 82599EB 10 Gigabit Network Connection (device: 10fb)
    • ixgbe driver version: ixgbe-2.1.4

lxhadeb05

Info:
  • 24 cores (AMD Opteron Processor 800 MHz)
  • 64 GB memory
  • 24 slots for hard disks:
    • RAID 1 for 2 system disks (slots 0-1)
    • Stand alone disks (slots 2-23)

kernel

Event Builders require the following settings:
  • kernel.sem="250 128000 32 512"
    • 250 :SEMMSL is the maximum number of semaphores per semaphore set (default)
    • 128000 : SEMMNS defines the total number of semaphores for the system (changed: 250x512)
    • 32 : SEMOPM defines the maximum number of semaphore operations per semaphore call (default)
    • 512 : SEMMNI defines the number of entire semaphore sets for the system (changed)
  • net.core.rmem_max=10485760 : Receive socket buffer size
  • net.core.wmem_max=10485760 : Send socket buffer size

IPMI Module

To gain an access to lxhadeb05 using the new firmware of IPMI module, the SUN java libs must be installed.

How to access:

Network

  • lxhadeb05b : 1 Gbps NIC
  • lxhadeb05 : Intel Corporation (vendor: 8086), device: 82599EB 10 Gigabit Network Connection (device: 10fb)
    • ixgbe driver version: ixgbe-2.1.4

-- SergeyYurevich - 31 Jan 2011

lxhadesdaq

It is a "higher-availability computer", that means:
  • Raid 0 for the system disks
  • Raid 5 for the data disks
  • 3 redundant power supplies (we just connected them to different fuses not different phases!)
  • spare parts in the computer center

System: Debian "Sarge" (which is not available yet(2005) smile )

How to check if the SATA-RAID is still OK:

https://lxhadesdaq:1082/

Username: user

Password: w**** (ask Michael)

Software

For installing packages, have a look to: http://wiki.gsi.de/cgi-bin/view/Linux/DebianPaketVerwaltung

Important Debian-commands:
  • apt-cache search rpcinfo ; searches for packages with "rpcinfo" in name of description
  • apt-get install libreadline5-dev : installs package
  • dpkg -S rpcinfo : searches for a file rpcinfo and prints the corresponding packages

Settings done while booting:
lxhadesdaq:~# cat /etc/sysctl.conf
#
# /etc/sysctl.conf - Configuration file for setting system variables
# See sysctl.conf (5) for information.
#
#kernel.domainname = example.com
#net/ipv4/icmp_echo_ignore_broadcasts=1
#net.core.rmem_max = 2097152
net.core.rmem_max=1048576
kernel.shmmax=268435456
kernel.shmall=20971520

Linux networking kernel parameters (/proc/sys/net/core):
  • rmem_default default receive socket buffer size
  • rmem_max maximum receive socket buffer size
  • wmem_default default send socket buffer size
  • wmem_max maximum send socket buffer size
  • netdev_max_backlog number of unprocessed input packets before kernel starts dropping them (default = 300)

How to set parameter value in /etc/sysctl.conf :
  • sysctl -w net.core.rmem_default=262144

Backup-system for lxhadesdaq

IT department can not guarantee that they can get lxhadesdaq up running a few hours after a hardware failure. Therefore, a backup-system should be prepared to replace lxhadesdaq in case of hardware failure.

backup-system (hadeb06) will get nightly all daq related files from lxhadesdaq via rsync.

Directories to be rsynced on lxhadesdaq:

  • /home/hadaq
  • /home/scs
  • /var/diskless/linuxvme
  • /var/diskless/etrax
  • /etc/hosts
  • /etc/dhcp3/dhcpd.conf
  • /tftpboot

to be continued

Lustre mount

The new kernel 2.6.22-gsi-lustre was installed by Thomas Roth and Lustre cluster is mounted as /lustre_alpha.

-- SergeyYurevich - 13 May 2008

hadeb06a (hadeb06b)

  • 2 x 2GHz AMD CPUs
  • 16 GB RAM (12 GB available for a use)
  • 130 GB hard disk.

hadeb06 is running a 64bit Linux. Currently it is used as a file server for QA. In fact, it can be used as an Event Builder.

If after reboot the ram disk is not mounted automatically, you may try to mount it manually: /root/start_ramdisk

Sequence of steps to be done to remount ramdisk (under root):
  • lsof /ramdisk (list all processes which access ramdisk)
  • /etc/start_res_services stop
  • /etc/init.d/xinetd stop
  • killall vsftpd
  • umount /ramdisk (try to unmount the disk, if succeeded go to next step)
  • /root/start_ramdisk
  • /etc/init.d/xinetd start
  • /etc/init.d/start_res_services start

-- SergeyYurevich - 16 Sep 2008

Data Sources

Data is transported via Ethernet and UDP to the eventbuilder. The sequence of sources is the following:

Subsystem UDP-port
TOF and MU 2222
Shower 2223
RICH0 2224
RICH1 2225
RICH2 2226
MDC0 2227
MDC1 2228

Meaning of EvtId in Eventbuilder-Output on text-console

            evtId09: 4086             evtId11:   11k            evtId19:  894 
            evtId21: 3967             evtId31: 2247 

The number is devided into two nibbles:

Lower nibble: Trigger Code as delivered by CTU Higher nibble: Decision made by MU

Bits
0 downscaled event, 0=> not downscaled
1-2 0: negative, 1: positive, 2: positive but stopped due too many leptons

-- MichaelTraxler - 28 Jan 2005
Topic revision: r33 - 2011-01-31, SergeyYurevich
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki Send feedback | Imprint | Privacy Policy (in German)