Table of contents:
hadeb05
Due to JUMBO frames hadeb05 can stop receiving DHCP discovers from TRBs. Solution:
- ifconfig eth0 down
- modprobe -r sk98lin; modprobe sk98lin
- ifconfig eth0 192.168.0.1 netmask 255.255.255.0 up
- rcdhcpd restart
hadeb07
hadeb07 parameters:
- SuSE 10.2
- Hard disks: sda+sdb = 0.32TB, sdc+sdd = 1TB. The two last disks (sdc, sdd) were additionally put into the machine to serve as backup disks. They are not really fixed inside the machine: take care when moving the machine. Nagios monitors the temperature of both disks.
- Two 3GHz processors
- 2GB memory
hadeb04 (remote backup system)
hadeb04 serves as a remote backup system.
- Software: rsnapshot, executed 1 times a day (crontab -e)
- Config file: /etc/rsnapshot.conf
- Backup disk: /data/hadeb04/backup
- Test mode: rsnapshot -t hourly
Fixes for rsnapshot on hadeb04:
- WARNING: Could not lchown() symlink
- Reason: perl Lchown module is missing
- Fix: perl -MCPAN -e 'install qw(Lchown)'
- ERROR: rsync returned error 12 in rsync_cleanup_after_native_cp_al()
- Reason: old versions of rsync cannot hold the entire file list in memory at once when there are too many files to be rsynced.
- Fix: upgrade to rsync 3.0.0 or newer. This uses an incremental recursion mode to avoid the need to hold the entire file list in memory.
The following machines/directories are backed up:
Note: since kp1pc098 has Ubuntu installed, 'marek' is a 'root'.
the backup of hades17: hades25/home/hadaq/backups/hades17_hadaq_home/
hadeb04 is a 1.7TB fileserver with a 3200+ AMD 64 CPU. It is running a 64bit linux. It has not system-disk.
The bootserver is lxhadesdaq.
Some things about the Asus-Board:
- The BIOS upgrade to version 1013 is not working.
- BIOS upgrades only work via "ALT-F2" during booting and providing the file on a Floppy-Disk. All other methods failed with a Checksum-Error!
- One has to boot over the second Giga-Bit interface. The first (NVIDIA) is for some reason not recoginized by the kernel
UNAME_MACHINE="x86_64" make modules_install
Compile the kernel 2.6.12.5 for x86_64:
- one has to change directory to: lxhadesdaq:/var/diskless/hadeb04/usr/src/linux-2.6.12.5
- one has to use cross-compiler. It is in lxhadesdaq:/var/diskless/hadeb04/usr/src/linux-2.6.12.5/x86_64-unknown-linux-gnu directory,
- type: export PATH=$PATH:/var/diskless/hadeb04/usr/src/linux-2.6.12.5/x86_64-unknown-linux-gnu/gcc-3.4.0-glibc-2.3.2/bin,
- make ARCH=x86_64 menuconfig
- make ARCH=x86_64 CROSS_COMPILE=x86_64-unknown-linux-gnu-
- INSTALL_MOD_PATH=/var/diskless/hadeb04 make ARCH=x86_64 CROSS_COMPILE=x86_64-unknown-linux-gnu- modules_install
- cp arch/x86_64/boot/bzImage /tftpboot/vmlinuz_2.6.12.5_64bit
--
SergeyYurevich - 04 Jun 2009
Miscellaneous
Hard disks for servers
- Type: Seagate Barracuda ES.2 ST31000340NS 1TB
- Bought by GSI: 12
- Installed: lxhadeb01
- Type: WD RE4-Green Power 2TB 2.5 SATA WD2002FYPS
- Bought by GSI: 70 + 10 (which are not yet installed)
- Bought by Coimbra: 16
- Installed: lxhadeb01, 02, 03, 04
lxhadeb01
lxhadeb01 is our new powerful server for parallel event building.
Info:
- 2x4 cores (Dual Quad-Core AMD Opteron Processor 2.3 GHz)
- 32 GB memory (4x8 GB Kingston DDR2)
- 24 slots for hard disks:
- RAID 1 for 2 system disks (slots 0-1)
- Stand alone disks (slots 2-23)
kernel
Event Builders require the following settings:
- kernel.sem="250 128000 32 512"
- 250 :SEMMSL is the maximum number of semaphores per semaphore set (default)
- 128000 : SEMMNS defines the total number of semaphores for the system (changed: 250x512)
- 32 : SEMOPM defines the maximum number of semaphore operations per semaphore call (default)
- 512 : SEMMNI defines the number of entire semaphore sets for the system (changed)
- net.core.rmem_max=10485760 : Receive socket buffer size
- net.core.wmem_max=10485760 : Send socket buffer size
Possible errors:
- "No space left on device". This error occurs when the event builder application tries to open more than 128 sets of semaphores (the standard setting is kernel.sem="250 32000 32 128"). 128 sets mean 64 shared memory segments since two semaphore sets are required per memory segment. In this case, daq_evtbuild -m 65 will lead to an error.
- "File exists". This error occurs when semaphores remained from previous execution of daq_evtbuild are not properly cleaned. Use ipcrm -s semid (or /home/hadaq/bin/ipcrm.pl).
Howto:
- List open semaphores: ipcs -s
- List open shared memory segments: ipcs -m
- List all: ipcs -a
- Remove semaphore: ipcs -s semid
- Remove all open semaphores: /home/hadaq/bin/ipcrm.pl
RAID Array Controller
Adaptec RAID Controller has been exchanged on 04.06.2009.
Adaptec Storage Manager is a java tool to control RAID Arrays. You can start it under root by executing /usr/StorMan/StorMan.sh
- How to rebuild degraded logical device with failed segment:
- Click on lxhadeb01.gsi.de (Logical system) in Enterprise view,
- then click on Controller in Physical devices
- Goto Actions -> Rescan
- Wait until rescan is finished. You will see that the failed disk is taken out of the logical device.
- Replace the failed disk if needed
- Click on the 'failed' disk -> Initialize
- Wait a bit, the rebuild of logical device should start automaticaly
arcconf is a command line interface.
- To get information about RAID controller status, try the following:
- For controller 1 and first RAID array: /root/bin/arcconf GETCONFIG 1 LD 0
- For controller 1 and second RAID array: /root/bin/arcconf GETCONFIG 1 LD 1
- To silence the alarm: arcconf setalarm 1 silence
Configuration
- Many configuration files are overwritten by cfagent which is started as a cron job at reboot and once per day (/etc/cron.d/gsi). If you want to stop it, you should comment out a couple of lines in /etc/cron.d/gsi.
- To enable remote logins for new users you should add the user to /etc/security/access.conf (access.conf is also overwritten!)
IPMI Module
IPMI module provides a remote access to the machine. It is connected to ITM 'yellow' network. Currently we have hades30.gsi.de machine
in the 'yellow' network for an access to IPMI module.
How to access:
How to get MAC address of IPMI module:
- Execute as 'root': ipmitool lan print
Network
- lxhadeb01b : 1 Gbps NIC
- lxhadeb01 : Intel Corporation (vendor: 8086), device: 82599EB 10 Gigabit Network Connection (device: 10fb)
- ixgbe driver version: ixgbe-2.1.4
- Vendor-device table to recognize devices.
- This might be old: source code: http://sourceforge.net/projects/e1000/files/ixgbe stable/
- To start ixgbe at boot time with single queue setting: /etc/modprobe.d/ixgbe
options ixgbe MQ=0,0
- To load new driver: rmmod ixgbe; modprobe ixgbe MQ=0,0
lxhadeb02
kernel
Event Builders require the following settings:
- kernel.sem="250 128000 32 512"
- 250 :SEMMSL is the maximum number of semaphores per semaphore set (default)
- 128000 : SEMMNS defines the total number of semaphores for the system (changed: 250x512)
- 32 : SEMOPM defines the maximum number of semaphore operations per semaphore call (default)
- 512 : SEMMNI defines the number of entire semaphore sets for the system (changed)
- net.core.rmem_max=10485760 : Receive socket buffer size
- net.core.wmem_max=10485760 : Send socket buffer size
IPMI Module
How to access:
Network
- lxhadeb02b : 1 Gbps NIC
- lxhadeb02 : Intel Corporation (vendor: 8086), 82598EB 10-Gigabit AF Dual Port Network Connection (device: 10f1)
- ixgbe driver version: 2.0.75.7-NAPI
lxhadeb03
kernel
Event Builders require the following settings:
- kernel.sem="250 128000 32 512"
- 250 :SEMMSL is the maximum number of semaphores per semaphore set (default)
- 128000 : SEMMNS defines the total number of semaphores for the system (changed: 250x512)
- 32 : SEMOPM defines the maximum number of semaphore operations per semaphore call (default)
- 512 : SEMMNI defines the number of entire semaphore sets for the system (changed)
- net.core.rmem_max=10485760 : Receive socket buffer size
- net.core.wmem_max=10485760 : Send socket buffer size
IPMI Module
How to access:
Network
- lxhadeb03b : 1 Gbps NIC
- lxhadeb03 : Intel Corporation (vendor: 8086), 82598EB 10-Gigabit AT Dual Port Network Connection (device: 10f1)
- ixgbe driver version: ixgbe-2.1.4
lxhadeb04
kernel
Event Builders require the following settings:
- kernel.sem="250 128000 32 512"
- 250 :SEMMSL is the maximum number of semaphores per semaphore set (default)
- 128000 : SEMMNS defines the total number of semaphores for the system (changed: 250x512)
- 32 : SEMOPM defines the maximum number of semaphore operations per semaphore call (default)
- 512 : SEMMNI defines the number of entire semaphore sets for the system (changed)
- net.core.rmem_max=10485760 : Receive socket buffer size
- net.core.wmem_max=10485760 : Send socket buffer size
IPMI Module
How to access:
Network
- lxhadeb04b : 1 Gbps NIC
- lxhadeb04 : Intel Corporation (vendor: 8086), device: 82599EB 10 Gigabit Network Connection (device: 10fb)
- ixgbe driver version: ixgbe-2.1.4
lxhadeb05
Info:
- 24 cores (AMD Opteron Processor 800 MHz)
- 64 GB memory
- 24 slots for hard disks:
- RAID 1 for 2 system disks (slots 0-1)
- Stand alone disks (slots 2-23)
kernel
Event Builders require the following settings:
- kernel.sem="250 128000 32 512"
- 250 :SEMMSL is the maximum number of semaphores per semaphore set (default)
- 128000 : SEMMNS defines the total number of semaphores for the system (changed: 250x512)
- 32 : SEMOPM defines the maximum number of semaphore operations per semaphore call (default)
- 512 : SEMMNI defines the number of entire semaphore sets for the system (changed)
- net.core.rmem_max=10485760 : Receive socket buffer size
- net.core.wmem_max=10485760 : Send socket buffer size
IPMI Module
To gain an access to lxhadeb05 using the new firmware of IPMI module, the SUN java libs must be installed.
How to access:
Network
- lxhadeb05b : 1 Gbps NIC
- lxhadeb05 : Intel Corporation (vendor: 8086), device: 82599EB 10 Gigabit Network Connection (device: 10fb)
- ixgbe driver version: ixgbe-2.1.4
--
SergeyYurevich - 31 Jan 2011
lxhadesdaq
It is a "higher-availability computer", that means:
- Raid 0 for the system disks
- Raid 5 for the data disks
- 3 redundant power supplies (we just connected them to different fuses not different phases!)
- spare parts in the computer center
System: Debian "Sarge" (which is not available yet(2005)
)
How to check if the SATA-RAID is still OK:
https://lxhadesdaq:1082/
Username: user
Password: w**** (ask Michael)
Software
For installing packages, have a look to:
http://wiki.gsi.de/cgi-bin/view/Linux/DebianPaketVerwaltung
Important Debian-commands:
- apt-cache search rpcinfo ; searches for packages with "rpcinfo" in name of description
- apt-get install libreadline5-dev : installs package
- dpkg -S rpcinfo : searches for a file rpcinfo and prints the corresponding packages
Settings done while booting:
lxhadesdaq:~# cat /etc/sysctl.conf
#
# /etc/sysctl.conf - Configuration file for setting system variables
# See sysctl.conf (5) for information.
#
#kernel.domainname = example.com
#net/ipv4/icmp_echo_ignore_broadcasts=1
#net.core.rmem_max = 2097152
net.core.rmem_max=1048576
kernel.shmmax=268435456
kernel.shmall=20971520
Linux networking kernel parameters (/proc/sys/net/core):
- rmem_default default receive socket buffer size
- rmem_max maximum receive socket buffer size
- wmem_default default send socket buffer size
- wmem_max maximum send socket buffer size
- netdev_max_backlog number of unprocessed input packets before kernel starts dropping them (default = 300)
How to set parameter value in /etc/sysctl.conf :
- sysctl -w net.core.rmem_default=262144
Backup-system for lxhadesdaq
IT department can not guarantee that they can get lxhadesdaq up running a few hours after a hardware failure. Therefore, a backup-system should be prepared to replace lxhadesdaq in case of
hardware failure.
backup-system (hadeb06) will get nightly all daq related files from lxhadesdaq via rsync.
Directories to be rsynced on lxhadesdaq:
- /home/hadaq
- /home/scs
- /var/diskless/linuxvme
- /var/diskless/etrax
- /etc/hosts
- /etc/dhcp3/dhcpd.conf
- /tftpboot
to be continued
Lustre mount
The new kernel 2.6.22-gsi-lustre was installed by Thomas Roth and Lustre cluster is mounted as /lustre_alpha.
--
SergeyYurevich - 13 May 2008
hadeb06a (hadeb06b)
- 2 x 2GHz AMD CPUs
- 16 GB RAM (12 GB available for a use)
- 130 GB hard disk.
hadeb06 is running a 64bit Linux. Currently it is used as a file server for QA. In fact, it can be used as an Event Builder.
If after reboot the ram disk is not mounted automatically, you may try to mount it manually: /root/start_ramdisk
Sequence of steps to be done to remount ramdisk (under root):
- lsof /ramdisk (list all processes which access ramdisk)
- /etc/start_res_services stop
- /etc/init.d/xinetd stop
- killall vsftpd
- umount /ramdisk (try to unmount the disk, if succeeded go to next step)
- /root/start_ramdisk
- /etc/init.d/xinetd start
- /etc/init.d/start_res_services start
--
SergeyYurevich - 16 Sep 2008
Data Sources
Data is transported via Ethernet and UDP to the eventbuilder. The sequence of sources is the following:
Subsystem |
UDP-port |
TOF and MU |
2222 |
Shower |
2223 |
RICH0 |
2224 |
RICH1 |
2225 |
RICH2 |
2226 |
MDC0 |
2227 |
MDC1 |
2228 |
Meaning of EvtId in Eventbuilder-Output on text-console
evtId09: 4086 evtId11: 11k evtId19: 894
evtId21: 3967 evtId31: 2247
The number is devided into two nibbles:
Lower nibble: Trigger Code as delivered by CTU
Higher nibble: Decision made by MU
Bits
0 |
downscaled event, 0=> not downscaled |
1-2 |
0: negative, 1: positive, 2: positive but stopped due too many leptons |
--
MichaelTraxler - 28 Jan 2005