IOC for HADES SCS running on Linux
Introduction - HADES IOC
There is an EPICS IOC running under Linux on machine
hadesdaq02.
Right now, it services
- the HV for all but MDC,
- the LV power supplies,
- temperature monitoring (old and new)
- connects to the hadcon boards and dreamplugs on the internal HADES VLAN
- the stats of all running iocs
- as a kind of gateway for process variables available for the GSI LAN
- ...
Sources and Maintenance
CVS
- CVS repository
-
:ext:scs@lx-pool.gsi.de:/misc/hadesprojects/slowcontrol/cvsroot
- CVS module
-
EPICS/apps/hades
In order to do small maintenance, ...
- ... connect to hadesdaq02 as user scs using the default scs password.
- The configuration is stored under the production directory /home/scs/apps/hades. There you will find a normal IOC directory tree for development and booting the IOC.
- The IOC is named "hades".
- To change the database files,
- make your edits in hadesApp/Db
- and afterwards type
make
.
- Don't forget to commit the changes to CVS.
- (re-)start the IOC
In order to do bigger maintenance or development first use the playground ...
Operation
(Re-) Starting the IOC
To restart the server, which is running
1. login to hadesdaq02
ssh scs@hadesdaq02
2. check whether the procServ server, which itselfs controls the EPICS server, is running,
by checking for processes:
ps x
PID TTY STAT TIME COMMAND
13061 ? S 0:02 procServ -L ioc-cave-hadesdaq02.log 4813 startEpicsIoc_includingHVSemaphoreCheck.sh ../../bin/linux-x86/hades st.cmd ### this is the process server
13062 pts/4 Ssl+ 1:51 startEpicsIoc_includingHVSemaphoreCheck.sh ../../bin/linux-x86/hades st.cmd ### this is the EPICS server
- NO:
- check for remaining semaphores (c.f. chapter "Server Problems")
- start server from scratch ( c.f. next chapter )
- YES: continue
3. Now you can login to the server:
-
telnet localhost 4813
- Switch off auto-restart (procServ), toggle it to be
OFF
: <CTRL+T> @@@ Toggled auto restart to ON/OFF
- ... then hit
<CR>
- You should see the epics prompt:
epics>
- NO you don't see it:
- quit telnet session
<CTRL+]>
telnet> quit
- check for remaining semaphores (c.f. "Server Problems")
- again login to the server
telnet localhost 4813
- YES, you have the prompt, i.e. EPICS is running, exit it.
exit
- wait
- if the IOC has been running, you first should see the IOC shutting down:
DEBUG: shutting down crate 0
Shutdown: successfully disconnected from crate x1
[...]
DEBUG: shutting down crate 6
Shutdown: successfully disconnected from crate x7 <
- then restart the IOC by using:
<CTRL+R>
- then you should see it restarting:
@@@ @@@ @@@ @@@ @@@
@@@ Received a sigChild for process WXYZ. The process was killed by signal 11
@@@ Current time: Sun Sep 12 19:29:51 2010
@@@ Child process is shutting down, auto restart is disabled
@@@ Use ^R to restart the child, ^Q to quit the server
@@@ Restarting child "../../bin/linux-x86/hades"
@@@ The PID of new child "../../bin/linux-x86/hades" is: abcdef
@@@ @@@ @@@ @@@ @@@
#!../../bin/linux-x86_64/hades
## You may have to change hades to something else
## everywhere it appears in this file
< envPaths
[...]
## Load record instances
[...]
iocInit
Starting iocInit
############################################################################
## EPICS R3.14.10 $R3-14-10$ $2008/10/27 19:39:04$
## EPICS Base built Oct 22 2009
############################################################################
Starting CAEN x527 driver
pthread_attr_setstacksize error Invalid argument
iocRun: All initialization complete
dbl > /home/scs/apps/hades/iocBoot/ioccave/hadesdaq02.dbl
[...]
- To quit from the server without stopping it use the telnet escape (
CTRL+]
) sequence and quit: <CTRL+]>
telnet> quit
- optionally continue with further checks, q.v. below
To start the server ((almost) from scratch)
Just follow the same procedures crontab would do during restart
- login to
hadesdaq02
ssh scs@hadesdaq02
- clean up
- make sure procServ is not running by checking for processes:
ps x
PID TTY STAT TIME COMMAND
13061 ? S 0:02 procServ -L ioc-cave-hadesdaq02.log 4813 startEpicsIoc_includingHVSemaphoreCheck.sh ../../bin/linux-x86/hades st.cmd
- if it is still there ...
- brute force: if there is still a procServ process kill it.
- else: try restart running server
- change to scs' procServ director
cd ~scs/procServ &&
./ioc-cave.sh
- optionally continue with further checks, q.v. below
Further checks
- Check the log
~/apps/hades/iocBoot/ioccave/ioc-cave-hadesdaq02.log
- In case
"Semaphore already present"
look chapter "Semaphore hanging"
- obsolete : taken care by startup command:
startEpicsIoc_includingHVSemaphoreCheck.sh ../../bin/linux-x86/hades st.cmd
- all other error messages, notify experts,
- system maybe working, but not completely
- check whether server is running,
- by checking for processes:
ps x
PID TTY STAT TIME COMMAND
13061 ? S 0:02 procServ -L ioc-cave-hadesdaq02.log 4813 startEpicsIoc_includingHVSemaphoreCheck.sh ../../bin/linux-x86/hades st.cmd
13062 pts/4 Ssl+ 1:51 startEpicsIoc_includingHVSemaphoreCheck.sh ../../bin/linux-x86/hades st.cmd
- You can login to the server:
-
telnet localhost 4813
- ... then hit "Enter", carriage return:
<CR>
- You should see the epics prompt:
epics>
- With the command
dbl
you will get a very long list of all process variables.
- To quit from the server without stopping it use the telnet escape (
CTRL+]
) sequence and quit: <CTRL+]>
telnet> quit
Known server problems
If the server is not starting properly read the log file: ~/apps/hades/iocBoot/ioccave/ioc-cave-hadesdaq02.log .
Semaphore hanging
Automatic procedure at startup
Taken care by startup command:
startEpicsIoc_includingHVSemaphoreCheck.sh ../../bin/linux-x86/hades st.cmd
. But in case this did not work ...
Removing by hand
- call script
~/apps/hades/iocBoot/ioccave/removeDeadHVSemaphores.sh
- or really by hand
It could be that a semaphore is not cleaned up from a previous start,
this is indicated by the message:
Semaphore already present
There is another process using the semaphore.
Or a process using the semaphore exited abnormally.
In That case try to manually release the semaphore with:
ipcrm sem XXX.
In order to cure this do the following:
- find the semaphore id
ipcs -s
------ Semaphore Arrays --------
key semid owner perms nsems
0x30222aea <Sem ID> scs 666 1
- Then delete this semaphore by
ipcrm sem <Sem ID>
SY1527 hanging
If you see in the log file that one pf the HV crates does not connect
you should first check if the crates is physically powered on and
has Ethernet connection.
ping hadhvp05
PING hadhvp05.gsi.de (192.168.100.69) 56(84) bytes of data.
64 bytes from hadhvp05.gsi.de (192.168.100.69): icmp_seq=1 ttl=64 time=0.908 ms
64 bytes from hadhvp05.gsi.de (192.168.100.69): icmp_seq=2 ttl=64 time=0.873 ms
^C
- Login and check hanging CMD sections
telnet hadhvp05 1527
user admin
password admin
About menu-->Sessions (left most menu)
If you see there a CMD TCP/IP connection you have to power cycle the crate.
-- PeterZumbruch - 30 May 2012