Table of contents:
Tests with 90 subsystems:
Test-1 with 90 subsystems (no writing to hard disk, no rfio)
The test system:
TOF-0, TOF-1, TOF-2, TOF-4 VME CPUs --> netgear 8 port switch --> foundry
switch --> Event Builder
Conditions:
- I used TOF-0, TOF-1, TOF-2 and TOF-4 VME CPUs. 23/22 daq_rdosoft processes per CPU.
- Water mark was set to 32 kB.
- subevent sizes were Gaussian-like smeared
- No writing to hard disk, no rfio to Data Mover.
- net.core.netdev_max_backlog = 10000
- net.core.rmem_max = 4194304 (nettrans.c:openUdp(): requests 1 MB queue)
- EB buffer = 8 MB (x2x2). Cannot go to 16 MB because 16x2x2x90 = 5.6 GB (more than available memory).
The detailed results on Event Builder performance are here:
EventBuilderPermormance01
The following plots represent a summary of the performance:
- The Event Builder efficiency (z-axis) is plotted as a function of the event rate and the data rate. The efficiency is defined as 100% minus an amount of discarded events (in %). Efficiency drops rapidly down above 100 MB/s data rate. In fact, this is most likely related to a huge load on VME cpus which run 23/22(x2) soft readout processes each and to a limited bandwidth of the netgear switch. Event Builder seems to be able to go further above this limit.
EB effic vs event rate and data rate:
- The Event Builder efficiency (z-axis) is plotted as a function of the subevent size and the data rate. This is different representation of the same plot.
EB effic vs subevt size and data rate:
- CPU usage of Event Builder as a function of the data rate. At the data rate of about 100 MB/s the Event Builder tend to accumulate all CPU time. There is an interesting trend: daq_evtbuild CPU usage depends on both the event rate and the data rate. While daq_netmem CPU usage depends mostly on the data rate.
daq_evtbuild CPU usage vs data rate:
daq_netmem CPU usage vs data rate:
- Fill levels of the buffers of Event Builder as a function of the data rate. The fill levels of buffers stay as low as 10%/20% up to a data rate of 100 MB/s. Above 100 MB/s the test system is going out of control which leads to a synch problem between the data sources. Consequently the buffers get filled.
daq_evtbuild: fill level of buffers vs data rate:
daq_netmem: fill level of buffers vs data rate:
Test-2 with 90 subsystems (no writing to hard disk, no rfio)
Difference from previous test-1: different sources have different sizes of subevents. Subevent sizes were Gaussian-like smeared. In this test I was only increasing an event rate. The subevent sizes were constant (given in table below). The Event Builder buffers were accordingly tuned.
Configuration of data sources:
Number of sources |
subevt size |
Buffer size |
60 |
116 Bytes |
2x4 MB |
20 |
338 Bytes |
2x8 MB |
10 |
1.17 kBytes |
2x32 MB |
Event Builder performance:
ev rate |
data rate |
disc evt |
cpu evtbuild |
cpu netmem |
load |
buff evtbuild |
buff netmem |
Hz |
MB/sec |
% |
% |
% |
|
% |
% |
1000 |
23 |
0 |
12 |
10 |
0.5 |
10 |
5 |
2000 |
46 |
0 |
20 |
19 |
0.6 |
10 |
10 |
3000 |
69 |
0 |
30 |
28 |
0.7 |
20 |
10/20 |
4000 |
95 |
0 |
44 |
32 |
0.9 |
20/30 |
20 |
Test-3 with 90 subsystems (no writing to hard disk, no rfio)
Difference from test-2:
Now I additionaly generate a huge subevent (size = 100 kBytes) once per about 10000 events for every data source
asynchronously.
Event Builder performance:
ev rate |
data rate |
disc evt |
cpu evtbuild |
cpu netmem |
load |
buff evtbuild |
buff netmem |
Hz |
MB/sec |
% |
% |
% |
|
% |
% |
1000 |
23 |
0 |
12 |
10 |
0.5 |
20 |
10 |
2000 |
46 |
0 |
20 |
18 |
0.5 |
20 |
15 |
3000 |
69 |
0 |
33 |
29 |
0.6 |
20 |
15 |
4000 |
95 |
0 |
42 |
35 |
0.9 |
20/30 |
20 |
Conclusions: Event Builder shows stable operation. The fill level of buffers is far below critical. The fill level for the test-3 is given in MB in a table below for different event rates.
Event rate |
source numbers |
buff size |
evtbuild buff fill level |
netmem buff fill level |
Hz |
|
MB |
MB |
MB |
1000 |
0-59 |
8 |
1.6 |
0.8 |
60-79 |
16 |
3.2 |
1.6 |
80-89 |
64 |
12.8 |
6.4 |
2000 |
0-59 |
8 |
1.6 |
1.2 |
60-79 |
16 |
3.2 |
2.4 |
80-89 |
64 |
12.8 |
9.6 |
3000 |
0-59 |
8 |
1.6 |
1.2 |
60-79 |
16 |
3.2 |
2.4 |
80-89 |
64 |
12.8 |
9.6 |
4000 |
0-59 |
8 |
1.6 |
1.6 |
60-79 |
16 |
4.8 |
3.2 |
80-89 |
64 |
19.2 |
12.8 |
Test-4 with 90 subsystems
Difference from test-3: now I additionally write the data to a local hard disk and to a Data Mover via RFIO.
Event Builder performance:
ev rate |
data rate |
disc evt |
cpu evtbuild |
cpu netmem |
load |
buff evtbuild |
buff netmem |
Hz |
MB/sec |
% |
% |
% |
|
% |
% |
1000 |
23 |
0 |
20 |
15 |
0.8 |
20 |
10 |
An open problem: if I generate a huge subevent (size = 100 kBytes) once per about 1000 events for every data source asynchronously, then daq_evtbuild gets blocked and
daq_netmem's buffers get fully filled. Under investigation.
Tests with 60+60 subsystems, two Event Builders and one Data Mover
The final aim of these tests is to understand the performance of the EB system when writing to the different mass storage systems.
Of a special interest is the performance of the Data Mover (DM) when two Event Builders are writing to it.
We will run Event Builders with the following settings:
- writing the data to /dev/null. (This is an ideal case)
- lxhadesdaq: writes to Lustre.
- lxhadesdaq: writes to Lustre + to local disks.
- lxhadesdaq: writes to Lustre + to DM via RFIO, hadeb06: writes to the same DM via RFIO
The setup:
- EB-1 : lxhadesdaq (receives 25 MB/s in total from 60 sources)
- EB-2 : hadeb06 (receives 25 MB/s in total from 60 sources)
- Data Mover : a new type of a server with two hardware raid arrays (8 and 7 disks). Its performance is supposed to be much better than a performance of the old Data Movers (which had only one raid array).
Note: the sizes of the EB buffers are only 4(x2x2) MB!
The performance of the system without Data Mover:
EB |
Local disk |
Lustre |
DM |
ev rate |
data rate |
disc evt |
cpu evtbuild |
cpu netmem |
load |
buff evtbuild |
buff netmem |
|
|
|
|
kHz |
MB/sec |
% |
% |
% |
|
% |
% |
hadeb06 |
- |
- |
- |
3 |
35 |
0 |
- |
- |
0.3 |
10 |
10 |
lxhadesdaq |
- |
- |
- |
3 |
35 |
0 |
- |
- |
0.4 |
10 |
10 |
lxhadesdaq |
- |
yes |
- |
2 |
25 |
0 |
- |
- |
0.4 |
10 |
10 |
lxhadesdaq |
yes |
yes |
- |
2 |
25 |
0 |
25 |
10 |
0.9 |
10 |
10 |
Some results:
- EB alone is very stable. Runs with 60 sources (data rate of 25 MB/s) during one week without problems.
- Writing to the Lustre seems to be also OK. During about 24 hours EB wrote about 2 TB of data to the Lustre without any problem.
Now the tests with Data Mover and 2 EBs:
EB |
Local disk |
Lustre |
DM |
ev rate |
data rate |
disc evt |
cpu evtbuild |
cpu netmem |
load |
buff evtbuild |
buff netmem |
|
|
|
|
kHz |
MB/sec |
% |
% |
% |
|
% |
% |
hadeb06 |
- |
- |
yes |
1 |
15 |
0 |
8 |
4 |
0.2 |
10 |
10 |
lxhadesdaq |
- |
yes |
yes |
1 |
15 |
0 |
13 |
6 |
0.3 |
10 |
10 |
|
hadeb06 |
- |
- |
yes |
2 |
25 |
0 |
14 |
11 |
0.5 |
20 |
15 |
lxhadesdaq |
- |
yes |
yes |
2 |
25 |
0 |
24 |
10 |
0.5 |
20 |
15 |
Observations:
setup |
result |
one EB -> DM -> Tape Drive |
OK |
two EBs -> DM |
OK: 2.6 TB written to pool during 20 hours |
two EBs -> DM -> two Tape Drives |
DM hangs after 20 min |
two EBs -> DM -> one Tape Drive |
DM hangs after 20 min |
Several attempts to run with 25 MB/s data rate ended up with hanging Data Mover (when writing to a tape is enabled). (We had also one hang with 15 MB/s data rate). However there seems to be no problem when only one EB writes to tape via RFIO.
Under investigation.
New tests (10.07.2009) showed good performance of writing to tape. Setup:
- 30+30 data streams to two EBs (30+30 MB/s) on lxhadeb01
- Each EB writes 30 MB/s data to tape via RFIO.
- Also parallel writing to Lustre from Data Mover was successful.
- Should still be tested with 90 data sources.
The aim of the test: the performance of the EB executables recompiled on a 64-bit platform.
Setup:
- Event Builder: hadeb06
- Data sources run on TOF CPUs
Results:
- Works fine with a number of data sources upto 95 (data rate of 45 MB/s)
- Opens buffers with a total size upto 12 GB (out of 12 GB available)
- RFIO connection to the Data Mover works
Problems:
- If there are 96 data sources, EB does not switch a buffer for a data stream from 96th source. This leads to filled buffers and discarded packets. Whereas on 32-bit platform EB works fine with 100 sources. (under investigation)
Tests of UDP packet loss
Tests with four 1Gbps NICs (bonding)
- General test with iperf: server on lxhadeb01 and clients on VME CPUs
- In HADES VLAN with bonding could go only up to 1 Gbps with four clients without UDP loss (expected 4 Gbps).
- Test with Event Builders on lxhadeb01 and soft readouts on VME CPUs
- UDP buffer overflow
- Measurement: netstat -su ("packet receive errors" in UDP section)
- Result: "packet receive errors" counter does not increase. However Netstat displays the statistics from the transport layer only. The netstat statistics do not account for datagram drops happening at layers.
- Overflow of Event Builder buffers (never observed)
- Buffers of EBs are monitored, fill levels < 10%.
- Bonding leads to UDP loss???
Tests with 10Gbps NIC
- lxhadeb02 and lxhadeb03 are interconnected via 10Gbps switch
- Both servers have 10Gbps PCI Express cards
- Test setup:
- lxhadeb02 client: iperf -c 140.181.91.154 -u -l 60k -w 512k -b 5g -p 65005 -i 1
- lxhadeb03 server: iperf -s -u -p 65005 -l 60k -w 512k
# -w 512k = UDP buffer size (iperf opens a double of 512k = 1 MByte)
# -l 60k = UDP datagram size
# -b 5g = send with a speed of 5 Gbits/sec
# -i 1 = report each second
# -u = UDP protocol
#Expected Rate TX Rate (Client) RX Rate (Server) UDP loss
#(Gbits/sec) (Gbits/sec) (Gbits/sec) (%)
4.03 4.03 4.03 0
5.02 5.02 5.02 0
6.07 6.07 6.07 0
7.02 7.02 7.02 0
7.23 7.23 7.23 0
7.44 7.44 6.91 7.2
7.64 7.64 6.49 15
7.80 7.75 6.06 22
8.40 8.03 5.83 27
9.00 8.14 5.78 29
- More tests of all four servers (lxhadeb01/02/03/04) with iperf. Setup: three servers send UDP packets to a fourth server (three servers run three UDP clients and the fourth server run three UDP servers).
Overall old 10Gbit NICs (lxhadeb02/03) seem to be better than new 10Gbit NICs (lxhadeb01/04). In most of the tests (up to 6Gbit/sec) the UDP loss for lxhadeb02/03 is 0%.
CPU usage vs water mark
- CPU usage vs wmark is shown on the pic below. CPU usage by netmem goes up as a water mark decreases. Netmem should handle larger amount of UDP packets. There is also a sudden jump of CPU usage by evtbuild at a water mark of 1 kB. Do not have yet an explanation.
Test of complete system with 4 EB Servers, 16 data streams and RFIO to tape
- The EB setup: 16 EB processes receiving 16 data streams from the DAQ (11 MB/s per stream) and writing via RFIO to 5 Data Movers
- Results: Very stable operation, Data Movers did not show any hanging behavior, all streams were properly distributed over Data Movers even after 20 restarts following one after another without any delay, no data loss after several ours of intensive testing.
-
SergeyYurevich - 12 Mar 2010