Fast SIMDized Kalman filter based track fit
Introduction
Finding particle trajectories is usually the most time consuming part of modern experiments in high energy physics [1]. In many present experiments with high track densities and complicated event topologies a Kalman filter [1], [2] based track fit is used already at this combinatorial part of the event reconstruction. Therefore speed of the track fitting algorithm becomes very important for the total processing time.
CBM [3] is a dedicated heavy-ion experiment with fixed target to investigate the properties of highly compressed baryonic matter as it is produced in nucleus-nucleus collisions at the Facility for Antiproton and Ion Research (FAIR) in Darmstadt, Germany. Large track densities (on average 500 tracks in the main tracker for a typical central Au + Au collision) together with the presence of a non-homogeneous magnetic field make the reconstruction of events challenging. The track reconstruction procedure in the CBM experiment is based on the cellular automaton track finder and the Kalman filter track fitter [4], [5]. To achieve a high track finding efficiency the Kalman filter fitting algorithm is intensively used within the track finder.
Motivated by the idea of using the SIMD unit of modern processors (e.g., [6]), we have investigated here a chain of modifications of the Kalman filter based track fitting algorithm in order to increase the speed of the track finding stage of the event reconstruction. In the CBM experiment the track fitting algorithm based on the conventional Kalman filter is implemented using scalar instructions. Thus we have started with the double precision scalar version of the conventional Kalman filter based track fitting algorithm.
The algorithm uses the 70 MB large map of the magnetic field and, therefore, permanently accesses the main memory which is slow relative to the cache. But similar to other high energy physics experiments, the non-homogeneous magnetic field of the CBM experiment is smooth enough to be locally approximated by a polynomial of fourth order. In the case of the polynomial approximation of the magnetic field the algorithm operates within the cache which results in a significant increase of the speed without degradation of the tracking precision.
In order to further optimize memory usage, precision of all data in the algorithm has been changed from double to single. As a result, twice more data can be stored in the cache and, as well, twice more data can be later packed into a SIMD register, effectively doubling the throughput. The conventional Kalman filter algorithm exhibits an unstable behavior when using only single precision numbers (see also [7], [8]). Therefore the Kalman filter algorithm has been specially investigated and modified in order to avoid such instability due to roundoff errors. In addition, the algorithm has been mathematically and numerically optimized, especially in the parts of initial track parameters estimation and also propagation in the magnetic field [5].
In a next step, the algorithm has been adapted for use of a SIMD instruction set. The adaptation has been done by inline operator overloading. This keeps the source code of the algorithm unchanged. Therefore, both versions, scalar and SIMDized, are equivalent and can be selected by a compile time option. Furthermore, this approach gives a unified way of dealing with different CPU families which implement different SIMD instruction sets.
Finally, the SIMDized version of the algorithm has been ported to the Cell processor [9], [10]. Initially designed for a game console, the Cell processor promises extremely high computing capabilities. The Cell processor consists of a general-purpose PowerPC processor core (PPE) connected to eight special-purpose streamlined coprocessing synergistic processing elements (SPE), which greatly accelerate multimedia and vector processing applications, as well as many other forms of dedicated computation. Cell combines the considerable floating point resources required for demanding numerical algorithms with a power efficient software-controlled memory hierarchy. The current implementation of Cell is most often noted for its extremely high performance single precision arithmetic. Even though single precision is widely considered insufficient for many scientific applications, it is fully adequate for the reformulated Kalman algorithm. The Cell processor is particularly compelling because it is expected to be produced in high volumes and to be cost competitive with commodity PC CPUs.
Using the IBM Cell Broadband Engine SDK [9], [10], the algorithm has been first ported to the PPE and modified for use of the AltiVec vector instructions [11], and then ported to the SPE with the corresponding SPE specific vector instructions. After extensive tests on a Cell simulator, the algorithm has been run on a Cell Blade computer.
In the end, the performance of the SIMDized version of the Kalman filter based track fitting routine has been evaluated on three different computer architectures: Intel Xeon, AMD Opteron and Cell Broadband Engine.
Section snippets
SIMD architecture
There are three important classes of computer architectures based upon the number of concurrent instruction and data streams:
- •
Single instruction, single data stream (SISD)—a single instruction stream on scalar data.
- •
Single instruction, multiple data streams (SIMD)—multiple data streams against a single instruction stream to perform operations which may be naturally parallelized.
- •
Multiple instruction, multiple data (MIMD)—many functional units perform different operations on different data.
The
Cell Broadband Engine
Cell1 [9], [10] is a microprocessor architecture jointly developed by a Sony, Toshiba, and IBM alliance known as STI. Cell combines a general-purpose Power-architecture core of modest performance with multiple streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as many other forms of dedicated computation. The resulting
Kalman filter method
The Kalman filter method [1], [2] is intended for finding the optimum estimation r of an unknown vector according to the measurements , , of the vector .
The Kalman filter starts with a certain initial approximation and refines the vector r, consecutively adding one measurement after the other. The optimum value is attained after the addition of the last measurement.
The vector can change from one measurement to the next: where —a linear operator, —a process
Speedup of the algorithm
The Kalman filter method is used both in the track finding and track fitting routines of the CBM experiment. The track finder is based on the cellular automaton method [4]. The algorithm creates short track segments (triplets) locally in neighboring detector planes and links them into track candidates, which are then selected using the -criterion. The Kalman filter based routines are used at all stages of the track finder in order to reliably estimate parameters of the track segments and
Results and discussion
The Kalman filter based track fitting algorithm has been tested on simulated data of the CBM experiment [3], [4].
In the CBM experiment with forward geometry the natural choice of the state vector3 is: where x and y are track coordinates at the reference z-plane, is the track slope in the xz plane, is the track slope in the yz plane, is the inverse particle
Conclusion
The Kalman filter based track fitting algorithm, the core algorithm of the event reconstruction software in high energy physics experiments, has been significantly optimized and adapted to a vector form implementing different SIMD instruction sets: SSE2 of the Intel and AMD CPUs, AltiVec of the PPE and the specialized SIMD instruction set of the SPE of the Cell processor. Overloading basic scalar operators by corresponding vector instructions keeps the source of the algorithm unchanged, thus
Acknowledgements
We wish to thank M. Engler, J. Franz and J. Jordan from the IBM Laboratory, Böblingen, for giving us the possibility to test the algorithm on the Cell Blade system. We would like to thank also Drs. T. Steinbeck and R. Weis from the Kirchhoff Institute for Physics, University of Heidelberg, for their technical assistance.
We acknowledge the support of the European Community-Research Infrastructure Activity under the FP6 “Structuring the European Research Area” programme (HadronPhysics, contract
References (14)
Event reconstruction in the CBM experiment
Nucl. Instr. Methods A
(2006)- et al.
Analytic formula for track extrapolation in non-homogeneous magnetic field
Nucl. Instr. Methods A
(2006) Data Analysis Techniques for High-Energy Physics
(2000)A new approach to linear filtering and prediction problems
Trans. ASME, Series D, J. Basic Eng.
(1960)Compressed Baryonic Matter Experiment, Technical Status Report, GSI, Darmstadt, 2005; 2006 Update
- IA-32 Intel Architecture Optimization Reference Manual, Intel, June...
- et al.
Kalman Filtering: Theory and Practice using MATLAB
(2001)
Cited by (34)
Real-time data processing in the ALICE High Level Trigger at the LHC
2019, Computer Physics CommunicationsTriplet Finder: On the way to triggerless online reconstruction with GPUs for the PANDA experiment
2015, Journal of Computational ScienceTriplet Finder: On the way to triggerless online reconstruction with GPUs for the PANDA experiment
2014, Procedia Computer ScienceA high-resolution pixel silicon Vertex Detector for open charm measurements with the NA61/SHINE spectrometer at the CERN SPS
2023, European Physical Journal CSustainability in astroparticle physics
2022, Proceedings of Science