PaperThe following article is Open access

Accelerating graph-based tracking tasks with symbolic regression

, , and

Published 15 November 2024 © 2024 The Author(s). Published by IOP Publishing Ltd
, , Citation Nathalie Soybelman et al 2024 Mach. Learn.: Sci. Technol. 5 045042DOI 10.1088/2632-2153/ad8f12

2632-2153/5/4/045042

Abstract

The reconstruction of particle tracks from hits in tracking detectors is a computationally intensive task due to the large combinatorics of detector signals. Recent efforts have proven that ML techniques can be successfully applied to the tracking problem, extending and improving the conventional methods based on feature engineering. However, complex models can be challenging to implement on heterogeneous trigger systems, integrating architectures such as field programmable gate arrays (FPGAs). Deploying the network on an FPGA is feasible but challenging and limited by its resources. An efficient alternative can employ symbolic regression (SR). We propose a novel approach that uses SR to replace a graph-based neural network. Substituting each network block with a symbolic function preserves the graph structure of the data and enables message passing. The technique is perfectly suitable for heterogeneous hardware, as it can be implemented more easily on FPGAs and grants faster execution times on CPU with respect to conventional methods. While the tracking problem is the target for this work, it also provides a proof-of-principle for the method that can be applied to many use cases.

Export citation and abstractBibTeXRIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

In High Energy Physics experiments, such as ATLAS [1] and CMS [2] at the LHC [3], data collection relies on sophisticated trigger systems [47] that select interesting events and reject others to significantly reduce the data rate from $\mathcal{O}$(40 MHz) to $\mathcal{O}$(1 kHz) for final storage. With the upcoming High Luminosity LHC upgrades, the instantaneous luminosity will increase from 1 to $7.5 \times 10^{34}\, \text{cm}^{-2} \text{s}^{-1}$, and the number of simultaneous collisions per bunch crossing (pile-up) will rise from 20–50 to 140–200, reaching unprecedented heights. These changes necessitate substantial improvements in data acquisition pipelines to handle the increased data rates efficiently. In this context, the inner tracking detector data, which captures hits from charged particles, plays a vital role in complementing calorimeter and muon detector data for triggering, enabling more precise particle reconstruction and identification. However, track reconstruction from these hits is challenging due to the combinatorics of detector signals, making it one of the most time-consuming trigger tasks [8].

Numerous machine learning (ML) algorithms have been explored to improve upon classical track reconstruction approaches, particularly in the context of the TrackML challenge [9, 10] and beyond. The most promising methods are based on graph-neural networks [1117], which provide a natural way of representing and processing tracking data.

However, full deployment of these methods—characterized by large memory requirements and non-negligible execution times—into the ATLAS and CMS trigger strategies remains an open challenge. Currently, both trigger systems perform tracking only in the software-based high-level trigger (HLT), which still primarily relies on CPU farms. CMS has recently introduced GPUs to offload part of the tracking task [18], though their potential is not yet fully exploited. At the same time, it is essential to explore alternative approaches based on heterogeneous hardware solutions, such as adopting field programmable gate arrays (FPGAs).

Speeding up the inference of ML tools through quantization and FPGA implementation [1922] could extend tracking capabilities to the hardware-based early stages of trigger selection [23]. This approach can incorporate GNNs or transformer architectures, as demonstrated in [2427]. However, it is only feasible for relatively small and simple networks (on the order of 10k parameters), as large-scale GNNs face memory limitations.

Symbolic regression can replace or approximate dense neural networks with analytical functions. Previous studies [2830] have employed this technique to derive more interpretable and physics-motivated expressions. Moreover, it can accelerate inference and ease FPGA implementation without significant performance loss, as explored in [31].

In this work, we propose a novel extension of this method, applying symbolic regression (SR) to approximate a graph neural network for track finding. This task is challenging because the network input is not a simple list of variables but a set of objects with varying properties and no fixed cardinality. Thus, finding a single analytical function to map hits directly to tracks is unfeasible. Instead, we decompose the network into its dense layers and fit them individually with symbolic expressions. This approach preserves the graph structure and message-passing (MP) mechanism, offering flexibility: the entire GNN or a subset of its components can be replaced with symbolic expressions, while the rest remains as neural networks.

To test this approach, we developed an algorithm to cluster hits originating from high transverse momentum tracks, separating them from hits produced by soft background tracks and detector noise. The clustered signal hits can be processed further to extract track parameters via a fit procedure.

For a proof-of-concept, we designed a simplified toy data simulator, enabling a direct comparison between the performance of the clustering GNN and the SR approximations.

2. Dataset

The toy data generator used in this study simulates a cylindrical detector consisting of 8 concentric layers. Its geometry is based on the barrel of the ATLAS Inner Detector, with layer radii matching those of the Pixel and semiconductor tracker subdetectors, each composed of 4 layers. The emulated detector operates within a constant 2 T solenoidal magnetic field aligned with the beam axis, approximating the setup of the ATLAS experiment.

The generator tracks the trajectories of simulated charged particles in the detector volume and calculates their intersection with the detector layers to generate hits. To maintain simplicity, no particle interactions with the detector material are considered.

In this study, we adopt the same coordinate system used by ATLAS and CMS3.

The φ and z coordinates of the hits are smeared based on the pitch of the emulated detector sensors to account for experimental resolution effects, while no smearing is applied on the r coordinate. Optionally, an average hit inefficiency can be randomly applied to each detector layer; in this study, all layers are assumed to be fully efficient.

The signal targeted in this study is a sample of single tracks originating from a simulated hard scattering vertex with $p_\textrm{T} \gt 20$ GeV. Tracks are generated within the $|\eta| \lt 1.4$ region to ensure full containment within the detector volume, thus avoiding acceptance effects on the hits. To emulate the typical beamspot shape in LHC running conditions, each track originates from a primary vertex (PV) with its position along the beam axis sampled from a Gaussian distribution centred at the origin, with a σ = 5 cm. The longitudinal (z0) impact parameter, evaluated with respect to the PV, is randomly sampled from a Gaussian distribution centred at 0 with σ = 100 µm, while the transverse (d0) impact parameter is set to 0.

Once an event containing a single signal track is simulated, two sources of background hits are overlaid. Pile-up collisions are simulated generating events with a variable number of tracks, following the longitudinal beamspot shape described above. To approximately replicate LHC Run 3 pile-up conditions, the $p_\textrm{T}$ distribution for pile-up tracks is tuned to match the measurements reported by ATLAS [32]. An average of 25 simultaneous collisions ($\langle\mu\rangle = 25$) is overlaid on the signal sample. Additionally, random uncorrelated hits are added to each event to simulate detector noise, replacing what is typically observed in ATLAS-simulated samples.

Finally, to emulate the region of interest data selection mechanism used in HLT tracking [8], hits are preselected within a wedge with an opening of $|\Delta\eta|\times|\Delta\phi| = 0.1\times 0.1$ around the signal track, starting from ±5 mm around the PV. Only events that fully contain the signal track are considered to avoid potential acceptance effects in this preselection.

An example event display of a generated signal track, along with its corresponding signal and background hits, is shown in figure 1.

Figure 1. Refer to the following caption and surrounding text.

Figure 1. Display of a simulated event in the transverse (left) and longitudinal (right) views relative to the beam axis. Signal hits are shown in red, while pile-up and noise hits are displayed in grey. The dotted line represents the track, fitted as a circle in the x − y plane and as a straight line in the z − r plane.

Standard image High-resolution image

The dataset used in this study consists of approximately 400k events for training and 50k events each for validation and testing. It is available at [33].

3. Network architecture

The task addressed by our GNN-based algorithm is to separate background hits from pile-up and noise from those produced by signal tracks and to distinguish between signal hits originating from different tracks. We apply the condensation approach proposed for particle flow reconstruction [34, 35], with a few adjustments.

For demonstration purposes, the proposed network is intentionally designed to be small and compact. We adopt a graph-based method, where each graph node represents a hit, parametrized by its position in Cartesian coordinates, expressed in millimetres. The features are scaled to have a mean of 0 and a standard deviation of 1 across the entire dataset. For simplicity, the graphs are fully connected, though this can be generalized to larger datasets if the number of edges becomes incompatible with available memory.

The nodes are then processed by the GNN, which consists of three blocks:

  • Embedding. For each node, the input features are transformed into a hidden representation of dimension 5 using a multilayer perceptron (MLP). This dimension was chosen to be as small as possible while maintaining performance, given the subsequent SR step. In real-world applications, a higher dimension may be necessary.
  • MP. The hidden representations are passed through the edges as messages to all connecting nodes. The received messages are summed for each node, concatenated with its hidden representation, and processed through an MLP to update the representation.
  • Mapping to condensation space. The updated hidden representations are mapped into a condensation space with a dimensionality of 5.

The network has a total of 446 k parameters, distributed approximately equally across the blocks. An illustration of the architecture is shown in figure 2.

Figure 2. Refer to the following caption and surrounding text.

Figure 2. Sketch of the proposed network consisting of three blocks: embedding, message passing and condensation.

Standard image High-resolution image

To cluster signal and background hits, the loss function consists of attractive and repulsive potentials, determined by a separately predicted charge for each node. The charges are calculated as follows:

Equation (1)

where i is the node index, β is the output of a separate MLP using the updated hidden representation, and qmin is a hyperparameter optimized to 0.4 via grid search. The loss for charge prediction is a binary cross entropy loss applied to β, classifying between signal and background hits. Ideally, background hits will have a charge of 0, while signal hits will have a high charge.

Using this charge, we define the potentials for a given track k by identifying the node with the maximal charge among all nodes associated with the track. This node is indexed as α.

The attractive and repulsive potentials $\check{V}_k(x)$ and $\hat{V}_k(x)$ for track k, with x representing the location in condensation space, are given by:

Equation (2)

where $||\cdot||$ represents the L2 norm. The attractive potential grows quadratically with the distance from the node with the highest charge, while the repulsive potential decreases linearly until it reaches 0.

The goal is to pull all nodes belonging to track k towards the same point using the attractive potential while pushing all other nodes away using the repulsive potential. To achieve this, we define a matrix Mik, where elements are set to 1 if node i belongs to track k, and 0 otherwise, to define the potential loss LV:

Equation (3)

where N is the total number of nodes in the event, and K is the total number of tracks. The first term in the loss, originally proposed in [34], represents the combined force from all tracks acting on each node in the event, summed and normalized across all nodes.

The second term applies only to the Nbkg nodes that do not belong to any signal track and originate from pile-up or noise. This term equals zero if the noise is located at the origin of the condensation space. This setup simplifies postprocessing, as background removal can be easily achieved with a simple cut around the origin. The absence of signal hits in this area is enforced by the repulsive potential in the first loss term.

4. Symbolic regression

The main goal of this work is to demonstrate that the above neural network can be approximated by a functional expression. For this purpose, we use the SR algorithm provided by the PySR package [36]. This algorithm is based on genetic programming, where trees of functions are randomly constructed using the provided operators and constraints. At each iteration, mutations are applied to the function trees, and the new trees are evaluated based on the specified loss function. If the performance of the new trees is better than that of the original ones, they are retained in the equation pool; otherwise, they are discarded. The output provides the best equations for all equation sizes, allowing the user to balance between accuracy and compactness.

An initial implementation of SR expressions on FPGAs was proposed in [31] in the context of jet classification. There, a single MLP was used to process a set of jet variables. However, this approach is unsuitable for tracking tasks, as our input consists of set-valued data that cannot be processed by a simple MLP.

To address this challenge with a novel approach, we exploit the modularity of our model, which essentially consists of 3 MLPs with a minor processing step in between. We replace each MLP with a symbolic expression separately.

For the embedding network, the function operates uniformly on each node, taking the hit features as input and producing its hidden representation as output.

Given our current choice of a fully connected graph, the MP step simply sums the hidden representations of all hits. In future implementations, this may change, as an arbitrary graph would require differently masked sums for each hit, depending on its connectivity. Once handled, the MLP in the MP block again operates on each node individually, using the summed message as input.

For mapping into the condensation space, the function takes the updated hidden representation of each node and outputs its coordinates, similar to the first block.

We use default PySR parameters for the SR training. The selected operators include +, −, ·, :, square, cube, and ReLU(), with the maximal equation length set to 75 (150 for the final block). Prioritizing performance over compactness, we select the longest function from the output. Mean squared error is used for function optimization.

Since PySR exhibits a significant decrease in speed for large datasets, we reduce the training dataset to 5000 hits to optimize computational resources. In the adopted dataset, only 17% of the hits are from signal tracks. To ensure PySR can correctly learn the signal features, we increase the ratio of signal to background hits to 50% by downscaling the background hits randomly.

We iteratively replace the network blocks with the learned symbolic expressions, starting with the embedding block. After each replacement, the remaining network is retrained to minimize performance loss.

5. Postprocessing

The proposed GNN model is not designed to address the track reconstruction task fully; instead, it focuses on providing background hit filtering and track seeding information. To complete the track reconstruction and fitting, we implemented additional postprocessing steps. These steps are kept intentionally simple, as they are intended only to evaluate the quality of the seeding and the positive impact of background rejection on reducing the combinatorial complexity of the tracking problem.

5.1. Hit selection

In the original object condensation proposal, objects are identified based on the charge of the condensation node, with everything within a radius around this node being matched to the object. We modified this approach so that the evaluation does not rely on charge prediction, as this would require another symbolic replacement, introducing an additional source of error.

Due to the additional loss term, pile-up and noise hits are ideally clustered at the origin of the condensation space. Thus, we can eliminate background hits by applying a radius cut $r_\textrm{cut}$ around the origin.

Next, we apply the Men Shift clustering algorithm to identify signal hits. This clustering step will be critical for distinguishing hits from different tracks in multi-track datasets. For now, since each event contains only one signal track, the clustering primarily serves to remove background hits further. The clustering algorithm uses a bandwidth parameter that defines the size of the cluster. Along with the radius cut, these two parameters can be optimized.

If the radius cut is too large, potential signal hits may be missed. If it is too small, high background contamination may remain, complicating further processing. A similar trade-off applies to the bandwidth parameter. To optimize background rejection ($r_\textrm{b}$) and signal efficiency (εs), we perform a grid search over both parameters for each performance evaluation. Signal hit efficiency refers to the percentage of signal hits selected, while background rejection is the percentage of background hits removed.

5.2. Track reconstruction and fit

After the hit selection, we apply a simplified track reconstruction algorithm inspired by fast-tracking tasks developed by LHC experiments. To achieve this, we build all possible triplets from the selected hits. For each triplet, we construct a circle in the x − y plane using the circle-fit library to obtain the track-candidate $p_\textrm{T}$. We then perform a linear fit in the r − z plane to extract z0 using the non-linear least squares curve-fit method from SciPy [37].

With these results, we fill a 2D histogram in the $p_\textrm{T}-z_0$ plane and select triplets within a window around the histogram peak. The window size depends on the peak's $p_\textrm{T}$ value and is chosen large enough to encompass all hits if the triplets were built using true signal hits. Only hits from triplets within the selected window are retained, further improving background rejection. At this stage, we reevaluate the signal efficiency and background rejection.

If, after the above steps, we do not have more than one hit per layer, we obtain the final track parameters by performing separate fits in r − z and x − y as described above. If at least one layer contains more than one hit, we apply simplified ambiguity solving. In this case, we perform the linear r − z fit for each combination and select the one with the best χ2. This process is repeated iteratively for each layer with more than one hit before performing the final circle fit.

We define hit purity as the percentage of signal hits selected for the final track fit.

After the fit, we examine the difference between the fitted values and true particle features. Due to finite detector resolution, even a fit on perfectly selected hits will not yield exact particle features. As a reference, we perform a truth fit, in which we fit the track using the truth information of the signal hits.

Furthermore, we aim to quantify the resolution of $p_\textrm{T}$. The expected resolution depends on $p_\textrm{T}$, as a high-energy track with little curvature is more sensitive to perturbations in hit positions than a low-energy track. To obtain a rough estimate of the ideal $p_\textrm{T}$ resolution, we divide the data into bins of true track $p_\textrm{T}$ and fit a Gaussian to the truth fit residuals for each bin, extracting σ. This allows us to measure the percentage of fits where the $p_\textrm{T}$ residual falls within the 1σ or 2σ range.

6. Results

We present the results for the full GNN to demonstrate its potential as a tracking tool. This serves as a baseline for comparison with all the symbolic expression replacement options. We denote the network with the first stage replaced with SR 1, the one with both the first and second stages replaced with SR 2 and the fully approximated network with SR 3. For SR 1 and SR 2, the evaluations are conducted on the retrained network.

First, we examine the pure condensation output. Figure 3 shows the condensation radius for signal and background hits across the different models. The degradation in performance is evident with the widening of the peaks, indicating a drop in precision among the models.

Figure 3. Refer to the following caption and surrounding text.

Figure 3. Condensation radius for all signal and background hits for the different stages.

Standard image High-resolution image

As mentioned earlier, the cut-off radius rcut and clustering bandwidth are optimized via grid search for each model and summarized in table 1. This demonstrates that achieving maximum signal efficiency in clustering requires loosening the rcut. If the signal and background distributions are not sufficiently separated, a tight radius cut may leave background contamination too high, potentially leading to the identification of a large cluster without signal hits. To balance high background rejection and reasonable clustering, an increase in rcut becomes necessary as more stages are replaced.

Table 1. Clustering parameters and performance metrics. The signal hit efficiencies (εs) and background rejections ($r_\textrm{b}$) are measured after the clustering and triplet filtering steps. Hit purity indicates the average percentage of signal hits used in the final best fit after resolving ambiguities among multiple hit candidates per layer. The track fit performance is assessed by comparing the fitted $p_\textrm{T}$ to the true track $p_\textrm{T}$, measuring how often their difference falls within 1 or 2 times the estimated ideal resolution (σ). 'Truth' refers to the fit results obtained using only signal hits.

 TruthGNNSR 1SR 2SR 3
rcut0.050.20.30.4
Bandwidth0.70.70.60.7
εs cluster96.3%94.8%92.7%85.5%
$r_\textrm{b}$ cluster93.9%93.5%93.0%71.1%
εs triplets95.0%93.5%91.3%82.4%
$r_\textrm{b}$ triplets98.5%98.4%98.2%90.1%
Hit purity96.8%95.9%94.9%83.3%
$p_\textrm{T}$ fit in 1σ39.8%37.8%37.5%36.7%29.7%
$p_\textrm{T}$ fit in 2σ97.6%95.0%94.4%93.6%84.5%

Signal efficiencies and background rejections after clustering and triplet filtering, along with the hit purity after all processing steps, are detailed in table 1. The triplet filtering step improves background rejection by about 5% while only reducing signal efficiency by roughly 1%. Comparing the models, we observe minimal performance degradation for SR 1 and SR 2 but a noticeable drop for SR 3. Hit purity declines by only 2% for SR 2 compared to the original GNN but decreases by over 13% for the full SR model. This is likely due to the lack of retraining possibilities after the last replacement stage. Similar performance drops are observed for the first network parts if no retraining is performed. Retraining helps recalibrate the SR outputs, recovering most of the performance losses.

The left panel of figure 4 shows the percentage of events retaining a given number of signal hits after clustering. The slight decrease in the probability of selecting all eight signal hits, compared to the individual signal hit efficiency reported in table 1, highlights a strong correlation in signal misclassification. The network tends to either classify a full track correctly or misclassify the entire track. This trend is also evident in the final hit purity distribution, shown in the right panel of figure 4, where two distinct peaks appear at 0 and 100%. For SR 1 and SR 2, these peaks remain sharp, whereas SR 3 shows a significant broadening.

Figure 4. Refer to the following caption and surrounding text.

Figure 4. Percentage of events with $\unicode{x2A7E} N$ signal hits selected after clustering (left). Percentage of signal in the hits selected for the final track fit (right).

Standard image High-resolution image

The residuals between the true particle kinematics and the fitted $p_\textrm{T}$ and z0 track parameters are shown in figure 5, alongside the truth fit reference. As expected from the previous results, the performances for SR 1 and SR 2 show only minor deviations from the original GNN. When signal hit filtering fails completely, the circle fit tends to return a nearly null $p_\textrm{T}$ estimate, producing a small peak around 1 in the normalized residual distribution.

Figure 5. Refer to the following caption and surrounding text.

Figure 5. Residual of fitted track features. Normalized $p_\textrm{T}$ residual (left) and z0 (right).

Standard image High-resolution image

The $p_\textrm{T}$ resolution studies, shown at the bottom of table 1, indicate similar performances across the models. We acknowledge certain limitations in the approach used to obtain these values, including the flexibility in choosing the $p_\textrm{T}$ binning, low statistics in the high-energy range, and the Gaussian fit's inadequacy in modelling the one-sided tails present in high $p_\textrm{T}$ regions. Despite these limitations, we believe the current approach is sufficient for the proof-of-concept demonstrated here.

Future work will focus on refining these methods, incorporating hit position uncertainties into the fit to extract more precise uncertainties on all fitted parameters. This will provide a more robust evaluation of tracking performance.

A potential application of the proposed network lies in its ability to eliminate background hits, significantly reducing the combinatorics involved in the tracking problem. The benefit of this can be measured by counting the number of triplets constructed after hit filtering for different models.

For a rough estimate, an ideal hit selection with only eight signal hits would construct 56 triplets. In the case of the full GNN, SR 1 and SR 2, we observe an average of approximately 100 triplets, while for SR 3, the average jumps to around 1000 triplets. Without any filtering, the number of triplets reaches approximately 13 000. This demonstrates that even with limited background rejection—about 70% for the SR 3 - there is still more than a factor of 10 reduction in triplet processing time.

This outcome highlights the trade-off between signal efficiency and background rejection and will be considered in future studies aimed at optimizing the network's performance, potentially prioritizing higher signal efficiency over stringent background rejection.

7. Conclusion

In this work, we introduced a new approach using SR to approximate complex neural networks for fast inference, with the goal of enabling feasible implementation on FPGAs. We began by breaking down a GNN into its three (MLP components, deriving analytical functions for each.

Although fast ML packages like hls4ml [38] can implement neural networks on FPGAs, their resources become constrained for networks exceeding 10k parameters. Symbolic functions, by contrast, offer a more resource-efficient solution. Our main objective was to demonstrate that using symbolic approximations for such networks in track-finding tasks within the trigger system does not result in significant performance losses.

Our findings show that replacing the first two MLP blocks with symbolic functions results in minimal performance loss, while replacing the final block leads to more significant performance degradation since retraining is not possible after the last stage, preventing output calibration. The flexibility of our method allows for the decision to either replace all blocks with SR or retain a small network in the final step to minimize performance loss.

Additionally, the purpose of the network can be adapted to different use cases. In this paper, we employed the network for track seeding, combining it with external postprocessing steps, such as clustering and final track-fitting. Alternatively, the network could serve as a pre-filtering step to reduce combinatorics and accelerate conventional track-finding algorithms.

The feasibility of both approaches must be further evaluated on more complex and realistic datasets featuring multiple tracks. While the performance of SR on such datasets remains to be fully assessed, the more dispersed outputs produced by SR compared to the GNN could pose challenges for clustering in multi-track scenarios. However, increasing the network size and adding more dimensions may help address these challenges. It's important to note that while SR can easily handle a modest increase in dimensionality, scaling up significantly, such as by a factor of 100, would be less practical, as a symbolic expression has to be found for each dimension. Future work will also focus on implementing this method on FPGAs and conducting timing studies to evaluate its practical application.

Acknowledgments

N S and E G are supported by the BSF-NSF Grant 2020780 and the ISF Grant 2871/19.

Data availability statement

The data that support the findings of this study are openly available at the following URL/DOI: https://zenodo.org/records/12680558.

Footnotes

  • We adopt a right-handed coordinate system with the origin at the nominal interaction point (IP) in the centre of the detector, with the z-axis along the beam pipe. The x-axis points from the IP to the centre of the accelerator ring, and the y-axis points upwards. Polar coordinates $(r,\phi)$ are used in the transverse plane, where φ is the azimuthal angle around the z-axis. The pseudorapidity is defined in terms of the polar angle θ as $\eta = -\ln \tan(\theta/2)$.

Please wait… references are loading.