Accelerator optimisation, operation and modelling with Machine Learning techniques: PACMAN project

Machine Learning applied to the Large Hadron Collider optimization:

Instability detection/classification

 

LHC beam loss model

Operational data

Loic?

 

Simulations

SixTrack is a single particle 6D symplectic tracking code optimized for long term tracking. Here, such simulations are employed to complement the LHC beam loss model created from operational data. The goal is to determine the loss rates on the primary collimators to perform parameter dependency and sensitivity studies. The parameters of interest are the machine knobs, such as the main tunes, chromaticity, octupole currents, etc. A second objective is to characterize the loss distribution in the horizontal and vertical planes to compare it to machine measurements.


Aperture model
For our purposes a simple aperture model that includes the horizontal, vertical, and skew primary collimators near LHC IR7 is accurate enough to start with. The collimators are modelled as black absorbers, i.e. ignoring particle scattering off the collimators. An illustration of the aperture model is shown in the figures below, both in 2D (left) and 3D (right) along with some particle losses.

Simple LHC aperture model with three primary collimators near IR7. Left: 2D / Right: 3D aperture visualizer

 

Example simulation
The animation below shows an example of an LHC simulation over 200'000 turns in SixTrack using PySixDesk. The action space (Jx, Jy) is initially populated in a rectangular grid going up to 8σ. Crosses mark the particles that have been lost - the color code indicates on which of the three collimators they were absorbed. The underlying colormesh plot indicates the weights that are used to quantify the losses assuming a Gaussian distribution. The fraction of particles lost on each collimator for a Gaussian beam are shown on the right hand side. The right plot illustrates where exactly the individual particles are lost.

LHC loss simulation with aperture model

 

 

Reinforcement learning

Particle accelerators are complex machines composed of a large number of interacting subsystems with a great many parameters to adjust. Typically, the machine performance, i.e. essentially the beam brightness, has a non-linear dependence on the input parameters making it difficult for humans to fully understand the complexity of this dynamical system. Numerical optimizers can be employed for fast, automatic tuning as demonstrated here, for example. The downside of such optimizers is, however, that they have no memory and the search for the best parameters has to be done from scratch every time the system is reset. We consider here, as an alternative, the Reinforcement Learning (RL) approach and apply it to the beam extraction septum alignment in the CERN Super Proton Synchrotron (SPS).
 

Reinforcement Learning
In the RL framework an agent learns through trial and error by directly interacting with an environment. At each time step the agent performs an action based on the current state of the environment which brings the environment into a new state for which the agent receives a scalar reward. Given enough interactions, the agent gradually develops a strategy for optimal behavior, which is typically encoded in a deep neural network.


SPS extraction septum
The North Area experiments at CERN require a constant proton flux from the SPS which is achieved through resonant slow extraction. To perform this operation with minimum particle loss at the electro-static septum (ZS), the septum anode wires of the 5 ZS tanks need to be properly aligned. The dependence of the total loss on the wire positions is non-linear due to multi-turn effects. The manual adjustment requires about 8 hours of beam time. Tests with a Powell optimizer have shown to reduce the time down to 40 minutes with the same improvement on the total beam loss. To reduce this time further, the RL approach is explored in the following.


Agent training
To reduce the time required for training the RL agent on the accelerator environment itself, a surrogate model of the beam loss at the ZS was created first. One can then pretrain the RL agent offline for a warm start in the real environment to address issue of sample efficiency.

1. Surrogate model
A surrogate model of the total beam loss on the anode wires of the ZS was created using a 2-layered, dense neural network with a total of about 100 nodes and the ReLU activation unit. The network was trained on experimental data from past manual and automatic ZS alignments. The input layer is given by the anode positions of the different ZS tanks, and the output is defined by the sum of the five beam loss monitors (BLMs) in close proximity of the ZS. Some results are displayed in the figures below.


Trained model performance on an independent test set.
 


Loss response functions of the trained model for different configurations of the neural network.
 

2. RL training
An agent is now trained on the surrogate model described above using the Twin-Delayed Deep Deterministic Policy Gradient algorithm (see S. Fujimoto et al., “Addressing Function Approximation Error in Actor-Critic Methods”). The figure below shows the evolution of the reward during agent training. Once the agent is fully trained (e.g. 20'000 interactions) if finds the optimum anode positions of the ZS in one single step in 99.95% of possible initial configurations.


Evolution of reward during agent training

The use of a surrogate model to pretrain an RL agent can potentially help overcome sample efficiency limitations, and save time and money on the actual system.


More details: https://indico.cern.ch/event/837438/