How much information is in a jet?

Journal of High Energy Physics, Jun 2017

Machine learning techniques are increasingly being applied toward data analyses at the Large Hadron Collider, especially with applications for discrimination of jets with different originating particles. Previous studies of the power of machine learning to jet physics have typically employed image recognition, natural language processing, or other algorithms that have been extensively developed in computer science. While these studies have demonstrated impressive discrimination power, often exceeding that of widely-used observables, they have been formulated in a non-constructive manner and it is not clear what additional information the machines are learning. In this paper, we study machine learning for jet physics constructively, expressing all of the information in a jet onto sets of observables that completely and minimally span N-body phase space. For concreteness, we study the application of machine learning for discrimination of boosted, hadronic decays of Z bosons from jets initiated by QCD processes. Our results demonstrate that the information in a jet that is useful for discrimination power of QCD jets from Z bosons is saturated by only considering observables that are sensitive to 4-body (8 dimensional) phase space.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2FJHEP06%282017%29073.pdf

How much information is in a jet?

Received: May much information is in a jet? Portland 0 OR 0 U.S.A. 0 Open Access 0 c The Authors. 0 0 Physics Department, Reed College Machine learning techniques are increasingly being applied toward data analyses at the Large Hadron Collider, especially with applications for discrimination of jets with di erent originating particles. Previous studies of the power of machine learning to jet physics have typically employed image recognition, natural language processing, or other algorithms that have been extensively developed in computer science. While these studies have demonstrated impressive discrimination power, often exceeding that of widely-used observables, they have been formulated in a non-constructive manner and it is not clear what additional information the machines are learning. In this paper, we study machine learning for jet physics constructively, expressing all of the information in a jet onto sets of observables that completely and minimally span N -body phase space. For concreteness, we study the application of machine learning for discrimination of boosted, hadronic decays of Z bosons from jets initiated by QCD processes. Our results demonstrate that the information in a jet that is useful for discrimination power of QCD jets from Z bosons is saturated by only considering observables that are sensitive to 4-body (8 dimensional) phase space. Jets; QCD Phenomenology 1 Introduction 2 3 4 Deep learning implementation A.1 2-subjettiness A.2 1-subjettiness A Explicit expressions for 3-body phase space C Results with other architectures C.1 A deeper neural network C.2 Boosted decision tree The problem of discrimination and identi cation of high energy jet-like objects observed at the Large Hadron Collider (LHC) is fundamental for both Standard Model physics and searches as the lower bound on new physics mass scales increase. Heavy particles of the Standard Model, like the W , Z, and H bosons or the top quark, can be produced with large Lorentz boosts and dominantly decay through hadrons. They will therefore appear collimated in the detector and similar to that of jets initiated by light QCD partons. The past several years have seen a huge number of observables and techniques devoted to jet identication [1{4], and many have become standard tools in the ATLAS and CMS experiments. The list of observables for jet discrimination is a bit dizzying, and in many cases there is no organizing principle for which observables work well in what situations.1 Motivated by the large number of variables that de ne the structure of a jet, several groups have recently applied machine learning methods to the problem of jet identi cation [9{21]. Rather than developing clever observables that identify certain physics aspects of the jets, the idea of the machine learning approach is to have a computer construct an approximation to the optimal classi er that discriminates signal from background. For example, ref. [11] interpreted the jet detected by the calorimetry as an image, with the pixels corresponding to the calorimeter cells and the \color" of the pixel corresponding to the deposited transverse momentum in 1There has been some e ort in the past to identify and quantify (over)complete bases of jet obserthe cell. These techniques have outperformed standard jet discrimination observables and show that there is additional information in jets to exploit. However, this comes with a signi cant cost. Machine learning methods applied to jet physics typically have hundreds of input variables with thousands of correlations between them. Thus, in one sense this problem seems ideally suited for machine learning, but it also lacks the immediate physical interpretation and intuition that individual observables have. Previous studies have shown that the computer is learning information about what discriminates jets of di erent origins, but it has not been clearly demonstrated what information standard observables are missing. Along these same lines, the improvement of discrimination performance of machine learning over standard observables is relatively small, suggesting that standard observables capture the vast majority of useful information In this paper, we approach machine learning for jet discrimination from a di erent perspective. We construct an observable basis that completely and minimally spans the phase space for the substructure of a jet.2 For a jet with M particles, the phase space is 4 dimensional, and so we identify 3M 4 infrared and collinear (IRC) safe jet substructure observables that span the phase space.3 These basis observables are then passed to a machine learning algorithm for identi cation of relevant discrimination information.4 A general jet will have an arbitrary number of particles in it, and so we will observe how the discrimination power depends on the dimension of phase space that we assume. That is, we will assume that the jet has 2 particles, 3 particles, 4 particles, etc., as de ned (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2FJHEP06%282017%29073.pdf

Kaustuv Datta, Andrew Larkoski. How much information is in a jet?, Journal of High Energy Physics, 2017, pp. 1-25, Volume 2017, Issue 6, DOI: 10.1007/JHEP06(2017)073