Thinking outside the ROCs: Designing Decorrelated Taggers (DDT) for jet substructure

Journal of High Energy Physics, May 2016

We explore the scale-dependence and correlations of jet substructure observables to improve upon existing techniques in the identification of highly Lorentz-boosted objects. Modified observables are designed to remove correlations from existing theoretically well-understood observables, providing practical advantages for experimental measurements and searches for new phenomena. We study such observables in W jet tagging and provide recommendations for observables based on considerations beyond signal and background efficiencies.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2FJHEP05%282016%29156.pdf

Thinking outside the ROCs: Designing Decorrelated Taggers (DDT) for jet substructure

Revised: May Thinking outside the ROCs: Designing Decorrelated James Dolen 0 3 Philip Harris 0 1 Simone Marzani 0 3 Salvatore Rappoccio 0 3 Nhan Tran 0 2 Geneva 0 Switzerland 0 0 Batavia, IL , 60510 U.S.A 1 CERN, European Organization for Nuclear Research 2 Fermi National Accelerator Laboratory , FNAL 3 Department of Physics, University at Bu alo, The State University of New York We explore the scale-dependence and correlations of jet substructure observables to improve upon existing techniques in the identi cation of highly Lorentz-boosted objects. Modi ed observables are designed to remove correlations from existing theoretically well-understood observables, providing practical advantages for experimental measurements and searches for new phenomena. We study such observables in W jet tagging and provide recommendations for observables based on considerations beyond signal and background e ciencies. Jets; QCD Phenomenology - 1 Introduction 2 3 4 5 6 7 Conclusion and outlook groomers and taggers on both background [16, 17] and signal jets [18, 19] have been performed. More recently, calculations have been extended to interesting case in which a jet shape is measured in conjunction with a cut on the jet mass in [20{23] and [24]. Despite this enormous amount of progress, experimental collaborations have yet to fully exploit these advantages to reduce systematic uncertainties in analyses using substructure techniques. Much study has been focused on the relationship of numerous identication observables in order to construct the most optimal heavy object taggers. Dedicated phenomenological studies [4] and detailed analysis by CMS [25{28] and ATLAS [29{32] employing multivariate techniques were performed in order to understand how to best identify boosted W=Z bosons, top quarks and Higgs bosons optimizing the statistical discrimination power of background rejection and signal e ciency. Moreover, there has been recent interest in using computer vision techniques to combine individual calorimeter cells into non-linear optimal observables [33{35]. However, a quantitative study of the reduction of { 1 { systematic uncertainties by taking advantage of theoretical improvements has not yet been performed. In the following study, we aim to build a tagger based not only on statistical discrimination power, but also the robust behavior of the inherent QCD background. This tagger will be designed such that, after applying a at cut on the tagging variable, the shape of the QCD background jet mass distribution remains stable and at. We demonstrate our methodology, entitled \designed decorrelated taggers (DDT)", by performing an example analysis in which hadronically decaying W boson jets are distinguished from quark- and gluon-initiated jets. The DDT approach is applicable to the identi cation of any heavy boosted objects, such as Z, H, and top jets. Samples. The Monte Carlo samples used in this study were originally used for studies in the BOOST13 report [4]. Samples were generated at p s = 8 TeV for QCD dijets, and for W +W pairs produced in the decay of a scalar resonance. The QCD events were split into subsamples of gg and qq events, allowing for tests of discrimination of hadronic W bosons, quarks, and gluons. QCD samples were produced at leading order (LO) using MADGRAPH5 [36], while W W samples were generated using the JHU GENERATOR [37]. The samples were then showered through PYTHIA8 (version 8.176) [38] using the default tune 4C [39]. The samples were produced in exclusive pT bins of width 100 GeV at the parton level. The pT bins investigated in this report were 300{400 GeV, 500{600 GeV and 1.0{1.1 TeV. The stable particles in the generator-level events are clustered into jets with the antikT jet algorithm [40] with three di erent distance parameters, R = 0:4; 0:8; 1:2, using fastjet 3.1 [41, 42]. No multiple parton interactions (or pileup) is used in these samples, although previous LHC measurements [43, 44] have shown that grooming algorithms are more resilient to pileup e ects than standard jet algorithms. Furthermore, it was shown in those measurements that the Monte Carlo simulation can accurately reproduce the data for regions of high jet mass, whereas there are disagreements below the Sudakov peak. The grooming algorithms, however, mitigate this disagreement very strongly as well. As such, we study jets with a grooming algorithm applied. The algorithms we have investigated are the \modi ed" mass-drop tagger (mMDT) [5, 16] with zcut = 0:1, jet trimming [10] with Rsub = 0:3 and fcut = 0:1, jet pruning [ 8, 9 ], and soft drop [12] with zcut = 0:1 for both = 1 and = 2 (note that the case of = 0 is equivalent to the mMDT). We have found that the conclusions are not strongly dependent on the groomer used, so have used soft-drop with = 0 (mMDT) for most of our comparisons due to its smoother scaling behavior than other groomers [16]. 2 Current taggers Current heavy object jet substructure taggers employed by CMS and ATLAS often cut on some number of observables directly or through some algorithm. Take, for example, something similar to the CMS Run 1 W tagger that uses simple cuts on the N -subjettiness ratio 2= 1 [11] and the soft drop jet mass [12]. In this study, we consider the 2= 1 variable where the subjet axes are chosen using the kT one-pass axes optimization technique. { 2 { ..au 104 103 102 10 0 0 20 40 60 80 100 120 140 soft drop mass [GeV], sucessive τ2/τ1 cuts 20 40 60 80 100 120 140 soft drop mass [GeV], sucessive τ2/τ1 cuts soft drop mass [GeV], sucessive τ2/τ1 cuts 20 40 60 80 100 120 140 soft drop mass [GeV], sucessive τ2/τ1 cuts = 0) for gluon jets after various cuts on 2= 1 ( = 1) for di erent jet pT bins: pT = 300{400 GeV (top left), pT = 500{600 GeV (top right), pT = 1{1:1 TeV (bottom left) and also for the signal (bottom right), distributions for signal are stable versus pT . The cuts in 2= 1 vary from 1.0 to 0.0 in steps of 0.02; the changing line styles for successive cuts are meant to visually aid the reader. In order to distinguish hadronically decaying W bosons (which give rise to jets that are intrinsically two-pronged) from QCD background, a at cut on on 2= 1 is typically performed. As expected, this procedure greatly reduces the background, but it also leads to an unwanted sculpting of the soft drop jet mass distribution (an undesirable feature also discussed in ref. [45]), as shown in gure 1. After cutting on 2= 1 to select jets which are two-pronged, the QCD background soft drop jet mass distribution becomes more peak-like in shape, making it harder to distinguish QCD jets from W jets which also have a peak in the jet mass distribution. The shape of the sculpted jet mass distribution, and the location of this arti cial peak, varies for di erent jet pT regions. This pT dependent sculpting of the jet mass distributions makes sideband methods of background estimation more di cult. In this case and in further examples, we primarily consider gluon-initiated jets though performance with quark-initiated jets is similar. Di erences will be explored in greater detail in future studies. In ref. [16] it was argued that at QCD mass distributions could be obtained by tuning the value of the soft drop energy fraction threshold (zcut), and optimal values for quarkand gluon-initiated jets were analytically derived. However, the presence of the 2= 1 cut makes this situation more complex and it requires reconsidering the issue. { 3 { Therefore, we propose additional criterion in determining a better tagging observable beyond pure statistical discrimination power. For similarly discriminant observables, we would like to nd an observable which is (1) primarily uncorrelated with the groomed jet mass observable (or rather that has complementary correlations as far as discrimination is concerned) and (2) maintains a desirable groomed mass behavior while scaling pT . Observables satisfying this criterion would, after applying a rectangular cut, still produce a at groomed jet mass distribution. 3 Shape observable scaling in QCD We start our study of the correlations of substructure variables with the jet mass and pT by introducing the appropriate scaling variable for QCD jets: = log(m2=p2T ): (3.1) Here we have di ered from the typical de nition of jet by removing the jet distance parameter R2 from the denominator of the de nition. For now we keep R = 0:8 xed and leave this for future study. Note that when we apply soft-drop, we take the mass in eq. (3.1) to be computed on the constituents of the soft-drop jet, while the transverse momentum is the one of the original (ungroomed) jet. We now compute, on both our background and signal samples, the average value of the N -subjettiness ratio 2= 1 (computed on the full jet) as a function of the soft-drop . This is shown in gure 2, on the left. The signal W jets are shown in open circles while the background, here gluon jets, are shown in closed circles. The various colors are di erent bins in jet pT . We note the typical behavior showing 2= 1 for the signal tending to lower values than the background and at a given value in due to the mass scale of the signal jet in a given pT bin. The signal tends to be xed around the W mass and thus shifts for di erent values of pT and is otherwise most concentrated in the dip region. Now, let us focus on the background curves (solid points). We notice a strong dependence on 2= 1 which is what causes the sculpting of the mass distributions shown in the previous section. However, we note that there exist a region in for which this relationship is conspicuously linear. This is an interesting behavior, which we will exploit shortly in section 4. We also observe that, even in this linear region, there is still a residual pT dependence, which looks like, to a very good approximation, a constant shift. The behavior observed in gure 2 for soft drop is also observed for other groomers, such as trimming and pruning, within the pT ranges consdidered. At lower values of di erences in the groomers become more apparent, most likely because in that region trimming and pruning acquire further sensitivity to soft physics [16]. Thus, in the current study, we concentrate on the soft-drop mass due to its stable behavior. This approximate linear relation between 2= 1 can be (qualitatively) understood by noting that, in the case = 2, 2 essentially measures the subjet mass, while 1 corresponds to the jet mass itself. This leads to an approximately linear relation between 2= 1 and in the region of the (soft-collinear) phase-space where all-order e ects can be { 4 { of 0 = log(m2=pT = ) (on the right). Solid dots correspond to background, while hollow ones to = log(m2=p2T ) (left) and as a function signal. The di erent colors correspond to di erent pT bins. neglected.1 Furthermore, ref. [24] performed calculations for jet mass distributions in the presence of a 2= 1 cut to an accuracy which is close to next-to-leading logarithmic (NLL) accuracy. Despite the fact that the calculation corresponding to the pro le plot in gure 2 were not performed, it could in principle be derived because the authors do provide the double di erential distribution in 2= 1 and . However, some important di erences between our current set-up and the one of ref. [24] prevent us from using their results to get more quantitative insight in the behaviors we observe beyond the existence of a region with linear correlation. First ref. [24] did not consider the soft drop and, second, the de nition of N -subjettiness di ers in the two studies both in regards of the angular exponent ( = 1 versus = 2) and of the choice of axes. We note that, at xed-coupling, all the transverse momentum dependence is accounted for in the de nition of the shape and . We have checked whether the origin of the pT dependence that we see in gure 2 (on the left) could be traced back to the transverse momentum used in the de nition of the (ungroomed vs groomed) but this was found not to be the case. Running coupling contributions, as well as other subleading corrections, do introduce a pT dependence and they are likely to responsible for the observed pT dependence. However, a quantitative understanding of these e ects would require a calculation using the techniques of ref. [24]. This goes beyond the scope of this work and for this study we limit ourselves to a phenomenological solution, while leaving a rst-principle analysis for future work. Thus, in order to remove the constant pT dependence in the 2= 1 pro le, we introduce a modi ed version of : 0 = + log pT = log m2 pT : (3.2) This change of variable, together with the choice 1 GeV, appears to perform an excel lent job in getting rid of the pT dependence, as shown in gure 2, on the right, though of course we note this is purely an empirical observation. So far, we have only considered 2= 1 versus soft drop mass. We also noted that a similar linear correlation exists between 2= 1 and other groomed masses, though not 1We thank Andrew Larkoski for raising this point. { 5 { 1β 2 0.7 = Solid dots correspond to background, while hollow ones to signal. The di erent colors correspond to di erent pT bins. shown explicitly. We can also consider other shape variables, though we leave an exhaustive exploration of all shape variables to a later study. As an example, we show also energy correlation functions C2 =1 and D2 =1 as a function of in gure 3. On the left, C2 =1 shows a relatively at distribution versus which is desirable although the behavior is not quite linear. On the right, D2 =1 is highly correlated with . In both cases, the correlations have some pT -dependence that is not trivially empirically determined. 4 4.1 Designing decorrelated taggers (DDT) Transforming 2= 1 ! By performing the transformation 0, we have successfully accounted for most of the pT dependence of the pro le distribution. Next we would like to perform a further transformation with the aim of attening the pro le dependence on 0, with the idea that this will in turn reduce the mass-sculpting discussed earlier. In order to determine the transformation we are after, we concentrate on the region in which the relationship between 2= 1 and 0 is essentially linear. Thus, we introduce 201 = 2= 1 M 0; (4.1) where the slope M is numerically tted from gure 2 (red t lines). The comparison between the 2= 1 and 201 distributions is shown in gure 4, for di erent jet pT bins. The transformed variable, 201, looks similar to the original variable 2= 1 although the behavior of the correlation with the groomed mass is now practically removed. We note that a pT -dependence on the signal shape is introduced which is, in hindsight, expected given the transformation takes advantage of scaling properties of the background. This can cause a pT -dependence in the signal e ciency with a cut on 201 not present in the original 2= 1; however, we note this is not necessarily an undesirable feature. For example, as backgrounds decrease at higher pT it may be desirable to allow a larger signal e ciency and this should be studied in more detail in the experiments within the context of particular { 6 { v fe 0.1 o = 1000-1100 GeV = 300-400 GeV = 500-600 GeV = 300-400 GeV = 500-600 GeV analyses. This can be seen in gure 5 which shows the pro le of 201 as a function of 0 with the intended decorrelated behavior. Now, we can explore the sculpting of the mass distributions making a at cut in 201. This is shown in gure 6 which should be contrasted with gure 1 which was obtained with a at cut in 2= 1. Notice that now the sculpting of the mass distribution is considerably reduced, particularly in the region of interest where the W boson peak is. With a simple transformation, we can now preserve mass sidebands for background estimations and make robust predictions of the pT dependence of the backgrounds. This practical consequences of a well-behaved background shape will be explored in section 5. Generally speaking, a nonlinear dependence is not a technical obstacle to performing an observable transformation and we discuss this in section 6; however, studying the behavior in a simple analytic regime allows us to better understand the underlying physical behavior. The nal component to evaluating the success of the observable transformation is to understand the performance of the new observable in terms of rejecting backgrounds. 4.2 Performance of DDT To evaluate the performance of the transformed variable we use the traditional receiver operating characteristic (ROC) curve, de ned as the signal e ciency as a function of the { 7 { soft drop mass [GeV], sucessive τ21' cuts soft drop mass [GeV], sucessive τ21' cuts soft drop mass [GeV], sucessive τ21' cuts 20 40 60 80 100 120 140 soft drop mass [GeV], sucessive τ21' cuts and also for the signal (bottom right), distributions for signal are stable versus pT . The cuts in 201 vary from 1.0 to 0.0 in steps of 0.02; the changing line styles for successive cuts are meant to visually aid the reader. background e ciency. A better discriminating tagger is characterized by higher signal e ciency and lower background e ciency. The discriminating performance of 2= 1 and the transformed 201 are shown in the left of gure 7 for jets within a soft drop mass window of [60{120] GeV (corresponding to the W signal mass region). From the ROC curve, we note that after transforming the variable the discriminating power does not degrade and even shows modest improvement in this kinematic regime. We can see where this comes from in the right panel of gure 7. After cutting on raw 2= 1 the QCD soft drop jet mass distribution is sculpted such that many of the jets surviving the cut fall into the W mass region. In contrast, cutting on 201 leaves a more linearly falling distribution which preserves the low sideband. The mass distributions on the right side of gure 7 are after making a cut on the shape observable to maintain a signal e ciency of 50%. 5 Case studies Currently, the systematic uncertainties in extracting the e ciency are large (and usually dominant) sources of uncertainty in SM and BSM analyses at the LHC [46{52]. There are { 8 { 0.8 0.6 e vE2500 2000 1500 1000 500 0 AK8, pT = 300-400 GeV AK8, pT = 500-600 GeV AK8, pT = 1000-1100 GeV AK8, pT = 300-400 GeV raw AK8, pT = 500-600 GeV raw AK8, pT = 1000-1100 GeV raw 1 ∈Bkg for three pT regions for the transformed 201 variable (solid) and the raw 2= 1 variable (dashed). Here e ciency is de ned as the number of jets with mass satisfying 60 < mMDT < 120 GeV which are tagged. (Right) Soft drop mass distributions after a cut on the transformed 201 variable (solid) and the raw 2= 1 variable (dashed), where the cut corresponds to 50% signal e ciency. Here the uncertainties on each bin signify the expected variation for a 10% uncertainty on the W boson tag e ciency. several places where the improved scaling behavior can reduce these systematics, in addition to the performance improvements in the ROC curves shown in gure 7. We will present two improvements, the preservation of mass sidebands in the kinematic t to extract the W tagging e ciency from semileptonic tt events, and the overall background estimate in diboson analyses. Both cases take advantage of the atter background distributions to improve the uncertainties in shape-based ts. 5.1 Preservation of mass sidebands The shape of the jet mass spectrum is used in the LHC experiments to determine the W tagging e ciency; for instance, CMS relies on a simultaneous t to the jet mass in events that pass and fail the 21 selection. However, as shown in gure 1, the 21 selection signi cantly kinematically sculpts the background distribution in this variable. This can lead to signi cant tted uncertainties when extracting the background normalization, and thus directly translates to large uncertainties in the W tagging e ciency measurement. By using the 201, a signi cant improvement is observed. To demonstrate this, we examine two cases, modi ed mass drop tagging with 21 < 0:45, and modi ed mass drop tagging with a scale-dependent selection 21 < 0:6{0:08 0, where 0 = log m2=pT = . This translates into a cut on 201 < 0:6. These selections have approximately the same signal e ciency. For simplicity, the same signal and background MC samples are used as in the previous sections, but the events are weighted with an easily speci able fraction of background jets. In this case, the background fraction for the entire sample is 40%. This gives a comparable fraction of merged to unmerged W bosons in a semileptonic tt selection at 13 TeV at the LHC, but allows us to easily tune the fraction. In addition, to mimic the approximate detector resolution, the intrinsic resolution of the W ! qq system is smeared with a Gaussian of width 10 GeV. This is indicative of the resolutions obtained at the CMS and ATLAS experiments. { 9 { 21 < 0:6 (right), respectively. These two selections have approximately the same signal e ciency. The background fraction of the entire sample (for all jet masses) is set to 40%. The points are the observed MC events, after smearing the jet mass resolution to purple dotted line corresponds to the smeared W signal jets. The red dashed line corresponds to the tted background component, modeled as a Gaussian distribution. The blue band corresponds to a t to the signal plus background, where the thickness of the line corresponds to the uncertainty in the tted component. 120 GeV, after a selection on the N -subjettiness variable. The model is a double Gaussian, one for the QCD continuum and one for the W mass peak. The jet pT range considered is pT = 300{400 GeV, to give a typical pT range of the W bosons from top quark decays from SM tt production. The rst t shows the modi ed mass drop algorithm after 21 < 0:45. The second t shows the modi ed mass drop algorithm after 201 < 0:6. The ts successfully capture the mass of the W and the input width of 10%. It is interesting to note that the jet mass of the QCD jets after the 201 selection are signi cantly pushed below 10 GeV. In addition, the remaining distribution is at. However, for the standard 21 selection, the distribution is rising, with signi cantly more background under the W signal peak. The background uncertainty on the t is is 6% when using the standard 21 selection. However, it is reduced by a factor of two to 3% by using the 201 selection. This is driven by the fact that the tter can more easily handle sidebands that are atter, so the 201 variable outperforms the 21 variable in this metric. This would translate directly into a decreased systematic uncertainty for the LHC experiments. While newer and more clever algorithms can achieve better performance in MC simulations, this does not always translate directly to improvements in actual analyses due to the need to characterize the systematic uncertainties. We therefore propose this test as an appropriate metric to characterize the systematic performance of new substructure algorithms. 5.2 Diboson background estimate The diboson background estimate for the LHC experiments is much the same as the extraction of the W tagging e ciency, except that the background fraction is signi cantly ven100 E 80 60 40 20 0 30 G /380 s t n e vE60 40 20 0 30 Bkg Uncertainty = 15% requiring 21 < 0:45 and 201 < 0:6, respectively. These two selections have approximately the same signal e ciency. The background fraction of the entire sample (for all jet masses) is set to 80%. The points are the observed MC events, after smearing the jet mass resolution to purple dotted line corresponds to the smeared W signal jets. The red dashed line corresponds to the tted background component, modeled as a Gaussian distribution. The blue band corresponds to a t to the signal plus background, where the thickness of the line corresponds to the uncertainty in the tted component. higher. We have chosen a value of 80% (integrated over the entire spectrum of events) as an indicative fraction, with the same number of events (5000). We have considered two di erent pT ranges, pT = 500{600 GeV and pT = 1000{1100 GeV. One somewhat obvious but important point is that as the pT increases, the Sudakov peak from QCD-generated jets shifts further to the right. As this occurs, the ts to discriminate boosted W bosons from QCD-generated jets are less and less able to distinguish between the categories. gure 8. However, the background fraction is raised from 40% to 80% (again integrated over the entire mass spectrum), and the pT ranges are set to pT = 500{600 GeV and pT = 1000{1100 GeV, respectively. For the range pT = 500{600 GeV, it is plain to see that there is a signi cant improvement of the 201 variable, where the background uncertainty decreases from 15% to 6%. This is even more apparent for the range pT = 1000{1100 GeV, where the uncertainty decreases from 23% to 6%. 6 Generalized scale invariance Decorrelation schemes can be extended beyond a pair of variables to decorrelate classes of many variables. Such a procedure can be used to allow for a class of variables to be merged into a single multi-variate analysis discriminator (MVA), while preserving decorrelation against one or a set of variables that are further used in the analysis. Consider, for example, building an MVA W tagger using both 2= 1 and C2 =1. Both of these variables have correlations with pT and mass, so the resulting classi er that combines the variables will also be correlated with mass and pT . Decorrelating the space of variables against mass and pT before or during the construction of the MVA can thus preserve the mass and G /3120 s t σW = 9.6 +/- 1.4 requiring 21 < 0:45 and 201 < 0:6, respectively. These two selections have approximately the same signal e ciency. The background fraction of the entire sample (for all jet masses) is set to 80%. The points are the observed MC events, after smearing the jet mass resolution to purple dotted line corresponds to the smeared W signal jets. The red dashed line corresponds to the tted background component, modeled as a Gaussian distribution. The blue band corresponds to a t to the signal plus background, where the thickness of the line corresponds to the uncertainty in the tted component. pT invariance resulting in an uncorrelated tagger. This idea has previously been pursued in b-physics utilizing an MVA that minimizes the mass dependence, while simultaneously constructing a classi er [53]. In light of building an example based on previously presented studies, we split = log(m2=p2T ) by into it components log(m) and log(pT ). Combining this with either C21 or 2= 1 gives a class of three variables for which we decorrelate into a set of three independent linear combinations of variables. The independent variables can be viewed as properties of the data which span the space of distinctive features. This space can be explored to further understand behavior of the data. Additionally, a subset of the independent components can be merged through an MVA while maintaining the decorrelation of the remaining set of variables. In this way, mass sidebands or other sideband methods can be used on the merged MVA discriminator with the decorrelated variable. As has previously been noted, decorrelating variables which are not implicitly linearly correlated is poorly de ned [54]. We thus consider two generalized approaches that attempt to decorrelate discriminators that are not necessarily linearly correlated. We consider two decorrelation approaches: Principle Component Analysis (PCA) of transformed variables and Independent Component Analysis (ICA). Decorrelation by PCA and ICA. Given a set of variables need not be linearly correlated, we consider a transformed variable (vi0) of the original variable vi de ned by vi0 = f (vi) : (6.1) For this transformation, we train a gradient boosted decision tree [55] with the boosted W boson as a signal and a high pT QCD jet as a background. This transformation places the variables into a space that enables the possibility of linearized correlations of the original variables. The resulting correlation matrix of the transformed variables can be decorrelated through principle component analysis by taking the eigenvectors of the matrix. This yields a set of n-independent vectors for a n-dimensional correlation matrix. The decorrelated vectors for the triplet of transformed 2= 1, log(pT ), and log(m) is shown in gure 11. The correlation of the resulting vectors is compared with a gradient boosted decision tree using all variables and with the transformed mass. From this correlation, we observe two discriminating dimensions and the pT . These we can write as v1 = log(m= 1) + K1( 2= 1) v2 = 2= 1 + K2 log(m3:5=pT 22:5); (6.2) (6.3) where K1;2 correspond to coe cients and 1;2 are scales, typically the observables dimensionless. The rst variable corresponds to the transformed mass and the second corresponds the transformed 2= 1. The second variable is not too di erent from 0 decorrelated 2= 1. An alternative decorrelation approach, known as independent component analysis (ICA), involves diagonalization of the matrix constructed by computing the pairwise mutual information of each pair of variables on the sample of QCD jets. This di ers to previous approaches, which rely on the mutual information to truth. Here, we focus on identifying features in the data and not necessarily discriminating power. We perform the ICA with an algorithm that uses k-nearest neighbor to expedite the diagonalization process (MILCA) [56]. The right panel of gure 11 shows the ICA decomposed vectors. As with the transformed PCA, the ICA decorrelates the pT , however the mass 2= 1 interdependence is stronger than in the transformed case. Finally, the equivalent decorrelated matrix for a combined set of observables is shown in gure 12, here we show just the transformed PCA approach. From the combined set, we observe the largest orthogonal set of discrimination power comes from the C =1 as oppose to 2= 1. When comparing the two approaches, we have found variable transformed PCA 2 yields a more consistent performance with our previous observations. 7 Conclusion and outlook In this note, we explore the scale-dependence and correlations of jet substructure observables. The goal is not only to improve the statistical power of such observables, which we also demonstrate, but also to consider practical issues related to using such observables in searches for new physics. In order to design decorrelated taggers (DDT), we transform the shape observable, here 2= 1 ! 201, by decorrelating it from groomed mass observables also factoring in the pT scale-dependence. In addition to improving the statistical discrimination between signal and background, we also preserve a robust, at background shape and which has more stable behavior when scaling of the background going from lower pT bins to higher pT bins. We demonstrate the advantages of such an approach in various case studies such as predicting background normalizations and determining heavy object tagging scale factors related to new physics searches. -99 -1 6 all(exc. p ) T Mass log(p ) T log(m mdt ) τ2/τ1 C 1 2 88 86 1 63 37 22 -48 3 -6 21 -94 all(exc. pT) Mass 60 log(pT) 0 log(mmdt) 100 80 40 20 − − − − − τ2/τ1 6 99 2 4 49 -15 11 -29 56 70 1 v -97 -4 14 2 all(exc. pT) Mass log(pT) log(mmdt) τ2/τ1 bottom panel corresponds to the vectors in columns with their relative fraction labeled by row. The top panel corresponds to the correlation to the soft dropped mass and a gradient boosted decision tree trained with all variables excluding the pT . 0 2 3 v v v studied in this paper. The bottom panel corresponds to the vectors in columns with their relative fraction labeled by row. The top panel corresponds to the correlation to the soft dropped mass and a gradient boosted decision tree trained with all variables excluding the pT . The intention of this note is not to perform a detailed study of all possible heavy object taggers, but instead, to introduce further considerations when designing taggers and propose a method by which all considerations can be addressed, namely via observ3 -2 -1 9 -72 67 70 1 68 -72 100 80 60 40 20 0 − − − − − -85 4 -72 -68 100 80 60 40 20 0 − − − − − able decorrelation. We leave studies related to variations on jet mass groomers and shape observables, R-scaling, quark-gluon fractions, scaling background predictions, behavior at extremely high pT , and top tagging to future works. We have explored more generic determinations of observable decorrelation with complex taggers using multivariate techniques and numerical principle-component analysis. Acknowledgments We thank Andrew Larkoski and Petar Maksimovic for their critical reading of the manuscript. The authors would also like to thank Matteo Cremonesi, Matthew Low, Cristina HJEP05(216) Mantilla Suarez, Siddharth Narayanan, Gavin Salam, Gregory Soyez, and Michael Spannowsky for useful discussion and inputs. JD and SR are partially supported by the U.S. National Science Foundation, under grant PHY{1401223. PH is supported by CERN. The work of SM is partly supported by the U.S. National Science Foundation, under grant PHY{0969510, the LHC Theory Initiative. NT is supported by the Fermi Research Alliance, LLC under Contract No. De-AC02-07CH11359 with the United States Department of Energy. Open Access. Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited. [1] A. Abdesselam et al., Boosted objects: A Probe of beyond the Standard Model physics, Eur. Phys. J. C 71 (2011) 1661 [arXiv:1012.5412] [INSPIRE]. [2] A. Altheimer et al., Jet Substructure at the Tevatron and LHC: New results, new tools, new benchmarks, J. Phys. G 39 (2012) 063001 [arXiv:1201.0008] [INSPIRE]. [3] A. Altheimer et al., Boosted objects and jet substructure at the LHC. Report of BOOST2012, held at IFIC Valencia, 23rd{27th of July 2012, Eur. Phys. J. C 74 (2014) 2792 [arXiv:1311.2708] [INSPIRE]. [4] D. Adams et al., Towards an Understanding of the Correlations in Jet Substructure, Eur. Phys. J. C 75 (2015) 409 [arXiv:1504.00679] [INSPIRE]. [5] J.M. Butterworth, A.R. Davison, M. Rubin and G.P. Salam, Jet substructure as a new Higgs search channel at the LHC, Phys. Rev. Lett. 100 (2008) 242001 [arXiv:0802.2470] [6] D.E. Kaplan, K. Rehermann, M.D. Schwartz and B. Tweedie, Top Tagging: A Method for Identifying Boosted Hadronically Decaying Top Quarks, Phys. Rev. Lett. 101 (2008) 142001 [arXiv:0806.0848] [INSPIRE]. tagging, CMS-PAS-JME-09-001 (2009) [INSPIRE]. [7] CMS collaboration, A cambridge-aachen (C-A) based jet algorithm for boosted top-jet [8] S.D. Ellis, C.K. Vermilion and J.R. Walsh, Techniques for improved heavy particle searches with jet substructure, Phys. Rev. D 80 (2009) 051501 [arXiv:0903.5081] [INSPIRE]. [arXiv:0912.0033] [INSPIRE]. [arXiv:0912.1342] [INSPIRE]. [11] J. Thaler and K. Van Tilburg, Maximizing Boosted Top Identi cation by Minimizing N-subjettiness, JHEP 02 (2012) 093 [arXiv:1108.2701] [INSPIRE]. [13] D.E. Soper and M. Spannowsky, Finding physics signals with shower deconstruction, Phys. HJEP05(216) [23] A.J. Larkoski, I. Moult and D. Neill, Analytic Boosted Boson Discrimination, [24] M. Dasgupta, L. Schunk and G. Soyez, Jet shapes for boosted jet two-prong decays from rst-principles, JHEP 04 (2016) 166 [arXiv:1512.00516] [INSPIRE]. [25] CMS collaboration, Identi cation techniques for highly boosted W bosons that decay into hadrons, JHEP 12 (2014) 017 [arXiv:1410.4227] [INSPIRE]. [26] CMS collaboration, Boosted Top Jet Tagging at CMS, CMS-PAS-JME-13-007 (2014) [27] CMS collaboration, V Tagging Observables and Correlations, CMS-PAS-JME-14-002 (2014) [28] CMS collaboration, Top Tagging with New Approaches, CMS-PAS-JME-15-002 (2016) [14] D.E. Soper and M. Spannowsky, Finding top quarks with shower deconstruction, Phys. Rev. [15] D.E. Soper and M. Spannowsky, Finding physics signals with event deconstruction, Phys. [16] M. Dasgupta, A. Fregoso, S. Marzani and G.P. Salam, Towards an understanding of jet substructure, JHEP 09 (2013) 029 [arXiv:1307.0007] [INSPIRE]. [17] M. Dasgupta, A. Fregoso, S. Marzani and A. Powling, Jet substructure with analytical methods, Eur. Phys. J. C 73 (2013) 2623 [arXiv:1307.0013] [INSPIRE]. [18] I. Feige, M.D. Schwartz, I.W. Stewart and J. Thaler, Precision Jet Substructure from Boosted Event Shapes, Phys. Rev. Lett. 109 (2012) 092001 [arXiv:1204.3898] [INSPIRE]. [19] M. Dasgupta, A. Powling and A. Siodmok, On jet substructure methods for signal jets, JHEP 08 (2015) 079 [arXiv:1503.01088] [INSPIRE]. [20] A.J. Larkoski, I. Moult and D. Neill, Toward Multi-Di erential Cross Sections: Measuring Two Angularities on a Single Jet, JHEP 09 (2014) 046 [arXiv:1401.4458] [INSPIRE]. [21] A.J. Larkoski, I. Moult and D. Neill, Power Counting to Better Jet Observables, JHEP 12 [22] A.J. Larkoski, I. Moult and D. Neill, Building a Better Boosted Top Tagger, Phys. Rev. D 91 [arXiv:1402.2657] [INSPIRE]. Rev. D 84 (2011) 074002 [arXiv:1102.3480] [INSPIRE]. D 87 (2013) 054012 [arXiv:1211.3140] [INSPIRE]. Rev. D 89 (2014) 094005 [arXiv:1402.1189] [INSPIRE]. arXiv:1507.03018 [INSPIRE]. [INSPIRE]. [INSPIRE]. [INSPIRE]. [29] ATLAS collaboration, Performance of jet substructure techniques for large-R jets in s = 7 TeV using the ATLAS detector, JHEP 09 (2013) 076 [30] ATLAS collaboration, Identi cation of boosted, hadronically decaying W bosons and s = 8 TeV, Eur. Phys. J. C 76 (2016) 154 [arXiv:1306.4945] [INSPIRE]. comparisons with ATLAS data taken at p [31] ATLAS collaboration, Boosted hadronic top identi cation at ATLAS for early 13 TeV data, [32] ATLAS collaboration, Identi cation of boosted, hadronically-decaying W and Z bosons in s = 13 TeV Monte Carlo Simulations for ATLAS, ATL-PHYS-PUB-2015-033 (2015). [33] J. Cogan, M. Kagan, E. Strauss and A. Schwarztman, Jet-Images: Computer Vision Inspired Techniques for Jet Tagging, JHEP 02 (2015) 118 [arXiv:1407.5675] [INSPIRE]. Boosted Top Identi cation with Pattern Recognition, JHEP 07 (2015) 086 Learning Edition, arXiv:1511.05190 [INSPIRE]. [36] J. Alwall et al., The automated computation of tree-level and next-to-leading order di erential cross sections and their matching to parton shower simulations, JHEP 07 (2014) 079 [arXiv:1405.0301] [INSPIRE]. [37] Y. Gao, A.V. Gritsan, Z. Guo, K. Melnikov, M. Schulze and N.V. Tran, Spin determination of single-produced resonances at hadron colliders, Phys. Rev. D 81 (2010) 075022 [arXiv:1001.3396] [INSPIRE]. [38] T. Sjostrand, S. Mrenna and P.Z. Skands, A Brief Introduction to PYTHIA 8.1, Comput. Phys. Commun. 178 (2008) 852 [arXiv:0710.3820] [INSPIRE]. [39] A. Buckley et al., General-purpose event generators for LHC physics, Phys. Rept. 504 (2011) [40] M. Cacciari, G.P. Salam and G. Soyez, The anti-kt jet clustering algorithm, JHEP 04 (2008) [41] M. Cacciari and G.P. Salam, Dispelling the N 3 myth for the kt jet- nder, Phys. Lett. B 641 145 [arXiv:1101.2599] [INSPIRE]. 063 [arXiv:0802.1189] [INSPIRE]. (2006) 57 [hep-ph/0512210] [INSPIRE]. 1896 [arXiv:1111.6097] [INSPIRE]. [43] ATLAS collaboration, Jet mass and substructure of inclusive jets in p s = 7 TeV pp collisions with the ATLAS experiment, JHEP 05 (2012) 128 [arXiv:1203.4606] [INSPIRE]. [44] CMS collaboration, Studies of jet mass in dijet and W/Z + jet events, JHEP 05 (2013) 090 [arXiv:1303.4811] [INSPIRE]. [45] G. Kasieczka, T. Plehn, T. Schell, T. Strebler and G.P. Salam, Resonance Searches with an Updated Top Tagger, JHEP 06 (2015) 203 [arXiv:1503.05921] [INSPIRE]. Z boson in hadronic 145 [arXiv:1506.01443] [INSPIRE]. [46] CMS collaboration, Search for a massive resonance decaying into a Higgs boson and a W or nal states in proton-proton collisions at p s = 8 TeV, JHEP 02 (2016) nal states at p W or Z boson decays in pp collisions at p [arXiv:1405.1994] [INSPIRE]. pp collisions at 7 TeV, Phys. Lett. B 723 (2013) 280 [arXiv:1212.1910] [INSPIRE]. [51] ATLAS collaboration, Search for high-mass diboson resonances with boson-tagged jets in s = 8 TeV with the ATLAS detector, JHEP 12 (2015) 055 [48] CMS collaboration, Search for massive resonances decaying into pairs of boosted bosons in s = 8 TeV, JHEP 08 (2014) 174 [arXiv:1405.3447] [INSPIRE]. [49] CMS collaboration, Search for massive resonances in dijet systems containing jets tagged as [52] ATLAS collaboration, Search for Higgs boson pair production in the bbbb nal state from pp s = 8 TeV with the ATLAS detector, Eur. Phys. J. C 75 (2015) 412 for boosting to uniformity, 2015 JINST 10 T03002 [arXiv:1410.4140] [INSPIRE]. Quark/Gluon Discrimination, JHEP 11 (2014) 129 [arXiv:1408.3122] [INSPIRE]. Analysis with ROOT, PoS(ACAT)040 [physics/0703039] [INSPIRE]. [9] S.D. Ellis , C.K. Vermilion and J.R. Walsh , Recombination Algorithms and Jet Substructure: Pruning as a Tool for Heavy Particle Searches , Phys. Rev. D 81 ( 2010 ) 094023 [10] D. Krohn , J. Thaler and L.-T. Wang, Jet Trimming, JHEP 02 ( 2010 ) 084 [12] A.J. Larkoski , S. Marzani , G. Soyez and J. Thaler , Soft Drop, JHEP 05 ( 2014 ) 146 [34] L.G. Almeida , M. Backovic , M. Cliche , S.J. Lee and M. Perelstein , Playing Tag with ANN: [35] L. de Oliveira , M. Kagan , L. Mackey , B. Nachman and A. Schwartzman , Jet-Images | Deep [42] M. Cacciari , G.P. Salam and G. Soyez, FastJet User Manual , Eur. Phys. J. C 72 ( 2012 ) [47] CMS collaboration, Search for Narrow High-Mass Resonances in Proton-Proton Collisions s = 8 TeV Decaying to a Z and a Higgs Boson, Phys . Lett. B 748 ( 2015 ) 255 [53] A. Rogozhnikov , A. Bukva , V. Gligorov , A. Ustyuzhanin and M. Williams , New approaches [54] A.J. Larkoski , J. Thaler and W.J. Waalewijn , Gaining (Mutual) Information about [55] H. Voss , A. Hoecker , J. Stelzer and F. Tegenfeldt , TMVA , Toolkit for Multivariate Data


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2FJHEP05%282016%29156.pdf

James Dolen, Philip Harris, Simone Marzani. Thinking outside the ROCs: Designing Decorrelated Taggers (DDT) for jet substructure, Journal of High Energy Physics, 2016, 156, DOI: 10.1007/JHEP05(2016)156