First experience and adaptation of existing tools to ATLAS distributed analysis

The European Physical Journal C, Feb 2008

The ATLAS production system has been successfully used to run production of simulation data at an unprecedented scale in ATLAS. Up to 10000 jobs were processed on about 100 sites in one day. The experiences obtained operating the system on several grid flavours was essential to perform a user analysis using grid resources. First tests of the distributed analysis system were then performed. In the preparation phase data was registered in the LHC file catalog (LFC) and replicated in external sites. For the main test, few resources were used. All these tests are only a first step towards the validation of the computing model. The ATLAS management computing board decided to integrate the collaboration efforts in distributed analysis in only one project, GANGA. The goal is to test the reconstruction and analysis software in a large scale Data production using grid flavors in several sites. GANGA allows trivial switching between running test jobs on a local batch system and running large-scale analyses on the grid; it provides job splitting and merging, and includes automated job monitoring and output retrieval.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1140%2Fepjc%2Fs10052-007-0499-9.pdf

First experience and adaptation of existing tools to ATLAS distributed analysis

S.G. De La Hoz 1 L.M. Ruiz 1 D. Liko 0 0 CERN - European Organization for Nuclear Research , 1211 Geneva, Switzerland 1 IFIC - Instituto de Fsica Corpuscular, Centro Mixto Universitat de Val`encia - CSIC , Valencia, Apartado de Correos 22085, 46071, Spain The ATLAS production system has been successfully used to run production of simulation data at an unprecedented scale in ATLAS. Up to 10 000 jobs were processed on about 100 sites in one day. The experiences obtained operating the system on several grid flavours was essential to perform a user analysis using grid resources. First tests of the distributed analysis system were then performed. In the preparation phase data was registered in the LHC file catalog (LFC) and replicated in external sites. For the main test, few resources were used. All these tests are only a first step towards the validation of the computing model. The ATLAS management computing board decided to integrate the collaboration efforts in distributed analysis in only one project, GANGA. The goal is to test the reconstruction and analysis software in a large scale Data production using grid flavors in several sites. GANGA allows trivial switching between running test jobs on a local batch system and running large-scale analyses on the grid; it provides job splitting and merging, and includes automated job monitoring and output retrieval. 1 Introduction using grid resources. It provides a robust framework to ex 1 PB (1015 bytes) per year. Due to the size of this expected 1.1 Atlas production system stored in a central database. A supervisor agent picks ically, a simulation (long) job runs for 24 h, whereas them up, and sends their definition as an XML message a digitization or reconstruction (short) job runs for 3 to the various executors. Executors are specialized agents, to 4 h. able to convert the ATLAS specific XML job description The ATLAS production system was successfully used into a grid specific language. Three executors were de- in DC2 to run production jobs at an unprecedented scale veloped, for LCG (Lexor and CondorG), NorduGrid (Dul- for a system deployed on about 100 sites around the world. cinea) and OSG (Capone). All the data management op- On successful days there were more then 10 000 jobs proerations are performed using a central service, Don Qui- cessed. In the ATLAS DC2 exercise a total of 10 million jote (DQ) [7]. DQ moves files from their temporary out- events were processed in 260 thousand jobs, consuming put locations to their final destination on some Storage 200 kSI2k years of CPU and producing 60 TB of data. Element and registers this location in the Replica Loca- During DC2 period, which took 6 moths, the automatic tion Service of the corresponding grid flavor. Thus all the production system submitted about 235 000 jobs, reachcopy and registration operations are performed through an ing approximately an average of 25003500 jobs per day, abstraction layer provided by DQ. This allows operating distributed over the three grid flavours. Overall, they conthe different replica catalogues of the three grid flavors in sumed 1.5 million SI2k-months of CPU ( 5000 CPU a similar way. months on that average present day CPU) and produced The ATLAS ProdSys has been used since the ATLAS more than 30 TB of physics data. About 6 TB of these data Data Challenge 2 (DC2) and is currently being used for were moved using DQ servers. the ATLAS Data Challenge 3 (DC3, also called Computing System Commissioning, CSC). The ATLAS ProdSys distinguishes between two levels of abstraction: task and job. A task transforms input 2 Distributed analysis strategy datasets into output datasets by applying a task transformation. Datasets are usually quite large and consist The ATLAS distributed analysis system is in evolvement of many logical files. In this case, a job transforms in- and several approaches are being studied and evaluated. In put logical files into output logical files by applying a job this way, ATLAS has adopted a multi-pronged approach transformation. In this way, one could say that a task is to distributed analysis by exploiting its existing grid insplit into several jobs and these jobs are managed by AT- frastructure via the various supported grid flavours and LAS ProdSys on the different grid flavour resources, so the indirectly via the ATLAS production system (see Fig. 2). overall production system relies on the performance of the Figure 2 shows various front-end clients enabling disindividual grid systems. tributed analysis on the existing grid infrastructures. According to the ATLAS full simulation chain, one These front-end clients (PanDA [8]/Pathena, GANGA [9] can classify these jobs as: event generation (evgen), simu- and ATCOM [10]) are intended to perform distributed anlation (simul), digitalization (digit) and reconstruction alysis according to Fig. 2. (recon) jobs. The ATLAS full simulation requires a chain PanDA (production and distributed analysis) is a (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1140%2Fepjc%2Fs10052-007-0499-9.pdf

S.G. De La Hoz, L.M. Ruiz, D. Liko. First experience and adaptation of existing tools to ATLAS distributed analysis, The European Physical Journal C, 2008, pp. 467-471, Volume 53, Issue 3, DOI: 10.1140/epjc/s10052-007-0499-9