This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Wednesday, June 29 • 1:36pm - 1:54pm
Adding R, Jupyter and Spark to the toolset for understanding the complex computing systems at CERN's Large Hadron Collider

Log in to save this to your schedule, view media, leave feedback and see who's attending!

High Energy Physics (HEP) has a decades long tradition of statistical data analysis and of using large computing infrastructures. CERN's current flagship project LHC has collected over 100 PB of data, which is analysed in a wold-wide distributed computing grid by millions of jobs daily. Being a community with several thousand scientists, HEP also has a tradition of developing its own analysis toolset. In this contribution we will briefly outline the core physics analysis tasks and then focus on applying data analysis methods also to understand and optimise the large and distributed computing systems in the CERN computer centre and the world-wide LHC computing grid. We will describe the approach and tools picked for the analysis of metrics about job performance, disk and network I/O and the geographical distribution and access to physics data. We will present the technical and non-technical challenges in optimising a distributed infrastructure for large scale science projects and will summarise the first results obtained.

avatar for Rasmus Arnling Bååth

Rasmus Arnling Bååth

Data Scientist, King
I'm a Data scientist at King interested all things stats, but if it's Bayesian I'm especially interested.

avatar for Dirk  Duellmann

Dirk Duellmann

Analysis & Design - Storage Group, CERN
quantitative understanding of large computing and storage systems

Wednesday June 29, 2016 1:36pm - 1:54pm PDT
McCaw Hall