This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Back To Schedule
Wednesday, June 29 • 2:12pm - 2:30pm
Providing Digital Provenance: from Modeling through Production

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Reproducibility is important throughout the entire data science process. As recent studies have shown, subconscious biases in the exploratory analysis phase of a project can have vast repercussions over final conclusions. The problems with managing the deployment and life-cycle of models in production are vast and varied, and often reproducibility stops at the level of the individual analyst. Though R has best in class support for reproducible research, with tools like KnitR to packrat, they are limited in their scope. In this talk we present a solution we have developed at Domino, which allows for every model in production to have full reproducibility from EDA to the training run and exact datasets which were used to generate. We discuss how we leverage Docker as our reproducibility engine, and how this allows us to provide the irrefutable provenance of a model.


Hana Ševčíková

University of Washington

avatar for Eduardo Ariño de la Rubia

Eduardo Ariño de la Rubia

Chief Data Scientist in Residence, Domino Data Lab
Eduardo Arino de la Rubia is Chief Data Scientist at Domino Data Lab. Eduardo is a lifelong technologist with a passion for data science who thrives on effectively communicating data-driven insights throughout an organization. He is a graduate of the MTSU Computer Science department... Read More →

Wednesday June 29, 2016 2:12pm - 2:30pm PDT
Barnes & McDowell & Cranston