This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Back To Schedule
Wednesday, June 29 • 1:54pm - 2:12pm
How to do one's taxes with R

Log in to save this to your schedule, view media, leave feedback and see who's attending!

In this talk it is shown how to generate a return of tax (German VAT) with R and send it over the internet to the tax administration. As this is certainly not a standard application for R (special software exists for this purpose) it may be worthwhile to have a closer look at the techniques used to realize such kind of transaction and to reveal any analogies to distributed data analysis. If confidential data cannot be analysed in the environment where it is created or stored, it has to be transferred over the internet to some kind of nexecution service, e.g. a cluster system. Encryption is necessary to protect the data as well as appending a digital signature to guarantee ownership nand prevent modification. Additionally some kind of packaging has to be applied to the data together with metadata giving directions for the receiver to handle the delivery. When returning the result the same techniques are used. So again privacy and authorship are ensured. For the tax example all these procedures have to observe well established cryptographic standards for encryption, hashing and digital signatures which change from time to time according to new results in cryptographic research. I demonstrate an implemenation in R for this kind of transaction in a data science context, trying to use the same rigorous standards mentioned above whenever possible. This leads to an overview of existing R packages and external software useful and necessary to realize a corresponding program.  Finally some proposals for a possible standardization of a secure distributed data analysis scenario are presented.

avatar for Rasmus Arnling Bååth

Rasmus Arnling Bååth

Data Scientist, King
I'm a Data scientist at King interested all things stats, but if it's Bayesian I'm especially interested.


Benno Süselbeck

University of Muenster, Center for Information Processing

Wednesday June 29, 2016 1:54pm - 2:12pm PDT
McCaw Hall