This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Back To Schedule
Wednesday, June 29 • 2:30pm - 3:30pm
Plotting for Marketers - Seeing the Story

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Poster #22

The first step, when getting a new data set, is to take a look at the data for completeness, accuracy, and reasonableness. This talk will describe a method based on Jack Olson's Data Quality - The Accuracy Dimension. The input data set can be either a raw text or spreadsheet file or from a source with columnar meta-data like a SQL table or an R data frame. The only setup is to connect to the data source. Using RMarkdown, dplyr, grid, and ggplot2 we produce a report where each column is profiled by data types, summary statistics (if numeric or date), distribution plot, counts, and the head and tail values. This facilitates a quick visual scan of each column for data quality issues. The simple visual format also aids communication with the data provider to dig into quality issues and, hopefully, clean up the data set before wasting time and effort on an analysis flawed by bad data. We provide examples both good and suspect columns.

avatar for Jim  Porzak

Jim Porzak

Principal, DS4CI
I am a (semi-)retired data scientist specializing in customer insights. I have been using R since 2002 and have presented at all but two useR! conferences starting with the first Vienna useR! 2004. See my archives, ds4ci.org/archives/ for past presentations including tutorials at... Read More →

Wednesday June 29, 2016 2:30pm - 3:30pm PDT
Sponsor Pavilion