Loading…
This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Back To Schedule
Thursday, June 30 • 11:25am - 11:30am
Outlier Detection Methods

Log in to save this to your schedule, view media, leave feedback and see who's attending!

This talk reviews some of the most relevant statistical approaches for outlier and anomaly detection in R. It covers statistical approaches, clustering based approaches, nearest neighbor approaches, random forest, and autoencoders. This talk will pull from a variety of existing R packages including DMwR, fclust, dbscan, isolation forest, and autoencoder. nThe talk is relevant because outlier detection is a necessary step to clean the data and in other instances, the outliers may be of interest themselves. For example, identification of credit card fraud involves identifying outliers. While the need for outlier detection has increased, there has also been a rise in the number of techniques for identifying outliers. nThis talk provides a theoretical background for each approach. This provides an understanding of the assumptions and limitations of each approach. This will then be demonstrated with examples of different datasets to show how performance varies for differing approaches. The specific methods include: Extreme Value, Expectation Maximization, Kmeans, Fuzzy Clustering, DBSCAN, Isolation Forests, an Autoencoders. nThis talk includes sharing/demonstrating two interactive shiny apps that illustrate these algorithms. The first is for low dimensional data and is available at: http://projects.rajivshah.com/shiny/outlier/ The second is a shiny app that must be run locally on a computer (due to computational requirements), for higher dimension methods. By including a shiny application, this allows people to try these different methods out on datasets. In my experience, these sort of talks resonate much better, because the audience can be directly involved.

Moderators
avatar for Max

Max

Software Engineer, former scientist, RStudio
Mayor of Crazytown

Speakers
avatar for Rajiv  Shah

Rajiv Shah

Data Scientist, DataRobot
Rajiv Shah is a data scientist at a large insurance company and an Adjunct Assistant Professor at the University of Illinois at Chicago. He is an active member of the data science community in Chicago with projects and publications related to surveillance and red light cameras and... Read More →


Thursday June 30, 2016 11:25am - 11:30am PDT
Econ 140