Loading…
This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Thursday, June 30 • 10:45am - 10:50am
Clustering of Hierarchically-Linked Multivariate Datasets

Log in to save this to your schedule, view media, leave feedback and see who's attending!

We present the growclusters package for R that implements a maximum posterior estimation of partitions (clusters) using a penalized optimization function derived from the limit of a Bayesian probability model under a multivariate Gaussian mixture on the mean, either under a Dirichlet process (DP) mixing measure or a hierarchical DP (HDP) mixing measure in the limit of a function of the global variance (to zero). We illustrate this package using data collected from a federal survey of business establishments. A special feature of this data is that it is collected under an informative sampling design. Under an informative sampling design the probability of inclusion depends on the surveyed response. We demonstrate a feature of the growclusters package that incorporates the sampling weights to “undo” the effects of the informative design to yield asymptotically unbiased estimation of the clusters.

Moderators
avatar for Max

Max

principal software engineer, Posit PBC
Max Kuhn is a software engineer at Posit PBC where he is working on improving R’s modeling capabilities and maintaining about 30 packages, including caret and tidymodels. He has a Ph.D. in Biostatistics. Max was a Senior Director of Nonclinical Statistics at Pfizer Global R&D and... Read More →

Speakers
JM

Jeffrey Mark Gonzalez

US Bureau of Labor Statistics


Thursday June 30, 2016 10:45am - 10:50am PDT
Econ 140