Loading…
This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
Back To Schedule
Thursday, June 30 • 10:45am - 10:50am
Clustering of Hierarchically-Linked Multivariate Datasets

Log in to save this to your schedule, view media, leave feedback and see who's attending!

We present the growclusters package for R that implements a maximum posterior estimation of partitions (clusters) using a penalized optimization function derived from the limit of a Bayesian probability model under a multivariate Gaussian mixture on the mean, either under a Dirichlet process (DP) mixing measure or a hierarchical DP (HDP) mixing measure in the limit of a function of the global variance (to zero). We illustrate this package using data collected from a federal survey of business establishments. A special feature of this data is that it is collected under an informative sampling design. Under an informative sampling design the probability of inclusion depends on the surveyed response. We demonstrate a feature of the growclusters package that incorporates the sampling weights to “undo” the effects of the informative design to yield asymptotically unbiased estimation of the clusters.

Moderators
avatar for Max

Max

Software Engineer, former scientist, RStudio
Mayor of Crazytown

Speakers
JM

Jeffrey Mark Gonzalez

US Bureau of Labor Statistics


Thursday June 30, 2016 10:45am - 10:50am PDT
Econ 140