This event has ended. Create your own event → Check it out
This event has ended. Create your own
Register Now or Visit the Website for more Information 
View analytic
Tuesday, May 10 • 9:00am - 9:50am
Random Forest Clustering with Apache Spark - Erik Erlandson, Red Hat, Inc.

Sign up or log in to save this to your schedule and see who's attending!

Analytics applications often boil down to grouping objects into two or more clusters having similar elements. Defining what “similar” means can be surprisingly difficult when data elements have many columns or dimensions. Having tools at hand to generate quality clusters from high-dimensional data greatly increases the variety of applications that can successfully leverage clustering.

In this presentation, Erik Erlandson will introduce the basic principles and advantages of Random Forest learning models and Random Forest clustering. He will explain how to build up an implementation of Random Forest clustering in the Apache Spark analytics framework, based on the Spark MLLib Random Forest modeling API.

The presentation will include examples of Random Forest clustering applied to VM installed-package profiles and a discussion of practical issues encountered along the way.


Erik Erlandson

Red Hat, Inc.
Erik Erlandson is a Senior Software Engineer at Red Hat, where he investigates analytics use cases and scalable deployments for Apache Spark on clustering and cloud-enabled environments. Erik also consults on internal data science and analytics projects. He is a contributor to Apache Spark and other open source projects in the Spark ecosystem. His presentations include talks at ApacheCon Big Data 2016, Data Science PHX meetup and Condor Week... Read More →

Tuesday May 10, 2016 9:00am - 9:50am
Plaza C