Loading…
Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 
View analytic
Thursday, May 12 • 9:00am - 12:00pm
Getting Started with Machine Learning & Spark - Holden Karau, IBM (Additional Fee)

Sign up or log in to save this to your schedule and see who's attending!

Apache Spark is a fast and general engine for distributed computing & big data processing with APIs in Scala, Java, Python, and R. Apache Spark ships with built in libraries for a variety of purposes including: SQL, Streaming, Graph Analysis, and Machine Learning. This talk will focus on how to use Spark for Machine Learning.

Apache Spark has two APIs for Machine Learning, the newer of which is focused on creating Machine Learning Pipelines. This talk will explore a simple classification problem in both of the APIs, followed by a tour of some of the different machine learning models. We will then talk about loading/saving models and the challenges faced when attempting to construct a real-time serving solution from Spark ML’s models. From their we will explore some of the performance improvement work being done inside of Spark for improving machine learning.

Speakers
avatar for Holden Karau

Holden Karau

Developer Advocate, Google
Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, Airflow, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer and... Read More →


Thursday May 12, 2016 9:00am - 12:00pm
Constable

Attendees (8)