Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 
Wednesday, May 11 • 2:00pm - 2:50pm
Combining Machine Learning Frameworks with Apache Spark - Tim Hunter, Databricks, Inc.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Machine Learning (ML) workflows involve a sequence of processing and learning stages. Realistic workflows combine specialized libraries with more general data management workflows.

Apache Spark is well-known as a powerful platform to perform iterative computations required for ML. This talk presents how to combine the strengths of Spark’s ML library (MLlib) with popular packages such as scikit-learn and TensorFlow. Scikit-learn is the de facto standard ML library for Python, and TensorFlow is a library for deep learning recently open-sourced by Google.

We also discuss the improvements of MLlib in Spark 2.0 and the future of MLlib’s APIs. On the roadmap are both more algorithms and features for users, and more utilities and abstractions to aid developers.


Tim Hunter

Databricks, Inc.
Tim Hunter is a software engineer at Databricks and contributes to the Spark MLlib project. He has been building distributed Machine Learning systems with Spark since version 0.5, before Spark was an Apache Software Foundation project.

Wednesday May 11, 2016 2:00pm - 2:50pm PDT
Georgia A