Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 
Back To Schedule
Tuesday, May 10 • 10:00am - 10:50am
Breaking Spark: Top 5 Mistakes to Avoid When Using Apache Spark in Production - Neelesh Srinivas Salian, Cloudera

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Apache Spark has been growing in deployments for the past year. The increasing amount of data being analyzed and processed through the framework is massive and it continues to push the boundaries of the engine.

This talk will focus on common problematic issues observed in a cluster environment setup with Apache Spark, based on the presenter’s experiences across 150+ production deployments.

When planning a Apache Spark deployment in a cluster, it is recommended to follow certain guidelines to help setup a real-world environment. The classification of issues that can occur are:

1) Scaling of the Architecture
2) Memory Configurations
3) End user Code
4) Incompatible Dependencies
5) Administration/Operation related issues.

These observations are very useful as they help to improve the usability and supportability of Apache Spark to avoid such issues in future deployments.

avatar for Neelesh Srinivas Salian

Neelesh Srinivas Salian

Software Engineer, Stitch Fix
Neelesh Srinivas Salian is a Software Engineer on the Data Platform team at Stitch Fix, where he works on the compute infrastructure used by data scientists. He helps build services that are part of Stitch Fix’s Data Warehouse ecosystem. Currently he is working to build Data Lineage... Read More →

Tuesday May 10, 2016 10:00am - 10:50am PDT
Georgia B