Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 
Tuesday, May 10 • 11:20am - 12:10pm
Spark Cyborgs - Deep Integration of Spark with Parallel Relational Engines - Torsten Steinbach & Gustavo Arocena, IBM

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

In this session we describe a family of hybrid engines that result from a deep two-way integration between Spark and parallel RDBMSs. This integration differs from projects like Hive on Spark, that leverage Spark purely as an execution framework. It also goes beyond what’s possible with the current version of the DataSources API in terms of leveraging the capabilities of the storage backend. In our presentation you will learn about four essential building blocks of the hybrid engines:
1. Derive DataFrame partitioning implicitly from parallel RDBMS partitioning
2. Colocation and efficient data movement between Spark and RDBMS processes
3. Hybrid queries by augmenting parallel RDBMS with Spark
4. Spark machine learning integrated in RDBMS for relational data


Gustavo Arocena

Big Data Architect, IBM
Gustavo Arocena is a Big Data Architect at the IBM Toronto Lab, with more than 10 years of experience in database technology and language processing. Recently he has lead the design and implementation of several components of the Big SQL engine, including the Hive-compatible IO layer... Read More →

Torsten Steinbach

Torsten has been a software architect for database technology in IBM for many years. He lead product development for DB2 performance management tooling, Netezza workload management and in-database analytics. Currently he works on IBM’s cloud data warehouse dashDB and it’s integrated... Read More →

Tuesday May 10, 2016 11:20am - 12:10pm PDT
Plaza C