Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 
Back To Schedule
Wednesday, May 11 • 11:50am - 12:40pm
Boost Spark ML Performance with Project Mnemonic - Yanping Wang & Gang Wang, Intel Corp.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Project Mnemonic is an open-source, structured data in-place persistence library for Java-based applications and frameworks. It provides unified interfaces for data manipulation on heterogeneous block/byte-addressable devices, such as DRAM, SSD, NVMe, and Cloud/network devices.
In this presentation, we will first introduce Project Mnemonic and non-volatile Java object model that defines in-memory non-volatile objects which can be directly stored in persistent memory. We will discuss how it can be used to allocate and reclaim heterogeneous memory and storage resources directly on DRAM, NVMe, other persistent memories, and SSD. Then we will show how in-memory non-volatile RDDs can be implemented in Spark. Finally we will present that 2X plus performance boost can be achieved on a Spark ML workload after removing SerDe RDDS, caching hot data, and reducing GC pause time dramatically.

avatar for Yanping Wang

Yanping Wang

Software Engineer, Intel Corp
As a Senior Software Performance Engineer at Intel, Yanping has been working on Java and Big Data applications performance for the past 15 years. Currently, she is focusing on improving Big Data applications performance by reducing garbage collection and serialization/de-serialization... Read More →

Wednesday May 11, 2016 11:50am - 12:40pm PDT
Georgia A