Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

New Projects [clear filter]
Monday, May 9

10:40am PDT

The Evolution of Apache Kylin: Realtime and Plugin Architecture in Kylin2 - Luke Han, Apache Kylin
After successful MOLAP implementation, Apache Kylin’s evolution is turning to enable realtime analysis, and also to support different input and output data sources, leverage different computing engines. In Apache Kylin2, the new designed architecture support plug-able adaptor from Hive/SparkSQL/Kafka and others, and also possible to store data into other storage system rather than HBase, like Kudu. During this session, will introduce the detail of such changes and coming features. Also will cover one production use case with streaming supported already
1. Apache Kylin Overview
2. Plugin Architecture
3. Streaming Cubing
4. Realtime Analysis
5. Use Cases.

avatar for Luke Han

Luke Han

Co-Founder & CEO, Kyligence
Luke Han is Co-Founder and CEO at Kyligence, and the co-creator and VP of the open source Apache Kylin project, who contributing his passion to driving the project's strategy, roadmap and product design. For past few years he has been working on growing Apache Kylin's community... Read More →

Monday May 9, 2016 10:40am - 11:30am PDT
Regency E

11:40am PDT

Apache Trafodion Brings Operational Workloads to Hadoop - Rohit Jain, Esgyn
Apache Trafodion is a world class Transactional SQL RDBMS running on HBase/Hadoop, currently in Apache incubation.

In this talk we will discuss:
• How operational workloads are different from BI and analytical workloads
• The operational (OLTP & Operational Data Store) use cases Trafodion addresses
• Why Trafodion is the right solution for these use cases. That is, what is the recipe for a world class database engine, and how Trafodion implements the ingredients that make up that recipe:
1. Time, money, and talent!
2. World class query optimizer
3. World class parallel data flow execution engine
4. World class distributed transaction management system
• Other important aspects such as performance, scale, availability, and future directions


Rohit Jain

CTO, Esgyn
Rohit Jain is Co-Founder and CTO at Esgyn, an open source database company. Rohit provided the vision behind Apache Trafodion, an enterprise-class MPP SQL Database for Big Data, donated to the Apache Software Foundation by HP in 2015. A veteran database technologist over the past... Read More →

Monday May 9, 2016 11:40am - 12:30pm PDT
Regency E

3:00pm PDT

Introduction to Apache Kudu (Incubating) for Timeseries Storage - Dan Burkert, Cloudera
Apache Kudu (Incubating) is a new columnar storage engine for the Hadoop
ecosystem. Kudu is designed to handle the stresses of the modern analytics
pipeline, enabling real time ingestion with instant querying capability at

This talk will introduce Kudu, giving an overview of the architecture and
internals. After discussing what makes Kudu different than existing Hadoop
storage platforms, we will discuss why Kudu is particularly well suited for
storing and querying large timeseries datasets. The talk will conclude by
demonstrating a realtime timeseries analytics dashboard powered by Kudu.


Dan Burkert

Dan Burkert is a software engineer at Cloudera and committer on Apache Kudu (Incubating). Prior to joining Cloudera, Dan worked on data processing pipelines for machine learning, search, and analytics. Dan received his bachelor’s degree from the University of Virginia.

Monday May 9, 2016 3:00pm - 3:50pm PDT
Regency E

4:10pm PDT

Everyone Plays: Collaborative Data Science with Zeppelin - Trevor Grant, Market6
Data Science is best played as a team sport. Zeppelin facilitates this collaboration via a web based notebook interface to state-of-the-art big data (Flink, Spark, Hive, Cassandra, and many more), with custom visualization powered by AngularJS built in. Markdown allows for rich notation in-line with the code. Work can be shared seamlessly across the organization. Further, interactive visualizations can be shared with business analysts and sales reps, great for prototyping and proof of concepts. But the collaboration also runs between technologies, by leveraging the Zeppelin Context sharing variables BETWEEN contexts. E.g. the results of a Flink paragraph can be passed to a Spark paragraph; the best tool can be used for the job can be used at each step in analytics pipeline and a data scientist who loves Scala Flink can easily work with a data scientist who loves pyspark.

avatar for Trevor Grant

Trevor Grant

Open Source AI / IoT Evangelist, IBM
Trevor is an open source evangelist at IBM in Watson IoT. He is also a PMC on the Apache Mahout, Apache Streams, and Apache Community Development projects. He has spoken at conferences and Meetups internationally.

Monday May 9, 2016 4:10pm - 5:00pm PDT
Regency E
Wednesday, May 11

10:50am PDT

Apache Yetus - Helping Solve the Last Mile Problem - Allen Wittenauer, Altiscale
In this time of rapidly growing software projects and software capabilities, where it is expected for “software to eat the world,” there is still a huge challenge going from source code to a tested, fully functional release. This is the “last mile problem,” ensuring that vision and coding become real, deployable software. To help address this problem, members of the extended Apache Hadoop/”big data” ecosystem have joined forces to create tools that reduce the burden of pre-commit testing, release note compilation and interface documentation. In this talk, Allen Wittenauer, a PMC member of the Apache Yetus project, will discuss the various components that make up the Yetus toolset, as well as how Apache Hadoop and other projects are using Apache Yetus to improve release quality.

avatar for Allen Wittenauer

Allen Wittenauer

Apache Yetus PMC Member, Apache Software Foundation
Allen Wittenauer has been involved with Apache Hadoop since May 2007, when he was hired by Yahoo! to bring large-scale operational experience to the fledgling project. His work there helped create the basic blueprints that almost all Hadoop deployments follow today. At LinkedIn, his... Read More →

Wednesday May 11, 2016 10:50am - 11:40am PDT
Regency A

2:00pm PDT

Apache REEF - Stdlib for Big Data - Sergiy Matusevych, Microsoft
Apache REEF (Sergiy Matusevych, Microsoft) - Resource managers like Apache YARN and Mesos have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low-level. We present Apache REEF, a powerful yet simple framework that helps developers of big data systems to retain fine-grained control over the cloud resources and address common problems of fault-tolerance, task scheduling and coordination, caching, interprocess communication, and bulk-data transfers. We will guide the developers through a simple REEF application and discuss current state of Apache REEF project and its place in the Hadoop ecosystem.

avatar for Sergiy Matusevych

Sergiy Matusevych

Sr. Research Engineer, Microsoft
Sergiy is a research engineer at Microsoft Cloud and Information Services Lab, where he is building large scale distributed systems for big data and machine learning. He is a committer to the Apache REEF project. Prior to Microsoft, Sergiy worked as a data research engineer at Yahoo... Read More →

Wednesday May 11, 2016 2:00pm - 2:50pm PDT
Regency A

4:10pm PDT

Graph Processing with Apache TinkerPop - Jason Plurad, IBM
Graphs are growing in popularity, but the landscape is becoming a hairball. Learn how to unravel it with the Apache TinkerPop graph computing framework and Gremlin, a functional, data flow language for traversing graphs. This session helps you distinguish between OLTP and OLAP graph processing as well as how to bridge the gap between graph databases and graph engines. We will offer TinkerPop alternatives for effective graph processing that go beyond Spark GraphX. We will also cover how to spin up a graph development environment quickly with Apache Ambari.


Jason Plurad

Software Engineer, IBM
Jason Plurad is a software engineer from IBM Open Technology. He is a PMC member and committer on Apache TinkerPop, an open source graph computing framework. Jason engages in various development (including front end, web tier, NoSQL databases, and big data analytics) and promotes... Read More →

Wednesday May 11, 2016 4:10pm - 5:00pm PDT
Regency A