Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Interfacing with Big Data [clear filter]
Monday, May 9

11:40am PDT

Building Large Scale Applications in Apache Hadoop YARN with Apache Twill - Henry Saputra & Terence Yim, Apache Software Foundation
Apache Twill incubating is new Apache incubator project that provides higher level abstraction to build distributed systems applications on top of Apache Hadoop YARN. Developing distributed applications using YARN is hard because YARN does not provide higher level APIs and lots of boiler plate code that need to be duplicated to deploy the application. Developing YARN applications usually done by framework developers such as from Apache Flink or Apache Spark developers who need to leverage YARN as resource management for deploying the framework in distributed way.
Using Apache Twill, application developers just need to know basic concept of Java programming model when using the Apache Twill APIs so they can focus solving business problems. In this talk I would like to also present example of Cask Data Application Platform (CDAP) that heavily use Apache Twill as resource management

avatar for Henry Saputra

Henry Saputra

Software Engineer, ASF
Member of the Apache Software Foundation (ASF) PMC, Committer, and contributor to several Apache Software Foundation projects: Incubator, Aurora, MetaModel, Flink, Gora, Tajo, Twill. Mentor and former mentor to some Apache Incubator projects: Aurora, MetaModel, Spark, Kylin... Read More →
avatar for Terence Yim

Terence Yim

Software Engineer, Cask Data Inc.
Terence Yim is a Software Engineer at Cask, responsible for designing and building the Cask Data Application Platform (CDAP). He is also the lead developer and PPMC of the Apache Twill and the Apache Tephra projects. Prior to joining Cask, Terence worked at both LinkedIn and... Read More →

Monday May 9, 2016 11:40am - 12:30pm PDT
Georgia A

2:00pm PDT

SMACK Stack - Data Done Right - Stefan Siprell, codecentric AG
A talk covering the best-of-breed platform consisting of Spark, Mesos, Akka, Cassandra and Kafka. SMACK is more of a toolbox of technologies to allow the building of resilient ingestion pipelines, offering a high degree of freedom in the selection of analysis and query possibilities and baked in support for flow-control. More and more customers are using this stack, which is rapidly becoming the new industry standard for Big Data solutions.

avatar for Stefan Siprell

Stefan Siprell

General Manager, codecentric AG
Anything which required integrating has been integrated by Stefan in his career. Currently he is working as an Architect at codecentric. There is projects have become much more demanding, which resulted in more resilient plattforms supporting previously unimaginable data transfer... Read More →

Monday May 9, 2016 2:00pm - 2:50pm PDT
Georgia A

3:00pm PDT

Druid: Interactive Exploratory Analytics at Scale - Fangjin Yang, Imply
Cluster computing frameworks such as Hadoop or Spark are tremendously beneficial in processing and deriving insights from data. However, long query latencies make these frameworks sub-optimal choices to power interactive applications. Organizations frequently rely on dedicated query layers, such as relational databases and key/value stores, for faster query latencies, but these technologies suffer many drawbacks for analytic use cases. In this session, we discuss using Druid for analytics, and why the architecture is well suited to power analytic dashboards.


Fangjin Yang

CEO, Imply
Fangjin is a co-author of the open source Druid project and a co-founder of Imply, a San Francisco based technology company. Fangjin previously held senior engineering positions at Metamarkets (acquired by Snap, Inc.) and Cisco. He holds a BASc in Electrical Engineering and a MASc... Read More →

Monday May 9, 2016 3:00pm - 3:50pm PDT
Georgia A

4:10pm PDT

Next Gen Big Data Analytics with Apache Apex - Thomas Weise, DataTorrent
Apache Apex is a next gen big data analytics platform. Originally developed at DataTorrent it comes with a powerful stream processing engine, rich set of functional building blocks and an easy to use API for the developer to build real-time and batch applications. Apex runs natively on YARN and HDFS and is used in production in various industries. You will learn about the Apex architecture, including its unique features for scalability, fault tolerance and processing guarantees, programming model and use cases.

avatar for Thomas Weise

Thomas Weise

CTO, Atrato.io
Thomas is Apache Apex PMC Chair and CTO at Atrato. Prior to founding Atrato he was Architect at DataTorrent and lead the development of Apex from the beginning of the project. Before that he was member of the Hadoop Team at Yahoo! and contributed to several of the big data ecosystem... Read More →

Monday May 9, 2016 4:10pm - 5:00pm PDT
Georgia A

5:10pm PDT

Hands-on Apache NiFi - Oleg Zhurakousky, Hortonworks
While Apache NiFi provides out-of-the-box support to build powerful and scalable directed graphs of data routing, transformation, and system mediation logic, some times "the world is not enough".
Roll up your sleeves and put your hands on the keyboard as this hands-on talk structured as a set of quick tutorials will take you through a journey of developing in NiFi. It will cover extension points such as Processor, ControllerService, ReportingTasks as well as other less known areas of NiFi internals, sharing some tips and tricks along the way.


Oleg Zhurakousky,

Open source practitioner with over 17 years of experience in software engineering across multiple disciplines including Big Dada, software architecture and design, consulting, business analysis and application development. Speaker who presented at dozens of conferences worldwide (i.e... Read More →

Monday May 9, 2016 5:10pm - 6:00pm PDT
Georgia A