Loading…
Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Interfacing with Big Data [clear filter]
Monday, May 9
 

10:40am

Seven Habits of a Highly Effective Big Data Programmer - Rekha Joshi, Intuit Inc
With examples of Big data application use cases, this talk will delve into the Seven Habits of a Highly Effective Big Data Programmer
1. Not just linearly: Sometimes doing more of the same helps, while other times what is required is to break the mould
2. Explore: Whenever you are stuck, explore
3. Design and Redesign: Do it in at least 3 different ways;debate
4.Newton SAW the apple: Really observe what happens.Check the metrics, performance, security. Monitor everything! Every technology has a strong card and an achilles heel.Deliberate on fitting!
5.Evaluate and Reevaluate: Technology will change, so will your customer needs
6.Get Your Mathematics KungFu on:To be savvy on napkin mathematics/modeling can get the scare out of big numbers
7.Networking is the King: In distributed computing, its distribution that’s critical.Often the tilt is one who can understand networking better

Speakers
avatar for Rekha Joshi

Rekha Joshi

Principal Software Engineer, Intuit
Rekha Joshi is a Principal Software Engineer at Intuit, and is getting amazing work done in finance on Big data ecosystem.Previously at Yahoo!, worked on Apache Hadoop since initial versions.She has worked in diverse domains of advertising, supply chain and research.She is an open... Read More →



Monday May 9, 2016 10:40am - 11:30am
Georgia A

11:40am

Building Large Scale Applications in Apache Hadoop YARN with Apache Twill - Henry Saputra & Terence Yim, Apache Software Foundation
Apache Twill incubating is new Apache incubator project that provides higher level abstraction to build distributed systems applications on top of Apache Hadoop YARN. Developing distributed applications using YARN is hard because YARN does not provide higher level APIs and lots of boiler plate code that need to be duplicated to deploy the application. Developing YARN applications usually done by framework developers such as from Apache Flink or Apache Spark developers who need to leverage YARN as resource management for deploying the framework in distributed way.
Using Apache Twill, application developers just need to know basic concept of Java programming model when using the Apache Twill APIs so they can focus solving business problems. In this talk I would like to also present example of Cask Data Application Platform (CDAP) that heavily use Apache Twill as resource management

Speakers
avatar for Henry Saputra

Henry Saputra

Software Engineer, ASF
Member of the Apache Software Foundation (ASF) PMC, Committer, and contributor to several Apache Software Foundation projects: Incubator, Aurora, MetaModel, Flink, Gora, Tajo, Twill. Mentor and former mentor to some Apache Incubator projects: Aurora, MetaModel, Spark, Kylin... Read More →
avatar for Terence Yim

Terence Yim

Software Engineer, Cask Data Inc.
Terence Yim is a Software Engineer at Cask, responsible for designing and building the Cask Data Application Platform (CDAP). He is also the lead developer and PPMC of the Apache Twill and the Apache Tephra projects. Prior to joining Cask, Terence worked at both LinkedIn and... Read More →



Monday May 9, 2016 11:40am - 12:30pm
Georgia A

2:00pm

SMACK Stack - Data Done Right - Stefan Siprell, codecentric AG
A talk covering the best-of-breed platform consisting of Spark, Mesos, Akka, Cassandra and Kafka. SMACK is more of a toolbox of technologies to allow the building of resilient ingestion pipelines, offering a high degree of freedom in the selection of analysis and query possibilities and baked in support for flow-control. More and more customers are using this stack, which is rapidly becoming the new industry standard for Big Data solutions.

Speakers
avatar for Stefan Siprell

Stefan Siprell

General Manager, codecentric AG
Anything which required integrating has been integrated by Stefan in his career. Currently he is working as an Architect at codecentric. There is projects have become much more demanding, which resulted in more resilient plattforms supporting previously unimaginable data transfer... Read More →


Monday May 9, 2016 2:00pm - 2:50pm
Georgia A

3:00pm

Druid: Interactive Exploratory Analytics at Scale - Fangjin Yang, Imply
Cluster computing frameworks such as Hadoop or Spark are tremendously beneficial in processing and deriving insights from data. However, long query latencies make these frameworks sub-optimal choices to power interactive applications. Organizations frequently rely on dedicated query layers, such as relational databases and key/value stores, for faster query latencies, but these technologies suffer many drawbacks for analytic use cases. In this session, we discuss using Druid for analytics, and why the architecture is well suited to power analytic dashboards.

Speakers
FY

Fangjin Yang

CEO, Imply
Fangjin is a co-author of the open source Druid project and a co-founder of Imply, a San Francisco based technology company. Fangjin previously held senior engineering positions at Metamarkets (acquired by Snap, Inc.) and Cisco. He holds a BASc in Electrical Engineering and a MASc... Read More →


Monday May 9, 2016 3:00pm - 3:50pm
Georgia A

4:10pm

Next Gen Big Data Analytics with Apache Apex - Thomas Weise, DataTorrent
Apache Apex is a next gen big data analytics platform. Originally developed at DataTorrent it comes with a powerful stream processing engine, rich set of functional building blocks and an easy to use API for the developer to build real-time and batch applications. Apex runs natively on YARN and HDFS and is used in production in various industries. You will learn about the Apex architecture, including its unique features for scalability, fault tolerance and processing guarantees, programming model and use cases.

Speakers
avatar for Thomas Weise

Thomas Weise

CTO, Atrato.io
Thomas is Apache Apex PMC Chair and CTO at Atrato. Prior to founding Atrato he was Architect at DataTorrent and lead the development of Apex from the beginning of the project. Before that he was member of the Hadoop Team at Yahoo! and contributed to several of the big data ecosystem... Read More →



Monday May 9, 2016 4:10pm - 5:00pm
Georgia A

5:10pm

Hands-on Apache NiFi - Oleg Zhurakousky, Hortonworks
While Apache NiFi provides out-of-the-box support to build powerful and scalable directed graphs of data routing, transformation, and system mediation logic, some times "the world is not enough".
Roll up your sleeves and put your hands on the keyboard as this hands-on talk structured as a set of quick tutorials will take you through a journey of developing in NiFi. It will cover extension points such as Processor, ControllerService, ReportingTasks as well as other less known areas of NiFi internals, sharing some tips and tricks along the way.

Speakers
OZ

Oleg Zhurakousky,

Hortonworks
Open source practitioner with over 17 years of experience in software engineering across multiple disciplines including Big Dada, software architecture and design, consulting, business analysis and application development. Speaker who presented at dozens of conferences worldwide (i.e... Read More →


Monday May 9, 2016 5:10pm - 6:00pm
Georgia A
 
Tuesday, May 10
 

10:00am

Apache BigTop Hadoop Dev Test Benchmark on Your Favorite Cloud or Laptop - Antonio Rosales, Canonical & Konstantin Boudnik, Memcore
Apache Bigtop is the foundation for Open Source BigData Projects. In this talk, we discuss how you can 1-command deploy a Apache Bigtop multi-node Hadoop cluster with Ganglia monitoring to your favorite cloud or onto containers on your laptop. Then dev/test and bechmark your cluster all with Open Source tools.

Speakers
avatar for Antonio Rosales

Antonio Rosales

Canonical
Collaborating with communities to help folks get answers faster.


Tuesday May 10, 2016 10:00am - 10:50am
Plaza A