Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Tutorial [clear filter]
Thursday, May 12

9:00am PDT

Getting Started with Apache OODT - Tom Barber, Meteroite Consulting (Additional Fee)
With data becoming more and more prevalent along with a requirement to store it managing it becomes a ever greater problem. How can Apache OODT fill that void?

Apache OODT is a distributed data processing and management platform. In this talk we’ll go through installation and configuration. How to start a project, deploy and test a project. We’ll run through the various components you’re likely to use, how to customise them and make your users embrace data management. We’ll also take a look at workflows, resources and how to build simple workflows. During this presentation we’ll also connect Apache OODT to a number of different data sources to demonstrate data ingestion and metadata capture. Finally, of course it’s all well and good capturing data, but how do you get data out to your end users? We’ll go through the options for data extraction and dissemination to end users.

avatar for Tom Barber

Tom Barber

Technical Director, Spicule LTD
Tom Barber is the director of Meteorite BI and Spicule BI. A member of the Apache Software Foundation and regular speaker at ApacheCon, Tom has a passion for simplifying technology. The creator of Saiku Analytics and open source stalwart, when not working for NASA, Tom currently deals... Read More →

Thursday May 12, 2016 9:00am - 12:00pm PDT
Plaza A

9:00am PDT

Getting Started with Machine Learning & Spark - Holden Karau, IBM (Additional Fee)
Apache Spark is a fast and general engine for distributed computing & big data processing with APIs in Scala, Java, Python, and R. Apache Spark ships with built in libraries for a variety of purposes including: SQL, Streaming, Graph Analysis, and Machine Learning. This talk will focus on how to use Spark for Machine Learning.

Apache Spark has two APIs for Machine Learning, the newer of which is focused on creating Machine Learning Pipelines. This talk will explore a simple classification problem in both of the APIs, followed by a tour of some of the different machine learning models. We will then talk about loading/saving models and the challenges faced when attempting to construct a real-time serving solution from Spark ML’s models. From their we will explore some of the performance improvement work being done inside of Spark for improving machine learning.

avatar for Holden Karau

Holden Karau

Developer Advocate, Google
Holden Karau is a transgender Canadian open source developer advocate at Google focusing on Apache Spark, Beam, and related big data tools. Previously, she worked at IBM, Alpine, Databricks, Google (yes, this is her second time), Foursquare, and Amazon. Holden is the coauthor of Learning... Read More →

Thursday May 12, 2016 9:00am - 12:00pm PDT

9:00am PDT

Interactive Data Science from Scratch with Apache Zeppelin and Apache Spark - Felix Cheung (Additional Fee)
How do you find the needle in the haystack?

With Big Data, finding insight is a big problem. Visualization and exploratory analysis help convert on insights and Apache Zeppelin (incubating) is an essential tool for that.

In this tutorial, Felix Cheung will introduce you to Apache Zeppelin, and provide step-by-step guides to get you up-and-running with Apache Zeppelin to run Big Data analysis with Apache Spark.

This is going to be a heavily hands-on session, no previous experience with Zeppelin, Data Science, or Statistics necessary. Bring your laptop - attendees are expected to be able to handle some software installation steps.

You can view the materials here:

avatar for Felix Cheung

Felix Cheung

Engineering Manager, Uber
Felix started in the big data space about 5 years ago with the then state-of-the-art MapReduce. Since then, he (re-)built Hadoop cluster from metal more times than he would like, created a Hadoop “distro” from two dozens or so projects into .rpm/.deb, and kicked off clusters in... Read More →

Thursday May 12, 2016 9:00am - 12:00pm PDT
Lord Byron

9:00am PDT

Mission to NARs with Apache NiFi - Aldrin Piri, Hortonworks (Additional Fee)
Mission to NARs with Apache NiFi (Aldrin Piri, Hortonworks) - Apache NiFi is both a powerful application and platform for creating and developing powerful and reliable dataflows to process and distribute data. During the course of this tutorial, Aldrin will showcase creating a dataflow using out of the box components and determining where custom components can help create more robust and expressive dataflows. Aldrin will illustrate the process and ease of creating new components and bundles (NiFi Archives, or NARs) for Apache NiFi allowing developers to focus on core functionality while getting the framework features inclusive of concurrency, provenance, metrics, and the associated UI components for free.

avatar for Aldrin Piri

Aldrin Piri

Aldrin is a Senior Member of Technical Staff at Hortonworks working on Hortonworks Data Flow (HDF). Following the open source release of NiFi by the NSA in late 2014, Aldrin has become a PMC member and committer for Apache NiFi and helps run the DMV Apache NiFi Users Group. Long time... Read More →

Thursday May 12, 2016 9:00am - 1:00pm PDT