Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Streams [clear filter]
Monday, May 9


Streaming SQL with Apache Calcite - Julian Hyde, Hortonworks
With the rise of the Internet of Things (IoT) and low-latency analytics, streaming data becomes ever more important. Surprisingly, one of the most promising approaches for processing streaming data is SQL. In this presentation, Julian Hyde shows how to build streaming SQL analytics that deliver results with low latency, adapt to network changes, and play nicely with BI tools and stored data. He also describes how Apache Calcite optimizes streaming queries, and the ongoing collaborations between Calcite and the Storm, Flink and Samza projects.


Julian Hyde

Julian Hyde is an expert in query optimization and in-memory analytics. He is PMC chair of Apache Calcite, an engine for query optimization and data virtualization. He also founded Mondrian, the most popular open source OLAP engine. He is an architect at Hortonworks.

Monday May 9, 2016 10:40am - 11:30am
Plaza A


SAMOA: A Platform for Mining Big Data Streams - Nicolas Kourtellis, Telefonica
In this talk, Nicolas Kourtellis will introduce Apache SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams (http://samoa.incubator.apache.org). Apache SAMOA provides a collection of distributed streaming algorithms for data mining tasks such as classification, regression, and clustering. The models built can be updated as new data arrive without the need to define data batches or update frequencies. The platform features a pluggable architecture that can run on existing and well-tested distributed stream processing engines such as Storm, S4, Samza and Flink, for scalability and fault tolerance.

avatar for Nicolas Kourtellis

Nicolas Kourtellis

Researcher, Telefonica I+D
Nicolas Kourtellis is a Researcher at Telefonica Research. Previously he was a Researcher in the Web Mining Research Group at Yahoo Labs, Barcelona. He holds a Ph.D. in Computer Science and Engineering from the University of South Florida (2012), a MSc in Computer Science from the... Read More →

Monday May 9, 2016 11:40am - 12:30pm
Plaza A


Will It Scale? The Secrets Behind Scaling Stream-processing Applications - Navina Ramesh, LinkedIn
Scaling stream processing applications is sometimes seen akin to scaling batch processing applications. You may re-partition your input stream to scale throughput, similar to re-sharding a batch. However, it becomes challenging for "stateful" applications to “stay realtime”, as they frequently require fault-tolerant state-management. Providing low-latency, fault-tolerant processing for high-volume input streams is fundamentally governed by the state-management primitives provided by the stream processing systems. In this talk, we will discuss how such stateful applications are supported in the open-source stream-processing systems, such as Apache Flink, Spark Streaming and Apache Samza. We will, then provide a deep-dive on Apache Samza’s approach for state-management and fault-tolerance and discuss how it can be effectively used to scale stateful applications.

avatar for Navina Ramesh

Navina Ramesh

Navina Ramesh started her career in Yahoo! India, where she contributed on scaling the Yahoo! Search clusters for 3 years. At LinkedIn, she has worked on developing the Feed Personalization pipeline and improved the caching and pagination models in the Feed Infrastructure. She has... Read More →

Monday May 9, 2016 2:00pm - 2:50pm
Plaza A


Generating Many Resources from One Set of Schemas with Apache Streams - Steve Blackmon, People Pattern
Apache has many good programming languages, databases, and analytics libraries. Most have some unique competency or value that justifies their application in certain situations. Use the right tool for the right job. However, mastering the data definition file formats of multiple platforms and keeping representations of your data (and partner data) current can be challenging and tedious.

Apache Streams (incubating) contains libraries and patterns for specifying, publishing, and inter-linking data schemas, and can convert data between the representation, format, and encoding preferred by supported platforms. The talk will cover using Streams to specify your object schemas, bind them across languages (Java, Scala), serializations (JSON, XML), databases (Cassandra, Elasticsearch, Mongo, HBase), and analytics tools (Spark, Pig, Hive), as well as re-use object definitions created by others.

avatar for Steve Blackmon

Steve Blackmon

VP Technology, People Pattern, Inc.
VP Technology at People Pattern, previously Director of Data Science at W2O Group, co-founder of Ravel, stints at Boeing, Lockheed Martin, and Accenture. Committer and PMC for Apache Streams (incubating). Experienced user of Spark, Storm, Hadoop, Pig, Hive, Nutch, Cassandra, Tinkerpop... Read More →

Monday May 9, 2016 3:00pm - 3:50pm
Plaza A


Designing Workflows with OODT - Tom Barber, Meteroite Consulting
When building a data management platform, flexible and effective workflows are key to the scalability and effectiveness of the platform.

OODT (originally developed by NASA JPL) has a very flexible and powerful workflow engine and is at the core of pretty much any data processing you will do within the platform but understanding it can sometimes be a challenge.

In this talk we’ll take a deep dive into guts of workflows inside OODT using CAS PGE to help lower the barrier for entry. We’ll run through a number of real world examples. How you build them, how you deploy and trigger them.

We’ll also look at monitoring and feedback. Lastly we’ll tackle resource management and how you make sure your workflows run in the correct server pool, without swamping your resources.

avatar for Tom Barber

Tom Barber

Technical Director, Spicule LTD
Tom Barber is the director of Meteorite BI and Spicule BI. A member of the Apache Software Foundation and regular speaker at ApacheCon, Tom has a passion for simplifying technology. The creator of Saiku Analytics and open source stalwart, when not working for NASA, Tom currently deals... Read More →

Monday May 9, 2016 4:10pm - 5:00pm
Plaza A


Speaking the Language of Big Data - With Apache Avro and Apache Thrift - Ranganathan Balashanmugam, ThoughtWorks
With the advent of feature based teams, software architecture styles like Microservices and deployment patterns like Devops are taking over. Each team takes autonomous decisions on technologies used, but there is always a need to define a common language for the services to communicate with each other. This way there will be a common wire format and avoid lot of mappers across the application. The other common scenario is in big data projects where the cluster of nodes need to communicate efficiently and effectively, with ease of API.
This talk highlights on Apache Avro and Apache Thrift which are used in Big data solutions -- which act as common language across different services/nodes in big data applications. These technologies act as language and platform neutral way of serializing structured data. This talk also shows examples and demos -- highlighting the pain points they solve.

avatar for Ranganathan Balashanmugam

Ranganathan Balashanmugam

Head of Engineering - India, Aconex
Ranganathan has nearly twelve years of experience of developing awesome products and loves to works on full stack - from front end, to backend and scale. He is Head of Engineering - India at Aconex and prior to that was Technology Lead at ThoughtWorks. He is Microsoft MVP for Data... Read More →

Monday May 9, 2016 5:10pm - 6:00pm
Plaza A