Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Kafka [clear filter]
Tuesday, May 10

10:00am PDT

Kafka at Peak Performance - Todd Palino, Linkedin
Big Data means big hardware, and the less of it we can use to do the job properly, the better the bottom line. Apache Kafka makes up the core of our data pipelines at many organizations, including LinkedIn, and we are on a perpetual quest to squeeze as much as we can out of our systems, from Zookeeper, to the brokers, to the various client applications. This means we need to know how well the system is running, and only then can we start turning the knobs to optimize it. In this talk, we will explore how best to monitor Kafka and its clients to assure they are working well. Then we will dive into how to get the best performance from Kafka, including how to pick hardware and the effect of a variety of configurations in both the broker and clients. We’ll also talk about setting up Kafka for no data loss.

avatar for Todd Palino

Todd Palino

Staff Site Reliability Engineer, http://linkedin.com/
Todd Palino is a Staff Site Reliability Engineer at LinkedIn, tasked with keeping Zookeeper, Kafka, and Samza deployments fed and watered. He is responsible for architecture, day-to-day operations, and tools development, including the creation of an advanced monitoring and notification... Read More →

Tuesday May 10, 2016 10:00am - 10:50am PDT
Regency A

11:20am PDT

Streaming Data Integration at Scale with Kafka - Ewen Cheslack-Postava, Confluent
The last decade as seen a dramatic shift in the complexity of data pipelines. Data is stored in more systems, queried in more ways, and comes from more sources. Complex data pipelines combined with the need for applications that can analyze and respond to that data in real-time leave traditional approach to data integration struggling to keep up.

This talk will describe how data integration is shifting to a streaming model and how Kafka supports this new model. Specifically, it will focus on a new tool included with Kafka, Kafka Connect, that handles streaming "E" and "L". It will describe Kafka Connect’s data and execution models, which provide scalable fault-tolerant import and export between Kafka and other data systems. Finally, it will show how this can be combined with other tools such as stream processing frameworks to create a complete streaming data integration solution.


Ewen Cheslack-Postava

Ewen Cheslack-Postava is a Kafka committer and engineer at Confluent building a stream data platform based on Apache Kafka to help organizations reliably and robustly capture and leverage all their real-time data. He received his PhD from Stanford University where he developed Sirikata... Read More →

Tuesday May 10, 2016 11:20am - 12:10pm PDT
Regency A

2:00pm PDT

Building a Self-serve Kafka Ecosystem - Joel Koshy, LinkedIn
Apache Kafka has enjoyed widespread adoption as a messaging backbone for data pipelines and stream processing platforms.

LinkedIn runs one of the largest known deployments of Kafka and serves hundreds of applications within the company. As new use-cases for Kafka emerge it is becoming increasingly critical to provide self-serve features for users without having to always engage Kafka specialists. Users need to create topics with non-default configurations and easily examine various topic metadata and schemas; topic owners may want to specify authorization rules and be able to encrypt their data. Furthermore, Kafka brokers need mechanisms in place to protect against rogue clients that impact the cluster and other clients.

In this talk we will describe how we are addressing these practical challenges in providing a truly multi-tenant messaging service.


Joel Koshy

Joel Koshy is a Staff Software Engineer in LinkedIn’s Data Infrastructure team. He is also a PMC member and committer on the Apache Kafka project. Joel has worked on distributed systems infrastructure and applications for the past eight years. Prior to LinkedIn, he was with the... Read More →

Tuesday May 10, 2016 2:00pm - 2:50pm PDT
Regency A

3:00pm PDT

Apache Flume or Apache Kafka? How About Both? - Jayesh Thakrar, Conversant
Flume and Kafka are seen by some to serve the same functionality and often considered as mutually exclusive. This presentation is about an implementation where both are used together as parts of a heterogeneous streaming data pipeline.

The presentation will cover the evolution of the pipeline and how it grew from being designed to handle 20 billion log lines to 90+ billion log lines a day. It will also cover Flume customization for ensuring data uniqueness as well as to allow fractional bifurcation of data from production to QA systems for continuous regression testing.

Finally, the presentation will cover monitoring of the pipeline from a holistic view as well as a detailed drill-down and associated alerting.

avatar for Jayesh Thakrar

Jayesh Thakrar

Sr. Software Engineer, Conversant
Jayesh Thakrar is a Sr. Data Engineer at Conversant (http://www.conversantmedia.com/). He is a data geek who gets to build and play with large data systems consisting of Hadoop, Spark, HBase, Cassandra, Flume and Kafka. To rest after a good day's work, he uses OpenTSDB with 500+ million... Read More →

Tuesday May 10, 2016 3:00pm - 3:50pm PDT
Regency A