Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Managing Distributed Systems [clear filter]
Monday, May 9

10:40am PDT

Building Distributed Systems Using Apache Helix - Aditya Auradkar, LinkedIn
Building distributed data systems is hard! Especially systems that store and process vast quantities of data and yet need to be operable. Problems like partition management, resource distribution, state management, leader elections are often solved in isolation by various big data systems. Wouldn’t it be nice if we had one mechanism with which to model these concepts across systems?

Apache Helix aims to do exactly that. It is a generic cluster management framework that can be used to manage resources spread across a pool of nodes. This talk will provide an overview of Apache Helix and how different data systems at LinkedIn leverage it to solve some of their hardest problems in a uniform manner.


Aditya Auradkar

Engineering Manager, Uber
Aditya manages the Streaming Data platform team at Uber. Powering pub-sub style event transport, streaming/batch analytics and ingestion are some examples of use-cases. Previously at LinkedIn, he managed the Apache Kafka engineering team and was one of the earliest members of the... Read More →

Monday May 9, 2016 10:40am - 11:30am PDT
Regency A

4:10pm PDT

YARN: A Resource Manager for Analytic Platform - Tsuyoshi Ozawa, NTT
Hadoop Yet Another Resource Negotiator (YARN) is a resource manager for processing big data. YARN can run various major distributed processing frameworks including not only MapReduce, but also Spark, Tez, Flink and so on to support various workloads.
I will talk about the architecture of YARN, how YARN manages resources in a cluster and how YARN is integrated with these processing frameworks. In particular, I will introduce the best practice to maximize the throughput of the processing frameworks on YARN with the optimization techniques of Spark on YARN and Tez on YARN as examples. I will talk about the points where our YARN community needs more helps and feedback.

avatar for Tsuyoshi Ozawa

Tsuyoshi Ozawa

I’m a Research Engineer on topics in distributed computing working at NTT(Nippon Telegraph and Telephone corporation), which is one of the largest carrier company in Japan. I’ve been a committer and PMC on Apache Hadoop project. Prior to working on Hadoop, I researched the time... Read More →

Monday May 9, 2016 4:10pm - 5:00pm PDT
Regency A

5:10pm PDT

Large Scale SolrCloud Cluster Management via APIs - Anshum Gupta, IBM Watson
Apache Solr is widely used by organizations to power their search platforms and often support multiple users. A lot of cluster management APIs were introduced over the last few releases, allowing the users to to manage operations ranging from replica placement to forcing leader elections via API calls. At the end of this talk, intermediate Solr users would understand what’s available, and when can they avoid direct interference with the system, leading to more stable clusters and lower chances of nodes going down. The attendees would also be much better equipped to build their own SolrCloud cluster management tools. I would also talk about when not to use these APIs and what’s planned in the near future to handle specific operational use cases.

avatar for Anshum Gupta

Anshum Gupta

Sr. Software Engineer, IBM Watson
Anshum Gupta is a Lucene/Solr committer and PMC member with over 10 years of experience with search. He is a part of the search team at IBM Watson, where he works on extending the limits and improving SolrCloud. Prior to this, he was a part of the open source team at Lucidworks and... Read More →

Monday May 9, 2016 5:10pm - 6:00pm PDT
Regency A