Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Monitoring-Benchmarking [clear filter]
Wednesday, May 11

2:00pm PDT

HiBench - The Benchmark Suite for Hadoop, Spark and Streaming - Carson Wang, Intel
HiBench is an open sourced and Apache licensed big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilizations. It contains a set of Hadoop, Spark and streaming workloads, including Sort, WordCount, TeraSort, PageRank, Bayes, Kmeans, enhanced DFSIO, etc. It also contains several streaming workloads for Spark Streaming, Storm and Samza. In this presentation, Carson Wang will introduce the features of HiBench and go through how to use HiBench to benchmark different big data frameworks. It will also cover tuning guides for workloads with different characterization.


Carson Wang

Carson Wang is a software engineer from Intel big data team. He is an active open source contributor to the Spark and Tachyon projects.

Wednesday May 11, 2016 2:00pm - 2:50pm PDT
Georgia B

3:00pm PDT

Monitoring in a Distributed World - Felix Massem, codecentric AG
The IT infrastructure for distributed applications is getting bigger and more complex every day. Through this, the pure mass of observed events is growing. To be able to ensure a safe IT operation, we also need a distributed and scalable monitoring architecture to evaluate these events. This session wants to show how to build an architecture upon open source software.
Starting with some basics on monitoring IT infrastructure and applications, we will have a look on some of the key words like monitoring, alerting, diagnostic and reporting. Based on this, we will start to build up a monitoring architecture.
We will elaborate on and integrate the following modules: log file shipping and analysis (logstash), system monitoring (collectD), event storage (elasticsearch), metric generator and storage (statsd and graphite) as well as different dashboards (grafana, seyren, kibana).


Felix Massem

codecentric AG
Felix Massem works as a consultant for codecentric AG. His main focus is in the area of Continuous Delivery and technologies around infrastructure as code and log analysis. Beside this, he is most interested in topics like DevOps, Data Minig and Big Data technologies. As an author... Read More →

Wednesday May 11, 2016 3:00pm - 3:50pm PDT
Georgia B

4:10pm PDT

Effective HBase Healthcheck and Troubleshooting - Jayesh Thakrar, Conversant
We all know of HBase as a robust, resilient, scalable, and performant big data datastore. Once configured well, it can run hands-off for months without need for any maintenance or care-and-feed. The only occassional attention needed is hardware maintenance and system troubleshooting. Since an HBase cluster is often made up of several servers and the system could be on "auto-pilot", its the applications that may notice problems first when they occur. At those times, identifying and resolving the root-cause or symptom needs to be done quickly.

Other than HDFS itself, HBase is probably the oldest and most mature component of the Hadoop ecosystem and it is budled with a number of tools and utilities. This presentation will cover how to effectively make them part of your troubleshooting toolbox as well as to formulate your own key performance and health indicators.

avatar for Jayesh Thakrar

Jayesh Thakrar

Sr. Software Engineer, Conversant
Jayesh Thakrar is a Sr. Data Engineer at Conversant (http://www.conversantmedia.com/). He is a data geek who gets to build and play with large data systems consisting of Hadoop, Spark, HBase, Cassandra, Flume and Kafka. To rest after a good day's work, he uses OpenTSDB with 500+ million... Read More →

Wednesday May 11, 2016 4:10pm - 5:00pm PDT
Georgia B