Loading…
Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

SQL Interaction [clear filter]
Tuesday, May 10
 

9:00am

SQL on Hadoop/Big Data - Architecture, Technology and Roadmap - Sumit Pal, Big Data Consultant
Talk Topic - "SQL on Hadoop - Architecture, Technology and Road Ahead"

This talk - will give an exhaustive overview of how SQL is done on Hadoop more foccused
on low latency SQL on Hadoop.
The various open source and commercial tools to perform SQL on Hadoop and their
internal architectures. The tools cover - Hive, Hive on Tez, Spark SQL, Impala, Apache
Drill, Presto, Tachyon based architecture etc.

The talk also covers how SQL can be used for Structured, UnStructured and Streaming
Data the concepts behind them and shows demo of using SQL - for JSON, Structured and
Streaming Data.

The talk also covers the changes coming in this field - with products like OLAP
on Hadoop, BlinkDB, NuoDB and HTAP based solutions.

Speakers
SP

Sumit Pal

Big Data Consultant
Sumit has more than 22 years of experience in the Software Industry in various roles spanning companies from startups to enterprises. He is a big data, visualisation and data science consultant and a software architect and big data enthusiast and builds end-to-end data-driven analytic... Read More →



Tuesday May 10, 2016 9:00am - 9:50am
Georgia A

10:00am

Apache Hive 2.0 SQL Speed Scale - Alan Gates, Hortonworks
Apache Hive is the most commonly used SQL interface for Hadoop. To meet users data warehousing needs it must scale to petabytes of data,
provide the necessary SQL, and perform in interactive time. The Hive community is working towards a 2.0 release of Hive that includes significant improvements. These include:
* LLAP, a daemon layer that enables sub-second response time.
* HBase to store Hive’s metadata, resulting in significantly reduced planning time.
* Expanding Hive’s support for managing changing data in a transactionally consistent way with SQL MERGE.
* Using Apache Calcite to enable Hive to use multiple storage engines (e.g. HBase)
This talk will cover the use cases these changes enable, the architectural changes being made in Hive as part of building these features, and share performance test results on how these improvements are speeding up Hive.

Speakers
avatar for Alan Gates

Alan Gates

Co-founder and Architect, Hortonworks
Alan is a founder of Hortonworks and an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan has done extensive work in Hive, including adding ACID transactions. Alan has a BS in Mathematics from... Read More →


Tuesday May 10, 2016 10:00am - 10:50am
Georgia A

11:20am

Get the Best Out of Hive and Spark - Xuefu Zhang, Uber
Apache Hive has wide use cases for batch-oriented SQL workloads for ETL and data analytics in the Hadoop ecosystem. Its rich features haven’t been matched by any other available SQL on Hadoop tools. In fact, many these tools are tied to and depend on Hive one way or the other. Apache Spark, on the other hand, offers a general data processing framework positioned to replace MapReduce with its faster data processing and efficient memory utilization. Moreover, one doesn’t have to abandon one for another or juggle between the two in order to get both sets of benefits, as Hive on Spark maintains Hive’s feature richness while providing faster SQL on Hadoop execution. As the adoption of Hive on Spark for production use, This presentation will share with you the best practice of deployment and performance tuning which enables you to gain the best out of the two projects.

Speakers
XZ

Xuefu Zhang

Software Engineer, Uber Technologies
Xuefu Zhang has over 10 year’s experience in software development. Earlier this year he joined as a software engineer in Uber from Cloudera, where he spent his main efforts on Apache Hive and Pig. He also worked in the Hadoop team at Yahoo when the majority of the development on... Read More →


Tuesday May 10, 2016 11:20am - 12:10pm
Georgia A

2:00pm

Hive on ACID - Alan Gates, Hortonworks
Apache Hive provides SQL access for data in Hadoop. Traditionally data in Hadoop is write once read many. But with traditional data
warehousing use cases moving to Hadoop there is a need to support transactional update and delete of records. Hive has recently implemented
ACID compliant row level insert, update, and delete as well as very low latency ingestion of streaming data from tools like Storm and Flume. This is done with snapshot isolation between queries. This talk will cover the intended use cases, architectural challenges of implementing updates and deletes in a write-once file system, and details of changes to the file storage formats and transaction management system.

Speakers
avatar for Alan Gates

Alan Gates

Co-founder and Architect, Hortonworks
Alan is a founder of Hortonworks and an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan has done extensive work in Hive, including adding ACID transactions. Alan has a BS in Mathematics from... Read More →


Tuesday May 10, 2016 2:00pm - 2:50pm
Georgia A

3:00pm

Using Kafka and Kudu for Fast, Low-latency SQL Analytics on Streaming Data - Mike Percy & Ashish Singh, Cloudera
Apache Kudu (incubating) is a fast new columnar data store for the Hadoop ecosystem designed to enable high-performing, flexible analytic pipelines. In this talk, Mike Percy and Ashish Singh will demonstrate how Apache Kafka can be combined with Kudu to achieve low latency, high throughput analytics on streaming data. We will compare various approaches to building such a solution and demonstrate a working system for analyzing tweets in real time by combining Kafka, Kudu, and Apache Impala (incubating).

Speakers
avatar for Mike Percy

Mike Percy

Software Engineer, Cloudera
Mike Percy is a software engineer at Cloudera and a PMC member on Apache Kudu, an open source distributed column store for the Hadoop ecosystem. He is also a PMC member on Apache Flume. Prior to joining Cloudera, Mike worked at Yahoo! building machine learning infrastructure for Big... Read More →
avatar for Ashish Singh

Ashish Singh

Software Engineer, Cloudera
Ashish Singh is a Software Engineer, working with Cloudera to empower the Hadoop ecosystem to answer bigger questions. Ashish studied Computer Science and Engineering at Ohio State University. Before working in the Big Data space, he worked on optimizing MPI collective communications... Read More →



Tuesday May 10, 2016 3:00pm - 3:50pm
Georgia A