Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

State-Future of $foo [clear filter]
Monday, May 9

10:40am PDT

Apache Hadoop 3 Current Status - Akira Ajisaka, NTT DATA
Do you want Hadoop 3 release? It is over 4 years since Hadoop 3 and Hadoop 2 were diverged, and there are a lot of great improvements in Hadoop 3, such as Shell Script Rewrite and MapReduce Native Optimization. Therefore if Hadoop 3 is released, users can enjoy the benefits of the new features.
In this session, we will introduce the new features and incompatible changes in Hadoop 3, and how the release is discussed in Apache Hadoop community. In addition, Akira Ajisaka would like to discuss releasing Hadoop 3 with the participants here if possible.

avatar for Akira Ajisaka

Akira Ajisaka

Software Engineer, NTT DATA Corporation
Akira Ajisaka is a software engineer working at NTT DATA, Japan. He belongs to OSS Professional Services team and deploys and operates Hadoop clusters for customers. He sometimes troubleshoots them by investigating source code and creating patches to fix the problem. He is an Apache... Read More →

Monday May 9, 2016 10:40am - 11:30am PDT
Regency B

11:40am PDT

Recent Development in HBase - Zhihong Yu, Hortonworks
HBase has been powering variety of applications in the past 8 years.
In this presentation, I will talk about the following recent developments:

Enhancement to compaction: Properly selecting / tuning compaction strategy is at the heart of providing consistent performance. FIFO compaction policy collects expired store files. Since no real compaction is done, we do not use CPU and IO (disk and network). This results in improved throughput and latency for both write and read.

Region normalization feature: This serves times series data well (in combination with FIFO compaction policy) where non-default TTL is specified. As aging data is archived, adjacent empty regions are continuously merged. This keeps table in well managed shape.

Bulk Loaded HFile Replication: HBase replication is enhanced to support replication of bulk loaded data. This completes disaster tolerance scenario.


Zhihong Yu

Staff Engineer, VMware
I have been Apache HBase PMC for 5 and half years.I am also committer for Apache Slider and Apache Bahir.I contribute to Apache Phoenix and Apache Spark.I have presented at the past 3 ApacheCon NA events.

Monday May 9, 2016 11:40am - 12:30pm PDT
Regency B

2:00pm PDT

Apache Bigtop: Overview and 2016 Community Update - Nate DAmico, Reactor 8 & Konstantin Boudnik, Memcore
Apache Bigtop is setting the standard for the integration, testing and deployment of the leading open source Big Data components. In this presentation we will give an introductory overview of Bigtop and its origins, including usage in the wild with various commercial vendors and industry standards based on it. We will also cover how the project is evolving in 2016 and where the community is driving it with lots of new participating components being added, such as Apache Apex (Incubating), Apache Zeppelin (Incubating) and Flink.

avatar for Konstantin Boudnik

Konstantin Boudnik

CEO, Memcore
Dr.Konstantin Boudnik, co-founder and CEO of Memcore Inc, is one of the early developers of Hadoop and a co-author of Apache BigTop, the open source framework and the community around creation of software stacks for data processing projects. With more than 20 years of experience in... Read More →

Nate DAmico

Nate has been working in the enterprise and mobile software industry for 14 years in various capacities. In recent years his tech efforts have focused around areas of mobile computer vision as well as the rise of the consumerization of IT Operations. Three years ago he started Reactor8... Read More →

Monday May 9, 2016 2:00pm - 2:50pm PDT
Regency B

3:00pm PDT

Dockerized Hadoop Platform and Recent Updates in Apache Bigtop - Amir Sanjar, IBM & Yu-Hsin Yeh, Trend Micro
Apache Bigtop is a project focuses on packaging, testing and configuration management solutions all around the Hadoop ecosystem. In this presentation, we’ll talk about how Bigtop Provisioner integrated with Docker Swarm, Docker Compose, and Docker Machine to give you the ability to run a fully distributed Hadoop cluster on Docker anywhere. In addition, the newly developed image pre-build feature substantially improves the user experience by cutting down the provisioning time to less than a minute. In the past few month, another excited work happened in Bigtop is the IBM PowerPC integration. So, to sum up the content of this talk:
1) How Bigtop Provisioner integrated with Docker ecosystem to achieve multi-host Hadoop cluster deployment.
2) The integration of IBM PowerPC with Apache Bigtop.
3) Newly added Hadoop ecosystem components and some new features we’ve developed recently.

avatar for Amir Sanjar

Amir Sanjar

Sr. Software Eng, IBM - Apache Bigtop PMC
Amir Sanjar has many years of experience in big data software and solution development at companies including IBM and Canonical. He is the inventor of several patents in areas of enterprise solution automation and wireless/cell technology. Currently, he leads big data ecosystem and... Read More →
avatar for Evans Ye

Evans Ye

ASF member, Apache Bigtop Committer/PMC member/Former VP, Director of Taiwan Data Engineering Association, Apache Software Foundation
Yu-Hsin Yeh(Evans Ye) is former VP, and currently committer and PMC member of Apache Bigtop. He loves to code, automate things, and tackling big data challenges. Aside from engineering stuff, he is also an enthusiast in giving talks to share software innovations and cutting-edge technologies... Read More →

Monday May 9, 2016 3:00pm - 3:50pm PDT
Regency B

4:10pm PDT

On the Bleeding Edge - Cassandra 3.4 and Beyond - Jonathan Haddad, Datastax
Cassandra is recognized as the best distributed database leveraging continuous availability and partition-tolerance for global deployments. With a strong open source history that began at Facebook to solve problems of absurdly massive scale, Cassandra has grown to be a huge project with a bright future. In this talk we will unpack exactly what that future is all about. With a brand new, high performance Secondary Index implementation, SSTable encryptions, a paradigm shift in architecture moving away from SEDA and towards threads per core, Materialized Views and Aggregations, Cassandra is maturing as a powerful front-runner on the bleeding edge of the NoSQL space.

avatar for Jon Haddad

Jon Haddad

Evangelist for Apache Cassandra, DataStax
Jon has 15 years experience in both development and operations. For 10 years he’s worked at various startups in southern California. For 2 years he had been the maintainer of cqlengine, the Python object mapper for Cassandra, now integrated into the native Cassandra driver. He’s... Read More →

Monday May 9, 2016 4:10pm - 5:00pm PDT
Regency B

5:10pm PDT

Apache Tika - What’s New with 2.0? - Nick Burch, Quanticate
Apache Tika detects and extracts metadata and text from a huge range of file formats and types. From Search to Big Data, single file to internet scale, if you’ve got files, Tika can help you get out useful information!

Apache Tika has been around for nearly 10 years now, and with the passage of all that time, plus the new 2.0 release, a lot has changed. Not only has there been a huge increase in the number of supported formats, but the ways of using Tika have expanded, and some of the philosophies on the best way to handle things have altered with experience. Tika has gained support for a wide range of programming languages to, and more recently, Big-Data scale support.

Whether you’re an old-hand with Tika looking to know what’s hot or different with 2.0, or someone new looking to learn more about the power of Tika, this talk will have something in it for you!


Nick Burch

CTO, Quanticate
Nick began contributing to Apache projects in 2003, and hasn't looked back since! He's mostly involved in "Content" projects like Apache POI, Apache Tika and Apache Chemistry, as well as foundation-wide activities like Conferences and Travel Assistance.Nick is CTO at Quanticate, a... Read More →

Monday May 9, 2016 5:10pm - 6:00pm PDT
Regency B