Loading…
Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

HDFS-Storage [clear filter]
Tuesday, May 10
 

9:00am

Shared or Distributed HDFS - What’s Right for Me? - Janet George, SanDisk
Currently, there are two competing architectures for how to implement HDFS. The original HDFS approach utilizes storage colocated with the compute servers. An emerging alternative relies on dedicated storage resources shared by the compute cluster. This talk will compare and contrast these two approaches and provide definitive quantitive guidelines to planners and architects to help them identify the best solutions for their needs.

Speakers
avatar for Janet George

Janet George

Fellow, Chief Data Scientist Big Data Platform/Data Science/Cognitive Computing, SanDisk
At SanDisk, Janet is involved with building global core competencies, shaping, driving and implementing the Big Data platform, products and technologies, using advanced analytics and pattern matching with semiconductor manufacturing data from the ground up. Janet's industry experience... Read More →


Tuesday May 10, 2016 9:00am - 9:50am
Regency B

11:20am

HDFS and Private Cloud - Janet George, SanDisk
Traditionally, Hadoop clusters have been built using dedicated hardware separated from the rest of the data center IT infrastructure. The rapid growth of HDFS/Hadoop/Spark/Yarn applications makes it desirable to share the services-oriented virtualized infrastructure commonly known as private cloud. While virtualizing compute and network interconnectivity is a relatively well-solved problem. Virtualizing the HDFS storage component into the private cloud work has unique challenges. This talk explores those challenges and offers multiple prescriptive solutions along with criteria to allow planners and architects to meaningfully compare and contrast the different approaches.

Speakers
avatar for Janet George

Janet George

Fellow, Chief Data Scientist Big Data Platform/Data Science/Cognitive Computing, SanDisk
At SanDisk, Janet is involved with building global core competencies, shaping, driving and implementing the Big Data platform, products and technologies, using advanced analytics and pattern matching with semiconductor manufacturing data from the ground up. Janet's industry experience... Read More →


Tuesday May 10, 2016 11:20am - 12:10pm
Regency B

2:00pm

Leveraging YCSB for Your Project - Sean Busbey, Cloudera
YCSB is an open source framework for evaluating data storage systems that has become the de facto standard for use with NoSQL projects. After several quiet years the project has returned to life as a community effort, which has led to substantial utility and system coverage improvements. Sean Busbey will review the last six months of development, explain how users of Apache projects can get a better understanding of their preferred storage system, and discuss ways some folks proactively use YCSB to improve the quality of their storage project.

Speakers
SB

Sean Busbey

Cloudera
Sean Busbey currently works at Cloudera as a software engineer on distributed storage systems. In addition to being a Member of the Apache Software Foundation, he is actively involved in several projects including: HBase, Yetus, Avro, NiFi, and Accumulo. Outside of the ASF, he is... Read More →


Tuesday May 10, 2016 2:00pm - 2:50pm
Regency B