Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

NoSQL [clear filter]
Tuesday, May 10

9:00am PDT

Wikimedia Content API: A Cassandra Use-case - Eric Evans, Wikimedia Foundation
The Wikimedia Foundation is a charitable organization with a vision of a world where everyone can freely share in the sum of all knowledge. Each month it serves over 18 billion page views to 500 million unique visitors around the world.

Among the resources offered by Wikimedia is an API providing low-latency access to full-history content, in many formats. Its results are often the product of computationally intensive transforms, and must be pre-generated and stored to meet latency expectations. Unsurprisingly, there are many challenges to providing low-latency access to such a large data-set, in a demanding, globally distributed environment.

This talk will cover the Wikimedia content API and its use of Apache Cassandra as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature will be discussed.

avatar for Eric Evans

Eric Evans

Senior Software Engineer, Wikimedia Foundation
Eric has more than a decade of experience with the engineering and operations of large-scale distributed systems. He joined Rackspace as a startup, and implemented a global DNS infrastructure utilizing IP anycast (possibly the first), and a novel data-center-wide IDS for which a patent... Read More →

Tuesday May 10, 2016 9:00am - 9:50am PDT
Plaza B

10:00am PDT

SASI - A Revolution for Secondary Indexes in Cassandra - Hanneli Tavante, Codeminer 42
Secondary Indexes in Cassandra usually are a wide topic for discussion. Regarding the benefits it can bring in terms of providing a better view and sorting for data, sometimes you shall handle performance issues.
Several strategies have been presented, and in 2015 Apple open sourced its implementation for Secondary Indexes, called SSTableAttachedSecondaryIndex, or just SASI (Github - https://github.com/xedin/sasi ). The main goal of this talk is to show the benefits of this implementation and how it could be used to reduce performance issues. Also, a step-by-step on the implementation will be provided, explaining the insights behind the adopted data-structures and project general architecture.

avatar for Hanneli Tavante

Hanneli Tavante

Hanneli is a software developer at Codeminer 42. She enjoys learning new programming languages, blowing capacitors and helping the community by organising meetups (Neo4j, Cassandra, Rust, Science)  and presenting talks around the globe. She also likes Math, Lego, dogs, hardware and... Read More →

Tuesday May 10, 2016 10:00am - 10:50am PDT
Plaza B

11:20am PDT

Cassandra Multi-datacenter Operations Essentials - Julien Anguenot, iland Internet Solutions, Corp
Apache Cassandra operations have the reputation to be quite simple against single datacenter clusters and / or low volume clusters but they become way more complex against high latency multi-datacenter clusters: basic operations such as repair, compaction or hints delivery can have dramatic consequences even on a healthy cluster.

In this presentation, Julien will go through Cassandra operations in details: bootstrapping new nodes and / or datacenter, repair strategies, compaction strategies, GC tuning, OS tuning, large batch of data removal and Apache Cassandra upgrade strategy.

Julien will give you tips and techniques on how to anticipate issues inherent to multi-datacenter cluster: how and what to monitor, hardware and network considerations as well as data model and application level bad design / anti-patterns that can affect your multi-datacenter cluster performances.

avatar for Julien Anguenot

Julien Anguenot

VP Software Engineering, iland Internet Solutions, Corp
Julien is an accomplished and organized software craftsman with a creative and entrepreneurial spirit. Julien serves as iland’s Vice President of Software Engineering and is responsible for the strategic vision and development of iland’s Cloud Services platform. Under his leadership... Read More →

Tuesday May 10, 2016 11:20am - 12:10pm PDT
Plaza B

2:00pm PDT

Zipkin & Apache Cassandra: A Big Data Tracing Case Study - Mick Semb Wever, The Last Pickle
Monitoring provides information on system performance, however tracing is necessary to understand the performance of individual requests.

Systems such as Zipkin, Dapper, and HTrace provide distributed tracing; as does CQL request tracing in Apache Cassandra. Such tracing is invaluable when diagnosing individual requests, yet knowing which database queries to trace and why they were made still requires deep technical knowledge. And while each solution provides insight, the problem of providing a single tracing view across a distributed application stack remains.

This talk will introduce using Zipkin to record Cassandra request traces, to provide a single tracing view from HTTP server to database. Starting with CQL request tracing; we will move onto Zipkin, ongoing work to record request traces via Zipkin, and the efforts of the OpenTracing community create a common tracing API.

avatar for Mick Semb Wever

Mick Semb Wever

Team Member, The Last Pickle
Mick Semb Wever works at The Last Pickle helping customers deliver and improve Apache Cassandra based solutions. Prior to TLP he spent seven years at FINN.no building their Microservices platform utilizing Apache Cassandra, Hadoop, Spark and Kafka. He is the PMC Chair for Apache Tiles... Read More →

Tuesday May 10, 2016 2:00pm - 2:50pm PDT
Plaza B

3:00pm PDT

From Big Data to Mobile Data with Apache CouchDB and PouchDB - Bradley Holt, IBM Cloudant
It’s all too easy for mobile app developers to assume that their apps will run on fast and reliable networks. The reality for end users, though, is often slow, unreliable networks with spotty coverage. What happens when the network doesn’t work, or when a device is in airplane mode? You get unhappy, frustrated users. One solution is to take an offline-first approach. An offline-first app is an app that works, without error, when there is no network connection. Offline-first apps built with Apache CouchDB and PouchDB (an open source JavaScript database) can provide better, faster user experiences by storing data locally and then synchronizing with a cloud database when a network connection is available.

avatar for Bradley Holt

Bradley Holt

Developer Advocate, IBM Cloud Data Services
Bradley Holt is a Developer Advocate with IBM Cloud Data Services. He is the author of several publications including Scaling CouchDB and Writing and Querying MapReduce Views in CouchDB (both published by O'Reilly Media). He has spoken at numerous conferences including the O'Reilly... Read More →

Tuesday May 10, 2016 3:00pm - 3:50pm PDT
Plaza B
Wednesday, May 11

3:00pm PDT

Scylla: A Revolutionary Design for NoSQL Performs at 1.8M TPS/node - Don Marti & Tzach Livyatan, ScyllaDB
Scylla is a new NoSQL database, compatible with Apache Cassandra, that is capable of a 10x improvement in throughput on the same hardware, with predictable low latency that dramatically improves the performance of analytics originally developed for Cassandra. The database is now in use in production and in pilot projects internationally.

Scylla applies kernel programming techniques to a horizontally scalable NoSQL design to achieve extreme performance improvements and the elimination of garbage collection pauses. The Scylla design is based on a modern shared-nothing approach.   A new architecture for the NoSQL server is necessary because of new growth in, and limitations of, modern server hardware. As CPU core counts continue to grow, along with the raw speed of networking and storage devices available on a modern system, software design approaches that were valid and safe even a few years ago are no longer sustainable. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC.

With extra performance to work with, NoSQL projects can have more flexibility to focus on other concerns, such as functionality and time to market. Scylla enables faster cluster scaling, more overhead to handle complex queries, and the power to do complex analytics tasks at the same time as routine administration operations.

avatar for Tzach Livyatan

Tzach Livyatan

VP Product, ScyllaDB
Tzach Livyatan has a B.A. and MSc in Computer Science (Technion, Summa Cum Laude), and has had a 15 year career in development, system engineering and product management. In the past he worked in the Telecom domain, focusing on carrier-grade systems, signalling, policy and charging... Read More →

Don Marti

Don Marti has written for Linux Weekly News, Linux Journal, and other publications. He co-founded the Linux consulting firm Electric Lichen. Don is a strategic advisor for Mozilla, and has previously served as president and vice president of the Silicon Valley Linux Users Group and... Read More →

Wednesday May 11, 2016 3:00pm - 3:50pm PDT
Plaza B