Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

NoSQL [clear filter]
Tuesday, May 10

9:00am PDT

Wikimedia Content API: A Cassandra Use-case - Eric Evans, Wikimedia Foundation
The Wikimedia Foundation is a charitable organization with a vision of a world where everyone can freely share in the sum of all knowledge. Each month it serves over 18 billion page views to 500 million unique visitors around the world.

Among the resources offered by Wikimedia is an API providing low-latency access to full-history content, in many formats. Its results are often the product of computationally intensive transforms, and must be pre-generated and stored to meet latency expectations. Unsurprisingly, there are many challenges to providing low-latency access to such a large data-set, in a demanding, globally distributed environment.

This talk will cover the Wikimedia content API and its use of Apache Cassandra as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature will be discussed.

avatar for Eric Evans

Eric Evans

Senior Software Engineer, Wikimedia Foundation
Eric has more than a decade of experience with the engineering and operations of large-scale distributed systems. He joined Rackspace as a startup, and implemented a global DNS infrastructure utilizing IP anycast (possibly the first), and a novel data-center-wide IDS for which a patent... Read More →

Tuesday May 10, 2016 9:00am - 9:50am PDT
Plaza B

11:20am PDT

Cassandra Multi-datacenter Operations Essentials - Julien Anguenot, iland Internet Solutions, Corp
Apache Cassandra operations have the reputation to be quite simple against single datacenter clusters and / or low volume clusters but they become way more complex against high latency multi-datacenter clusters: basic operations such as repair, compaction or hints delivery can have dramatic consequences even on a healthy cluster.

In this presentation, Julien will go through Cassandra operations in details: bootstrapping new nodes and / or datacenter, repair strategies, compaction strategies, GC tuning, OS tuning, large batch of data removal and Apache Cassandra upgrade strategy.

Julien will give you tips and techniques on how to anticipate issues inherent to multi-datacenter cluster: how and what to monitor, hardware and network considerations as well as data model and application level bad design / anti-patterns that can affect your multi-datacenter cluster performances.

avatar for Julien Anguenot

Julien Anguenot

VP Software Engineering, iland Internet Solutions, Corp
Julien is an accomplished and organized software craftsman with a creative and entrepreneurial spirit. Julien serves as iland’s Vice President of Software Engineering and is responsible for the strategic vision and development of iland’s Cloud Services platform. Under his leadership... Read More →

Tuesday May 10, 2016 11:20am - 12:10pm PDT
Plaza B

2:00pm PDT

Zipkin & Apache Cassandra: A Big Data Tracing Case Study - Mick Semb Wever, The Last Pickle
Monitoring provides information on system performance, however tracing is necessary to understand the performance of individual requests.

Systems such as Zipkin, Dapper, and HTrace provide distributed tracing; as does CQL request tracing in Apache Cassandra. Such tracing is invaluable when diagnosing individual requests, yet knowing which database queries to trace and why they were made still requires deep technical knowledge. And while each solution provides insight, the problem of providing a single tracing view across a distributed application stack remains.

This talk will introduce using Zipkin to record Cassandra request traces, to provide a single tracing view from HTTP server to database. Starting with CQL request tracing; we will move onto Zipkin, ongoing work to record request traces via Zipkin, and the efforts of the OpenTracing community create a common tracing API.

avatar for Mick Semb Wever

Mick Semb Wever

Team Member, The Last Pickle
Mick Semb Wever works at The Last Pickle helping customers deliver and improve Apache Cassandra based solutions. Prior to TLP he spent seven years at FINN.no building their Microservices platform utilizing Apache Cassandra, Hadoop, Spark and Kafka. He is the PMC Chair for Apache Tiles... Read More →

Tuesday May 10, 2016 2:00pm - 2:50pm PDT
Plaza B

3:00pm PDT

From Big Data to Mobile Data with Apache CouchDB and PouchDB - Bradley Holt, IBM Cloudant
It’s all too easy for mobile app developers to assume that their apps will run on fast and reliable networks. The reality for end users, though, is often slow, unreliable networks with spotty coverage. What happens when the network doesn’t work, or when a device is in airplane mode? You get unhappy, frustrated users. One solution is to take an offline-first approach. An offline-first app is an app that works, without error, when there is no network connection. Offline-first apps built with Apache CouchDB and PouchDB (an open source JavaScript database) can provide better, faster user experiences by storing data locally and then synchronizing with a cloud database when a network connection is available.

avatar for Bradley Holt

Bradley Holt

Developer Advocate, IBM Cloud Data Services
Bradley Holt is a Developer Advocate with IBM Cloud Data Services. He is the author of several publications including Scaling CouchDB and Writing and Querying MapReduce Views in CouchDB (both published by O'Reilly Media). He has spoken at numerous conferences including the O'Reilly... Read More →

Tuesday May 10, 2016 3:00pm - 3:50pm PDT
Plaza B