Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Math & Standards [clear filter]
Monday, May 9

10:40am PDT

How ODPi Leveraged Apache Bigtop to Get to Market Faster (and You Can Too!) - Roman Shaposhnik & Konstantin Boudnik, Pivotal Inc.
Apache Bigtop has always tried to be to Apache Big Data ecosystem what Debian has been to Linux universe, so it is no surprise that ODPi.org has leverage it to produce its first official release. Come get an overview of the origins of Apache Bigtop and why organizations like ODPi, Cloudera, Wandisco, and Amazon Web Services rely on Bigtop for their own bigdata component distribution efforts, and where the project is going post its last 1.0 release. You will also learn how contributions from ODPi members are helping Bigtop get even stronger and provide an integration platform for the next generation big data technologies.

avatar for Konstantin Boudnik

Konstantin Boudnik

CEO, Memcore
Dr.Konstantin Boudnik, co-founder and CEO of Memcore Inc, is one of the early developers of Hadoop and a co-author of Apache BigTop, the open source framework and the community around creation of software stacks for data processing projects. With more than 20 years of experience in... Read More →
avatar for Roman Shaposhnik

Roman Shaposhnik

Director of Open Source, Linux Foundation
Apache Software Foundation and Data, oh but also unikernels

Monday May 9, 2016 10:40am - 11:30am PDT
Plaza B

11:40am PDT

Using the SDACK Architecture to Build a Big Data Product - Yu-Hsin Yeh, Trend Micro
You definitely have heard about the SMACK architecture, which stands for Spark, Mesos, Akka, Cassandra, and Kafka. It’s especially suitable for building a lambda architecture system. But what is SDACK? Apparently it’s very much similar to SMACK except the “D" stands for Docker. While SMACK is an enterprise scale, multi-tanent supported solution, the SDACK architecture is particularly suitable for building a data product. In this talk, I’ll talk about the advantages of the SDACK architecture, and how TrendMicro uses the SDACK architecture to build an anomaly detection data product. The talk will cover:
1) The architecture we designed based on SDACK to support both batch and streaming workload.
2) The data pipeline built based on Akka Stream which is flexible, scalable, and able to do self-healing.
3) The Cassandra data model designed to support time series data writes and reads.

avatar for Evans Ye

Evans Ye

ASF member, Apache Bigtop Committer/PMC member/Former VP, Director of Taiwan Data Engineering Association, Apache Software Foundation
Yu-Hsin Yeh(Evans Ye) is former VP, and currently committer and PMC member of Apache Bigtop. He loves to code, automate things, and tackling big data challenges. Aside from engineering stuff, he is also an enthusiast in giving talks to share software innovations and cutting-edge technologies... Read More →

Monday May 9, 2016 11:40am - 12:30pm PDT
Plaza B

2:00pm PDT

Convergence Rank and It’s Applications - Dalmo Cirne, mParticle, Inc.
In this paper we explore an algorithm to determine the relevance of each item in a finite set of the items in reference to each other, where in order to address an item you have to first go through a convergence or proxy item. If we imagine a media streaming company (convergence item) and all its available genres for playback (items in a finite set), how relevant is each music genre at different moments in time? Or a sports media company and the covered sports, how does the relevance of each sport changes throughout the year as sports seasons begin and end?

The applications of this algorithm are immediate and plentiful in possibilities: When should investments be made during the lifecycle of a sports season? Where to allocate resources in a financial portfolio based of ranks and trends? How to compensate artists given the relevance of the traffic they are generating?

avatar for Dalmo Cirne

Dalmo Cirne

Senior Director of Mobile Engineering, mParticle, Inc.
Dalmo Cirne is a software engineer and mathematician with more than 10 years of experience (in startups and large corporations) creating applications from its conception to architecture definition, development, deployment, and adoption by users worldwide.

Monday May 9, 2016 2:00pm - 2:50pm PDT
Plaza B

3:00pm PDT

Big Data Analytics Using R and PySpark for Business, Finance and Marketing - Dirk Van den Poel, Ghent University
In this talk, we share our experience in researching and practicing Business Analytics with a strong emphasis on predictive and prescriptive analytics. We present our findings using a series of platforms ranging from (1) large shared memory systems (e.g. for open-source R code, #rstats), over (2) dedicated Apache Spark clusters using Python Jupyter Notebooks to (3) very large HPC settings with hundreds of nodes (using HOD, https://github.com/hpcugent/hanythingondemand).
More specifically, we discuss our experience (a) running huge equity price-direction prediction models for S&P 100 stocks, (b) analyzing analytical CRM databases for a large retailer, and (c) researching the link between tweets and weblog data.

avatar for Dirk Van den Poel

Dirk Van den Poel

Professor of Data Analytics, Ghent University
Dirk Van den Poel (PhD) is Senior Full Professor of Data Analytics/Big Data at Ghent University, Belgium. He teaches courses such as Statistical Computing, Big Data, Predictive and Prescriptive Analytics. He co-authored 80+ international peer-reviewed publications in journals such... Read More →

Monday May 9, 2016 3:00pm - 3:50pm PDT
Plaza B

4:10pm PDT

ODPi: Advancing Open Data for the Enterprise - A Panel Discussion Moderated by Roman Shaposhnik, Pivotal Inc.
This panel will be an opportunity for members of the Open Data Platform Initiative to share the benefits of ODP with the Apache community.

avatar for Roman Shaposhnik

Roman Shaposhnik

Director of Open Source, Linux Foundation
Apache Software Foundation and Data, oh but also unikernels

avatar for Milind Bhandarkar

Milind Bhandarkar

Founder, Ampool
Milind Bhandarkar was the founding member of the team at Yahoo! that took Apache Hadoop from 20-node prototype to datacenter-scale production system. Parallel programming languages and paradigms has been his area of focus for over 20 years. He worked at several HPC companies, Yahoo... Read More →
avatar for Alan Gates

Alan Gates

Co-founder and Architect, Hortonworks
Alan is a founder of Hortonworks and an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan has done extensive work in Hive, including adding ACID transactions. Alan has a BS in Mathematics from... Read More →

Susan Malaika

Susan Malaika is Senior Technical Staff in IBM’s Open Technologies team focusing on data initiatives. Her background spans software development, data modeling, open data, creating and delivering workshops. Susan loves reading, writing and participating in meet-ups.
avatar for John Mertic

John Mertic

Director of Program Management, The Linux Foundation
John Mertic is the Director of Program Management for The Linux Foundation. Under his leadership, he has helped ASWF, ODPi, Open Mainframe Project, and R Consortium accelerate open source innovation and transform industries. John has an open source career spanning two decades, both... Read More →

Monday May 9, 2016 4:10pm - 5:00pm PDT
Plaza B

5:10pm PDT

Next-Gen Decision Making in Under 2ms - Ilya Ganelin, Capital One Data Innovation Lab
What if we had reached that point where open source can handle massively difficult streaming problems with enterprise-grade durability?

Today, Ilya presents Capital One’s novel solution for real-time decisioning on Apache Apex. With an analysis of the dominant streaming frameworks, he’ll show how Apex provides unique capabilities ensuring less than 2ms latency in an enterprise-grade solution on Hadoop.

He’ll first take a detailed dive into the business requirements of a new real-time decisioning platform for model building, feature computation, and model scoring. Next, a survey of the leading open source technologies for stream processing and what tradeoffs we considered when selecting our technology stack. Lastly, how Apex provides un-paralleled performance and meets the stringent performance, scalability, and durability requirements necessary for enterprise-grade decisioning.

avatar for Ilya Ganelin

Ilya Ganelin

Senior Data Engineer, Capital One Data Innovation Lab
Ilya is a roboticist turned data engineer. At the University of Michigan he built self-discovering robots and then worked on embedded DSP software with cell phone radios at Boeing. Today, he drives innovation at Capital One. Ilya is a contributor to the core components of Apache Spark... Read More →

Monday May 9, 2016 5:10pm - 6:00pm PDT
Plaza B