Name: Cancer Outlier Profile Analysis Using Spark - Mahmoud Parsian, Illumina, Inc.
Start: 2016-05-10T09:00:00-0700
End: 2016-05-10T09:50:00-0700

Register Now or Visit the Website for more Information

Back To Schedule

Cancer Outlier Profile Analysis Using Spark - Mahmoud Parsian, Illumina, Inc.

Cancer Outlier Profile Analysis (COPA) is a method to find genes
that undergo recurrent fusion in a given cancer type by finding
pairs of genes that have mutually exclusive outlier profiles.
COPA is used for detecting translocations of the second type
using microarray data. The goal of COPA is to identify genes
that have a subset of disease samples with outstanding high/low
values. We have implemented COPA in Spark for production, which
can process millions of biomarkers for one-sided and two-sided
analysis, where each biomarker may have thousands of genes.
Selection of the Spark for COPA implementation was a natural
choice, since Spark offers natural join and filter operations
(main steps in COPA implementation) in a very high level manner,
which is lacking from traditional MapReduce API. This presentation
will show how we used Spark to solve a complex COPA.

Speakers

Mahmoud Parsian

Illumina, Inc.

Mahmoud Parsian, Ph.D. in Computer Science, is a practicing software professional with 30 years of experience as a developer, designer, architect, and author. For the past 15 years, he has been involved in Java server-side, databases, MapReduce, Hadoop, Spark, and distributed... Read More →

Cancer Outlier Profile Analysis Using Spark pdf

Tuesday May 10, 2016 9:00am - 9:50am PDT
Georgia B

Operations-Use Cases, Intermediate

Apache: Big Data 2016

Mahmoud Parsian

Attendees (23)

Apache: Big Data 2016

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Mahmoud Parsian

Attendees (23)