Abstract


BIG DATA ANALYTICS FOR VIRAL ECOLOGY

Big data is pervasive in biology and can be used to discover new insights into interconnected biological processes. This is particularly true for research in viral ecology, wherein large-scale metagenomic datasets are uncovering the extensive genetic diversity of viruses and their role in host-driven nutrient and energy cycles in aquatic systems. Yet, despite innovations in sequencing technology, bottlenecks still exist in analyzing these massive and highly contextualized datasets. Specifically, ecosystem-wide analyses require the harmonization, integration and analysis of multiple biological datasets such as genes, protein function, pathways and environmental or host-related factors. Here we describe a strategy to perform massive comparative metagenomic sequence analysis using the Hadoop big data architecture, and interconnect these data with biological annotations stored in a scalable Neo4J graph database for functional, taxonomic and ecosystem-level analyses. We demonstrate the utility of our toolkit using a large-scale viral metagenomics dataset from the TARA Oceans Expedition. This work represents a first step in storing, comparing, and querying massive metagenomic datasets using scalable big data architectures toward understanding viruses and their impact on host-processes in the ocean.

Authors

Hurwitz, B. L., University of Arizona, USA, bhurwitz@email.arizona.edu

Choi, I., University of Arizona, USA, iychoi@email.arizona.edu

Youens-Clark, C. K., University of Arizona, USA, kyclark@email.arizona.edu

Hartman, J. H., University of Arizona, USA, jhh@cs.arizona.edu

Details

Oral presentation

Session #:105
Date: 2/27/2015
Time: 15:00
Location: Auditorium Manuel de Falla (Floor 1)

Presentation is given by student: No