Track Descriptions

Track #1 Business Solutions (Business Strategists and Decision Makers)

Sessions in this track focus on the motivations behind the rapidly growing adoption of Apache Hadoop across a variety of industries. Speakers will present innovative Hadoop use cases and uncover how the technology fits into their existing data management environments. Attendees will learn how to leverage Hadoop to improve their own infrastructures and profit from increasing opportunities using all their data.

Track #2 Enterprise Architecture (Enterprise Architects)

Sessions in this track focus on enterprise architecture with an emphasis on how Hadoop is powering today’s advanced data management ecosystems and how Hadoop fits into modern enterprise environments. Speakers will discuss architecture and models, demonstrating how Hadoop connects to surrounding platforms. Sessions will focus on Hadoop deployment design patterns; enterprise models and system architecture; types of systems managing data that is transferred to Hadoop using Apache Sqoop and Apache Flume; and how to publish data via Apache Hive, Apache HBase and Apache Sqoop to systems that consume data from Hadoop.

Track #3 Operations (IT/Operations Managers and Practitioners)

Sessions in this track focus on the practices IT organizations employ to adopt and run Apache Hadoop with special emphasis on the people, processes and technology. Presentations will include initial deployment case studies, production scenarios or expansion scenarios with a focus on people, processes and technology. Speakers will discuss advances in reducing the cost of Hadoop deployment and increasing availability and performance.

Track #4 Applications (Data Scientists)

Hadoop is primarily used for two classes of applications: advanced analytics and data processing. Sessions in this track focus on algorithms, solutions and tools for solving data science and integration tasks using Hadoop. Speakers will reveal how Hadoop is used to solve problems across various industries, focusing on particular issues, unveiling relevant tips and discussing lessons learned from solving real-world challenges. Attendees should have a background in statistics, data mining, machine learning or big data management. Talks will include examples of algorithms, code samples and offer valuable insights from experts in the field.

Track #5 Development (Developers)

This track is a technical deep dive dedicated to discussion about Hadoop and application development for Hadoop. You will hear committers, contributors and expert users from various Hadoop projects discuss the finer points of building applications with Hadoop and the related ecosystem. The sessions will touch on foundational topics such as HDFS, HBase, Pig, Hive, Flume and other related technologies. In addition, speakers will address key development areas, including tools, performance, bringing the stack together and testing the stack. Sessions in this track are for users of all levels who want to learn more about upcoming features and enhancements, new tools, advanced techniques and best practices.

Print

Tue, Nov 8

Met Ballroom Empire East Empire West New York East New York West
7:30AM – 5:30PM

Registration

7:30AM – 8:30AM

Breakfast

8:30AM – 10:00AM

  • Michael Olson, Chief Executive Officer, Cloudera - Video | PPT
    Larry Feinsmith, Managing Director, Office of the CIO, JPMorgan Chase & Co
    Hugh Williams, VP, Experience, Search, and Platforms, eBay - Video | PPT

    General Session — Keynote Speakers

    Michael Olson, Chief Executive Officer, Cloudera
    Larry Feinsmith, Managing Director, Office of the CIO, JPMorgan Chase & Co
    Hugh Williams, VP, Experience, Search, and Platforms, eBay

    In compliance with JPMorgan Chase & Co, Larry Feinsmith's keynote presentation will not be made available as a Hadoop World 2011 resource. Hugh Williams, eBay Keynote Hugh Williams will discuss building Cassini, a new search engine at eBay which processes over 250 million search queries and serves more than 2 billion page views each day. [...]

10:00AM – 10:15AM

Break

10:15AM – 11:05AM
  • Building Web Analytics Processing on Hadoop at CBS Interactive

    Michael Sun, Lead Software Engineer, CBS Interactive
    Track #5 Development (Developers)

    Room: Met Ballroom

    We successfully adopted Hadoop as the web analytics platform, processing one Billion weblogs daily from hundreds of web site properties at CBS Interactive. After I introduce Lumberjack, the Extraction, Transformation and Loading framework we built based on python and streaming, which is under review for Open-Source release, I will talk about web metrics processing on [...]

  • Building Realtime Big Data Services at Facebook with Hadoop and HBase

    Jonathan Gray, Software Engineer, Facebook
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire East

    Facebook has one of the largest Apache Hadoop data warehouses in the world, primarily queried through Apache Hive for offline data processing and analytics. However, the need for realtime analytics and end-user access has led to the development of several new systems built using Apache HBase. This talk will cover specific use cases and the [...]

  • Hadoop in a Mission Critical Environment

    Jim Haas, Director Data Warehouse ETL, CBS Interactive
    Track #3 Operations (IT/Operations Managers and Practitioners)

    Room: Empire West

    Our need for better scalability in processing weblogs is illustrated by the change in requirements - processing 250 million vs. 1 billion web events a day (and growing). The Data Waregoup at CBSi has been transitioning core processes to re-architected hadoop processes for two years. We will cover strategies used for successfully transitioning core ETL [...]

  • Hadoop's Life in Enterprise Systems

    Y Masatani, Senior Specialist, NTT DATA
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: New York East

    NTT DATA has been providing Hadoop professional services for enterprise customers for years. In this talk we will categorize Hadoop integration cases based on our experience and illustrate archetypal design practices how Hadoop clusters are deployed into existing infrastructure and services. We will also present enhancement cases motivated by customer's demand including GPU for big [...]

  • Completing the Big Data Picture: Understanding Why and Not Just What

    Sid Probstein, Chief Technology Officer, Attivio
    Track #1 Business Solutions (Business Strategists and Decision Makers)

    Room: New York West

    It's increasingly clear that Big Data is not just about volume – but also the variety, complexity and velocity of enterprise information. Integrating data with insights from unstructured information such as documents, call logs, and web content is essential to driving sustainable business value. Aggregating and analyzing unstructured content is challenging because human expression is [...]

11:15AM – 12:05PM
  • The Hadoop Stack - Then, Now and In The Future

    Charles Zedlewski, Vice President, Product, Cloudera
    Eli Collins, Software Engineer, Cloudera
    Track #5 Development (Developers)

    Room: Met Ballroom

    Many people refer to Apache Hadoop as their system of choice for big data management but few actually use just Apache Hadoop. Hadoop has become a proxy for a much larger system which has HDFS storage at its core. The Apache Hadoop based "big data stack" has changed dramatically over the past 24 months and [...]

  • Storing and Indexing Social Media Content in the Hadoop Ecosystem

    Lance Riedel, Principal Engineer, Jive Software
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire East

    Jive is using Flume to deliver the content of a social web (250M messages/day) to HDFS and HBase. Flume's flexible architecture allows us to stream data to our production data center as well as Amazon's Web Services datacenter. We periodically build and merge Lucene indices with Hadoop jobs and deploy these to Katta to provide [...]

  • Hadoop Troubleshooting 101

    Kate Ting, Customer Operations Engineer, Cloudera
    Track #3 Operations (IT/Operations Managers and Practitioners)

    Room: Empire West

    Attend this session and walk away armed with solutions to the most common customer problems. Learn proactive configuration tweaks and best practices to keep your cluster free of fetch failures, job tracker hangs, and other common issues.

  • Building Relational Event History Model with Hadoop

    Josh Lospinoso, Owner, University of Oxford
    Track #4 Applications (Data Scientists)

    Room: New York East

    In this session we will look at Reveal, a statistical network analysis library built on Hadoop that uses relational event history analysis to grapple with the complexity, temporal causality, and uncertainty associated with dynamically evolving, growing, and changing networks. There are a broad range of applications for this work, from finance to social network analysis [...]

  • The Blind Men and the Elephant

    Matthew Aslett, Senior Analyst, The 451 Group
    Track #1 Business Solutions (Business Strategists and Decision Makers)

    Room: New York West

    Who is contributing to the Hadoop ecosystem, what are they contributing, and why? Who are the vendors that are supplying Hadoop-related products and services and what do they want from Hadoop? How is the expanding ecosystem benefiting or damaging the Apache Hadoop project? What are the emerging alternatives to Hadoop and what chance do they [...]

12:05PM – 1:15PM

Lunch

1:15PM – 2:05PM
  • Lily: Smart Data at Scale, Made Easy

    Steven Noels, CEO, Outerthought
    Track #5 Development (Developers)

    Room: Met Ballroom

    Lily is a repository made for the age of Data, and combines CDH, HBase and Solr in a powerful, high-level, developer-friendly backing store for content-centric application with ambition to scale. In this session, we highlight why we choose HBase as the foundation for Lily, and how Lily will allow users to not only store, index [...]

  • Security Considerations for Hadoop Deployments

    Jeremy Glesner, Chief Technology Officer, Berico Technologies
    Richard Clayton, Chief Engineer, Berico Technologies
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire East

    Security in a distributed environment is a growing concern for most industries. Few face security challenges like the Defense Community, who must balance complex security constraints with timeliness and accuracy. We propose to briefly discuss the security paradigms defined in DCID 6/3 by NSA for secure storage and access of data (the “Protection Level” system). [...]

  • Unlocking the Value of Big Data with Oracle

    Jean-Pierre Dijcks, Senior Principal Product Manager, Oracle
    Track #3 Operations (IT/Operations Managers and Practitioners)

    Room: Empire West

    Analyzing new and diverse digital data streams can reveal new sources of economic value, provide fresh insights into customer behavior and identify market trends early on. But this influx of new data can create challenges for IT departments. To derive real business value from Big Data, you need the right tools to capture and organize [...]

  • Raptor - Real-time Analytics on Hadoop

    Soundar Velu, Product Architect, Sungard
    Track #4 Applications (Data Scientists)

    Room: New York East

    Raptor combines Hadoop & HBase with machine learning models for adaptive data segmentation, partitioning, bucketing, and filtering to enable ad-hoc queries and real-time analytics. Raptor has intelligent optimization algorithms that switch query execution between HBase and MapReduce. Raptor can create per-block dynamic bloom filters for adaptive filtering. A policy manager allows optimized indexing and autosharding. [...]

  • Hadoop Trends & Predictions

    Vanessa Alverez, Analyst, Infrastructure and Operations, Forrester
    Track #1 Business Solutions (Business Strategists and Decision Makers)

    Room: New York West

    Hadoop is making its way into the enterprise, as organizations look to extract valuable information and intelligence from the mountains of data in their storage environments. The way in which this data is analyzed and stored is changing, and Hadoop has become a critical part of this transformation. In this session, Vanessa will cover the trends we are [...]

2:15PM – 3:05PM
  • HDFS Name Node High Availability

    Aaron Myers, Software Engineer, Cloudera
    Suresh Srinivas, Founder and Architect, Hortonworks
    Track #5 Development (Developers)

    Room: Met Ballroom

    HDFS HA has been a highly sought after feature for years. Through collaboration between Cloudera, Facebook, Yahoo!, and others, a high availability system for the HDFS Name Node is actively being worked on. This talk will discuss the architecture and setup of this system.

  • Hadoop Network and Compute Architecture Considerations

    Jacob Rapp, Manager, Technical Marketing, Cisco
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire East

    Hadoop is a popular framework for web 2.0 and enterprise businesses who are challenged to store, process and analyze large amounts of data as part of their business requirements. Hadoop’s framework brings a new set of challenges related to the compute infrastructure and underlined network architectures. This session reviews the state of Hadoop enterprise environments, [...]

  • Life in Hadoop Ops - Tales From the Trenches

    Eric Sammer, Solutions Architect and Training Instructor, Cloudera
    Gregory Baker, Lead Software Engineer, AT&T Interactive
    Karthik Ranganathan, Software Engineer, Facebook
    Nicholas Evans, System Engineer, AOL Advertising
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire West

    This session will be a panel discussion with experienced Hadoop Operations practitioners from several different organizations. We'll discuss the role, the challenges and how both these will change in the coming years.

  • Building a Model of Organic Link Traffic

    Brian David Eoff, Scientist, Bit.ly
    Track #4 Applications (Data Scientists)

    Room: New York East

    At bitly we study behaviour on the internet by capturing clicks on shortened URLs. This link traffic comes in many forms yet, when studying human behaviour, we're only interested in using 'organic' traffic: the traffic patterns caused by actual humans clicking on links that have been shared on the social web. To extract these patterns, [...]

  • The State of Big Data Adoption in the Enterprise

    Tony Baer, Principal Analyst, Ovum IT Software
    Track #1 Business Solutions (Business Strategists and Decision Makers)

    Room: New York West

    As Big Data has captured attention as one of “the next big things” in enterprise IT, most of the spotlight has focused on early adopters. But what is the state of Big Data adoption across the enterprise mainstream? Ovum recently surveyed 150 global organizations in a variety of vertical industries with revenue of $500 million+ [...]

3:05PM – 3:30PM

Break

3:30PM – 4:20PM
  • Integrating Hadoop with Enterprise RDBMS Using Apache SQOOP and Other Tools

    Guy Harrison, Senior Director of Research and Development, Quest Software
    Arvind Prabhakar, Software Engineer, Cloudera
    Track #5 Development (Developers)

    Room: Met Ballroom

    As Hadoop graduates from pilot project to a mission critical component of the enterprise IT infrastructure, integrating information held in Hadoop and in Enterprise RDBMS becomes imperative. We’ll look at key scenarios driving Hadoop and RDBMS integration and review technical options. In particular, we’ll deep dive into the Apache SQOOP project, which expedites data movement [...]

  • WibiData: Building Personalized Applications with HBase

    Aaron Kimball, Founder and CTO, Odiago
    Garrett Wu, Director of Engineering, Odiago
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire East

    WibiData is a collaborative data mining and predictive modeling platform for large-scale, multi-structured, user-centric data. It leverages HBase to combine batch analysis and real time access within the same system, and integrates with existing BI, reporting and analysis tools. WibiData offers a set of libraries for common user-centric analytic tasks, and more advanced data mining [...]

  • Hadoop and Graph Data Management: Challenges and Opportunities

    Daniel Abadi, Yale University + Hadapt
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire West

    As Hadoop rapidly becomes the universal standard for scalable data analysis and processing, it is increasingly important to understand its strengths and weaknesses for particular application scenarios in order to avoid inefficiency pitfalls. For example, Hadoop has great potential to perform scalable graph analysis if it is used correctly. Recent benchmarking has shown that simple [...]

  • Data Mining in Hadoop, Making Sense Of It in Mahout!

    Michael Cutler, Senior Research Engineer, British Sky Broadcasting
    Track #4 Applications (Data Scientists)

    Room: New York East

    Much of Hadoop adoption thus far has been for use cases such as processing log files, text mining, and storing masses of file data -- all very necessary, but largely not exciting. In this presentation, Michael Cutler presents a selection of methodologies, primarily using Mahout, that will enable you to derive real insight into your [...]

  • The Hadoop Award for Government Excellence

    Bob Gourley, CTO, Crucial Point LLC
    Track #1 Business Solutions (Business Strategists and Decision Makers)

    Room: New York West

    Federal, State and Local governments and the development community surrounding them are busy creating solutions leveraging the Apache Foundation Hadoop capabilities. This session will highlight the top five picked from an all star panel of judges. Who will take home the coveted Government Big Data Solutions Award for 2011? This presentation will also highlight key [...]

4:30PM – 5:20PM
  • Next Generation Apache Hadoop MapReduce

    Mahadev Konar, Co Founder, Hortonworks
    Track #5 Development (Developers)

    Room: Met Ballroom

    The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Apache Hadoop MapReduce that factors the framework into a generic resource scheduler and a per-job, user-defined component that manages the application execution. Since downtime is more expensive at scale high-availability is built-in from the beginning; [...]

  • Hadoop and Netezza Deployment Models and Case Study

    Krishnan Parasuraman, CTO, Digital Media, Netezza
    Greg Rokita, Director, Software Architecture, Edmunds
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire East

    Hadoop has rapidly emerged as a viable platform for Big Data analytics. Many experts believe Hadoop will subsume many of the data warehousing tasks presently done by traditional relational systems. In this session, you will learn about the similarities and differences of Hadoop and parallel data warehouses, and typical best practices. Edmunds will discuss how [...]

  • I Want to Be BIG - Lessons Learned at Scale

    David "Sunny" Sundstrom, Director, Software Products, SGI
    Track #3 Operations (IT/Operations Managers and Practitioners)

    Room: Empire West

    SGI has been a leading commercial vendor of Hadoop clusters since 2008. Leveraging SGI's experience with high performance clusters at scale, SGI has delivered individual Hadoop clusters of up to 4000 nodes. In this presentation, through the discussion of representative customer use cases, you’ll explore major design considerations for performance and power optimization, how integrated [...]

  • Data Mining for Product Search Ranking

    Aaron Beppu, Software Engineer, Etsy
    Track #4 Applications (Data Scientists)

    Room: New York East

    How can you rank product search results when you have very little data about how past shoppers have interacted with the products? Through large scale analysis of its clickstream data, Etsy is automatically discovering product attributes (things like materials, prices, or text features) which signal that a search result is particularly relevant (or irrelevant) to [...]

  • From Big Data to Lives Saved: HBase in HealthCare

    Doug Meil, Chief Software Architect, Explorys
    Charlie Lougheed, Co-founder, President, and CTO, Explorys
    Track #1 Business Solutions (Business Strategists and Decision Makers)

    Room: New York West

    Explorys, founded in 2009 in partnership with the Cleveland Clinic, is one of the largest clinical repositories in the United States with 10 million lives under contract. HBase and Hadoop are at the center of Explorys. The Explorys healthcare platform is based upon a massively parallel computing model that enables subscribers to search and analyze [...]

5:10PM – 7:00PM

Networking Exhibitor Reception

Wed, Nov 9

Met Ballroom Empire East Empire West New York East New York West
7:30AM – 5:00PM

Registration

7:30AM – 8:30AM

Breakfast

8:30AM – 9:45AM

  • Doug Cutting, Architect, Cloudera - Video | PPT
    James Markarian, Executive Vice President and CTO, Informatica - Video | PPT

    General Session — Keynote Speakers

    Doug Cutting, Architect, Cloudera
    James Markarian, Executive Vice President and CTO, Informatica

    James Markarian Keynote: The Future of the Data Management Market, and What that Means to You James Markarian will discuss historical trends and technology shifts in data management and how the data deluge has contributed to the emergence of Apache Hadoop.  James will showcase examples of how forward-looking organizations are leveraging Hadoop to maximize their [...]

9:45AM – 10:00AM

Break

10:00AM – 10:50AM
  • Hadoop and Performance

    Todd Lipcon, Software Engineer, Cloudera
    Yanpei Chen, Software Engineer, Cloudera
    Track #5 Development (Developers)

    Room: Met Ballroom

    Performance is a thing that you can never have too much of. But performance is a nebulous concept in Hadoop. Unlike databases, there is no equivalent in Hadoop to TPC, and different use cases experience performance differently. This talk will discuss advances on how Hadoop performance is measured and will also talk about recent and [...]

  • Architecting a Business-Critical Application in Hadoop

    Stephen Daniel, Technical Director, Database Platform and Performance Technology, NetApp
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire East

    NetApp is in the process of moving a petabyte-scale database of customer support information from a traditional relational data warehouse to a Hadoop-based application stack. This talk will explore the application requirements and the resulting hardware and software architecture. Particular attention will be paid to trade-offs in the storage stack, along with data on the [...]

  • Preview of the New Cloudera Management Suite

    Henry Robinson, Software Engineer, Cloudera
    Phil Zeyliger, Software Engineer, Cloudera
    Vinithra Varadharajan, Software Engineer, Cloudera
    Track #3 Operations (IT/Operations Managers and Practitioners)

    Room: Empire West

    This session will preview what is new in the latest release of the Cloudera Management Suite. We will cover the common problems we've seen in Hadoop management and will do a demonstration of several new features designed to address these problems.

  • Big Data Analytics – Data Professionals: the New Enterprise Rock Stars

    Martin Hall, Co-Founder and EVP of Corporate Development, Karmasphere
    Track #4 Applications (Data Scientists)

    Room: New York East

    In this session, we will explore how Hadoop and Big Data are re-inventing enterprise workflows and the pivotal role of the Data Analyst. We will examine the changing face of analytics and the streamlining of iterative queries through evolved user interfaces. We will explain how combining Hadoop and SQL-based analytics help companies discover emergent trends [...]

  • BI on Hadoop in Financial Services

    Stefan Groschupf, CEO, Datameer
    Track #1 Business Solutions (Business Strategists and Decision Makers)

    Room: New York West

    This session is designed for banking and other financial services managers with technical experience and for engineers. It will discuss business intelligence platform deployments on Hadoop including cost performance, customer analytics, value-at-risk analytics and IT SLA’s.

11:00AM – 11:50AM
  • HDFS Federation

    Suresh Srinivas, Founder and Architect, Hortonworks
    Track #5 Development (Developers)

    Room: Met Ballroom

    Scalability of the NameNode has been a key issue for HDFS clusters. Because the entire file system metadata is stored in memory on a single NameNode, and all metadata operations are processed on this single system, the NameNode both limits the growth in size of the cluster and makes the NameService a bottleneck for the [...]

  • Hadoop vs. RDBMS for Big Data Analytics... Why Choose?

    Mingsheng Hong, Field CTO, HP Vertica
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire East

    When working with structured, semi-structured, and unstructured data, there is often a tendency to try and force one tool - either Hadoop or a traditional DBMS - to do all the work. At Vertica, we've found that there are reasons to use Hadoop for some analytics projects, and Vertica for others, and the magic comes [...]

  • Big Data Architecture: Integrating Hadoop with Other Enterprise Analytic Systems

    Tasso Argyros, Vice President of Marketing and Product Management, Aster Data
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire West

    Recent research has pointed out the complementary nature of Hadoop and other data management solutions and the importance of leveraging existing systems, SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve analytic processing. Come to this session to learn how companies optimize the use of Hadoop with other enterprise [...]

  • Leveraging Hadoop to Transform Raw Data into Rich Features at LinkedIn

    Abhishek Gupta, Software Engineer, Recommendation Engine, LinkedIn
    Track #4 Applications (Data Scientists)

    Room: New York East

    This presentation focuses on the design and evolution of the LinkedIn recommendations platform. It currently computes more than 100 billion personalized recommendations every week, powering an ever growing assortment of products, including Jobs You May be Interested In, Groups You May Like, News Relevance, and Ad Targeting. We will describe how we leverage Hadoop to [...]

  • Advancing Disney’s Data Infrastructure with Hadoop

    Matt Estes, Director Data Architecture, The Walt Disney Company
    Track #1 Business Solutions (Business Strategists and Decision Makers)

    Room: New York West

    This is the story of why and how Hadoop was integrated into the Disney data infrastructure. Providing data infrastructure for Disney’s, ABC’s and ESPN’s Internet presences is challenging. Doing so requires cost effective, performant, scalable and highly available solutions. Information requirements from the business add the need for these solutions work together; providing consistent acquisition, [...]

11:50AM – 1:00PM

Lunch

1:00PM – 1:50PM
  • HBase Roadmap

    Jonathan Gray, Software Engineer, Facebook
    Track #5 Development (Developers)

    Room: Met Ballroom

    This technical session will provide a quick review of the Apache HBase project, looking at it from the past to the future. It will cover the imminent HBase 0.92 release as well as what is slated for 0.94 and beyond. A number of companies and use cases will be used as examples to describe the [...]

  • Replacing RDB/DW with Hadoop and Hive for Telco Big Data

    Jason Han, Founder and CEO, NexR
    Ja-Hyung Koo, Senior Manager, Korea Telecom
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire East

    This session will focus on the challenges of replacing existing Relational DataBase and Data Warehouse technologies with Open Source components. Jason Han will base his presentation on his experience migrating Korea Telecom (KT’s) CDR data from Oracle to Hadoop, which required converting many Oracle SQL queries to Hive HQL queries. He will cover the differences [...]

  • Proven Tools to Simplify Hadoop Environments

    Joey Jablonski, Principal Solution Architect, Dell
    Vin Sharma, Enterprise Software Strategist, Intel
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire West

    Do you see great potential in Hadoop but you also have questions or challenges to overcome? Come to this session to get answers to your questions and advice. Dell Big Data Architect Joey Jablonski and Intel Enterprise Software Strategist Vin Sharma will answer frequently asked questions about Hadoop, and share proven ways you can overcome [...]

  • Radoop: A Graphical Analytics Tool for Big Data

    Gábor Makrai, Chief Technology Officer, Radoop
    Track #4 Applications (Data Scientists)

    Room: New York East

    Hadoop is an excellent environment for analyzing large data sets, but it lacks an easy-to-use graphical interface for building data pipelines and performing advanced analytics. RapidMiner is an excellent open-source tool for data analytics, but is limited to running on a single machine.In this presentation, we will introduce Radoop, an extension to RapidMiner that lets [...]

  • How Hadoop is Revolutionizing Business Intelligence and Advanced Data Analytics

    Dr. Amr Awadallah, Co-founder and CTO, Cloudera
    Track #1 Business Solutions (Business Strategists and Decision Makers)

    Room: New York West

    The introduction of Apache Hadoop is changing the business intelligence data stack. In this presentation, Dr. Amr Awadallah, chief technology officer at Cloudera, will discuss how the architecture is evolving and the advanced capabilities it lends to solving key business challenges. Awadallah will illustrate how enterprises can leverage Hadoop to derive complete value from both [...]

2:00PM – 2:50PM
  • Hadoop Hadoop 0.23

    Arun Murthy, Founder, Architect, Hortonworks
    Track #5 Development (Developers)

    Room: Met Ballroom

    Apache Hadoop is the de-facto Big Data platform for data storage and processing. The current stable, production release of Hadoop is "hadoop-0.20". The Apache Hadoop community is preparing to release "hadoop-0.23" with several major improvements including HDFS Federation and NextGen MapReduce. In this session, Arun Murthy, who is the Apache Hadoop Release Master for "hadoop.next", [...]

  • Data Ingestion, Egression, and Preparation for Hadoop

    Sanjay Kaluskar, Sr. Architect, Informatica
    David Teniente, Lead Data Architect, Rackspace
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire East

    One of the first challenges Hadoop developers face is accessing all the data they need and getting it into Hadoop for analysis. Informatica PowerExchange accesses a variety of data types and structures at different latencies (e.g. batch, real-time, or near real-time) and ingests data directly into Hadoop.  The next step is to parse the data in preparation [...]

  • Advanced HBase Schema Design

    Lars George, Solutions Architect, Cloudera
    Track #3 Operations (IT/Operations Managers and Practitioners)

    Room: Empire West

    While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second. This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples [...]

  • Large Scale Log Data Analysis for Marketing in NTT Communications

    Kenji Hara, NTT Communications
    Track #4 Applications (Data Scientists)

    Room: New York East

    NTT Communications built a log analysis system for marketing using hadoop, which explore the internet users' interests or feedback about specified products or themes from access log, query/click log and CGM data. Our system provides three features: sentiment analysis, co-occuring keyword extraction, and user interests estimation. For large scale analysis, we use Hadoop with customized [...]

  • Practical Knowledge for Your First Hadoop Project

    Mark Slusar, Manager of Location Content, NAVTEQ
    Boris Lublinsky, Principal Architect, NAVTEQ
    Mike Segel, NAVTEQ
    Track #1 Business Solutions (Business Strategists and Decision Makers)

    Room: New York West

    A collection of guidelines and advice to help a technologist successfully complete their first Hadoop project. This presentation is based on our experiences in initiating and executing several successful Hadoop projects. Part 1 focuses on tactics to “sell” Hadoop to stakeholders and senior management, including understanding what Hadoop is and what is its “sweet” spots, [...]

2:50PM – 3:20PM

Break

3:20PM – 4:10PM
  • SHERPASURFING - Open Source Cyber Security Solution

    Wayne Wheeles, Cyber Security Defensive Analytic Developer, Novii Design
    Track #5 Development (Developers)

    Room: Met Ballroom

    Every day billions of packets, both benign and some malicious, flow in and out of networks. Every day it is an essential task for the modern Defensive Cyber Security Organization to be able to reliably survive the sheer volume of data, bring the NETFLOW data to rest, enrich it, correlate it and perform. SHERPASURFING is [...]

  • Leveraging Hadoop for Legacy Systems

    Mathias Herberts, Disruptive Engineer - BigData Advocate, Credit Mutuel Arkea
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire East

    Since many companies in the financial sector still relies on legacy systems for its daily operations, Hadoop can only be truly useful in those environments if it can fit nicely among COBOL, VSAM, MVS and other legacy technologies. In this session, we will detail how Crédit Mutuel Arkéa solved this challenge and successfully mixed the [...]

  • Practical HBase

    Ravi Veeramchaneni, Big Data Product Specialist, Enterprise Architecture Services, Informatica
    Track #3 Operations (IT/Operations Managers and Practitioners)

    Room: Empire West

    Many developers have experience in working on relational databases using SQL. The transition to No-SQL data stores, however, is challenging and often time confusing. This session will share experiences of using HBase from Hardware selection/deployment to design, implementation and tuning of HBase. At the end of the session, audience will be in a better position [...]

  • Leveraging Big Data in the Fight Against Spam and Other Security Threats

    Wade Chambers, Executive Vice President, Development/Operations, Proofpoint
    Track #4 Applications (Data Scientists)

    Room: New York East

    In 2004, Bill Gates told a select group of participants in the World Economic Forum that "two years from now, the spam issue will be solved.” Eight years later, the spam problem is only getting worse, with no sign of relief. Big Data technologies such as Hadoop, MapReduce, Cassandra, and real-time stream processing can [...]

  • Changing Company Culture with Hadoop

    Amy O'Connor, Senior Director, Analytics, Nokia
    Track #1 Business Solutions (Business Strategists and Decision Makers)

    Room: New York West

    We are living in a time of tremendous convergence, convergence of mobile, cloud and social… This convergence is forcing companies to change. At Nokia, we are changing the way we make decisions, from a manufacturing model to a data driven one. Yet making cultural changes is one of the hardest things to accomplish. In this [...]

4:20PM – 5:10PM
  • Gateway: Cluster Virtualization Framework

    Konstantin Shvachko, Principal Hadoop Architect, eBay
    Track #5 Development (Developers)

    Room: Met Ballroom

    Access to Hadoop clusters through dedicated portal nodes (typically located behind firewalls and performing user authentication and authorization) can have several drawbacks -- as shared multitenant resources they can create contention among users and increase the maintenance overhead for cluster administrators. This session will discuss the Gateway system, a cluster virtualization framework that provides multiple [...]

  • Extending the Enterprise Data Warehouse with Hadoop

    Jonathan Seidman, Lead Engineer, Orbitz Worldwide
    Rob Lancaster, Orbitz worldwide
    Track #2 Enterprise Architecture (Enterprise Architects)

    Room: Empire East

    Hadoop provides the ability to extract business intelligence from extremely large, heterogeneous data sets that were previously impractical to store and process in traditional data warehouses. The challenge now is in bridging the gap between the data warehouse and Hadoop. In this talk we’ll discuss some steps that Orbitz has taken to bridge this gap, [...]

  • Hadoop as a Service in Cloud

    Junping Du, Member of Technical Staff, VMware
    Richard McDougall, Application Infrastructure CTO and Principal Engineer, VMware
    Track #3 Operations (IT/Operations Managers and Practitioners)

    Room: Empire West

    Hadoop framework is often built on native environment with commodity hardware as its original design. However, with growing tendency of cloud computing, there is stronger requirement to build Hadoop cluster on a public/private cloud in order for customers to benefit from virtualization and multi-tenancy. This session discusses how to address some the challenges of providing [...]

  • The Powerful Marriage of R and Hadoop

    David Champagne, CTO, Revolution Analytics
    Track #4 Applications (Data Scientists)

    Room: New York East

    When two of the most powerful innovations in modern analytics come together, the result is revolutionary. This session will cover: An overview of R, the Open Source programming language used by more than 2 million users that was specifically developed for statistical analysis and data visualization. The ways that R and Hadoop have been integrated. [...]

  • Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop

    Oliver Guinan, Vice President, Ground Systems, Skybox Imaging
    Track #1 Business Solutions (Business Strategists and Decision Makers)

    Room: New York West

    Skybox Imaging is using Hadoop as the engine of it's satellite image processing system. Using CDH to store and process vast quantities of raw satellite image data enables Skybox to create a system that scales as they launch larger numbers of ever more complex satellites. Skybox has developed a CDH based framework that allows image [...]