Big Data Made Easy

A Working Guide to the Complete Hadoop Toolset

By Michael Frampton

Big Data Made Easy Cover Image

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset is an introduction for
developers and architects anyone else interested in big data to using
the Apache Hadoop toolset. It includes a description of all tool capabilities
as well as in-depth instructions to build and test a working system.

Full Description

  • ISBN13: 978-1-484200-95-7
  • 375 Pages
  • User Level: Beginner to Advanced
  • Publishing November 25, 2014, but available now as part of the Alpha Program
  • Available eBook Formats: EPUB, MOBI, PDF
  • Print Book Price: $44.99
  • eBook Price: $31.99

Related Titles

Full Description

Many corporations are finding that the size of their data sets are outgrowing the capability of their systems to store and process them. The data is becoming too big to manage and use with traditional tools. The solution: implementing a big data system.

As Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset shows, Apache Hadoop offers a scalable, fault-tolerant system for storing and processing data in parallel. It has a very rich toolset that allows for storage (Hadoop), configuration (Yarn and ZooKeeper), collection (Nutch and Solr), processing (Storm, Pig, and Map Reduce), scheduling (Oozie), moving (Sqoop and Avro), monitoring (Chukwa, Ambari, and Hue), testing (Big Top), and analysis (Hive).

The problem is that the internet offers IT pros wading into big data many versions of the truth and some outright falsehoods born of ignorance. What is needed is a book just like this one: a wide-ranging but easily understood set of instructions to explain where to get Hadoop tools, what they can do, how to install them, how to configure them, how to integrate them, and how to use them successfully. And you need an expert who has worked in this area for a decade—someone just like author and big data expert Mike Frampton.

Big Data Made Easy approaches the problem of managing massive data sets from a systems perspective, and it explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage. It explains, in an easily understood manner and through numerous examples, how to use each tool. The book also explains the sliding scale of tools available depending upon data size and when and how to use them. Big Data Made Easy shows developers and architects, as well as testers and project managers, how to:

  • Store big data
  • Configure big data
  • Process big data
  • Schedule processes
  • Move data among SQL and NoSQL systems
  • Monitor data
  • Perform big data analytics
  • Report on big data processes and projects
  • Test big data systems

Big Data Made Easy also explains the best part, which is that this toolset is free. Anyone can download it and—with the help of this book—start to use it within a day. With the skills this book will teach you under your belt, you will add value to your company or client immediately, not to mention your career.

What you’ll learn

  • How to install and employ Hadoop

  • How to install and use Hadoop-related tools like Hive, Storm, Pig, Solr, Oozie, Ambari, and many others
  • How to set up and test a big data system
  • How to scale the system for the amount of data at hand and the data you expect to accumulate
  • How those who have spent their careers in the SQL database world can apply their skills to building big data systems

Who this book is for

This book is for developers, architects, IT project managers, database administrators, and others charged with developing or supporting a big data system. It is also for a general IT audience, anyone interested in Hadoop or big data, and those experiencing problems with data size. It’s also for anyone who would like to further their career in this area by adding big data skills.

Table of Contents

Table of Contents

Chapter 1: The Problem with Data

Chapter Goal:

  1. Explain the big data problem
  2. Explain how Hadoop tools can help
  3. Explain my method of Hadoop tool use
  4. Explain how these tools fit together using a data warehouse as a metaphor
  5. Explain to people how using these tools can save them time and money while "futureproofing" their organizations.

Chapter 2: Storing and Configuring Data with Hadoop, Yarn, and ZooKeeper

Chapter Goal:

  1. Provide a Hadoop platform overview
  2. Explain how Hadoop can be installed and configured
  3. Explain how Hadoop can be used via examples
  4. Explain configuration tools with examples
  5. Briefly explain the wider command set.

Chapter 3: Collecting Data with Nutch and Solr

Chapter Goal:

  1. Explain how big data can be modified and imported into Hadoop
  2. Explain how ETL streams can quickly become very long and complex
  3. Explain the Hadoop collection tools with worked examples

Chapter 4: Processing Data with Storm, Pig, and Map Reduce

Chapter Goal:

  1. Explain how big data can be processed using Hadoop tools
  2. Give examples of processing tool use and when and why they might be useful
  3. Show results and compare tools

Chapter 5: Scheduling Using Oozie

Chapter Goal:

  1. 1. Explain how important scheduling is to system management
  2. 2. Explain monitoring and problem alerting
  3. 3. Explain the tools used via example

Chapter 6: Moving Data with Sqoop and Avro

Chapter Goal:

  1. Explain the special problems that big data brings to data movement
  2. Explain the tools used to move big data
  3. Give worked examples for tool installation and use

Chapter 7: Monitoring the System with Chukwa, Ambari, and Hue

Chapter Goal:

  • Explain the need to monitor a big data system, which may contain millions of files
  • Explain the systems and tools available to monitor
  • Give worked examples for tool installation and use

Chapter 8: Analyzing and Querying Data with Hive and MongoDB

Chapter Goal:

  • Explain how to query data
  • Explain the tools available to the analyst/manager/tester
  • Show how to install and use analytics tools, with examples

Chapter 9: Reporting with Hadoop and Other Software

Chapter Goal:

  • Explain how you can assist management via reports
  • Explain the tools Hadoop and other software provides
  • Show to how to install reporting tools and use them, with examples

Chapter 10: Testing with Big Top

Chapter Goal:

  • Explain how to test a big data system
  • Explain what testing tools are available
  • Show how to install and use them, with examples

Chapter 11: Hadoop Present and Future

Chapter Goal:

  • Explain that data sizes will just keep growing
  • Explain that financial and regulatory pressures will push for greater data retention
  • Explain that this is already happening in the energy and banking sectors
  • Explain how Hadoop, a free tool, will help solve these problems going forward
  • Explain to readers that getting involved now could build them a new career and will certainly help their company now and in the future.
Errata

Please Login to submit errata.

No errata are currently published