Name: Practical Hive
ISBN: 978-1-4842-0271-5

Authors:

Scott Shaw ⁰,
Andreas François Vermeulen ¹,
Ankur Gupta ²,
…
David Kjerrumgaard ³

Scott Shaw
1. Saint Louis, USA
View author publications

You can also search for this author in PubMed Google Scholar
Andreas François Vermeulen
1. West Kilbride North Ayrshire, United Kingdom
View author publications

You can also search for this author in PubMed Google Scholar
Ankur Gupta
1. Uxbridge, United Kingdom
View author publications

You can also search for this author in PubMed Google Scholar
David Kjerrumgaard
1. Henderson, USA
View author publications

You can also search for this author in PubMed Google Scholar

Comprehensive guide to using Hive in the real world
Written by acknowledged experts in big data and Hive
Covers all aspects of Hive, from getting started to advanced techniques such as performance tuning and security

25k Accesses
10 Citations
3 Altmetric

Buy it now

eBook USD 44.99

Price excludes VAT (USA)

Softcover Book USD 59.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Learn about institutional subscriptions

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (13 chapters)

Front Matter

Pages i-xxi

PDF
Setting the Stage for Hive: Hadoop
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 1-22
Introducing Hive
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 23-35
Hive Architecture
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 37-48
Hive Tables DDL
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 49-76
Data Manipulation Language (DML)
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 77-98
Loading Data into Hive
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 99-114
Querying Semi-Structured Data
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 115-131
Hive Analytics
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 133-217
Performance Tuning: Hive
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 219-232
Hive Security
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 233-243
The Future of Hive
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 245-247
Building a Big Data Team
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 249-252
Hive Functions
- Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Pages 253-262
Back Matter

Pages 263-265

PDF

About this book

Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas Francois Vermeulen take you through learning HiveQL, the SQL-like language specific to Hive, to analyze, export, and massage the data stored across your Hadoop environment. From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical Hive gives you a detailed treatment of the software.

In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data.

What You Will Learn

Install and configure Hive for new and existing datasets
Perform DDL operations
Execute efficient DML operations
Use tables, partitions, buckets, and user-defined functions
Discover performance tuning tips and Hive best practices

Who This Book Is For

Developers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL.

Keywords

Authors and Affiliations

Saint Louis, USA

Scott Shaw
West Kilbride North Ayrshire, United Kingdom

Andreas François Vermeulen
Uxbridge, United Kingdom

Ankur Gupta
Henderson, USA

David Kjerrumgaard

About the authors

Scott Shaw has over fifteen years of data management experience. He has worked as both an Oracle and SQL Server DBA. He has worked as a consultant on Microsoft business intelligence projects utilizing both Tabular and OLAP models and co-authored two T-SQL books by Apress. Scott also enjoys speaking across the country about distributed computing, Big Data concepts, business intelligence, Hive, and the value of Hadoop. Scott works as a Sr. Solutions Engineer for Hortonworks and lives in Saint Louis with his wife and two kids.

Andreas Francois Vermeulen is Consulting Manager of Business Intelligence, Big Data, Data Science, and Computational Analytics at Sopra-Steria, doctoral researcher at University of Dundee and St Andrews on future concepts in massive distributed computing, mechatronics, big data, business intelligence, and deep learning. He owns and incubates the "Rapid Information Factory" data processing framework. Active in developing next-generation processing frameworks and mechatronics engineering with over thirty-five years of international experience in data processing, software development and system architecture. Andre is a data scientist, doctoral trainer, corporate consultant, principal systems architect, and speaker/author/columnist on data science, distributed computing, big data, business intelligence, and deep learning. Andre took his bachelor's at the North West University at Potchefstroom, his Master of Business Administration at the University of Manchester, Master of Business Intelligence and Data Science at University of Dundee, and Doctor of Philosophy at the University of Dundee and St Andrews.

Ankur Gupta is a Senior Solutions Engineer at Hortonworks. He has over fourteen years of experience in data management, working as a Data Architect and Oracle DBA. Before joining the world of big data, he was working as an Oracle Consultant for Investment Banks in the UK. He is a regular speaker on big data concepts, Hive, Hadoop, Oracle in various events and is an author of Oracle Goldengate 11g Complete Cookbook. Ankur has a Masters’ degree in Computer Science & International Business. He is a Hadoop Certified Administrator & Oracle Certified Professional and lives in London with his wife.

David Kjerrumgaard is a systems architect at Hortonworks. He has 20 years of experience in software development and is a Certified Developer for Apache Hadoop (CCDH). Kjerrumgaard is the author of Data Governance with Apache Falcon and Cloudera Developer Training for Apache Hadoop. He took his BS and MS in Computer Science from Kent State University.

Bibliographic Information

Book Title: Practical Hive
Book Subtitle: A Guide to Hadoop's Data Warehouse System
Authors: Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
DOI: https://doi.org/10.1007/978-1-4842-0271-5
Publisher: Apress Berkeley, CA
eBook Packages: Professional and Applied Computing, Apress Access Books, Professional and Applied Computing (R0)
Copyright Information: Scott Shaw, Andreas Francois Vermeulen, Ankur Gupta, David Kjerrumgaard 2016
Softcover ISBN: 978-1-4842-0272-2Published: 28 August 2016
eBook ISBN: 978-1-4842-0271-5Published: 27 August 2016
Edition Number: 1
Number of Pages: XXI, 265
Number of Illustrations: 12 b/w illustrations, 73 illustrations in colour
Topics: Big Data, Computer Science, general, Data Storage Representation, Systems and Data Security, Data Structures, Database Management

Publish with us

Policies and ethics

Authors:

Sections

Buy it now

Buying options

Other ways to access

Table of contents (13 chapters)

Front Matter

Back Matter

About this book

Keywords

Authors and Affiliations

Saint Louis, USA

West Kilbride North Ayrshire, United Kingdom

Uxbridge, United Kingdom

Henderson, USA

About the authors

Bibliographic Information

Publish with us

Buy it now

Buying options

Other ways to access

Search

Navigation