Skip to main content
  • Book
  • © 2016

Beginning Apache Pig

Big Data Processing Made Easy

Apress
  • Only Pig book that talks about Pig jobs scheduling using Oozie
  • Only Pig book that talks about how to submit Pig jobs using Hue
  • One stop shop for all Apache Pig needs

Buy it now

Buying options

eBook USD 29.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 37.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (17 chapters)

  1. Front Matter

    Pages i-xxiii
  2. MapReduce and Its Abstractions

    • Balaswamy Vaddeman
    Pages 1-20
  3. Data Types

    • Balaswamy Vaddeman
    Pages 21-31
  4. Grunt

    • Balaswamy Vaddeman
    Pages 33-40
  5. Pig Latin Fundamentals

    • Balaswamy Vaddeman
    Pages 41-67
  6. Joins and Functions

    • Balaswamy Vaddeman
    Pages 69-87
  7. HCatalog

    • Balaswamy Vaddeman
    Pages 103-113
  8. Pig Latin in Hue

    • Balaswamy Vaddeman
    Pages 115-122
  9. Pig Latin Scripts in Apache Falcon

    • Balaswamy Vaddeman
    Pages 123-136
  10. Macros

    • Balaswamy Vaddeman
    Pages 137-145
  11. User-Defined Functions

    • Balaswamy Vaddeman
    Pages 147-155
  12. Writing Eval Functions

    • Balaswamy Vaddeman
    Pages 157-169
  13. Writing Load and Store Functions

    • Balaswamy Vaddeman
    Pages 171-186
  14. Troubleshooting

    • Balaswamy Vaddeman
    Pages 187-199
  15. Data Formats

    • Balaswamy Vaddeman
    Pages 201-208
  16. Optimization

    • Balaswamy Vaddeman
    Pages 209-223
  17. Hadoop Ecosystem Tools

    • Balaswamy Vaddeman
    Pages 225-248
  18. Back Matter

    Pages 249-274

About this book

Learn to use Apache Pig to develop lightweight big data applications easily and quickly. This book shows you many optimization techniques and covers every context where Pig is used in big data analytics. Beginning Apache Pig shows you how Pig is easy to learn and requires relatively little time to develop big data applications.

The book is divided into four parts: the complete features of Apache Pig; integration with other tools; how to solve complex business problems; and optimization of tools.

You'll discover topics such as MapReduce and why it cannot meet every business need; the features of Pig Latin such as data types for each load, store, joins, groups, and ordering; how Pig workflows can be created; submitting Pig jobs using Hue; and working with Oozie. You'll also see how to extend the framework by writing UDFs and custom load, store, and filter functions. Finally you'll cover different optimization techniques such asgathering statistics about a Pig script, joining strategies, parallelism, and the role of data formats in good performance.


What You Will Learn

• Use all the features of Apache Pig
• Integrate Apache Pig with other tools
• Extend Apache Pig
• Optimize Pig Latin code
• Solve different use cases for Pig Latin

Who This Book Is For

All levels of IT professionals: architects, big data enthusiasts, engineers, developers, and big data administrators

Authors and Affiliations

  • Hyderabad, India

    Balaswamy Vaddeman

About the author

Balaswamy Vaddeman, Thinker, Blogger, Serious and Self-motivated Big data evangelist with 9 years of experience in IT and 4 years of experience in Big data space. My Big data experience covers multiple areas like delivery of analytical applications, product development, consulting, training, book reviews, hackathons and mentoring and helping people on forums. I have proved myself while delivering analytical applications in retail, banking and finance domain in 3 aspects (Development, Administration and Architecture) of Hadoop related technologies. At Startup Company, I had developed a Hadoop based product that was used for delivering of analytical applications without writing code.
 In 2013 I had won Hadoop Hackathon event for Hyderabad conducted by Cloudwick technologies. Being top contributor at stackoverflow.com, I helped many people on big data at multiple websites like stackoverflow.com and quora.com. With so much passion on big data I went ahead as independenttrainer and consultant to train hundreds of people and to set big data teams in couple of companies.


Bibliographic Information

Buy it now

Buying options

eBook USD 29.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 37.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access