Skip to main content
Apress
Book cover

Practical Apache Spark

Using the Scala API

  • Book
  • © 2018

Overview

  • Contains extensive coverage of machine-learning algorithms with real-time code implementation using Spark MLib
  • Explains the SparkR real-time module with code implementation
  • Covers Spark Streaming and Spark Integration examples with other big data components such as Kafka

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (10 chapters)

Keywords

About this book

Work with Apache Spark using Scala to deploy and set up single-node, multi-node, and high-availability clusters. This book discusses various components of Spark such as Spark Core, DataFrames, Datasets and SQL, Spark Streaming, Spark MLib, and R on Spark with the help of practical code snippets for each topic. Practical Apache Spark also covers the integration of Apache Spark with Kafka with examples. You’ll follow a learn-to-do-by-yourself approach to learning – learn the concepts, practice the code snippets in Scala, and complete the assignments given to get an overall exposure. 


On completion, you’ll have knowledge of the functional programming aspects of Scala, and hands-on expertise in various Spark components. You’ll also become familiar with machine learning algorithms with real-time usage.

What You Will Learn
  • Discover the functional programming features of Scala
  • Understand the completearchitecture of Spark and its components
  • Integrate Apache Spark with Hive and Kafka 
  • Use Spark SQL, DataFrames, and Datasets to process data using traditional SQL queries
  • Work with different machine learning concepts and libraries using Spark's MLlib packages



Who This Book Is For


Developers and professionals who deal with batch and stream data processing. 



Authors and Affiliations

  • Bangalore, India

    Subhashini Chellappan

  • Krishnagiri, India

    Dharanitharan Ganesan

About the authors

Subhashini Chellappan is a technology enthusiast with expertise in the big data and cloud space. She has rich experience in both academia and the software industry. Her areas of interest and expertise are centered on business intelligence, big data analytics and cloud computing.

Dharanitharan Ganesan is a senior analyst with five years of experience in IT. He has a high level of exposure and experience in big data – Apache Hadoop, Apache Spark and various Hadoop ecosystem components. He has a proven track record of improving efficiency and productivity through the automation of various routine and administrative functions in business intelligence and big data technologies. His areas of interest and expertise are centered on machine learning algorithms, statistical modelling and predictive analysis.



Bibliographic Information

Publish with us