Overview

Authors:

Subhashini Chellappan ⁰,
Dharanitharan Ganesan ¹

Subhashini Chellappan
1. Bangalore, India
View author publications

You can also search for this author in PubMed Google Scholar
Dharanitharan Ganesan
1. Krishnagiri, India
View author publications

You can also search for this author in PubMed Google Scholar

Contains extensive coverage of machine-learning algorithms with real-time code implementation using Spark MLib
Explains the SparkR real-time module with code implementation
Covers Spark Streaming and Spark Integration examples with other big data components such as Kafka

14k Accesses
4 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 44.99

Price excludes VAT (USA)

Softcover Book USD 59.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (10 chapters)

Front Matter

Pages i-xvi

Download chapter PDF
Scala: Functional Programming Aspects
- Subhashini Chellappan, Dharanitharan Ganesan
Pages 1-37
Single and Multinode Cluster Setup
- Subhashini Chellappan, Dharanitharan Ganesan
Pages 39-77
Introduction to Apache Spark and Spark Core
- Subhashini Chellappan, Dharanitharan Ganesan
Pages 79-113
Spark SQL, DataFrames, and Datasets
- Subhashini Chellappan, Dharanitharan Ganesan
Pages 115-139
Introduction to Spark Streaming
- Subhashini Chellappan, Dharanitharan Ganesan
Pages 141-156
Spark Structured Streaming
- Subhashini Chellappan, Dharanitharan Ganesan
Pages 157-174
Spark Streaming with Kafka
- Subhashini Chellappan, Dharanitharan Ganesan
Pages 175-187
Spark Machine Learning Library
- Subhashini Chellappan, Dharanitharan Ganesan
Pages 189-236
Working with SparkR
- Subhashini Chellappan, Dharanitharan Ganesan
Pages 237-260
Spark Real-Time Use Case
- Subhashini Chellappan, Dharanitharan Ganesan
Pages 261-273
Back Matter

Pages 275-280

Download chapter PDF

Keywords

About this book

Work with Apache Spark using Scala to deploy and set up single-node, multi-node, and high-availability clusters. This book discusses various components of Spark such as Spark Core, DataFrames, Datasets and SQL, Spark Streaming, Spark MLib, and R on Spark with the help of practical code snippets for each topic. Practical Apache Spark also covers the integration of Apache Spark with Kafka with examples. You’ll follow a learn-to-do-by-yourself approach to learning – learn the concepts, practice the code snippets in Scala, and complete the assignments given to get an overall exposure.

On completion, you’ll have knowledge of the functional programming aspects of Scala, and hands-on expertise in various Spark components. You’ll also become familiar with machine learning algorithms with real-time usage.

What You Will Learn

Discover the functional programming features of Scala
Understand the completearchitecture of Spark and its components
Integrate Apache Spark with Hive and Kafka
Use Spark SQL, DataFrames, and Datasets to process data using traditional SQL queries
Work with different machine learning concepts and libraries using Spark's MLlib packages

Who This Book Is For

Developers and professionals who deal with batch and stream data processing.

Authors and Affiliations

Bangalore, India

Subhashini Chellappan
Krishnagiri, India

Dharanitharan Ganesan

About the authors

Subhashini Chellappan is a technology enthusiast with expertise in the big data and cloud space. She has rich experience in both academia and the software industry. Her areas of interest and expertise are centered on business intelligence, big data analytics and cloud computing.

Dharanitharan Ganesan is a senior analyst with five years of experience in IT. He has a high level of exposure and experience in big data – Apache Hadoop, Apache Spark and various Hadoop ecosystem components. He has a proven track record of improving efficiency and productivity through the automation of various routine and administrative functions in business intelligence and big data technologies. His areas of interest and expertise are centered on machine learning algorithms, statistical modelling and predictive analysis.

Bibliographic Information

Book Title: Practical Apache Spark
Book Subtitle: Using the Scala API
Authors: Subhashini Chellappan, Dharanitharan Ganesan
DOI: https://doi.org/10.1007/978-1-4842-3652-9
Publisher: Apress Berkeley, CA
eBook Packages: Professional and Applied Computing, Apress Access Books, Professional and Applied Computing (R0)
Copyright Information: Subhashini Chellappan, Dharanitharan Ganesan 2018
Softcover ISBN: 978-1-4842-3651-2Published: 13 December 2018
eBook ISBN: 978-1-4842-3652-9Published: 12 December 2018
Edition Number: 1
Number of Pages: XVI, 280
Number of Illustrations: 303 b/w illustrations
Topics: Big Data, Open Source, Programming Languages, Compilers, Interpreters

Publish with us

Policies and ethics

Overview

Access this book

Other ways to access

Table of contents (10 chapters)

Front Matter

Back Matter

Keywords

About this book

Authors and Affiliations

Bangalore, India

Krishnagiri, India

About the authors

Bibliographic Information

Publish with us

Search

Navigation