Practical Apache Spark

Using the Scala API

Authors: Chellappan, Subhashini, Ganesan, Dharanitharan

Free Preview
  • Contains extensive coverage of machine-learning algorithms with real-time code implementation using Spark MLib
  • Explains the SparkR real-time module with code implementation
  • Covers Spark Streaming and Spark Integration examples with other big data components such as Kafka
see more benefits

Buy this book

eBook 29,99 €
price for China (P.R.) (gross)
  • ISBN 978-1-4842-3652-9
  • Digitally watermarked, DRM-free
  • Included format: EPUB, PDF
  • ebooks can be used on all reading devices
  • Immediate eBook download after purchase
Softcover 37,99 €
price for China (P.R.) (gross)
  • ISBN 978-1-4842-3651-2
  • Free shipping for individuals worldwide
  • Usually dispatched within 3 to 5 business days.
About this book

Work with Apache Spark using Scala to deploy and set up single-node, multi-node, and high-availability clusters. This book discusses various components of Spark such as Spark Core, DataFrames, Datasets and SQL, Spark Streaming, Spark MLib, and R on Spark with the help of practical code snippets for each topic. Practical Apache Spark also covers the integration of Apache Spark with Kafka with examples. You’ll follow a learn-to-do-by-yourself approach to learning – learn the concepts, practice the code snippets in Scala, and complete the assignments given to get an overall exposure. 
On completion, you’ll have knowledge of the functional programming aspects of Scala, and hands-on expertise in various Spark components. You’ll also become familiar with machine learning algorithms with real-time usage.
What You Will Learn

  • Discover the functional programming features of Scala
  • Understand the complete architecture of Spark and its components
  • Integrate Apache Spark with Hive and Kafka 
  • Use Spark SQL, DataFrames, and Datasets to process data using traditional SQL queries
  • Work with different machine learning concepts and libraries using Spark's MLlib packages

Who This Book Is For
Developers and professionals who deal with batch and stream data processing. 

About the authors

Subhashini Chellappan is an associate manager and technology enthusiast. She has rich experience in both academia and the software industry. She has published two books: Big Data Analytics and Pro Tableau. Her areas of interest and expertise are centered on business intelligence, big data analytics and cloud computing.


Bharath Kumar Dasa is a technology lead, with expertise in the big data space having core expertise in the complete Hadoop stack. Had worked on HDP distribution and has architected multiple data management and data life cycle auto service management projects for financial institutions. He has been working in machine learning and integration of machine learning with big data technologies for the past few years. His areas of interest and expertise are centered on big data and analytics, machine learning, data visualization and deep learning.

 

Dharanitharan Ganesan is a senior analyst with five years of experience in IT. He has a high level of exposure and experience in big data – Apache Hadoop, Apache Spark and various Hadoop ecosystem components. He has a proven track record of improving efficiency and productivity through the automation of various routine and administrative functions in business intelligence and big data technologies. His areas of interest and expertise are centered on machine learning algorithms, statistical modelling and predictive analysis.



Table of contents (10 chapters)

Table of contents (10 chapters)

Buy this book

eBook 29,99 €
price for China (P.R.) (gross)
  • ISBN 978-1-4842-3652-9
  • Digitally watermarked, DRM-free
  • Included format: EPUB, PDF
  • ebooks can be used on all reading devices
  • Immediate eBook download after purchase
Softcover 37,99 €
price for China (P.R.) (gross)
  • ISBN 978-1-4842-3651-2
  • Free shipping for individuals worldwide
  • Usually dispatched within 3 to 5 business days.

Services for this book

Loading...

Bibliographic Information

Bibliographic Information
Book Title
Practical Apache Spark
Book Subtitle
Using the Scala API
Authors
Copyright
2018
Publisher
Apress
Copyright Holder
Subhashini Chellappan, Dharanitharan Ganesan
Distribution Rights
Apress Standard
eBook ISBN
978-1-4842-3652-9
DOI
10.1007/978-1-4842-3652-9
Softcover ISBN
978-1-4842-3651-2
Edition Number
1
Number of Pages
XVI, 280
Number of Illustrations
303 b/w illustrations
Topics