Name: Beginning Apache Spark Using Azure Databricks
ISBN: 978-1-4842-5781-4

Authors:

Robert Ilijason ⁰

Robert Ilijason
1. Viken, Sweden
View author publications

You can also search for this author in PubMed Google Scholar

Teaches you how to extract value from massive datasets, using a toolset that can be up and running the same day
Shows you why Azure Databricks is an up-and-coming, fast-growing tool that anyone in data should know about
Aimed at data analysts and business analysts who are curious about the hype surrounding cloud technology and Apache Spark

17k Accesses
5 Citations

Buy it now

eBook USD 34.99

Price excludes VAT (USA)

Softcover Book USD 44.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Learn about institutional subscriptions

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (11 chapters)

Front Matter

Pages i-xvii

PDF
Introduction to Large-Scale Data Analytics
- Robert Ilijason
Pages 1-14
Spark and Databricks
- Robert Ilijason
Pages 15-25
Getting Started with Databricks
- Robert Ilijason
Pages 27-38
Workspaces, Clusters, and Notebooks
- Robert Ilijason
Pages 39-49
Getting Data into Databricks
- Robert Ilijason
Pages 51-73
Querying Data Using SQL
- Robert Ilijason
Pages 75-102
The Power of Python
- Robert Ilijason
Pages 103-137
ETL and Advanced Data Wrangling
- Robert Ilijason
Pages 139-175
Connecting to and from Databricks
- Robert Ilijason
Pages 177-199
Running in Production
- Robert Ilijason
Pages 201-226
Bits and Pieces
- Robert Ilijason
Pages 227-267
Back Matter

Pages 269-274

PDF

About this book

Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster.

This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know anything aboutconfiguring hardware or software. By removing the need for expensive experts and hardware, your resources can instead be allocated to actually finding business value in the data.

This book guides you through some advanced topics such as analytics in the cloud, data lakes, data ingestion, architecture, machine learning, and tools, including Apache Spark, Apache Hadoop, Apache Hive, Python, and SQL. Valuable exercises help reinforce what you have learned.

What You Will Learn

Discover the value of big data analytics that leverage the power of the cloud
Get started with Databricks using SQL and Python in either Microsoft Azure or AWS
Understand the underlying technology, and how the cloud and Apache Spark fit into the bigger picture
See how these tools are used in the real world
Run basic analytics, including machine learning, on billions of rows at a fraction of a cost or free

Who This Book Is For

Data engineers, data scientists, and cloud architects who want or need to run advanced analytics in the cloud. It is assumed that the reader has data experience, but perhaps minimal exposure to Apache Spark and Azure Databricks. The book is also recommended for people who want to get started in the analytics field, as it provides a strong foundation.

Keywords

Authors and Affiliations

Viken, Sweden

Robert Ilijason

About the author

Robert Ilijason is a 20-year veteran in the business intelligence (BI) segment. He has worked as a contractor for some of Europe’s biggest companies and has conducted large-scale analytics projects within the areas of retail, telecom, banking, government, and more. He has seen his share of analytic trends come and go over the years, but unlike most of them, he strongly believes that Apache Spark in the cloud, especially with Azure Databricks, is a game changer.

Bibliographic Information

Book Title: Beginning Apache Spark Using Azure Databricks
Book Subtitle: Unleashing Large Cluster Analytics in the Cloud
Authors: Robert Ilijason
DOI: https://doi.org/10.1007/978-1-4842-5781-4
Publisher: Apress Berkeley, CA
eBook Packages: Business and Management, Apress Access Books, Business and Management (R0)
Softcover ISBN: 978-1-4842-5780-7Published: 12 June 2020
eBook ISBN: 978-1-4842-5781-4Published: 11 June 2020
Edition Number: 1
Number of Pages: XVII, 274
Number of Illustrations: 14 b/w illustrations
Topics: Big Data/Analytics, Microsoft and .NET, Open Source

Publish with us

Policies and ethics

Authors:

Sections

Buy it now

Buying options

Other ways to access

Table of contents (11 chapters)

Front Matter

Back Matter

About this book

Keywords

Authors and Affiliations

Viken, Sweden

About the author

Bibliographic Information

Publish with us

Buy it now

Buying options

Other ways to access

Search

Navigation