Skip to main content
  • Book
  • © 2019

PySpark SQL Recipes

With HiveQL, Dataframe and Graphframes

Apress
  • Explains PySpark SQL and Dataframe in detail
  • Include IO operation using PySpark SQL from most frequently used SQL and NoSQL databases
  • Detail discussion on Data Preprocessing using PySpark SQL
  • Problem Solution approach to graph bases algorithm using Graphframes

Buy it now

Buying options

eBook USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (9 chapters)

  1. Front Matter

    Pages i-xxiv
  2. Introduction to PySpark SQL

    • Raju Kumar Mishra, Sundar Rajan Raman
    Pages 1-22
  3. Installation

    • Raju Kumar Mishra, Sundar Rajan Raman
    Pages 23-64
  4. IO in PySpark SQL

    • Raju Kumar Mishra, Sundar Rajan Raman
    Pages 65-100
  5. Operations on PySpark SQL DataFrames

    • Raju Kumar Mishra, Sundar Rajan Raman
    Pages 101-166
  6. Data Merging and Data Aggregation Using PySparkSQL

    • Raju Kumar Mishra, Sundar Rajan Raman
    Pages 167-206
  7. SQL, NoSQL, and PySparkSQL

    • Raju Kumar Mishra, Sundar Rajan Raman
    Pages 207-248
  8. Optimizing PySpark SQL

    • Raju Kumar Mishra, Sundar Rajan Raman
    Pages 249-274
  9. Structured Streaming

    • Raju Kumar Mishra, Sundar Rajan Raman
    Pages 275-295
  10. GraphFrames

    • Raju Kumar Mishra, Sundar Rajan Raman
    Pages 297-315
  11. Back Matter

    Pages 317-323

About this book

Carry out data analysis with PySpark SQL, graphframes, and graph data processing using a problem-solution approach. This book provides solutions to problems related to dataframes, data manipulation summarization, and exploratory analysis. You will improve your skills in graph data analysis using graphframes and see how to optimize your PySpark SQL code.


PySpark SQL Recipes starts with recipes on creating dataframes from different types of data source, data aggregation and summarization, and exploratory data analysis using PySpark SQL. You’ll also discover how to solve problems in graph analysis using graphframes.


On completing this book, you’ll have ready-made code for all your PySpark SQL tasks, including creating dataframes using data from different file formats as well as from SQL or NoSQL databases.


What You Will Learn


  • Understand PySpark SQL and its advanced features
  • Use SQL and HiveQL with PySpark SQL
  • Work with structured streaming
  • Optimize PySpark SQL 
  • Master graphframes and graph processing



Who This Book Is For
Data scientists, Python programmers, and SQL programmers.









Authors and Affiliations

  • Bangalore, India

    Raju Kumar Mishra

  • Chennai, India

    Sundar Rajan Raman

About the authors

Raju Kumar Mishra has strong interests in data science and systems that have the capability of handling large amounts of data and operating complex mathematical models through computational programming. He was inspired to pursue an M. Tech in computational sciences from Indian Institute of Science in Bangalore, India. Raju primarily works in the areas of data science and its different applications. Working as a corporate trainer he has developed unique insights that help him in teaching and explaining complex ideas with ease. Raju is also a data science consultant solving complex industrial problems. He works on programming tools such as R, Python, scikit-learn, Statsmodels, Hadoop, Hive, Pig, Spark, and many others. His venture Walsoul Private Ltd provides training in data science, programming, and big data.

Sundar Rajan Raman is an artificial intelligence practitioner currently working at Bank of America. He holds a Bachelor of Technology degree from the National Institute of Technology, India. Being a seasoned Java and J2EE programmer he has worked on critical applications for companies such as AT&T, Singtel, and Deutsche Bank. He is also a seasoned big data architect. His current focus is on artificial intelligence space including machine learning and deep learning.



Bibliographic Information

Buy it now

Buying options

eBook USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access