University of OXford & University of camBRiDGE
...is a collective term for characteristics that the two institutions share.
PySpark for Beginners
Published by: Packt
Published: Jun 2018
Scale: 13x50
Lecture Date: 10/2018
Lecture Link: https://www.safaribooksonline.com/ https://www.packtpub.com
| Build data-intensive applications locally and deploy at scale
| using the combined powers of Python and Spark 2.0
| About This Video
Learn why and how you can efficiently use Python to process
| data and build machine learning models in Apache Spark 2.0.
Develop and deploy efficient, scalable real-time Spark
| solutions.
Take your understanding of using Spark with Python to the next
| level with this jump start guide.
| In Detail
| Apache Spark is an open source framework for efficient cluster
| computing with a strong interface for data parallelism and fault
| tolerance. This course will show you how to leverage the power of
| Python and put it to use in the Spark ecosystem. You will start by
| getting a firm understanding of the Spark 2.0 architecture and how
| to set up a Python environment for Spark. You will get familiar
| with the modules available in PySpark. You will learn how to
| abstract data with RDDs and DataFrames and understand the
| streaming capabilities of PySpark. Also, you will get a thorough
| overview of machine learning capabilities of PySpark using ML and
| MLlib, graph processing using GraphFrames, and polyglot
| persistence using Blaze. Finally, you will learn how to deploy
| your applications to the cloud using the spark-submit command. By
| the end of this course, you will have established a firm
| understanding of the Spark Python API and how it can be used to
| build data-intensive applications.
| All the code and supporting files for this course are available on
| Github at https://github.com/PacktPublishing/PySpark-for-Beginners