Data Science with Spark
ERROR
00:00
00:00

VIDEO TUTORIAL Data Science with Spark

Packt
109,00€

Unlimited download & streaming

Satisfied or refunded

100% secure payment

Get started with Spark for data science using this unique video tutorial

About This Data Science with Spark video course

  • Explore various facets of data science with Spark using this example-rich video
  • Learn how to tell a compelling story in data science using Spark's eco-system
  • Get up and running with Apache Spark and clean, analyze, and visualize data with ease

Apache Spart video course In Detail

The real power and value proposition of Apache Spark is its speed and platform to execute Data Science tasks. Spark's unique use case is that it combines ETL, batch analytic, real-time stream analysis, machine learning, graph processing, and visualizations to allow Data Scientists to tackle the complexities that come with raw unstructured data sets. Spark embraces this approach and has the vision to make the transition from working on a single machine to working on a cluster, something that makes data science tasks a lot more agile.

In this video course, you’ll get a hands-on technical resource that will enable you to become comfortable and confident working with Spark for Data Science. We won't just explore Spark’s Data Science libraries, we’ll dive deeper and expand on the topics.

This video training starts by taking you through Spark and the needed steps to build machine learning applications. You will learn to collect, clean, and visualize data coming from Twitter with Spark streaming. Then, you will get acquainted with Spark Machine learning algorithms and different machine learning techniques. You will also learn to apply statistical analysis and mining operations on our Tweet dataset. Finally, the course will end by giving you some ideas on how to perform awesome analysis including graph processing.

By the end of the course, you will be able to do your Data scientist job in a very visual way, comprehensive and appealing for business and other stakeholders.

What will you learn in this course?

Course plan
Chapter 1
Your Spark and Visualization Toolkit
Chapter 2
First Steps with Spark Visualization
Chapter 3
The Spark Machine Learning Algorithms
Chapter 4
Collecting and Cleansing the Dirty Tweets
Chapter 5
Statistical Analysis on Tweets
Chapter 6
Extracting Features from the Tweets
Chapter 7
Mine Data and Share Results

Detailed course plan

Chapter 1 : Your Spark and Visualization Toolkit
16m25s
 
Lesson 1The Course Overview
Lesson 2Spark: Origins and Ecosystem for Big Data Scientists, the Scala, Python, and R flavors
Lesson 3Install Spark on Your Laptop with Docker, or Scale Fast in the Cloud
Lesson 4Apache Zeppelin, a Web-Based Notebook for Spark with matplotlib and ggplot2
Chapter 2 : First Steps with Spark Visualization
25m31s
 
Lesson 1Manipulating Data with the Core RDD API
Lesson 2Using Dataframe, Dataset, and SQL – Natural and Easy!
Lesson 3Manipulating Rows and Columns
Lesson 4Dealing with File Format
Lesson 5Visualizing More – ggplot2, matplotlib, and Angular.js at the Rescue
Chapter 3 : The Spark Machine Learning Algorithms
31m12s
 
Lesson 1Discovering spark.ml and spark.mllib - and Other Libraries
Lesson 2Wrapping Up Basic Statistics and Linear Algebra
Lesson 3Cleansing Data and Engineering the Features
Lesson 4Reducing the Dimensionality
Lesson 5Pipeline for a Life
Chapter 4 : Collecting and Cleansing the Dirty Tweets
19m11s
 
Lesson 1Streaming Tweets to Disk
Lesson 2Streaming Tweets on a Map
Lesson 3Cleansing and Building Your Reference Dataset
Lesson 4Querying and Visualizing Tweets with SQL
Chapter 5 : Statistical Analysis on Tweets
19m12s
 
Lesson 1Indicators, Correlations, and Sampling
Lesson 2Validating Statistical Relevance
Lesson 3Running SVD and PCA
Lesson 4Extending the Basic Statistics for Your Needs
Chapter 6 : Extracting Features from the Tweets
19m21s
 
Lesson 1Analyzing Free Text from the Tweets
Lesson 2Dealing with Stemming, Syntax, Idioms and Hashtags
Lesson 3Detecting Tweet Sentiment
Lesson 4Identifying Topics with LDA
Chapter 7 : Mine Data and Share Results
18m39s
 
Lesson 1Word Cloudify Your Dataset
Lesson 2Locating Users and Displaying Heatmaps with GeoHash
Lesson 3Collaborating on the Same Note with Peers
Lesson 4Create Visual Dashboards for Your Business Stakeholders
Chapter 8 : Classifying the Tweets
22m11s
 
Lesson 1Building the Training and Test Datasets
Lesson 2Training a Logistic Regression Model
Lesson 3Evaluating Your Classifier
Lesson 4Selecting Your Model
Chapter 9 : Clustering Users
10m31s
 
Lesson 1Clustering Users by Followers and Friends
Lesson 2Clustering Users by Location
Lesson 3Running KMeans on a Stream
Chapter 10 : Your Next Data Challenges
17m54s
 
Lesson 1Recommending Similar Users
Lesson 2Analyzing Mentions with GraphX
Lesson 3Where to Go from Here

Your questions about the course

What is the required level to follow this tutorial ?

intermediate

Wait ! 🤗

Access more than 19 free tutorials

Our data protection policy