Book Detail : Machine Learning with Spark

Book Title: 
Machine Learning with Spark
Resource Category: 
Publisher: 
Publication Year: 
2 015
Number of Pages: 
338
ISBN: 
978-1-78328-851-9
Language: 
English
WishList: 
yes
Available at Shelf: 
No
Description: 

Create scalable machine learning applications to power a modern data-driven business using Spark

Table of Contents (Summary): 
  1. Getting Up and Running with Spark

  2. Designing a Machine Learning System

  3. Obtaining, Processing, and Preparing Data with Spark

  4. Building a Recommendation Engine with Spark

  5. Building a Classification Model with Spark 

  6. Building a Regression Model with Spark

  7. Building a Clustering Model with Spark 

  8. Dimensionality Reduction with Spark  

  9. Advanced Text Processing with Spark 

  10. Real-time Machine Learning with Spark Streaming 

Table of Contents (Expanded): 
  1. Getting Up and Running with Spark

    • ​​Installing and setting up Spark locally  

    • Spark clusters  

    • The Spark programming model  

      • SparkContext and SparkConf 

      • The Spark shell  

      • Resilient Distributed Datasets 

        • Creating RDDs  

        • Spark operations 

        • Caching RDDs 

      • Broadcast variables and accumulators  

    • The first step to a Spark program in Scala  

    • The first step to a Spark program in Java 

    • The first step to a Spark program in Python 

    • Getting Spark running on Amazon EC2 

      • Launching an EC2 Spark cluster 

  2. Designing a Machine Learning System

    • ​​Introducing MovieStream 

    • Business use cases for a machine learning system 

      • Personalization  

      • Targeted marketing and customer segmentation 

      • Predictive modeling and analytics  

    • Types of machine learning models 

    • The components of a data-driven machine learning system  

      • Data ingestion and storage  

      • Data cleansing and transformation 

      • Model training and testing loop 

      • Model deployment and integration 

      • Model monitoring and feedback  

      • Batch versus real time  

    • An architecture for a machine learning system 

      • Practical exercise  

  3. Obtaining, Processing, and Preparing Data with Spark

    • ​​Accessing publicly available datasets  

      • The MovieLens 100k dataset  

    • Exploring and visualizing your data  

      • Exploring the user dataset 

      • Exploring the movie dataset 

      • Exploring the rating dataset 

    • Processing and transforming your data  

      • Filling in bad or missing data  

    • Extracting useful features from your data 

      • Numerical features 

      • Categorical features  

      • Derived features 

        • Transforming timestamps into categorical features 

      • Text features  

        • Simple text feature extraction  

      • Normalizing features 

        • Using MLlib for feature normalization 

      • Using packages for feature extraction 

  4. Building a Recommendation Engine with Spark

    • ​​Types of recommendation models 

      • Content-based filtering 

      • Collaborative filtering 

        • Matrix factorization 

    • Extracting the right features from your data 

      • Extracting features from the MovieLens 100k dataset 

    • Training the recommendation model 

      • Training a model on the MovieLens 100k dataset  

        • Training a model using implicit feedback data  

    • Using the recommendation model 

      • User recommendations 

        • Generating movie recommendations from the MovieLens 100k dataset 

      • Item recommendations 

        • Generating similar movies for the MovieLens 100k dataset  

    • Evaluating the performance of recommendation models 

      • Mean Squared Error 

      • Mean average precision at K 

      • Using MLlib's built-in evaluation functions 

        • RMSE and MSE 

        • MAP 

  5. Building a Classification Model with Spark 

    • ​​Types of classification models 

      • Linear models 

        • Logistic regression 

        • Linear support vector machines  

      • The naïve Bayes model 

      • Decision trees 

    • Extracting the right features from your data 

      • Extracting features from the Kaggle/StumbleUpon evergreen classification dataset 

    • Training classification models 

      • Training a classification model on the Kaggle/StumbleUpon evergreen classification dataset  

    • Using classification models 

      • Generating predictions for the Kaggle/StumbleUpon evergreen classification dataset 

    • Evaluating the performance of classification models  

      • Accuracy and prediction error 

      • Precision and recall  

      • ROC curve and AUC 

    • Improving model performance and tuning parameters  

      • Feature standardization 

      • Additional features 

      • Using the correct form of data 

      • Tuning model parameters 

        • Linear models 

        • Decision trees 

        • The naïve Bayes model 

      • Cross-validation 

  6. Building a Regression Model with Spark

    • ​​Types of regression models 

      • Least squares regression 

      • Decision trees for regression  

    • Extracting the right features from your data 

      • Extracting features from the bike sharing dataset 

        • Creating feature vectors for the linear model 

        • Creating feature vectors for the decision tree  

    • Training and using regression models 

      • Training a regression model on the bike sharing dataset  

    • Evaluating the performance of regression models 

      • Mean Squared Error and Root Mean Squared Error 

      • Mean Absolute Error 

      • Root Mean Squared Log Error 

      • The R-squared coefficient 

      • Computing performance metrics on the bike sharing dataset 

        • Linear model  

        • Decision tree  

    • Improving model performance and tuning parameters  

      • Transforming the target variable 

        • Impact of training on log-transformed targets  

      • Tuning model parameters 

        • Creating training and testing sets to evaluate parameters 

        • The impact of parameter settings for linear models 

        • The impact of parameter settings for the decision tree  

  7. Building a Clustering Model with Spark 

    • ​​Types of clustering models  

      • K-means clustering 

        • Initialization methods  

        • Variants  

      • Mixture models 

      • Hierarchical clustering  

    • Extracting the right features from your data 

      • Extracting features from the MovieLens dataset  

        • Extracting movie genre labels 

        • Training the recommendation model  

        • Normalization 

    • Training a clustering model 

      • Training a clustering model on the MovieLens dataset  

    • Making predictions using a clustering model 

      • Interpreting cluster predictions on the MovieLens dataset 

        • Interpreting the movie clusters  

    • Evaluating the performance of clustering models 

      • Internal evaluation metrics  

      • External evaluation metrics 

      • Computing performance metrics on the MovieLens dataset  

    • Tuning parameters for clustering models 

      • Selecting K through cross-validation  

  8. Dimensionality Reduction with Spark  

    • ​​Types of dimensionality reduction 

      • Principal Components Analysis  

      • Singular Value Decomposition 

      • Relationship with matrix factorization 

      • Clustering as dimensionality reduction  

    • Extracting the right features from your data 

      • Extracting features from the LFW dataset 

        • Exploring the face data 

        • Visualizing the face data  

        • Extracting facial images as vectors 

        • Normalization 

    • Training a dimensionality reduction model 

      • Running PCA on the LFW dataset 

        • Visualizing the Eigenfaces  

        • Interpreting the Eigenfaces  

    • Using a dimensionality reduction model  

      • Projecting data using PCA on the LFW dataset 

      • The relationship between PCA and SVD  

    • Evaluating dimensionality reduction models  

      • Evaluating k for SVD on the LFW dataset 

  9. Advanced Text Processing with Spark 

    • ​​What's so special about text data? 

    • Extracting the right features from your data 

      • Term weighting schemes 

      • Feature hashing 

      • Extracting the TF-IDF features from the 20 Newsgroups dataset 

        • Exploring the 20 Newsgroups data 

        • Applying basic tokenization 

        • Improving our tokenization  

        • Removing stop words 

        • Excluding terms based on frequency 

        • A note about stemming 

        • Training a TF-IDF model  

        • Analyzing the TF-IDF weightings 

    • Using a TF-IDF model 

      • Document similarity with the 20 Newsgroups dataset and TF-IDF features 

      • Training a text classifier on the 20 Newsgroups dataset using TF-IDF  

    • Evaluating the impact of text processing 

      • Comparing raw features with processed TF-IDF features on the 20 Newsgroups dataset 

    • Word2Vec models 

      • Word2Vec on the 20 Newsgroups dataset 

  10. Real-time Machine Learning with Spark Streaming 

    • ​​Online learning 

    • Stream processing 

      • An introduction to Spark Streaming 

        • Input sources 

        • Transformations 

        • Actions 

        • Window operators 

      • Caching and fault tolerance with Spark Streaming  

    • Creating a Spark Streaming application 

      • The producer application 

      • Creating a basic streaming application  

      • Streaming analytics  

      • Stateful streaming 

    • Online learning with Spark Streaming  

      • Streaming regression 

      • A simple streaming regression program 

        • Creating a streaming data producer  

        • Creating a streaming regression model 

      • Streaming K-means  

    • Online model evaluation 

      • Comparing model performance with Spark Streaming 

 

Index

3.260135
Average: 3.3 (296 votes)

Search the Web

Custom Search

Searches whole web. Use the search in the right sidebar to search only within javajee.com!!!