Book Detail : Fast Data Processing with Spark, 2nd Edition

Book Title: 
Fast Data Processing with Spark, 2nd Edition
Resource Category: 
Publisher: 
Publication Year: 
2 015
Number of Pages: 
184
ISBN: 
978-1-78439-257-4
Language: 
English
Edition: 
Second
WishList: 
yes
Available at Shelf: 
No
Description: 

Perform real-time analytics using Spark in a fast, distributed, and scalable way

Table of Contents (Summary): 
  1. Installing Spark and Setting up your Cluster 

  2. Using the Spark Shell 

  3. Building and Running a Spark Application 

  4. Creating a SparkContext 

  5. Loading and Saving Data in Spark

  6. Manipulating your RDD 

  7. Spark SQL 

  8. Spark with Big Data 

  9. Machine Learning Using Spark MLlib 

  10. Testing 

  11. Tips and Tricks 

Table of Contents (Expanded): 
  1. Installing Spark and Setting up your Cluster 

    • Directory organization and convention 

    • Installing prebuilt distribution 

    • Building Spark from source 

      • Downloading the source  

      • Compiling the source with Maven  

      • Compilation switches

      • Testing the installation  

    • Spark topology 

    • A single machine

    • Running Spark on EC2  

      • Running Spark on EC2 with the scripts

      • Deploying Spark on Elastic MapReduce

    • Deploying Spark with Chef (Opscode)  

    • Deploying Spark on Mesos  

    • Spark on YARN 

    • Spark Standalone mode

  2. Using the Spark Shell 

    • Loading a simple text file  

    • Using the Spark shell to run logistic regression

    • Interactively loading data from S3

      • Running Spark shell in Python

  3. Building and Running a Spark Application 

    • Building your Spark project with sbt 

    • Building your Spark job with Maven 

    • Building your Spark job with something else 

  4. Creating a SparkContext 

    • Scala 

    • Java

    • SparkContext – metadata 

    • Shared Java and Scala APIs

    • Python 

  5. Loading and Saving Data in Spark

    • RDDs 

    • Loading data into an RDD 

    • Saving your data 

  6. Manipulating your RDD 

    • anipulating your RDD in Scala and Java

      • Scala RDD functions 

      • Functions for joining PairRDDs

      • Other PairRDD functions 

      • Double RDD functions  

      • General RDD functions

      • Java RDD functions 

        • Spark Java function classes 

        • Common Java RDD functions 

        • Methods for combining JavaRDDs

        • Functions on JavaPairRDDs

    • Manipulating your RDD in Python

      • Standard RDD functions  

      • PairRDD functions

  7. Spark SQL 

    • The Spark SQL architecture 

      • Spark SQL how-to in a nutshell  

      • Spark SQL programming 

        • SQL access to a simple data table 

        • Handling multiple tables with Spark SQL  

        • Aftermath 

  8. Spark with Big Data 

    • Parquet – an efficient and interoperable big data format

      • Saving files to the Parquet format  

      • Loading Parquet files

      • Saving processed RDD in the Parquet format

    • Querying Parquet files with Impala 

    • HBase

      • Loading from HBase

      • Saving to HBase 

      • Other HBase operations

  9. Machine Learning Using Spark MLlib 

    • The Spark machine learning algorithm table  

    • Spark MLlib examples 

      • Basic statistics 

      • Linear regression

      • Classification  

      • Clustering 

      • Recommendation

  10. Testing 

    • Testing in Java and Scala 

      • Making your code testable 

      • Testing interactions with SparkContext  

    • Testing in Python 

  11. Tips and Tricks 

    • Where to find logs 

    • Concurrency limitations

      • Memory usage and garbage collection  

      • Serialization 

      • IDE integration

    • Using Spark with other languages

    • A quick note on security 

    • Community developed packages 

    • Mailing lists 

 

Index

2.9937
Average: 3 (238 votes)

Search the Web

Custom Search

Searches whole web. Use the search in the right sidebar to search only within javajee.com!!!