Book Detail : Hadoop MapReduce v2 Cookbook

Book Title: 
Hadoop MapReduce v2 Cookbook
Resource Category: 
Publication Year: 
2 015
Number of Pages: 
322
ISBN: 
978-1-78328-547-1
Language: 
English
Edition: 
2
WishList: 
yes
Available at Shelf: 
No
Description: 

An approach to Process large and complex datasets using next generation Hadoop and to solve the problems.

 

 

Table of Contents (Summary): 
  1. Getting Started with Hadoop v2

  2. Cloud Deployments – Using Hadoop YARN on Cloud Environments

  3. Hadoop Essentials – Configurations, Unit Tests, and Other APIs 

  4. Developing Complex Hadoop MapReduce Applications 

  5. Analytics

  6. Hadoop Ecosystem – Apache Hive

  7. Hadoop Ecosystem II – Pig, HBase, Mahout, and Sqoop

  8. Searching and Indexing

  9. Classifications, Recommendations, and Finding Relationships 

  10. Mass Text Data Processing

 

Index 

Table of Contents (Expanded): 
  1. Getting Started with Hadoop v2

    • Introduction

      • Hadoop Distributed File System – HDFS

      • Hadoop YARN

      • Hadoop MapReduce

      • Hadoop installation modes

    • Setting up Hadoop v2 on your local machine

      • Getting ready

      • How to do it…

      • How it works…

    • Writing a WordCount MapReduce application, bundling it, and running it using the Hadoop local mode

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

      • See also

    • Adding a combiner step to the WordCount MapReduce program

      • How to do it…

      • How it works…

      • There’s more…

    • Setting up HDFS

      • Getting ready

      • How to do it…

      • See also

    • Setting up Hadoop YARN in a distributed cluster environment using Hadoop v2

      • Getting ready

      • How to do it…

      • How it works…

      • See also

    • Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution

      • Getting ready

      • How to do it…

      • There’s more…

    • HDFS command-line file operations

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

    • Running the WordCount program in a distributed cluster environment

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

    • Benchmarking HDFS using DFSIO

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

    • Benchmarking Hadoop MapReduce using TeraSort

      • Getting ready

      • How to do it…

      • How it works…

  2. Cloud Deployments – Using Hadoop YARN on Cloud Environments

    • Introduction

    • Running Hadoop MapReduce v2 computations using Amazon Elastic MapReduce

      • Getting ready

      • How to do it…

      • See also

    • Saving money using Amazon EC2 Spot Instances to execute EMR job flows

      • How to do it…

      • There’s more…

      • See also

    • Executing a Pig script using EMR

      • How to do it…

      • There’s more…

        • Starting a Pig interactive session

    • Executing a Hive script using EMR

      • How to do it…

      • There’s more…

        • Starting a Hive interactive session

      • See also

    • Creating an Amazon EMR job flow using the AWS Command Line Interface

      • Getting ready

      • How to do it…

      • There’s more…

      • See also

    • Deploying an Apache HBase cluster on Amazon EC2 using EMR

      • Getting ready

      • How to do it…

      • See also

    • Using EMR bootstrap actions to configure VMs for the Amazon EMR jobs

      • How to do it…

      • There’s more…

    • Using Apache Whirr to deploy an Apache Hadoop cluster in a cloud environment

      • How to do it…

      • How it works…

      • See also

  3. Hadoop Essentials – Configurations, Unit Tests, and Other APIs

    • Introduction

    • Optimizing Hadoop YARN and MapReduce configurations for cluster deployments

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

    • Shared user Hadoop clusters – using Fair and Capacity schedulers

      • How to do it…

      • How it works…

      • There’s more…

    • Setting classpath precedence to user-provided JARs

      • How to do it…

      • How it works…

    • Speculative execution of straggling tasks

      • How to do it…

      • There’s more…

    • Unit testing Hadoop MapReduce applications using MRUnit

      • Getting ready

      • How to do it…

      • See also

    • Integration testing Hadoop MapReduce applications using MiniYarnCluster

      • Getting ready

      • How to do it…

      • See also

    • Adding a new DataNode

      • Getting ready

      • How to do it…

      • There’s more…

        • Rebalancing HDFS

      • See also

    • Decommissioning DataNodes

      • How to do it…

      • How it works…

      • See also

    • Using multiple disks/volumes and limiting HDFS disk usage

      • How to do it…

    • Setting the HDFS block size

      • How to do it…

      • There’s more…

      • See also

    • Setting the file replication factor

      • How to do it…

      • How it works…

      • There’s more…

      • See also

    • Using the HDFS Java API

      • How to do it…

      • How it works…

      • There’s more…

        • Configuring the FileSystem object

        • Retrieving the list of data blocks of a file

  4. Developing Complex Hadoop MapReduce Applications

    • Introduction

    • Choosing appropriate Hadoop data types

      • How to do it…

      • There’s more…

      • See also

    • Implementing a custom Hadoop Writable data type

      • How to do it…

      • How it works…

      • There’s more…

      • See also

    • Implementing a custom Hadoop key type

      • How to do it…

      • How it works…

      • See also

    • Emitting data of different value types from a Mapper

      • How to do it…

      • How it works…

      • There’s more…

      • See also

    • Choosing a suitable Hadoop InputFormat for your input data format

      • How to do it…

      • How it works…

      • There’s more…

      • See also

    • Adding support for new input data formats – implementing a custom InputFormat

      • How to do it…

      • How it works…

      • There’s more…

      • See also

    • Formatting the results of MapReduce computations – using Hadoop OutputFormats

      • How to do it…

      • How it works…

      • There’s more…

    • Writing multiple outputs from a MapReduce computation

      • How to do it…

      • How it works…

        • Using multiple input data types and multiple Mapper implementations in a single MapReduce application

      • See also

    • Hadoop intermediate data partitioning

      • How to do it…

      • How it works…

      • There’s more…

        • TotalOrderPartitioner

        • KeyFieldBasedPartitioner

    • Secondary sorting – sorting Reduce input values

      • How to do it…

      • How it works…

      • See also

    • Broadcasting and distributing shared resources to tasks in a MapReduce job – Hadoop DistributedCache

      • How to do it…

      • How it works…

      • There’s more…

        • Distributing archives using the DistributedCache

        • Adding resources to the DistributedCache from the command line

        • Adding resources to the classpath using the DistributedCache

    • Using Hadoop with legacy applications – Hadoop streaming

      • How to do it…

      • How it works…

      • There’s more…

      • See also

    • Adding dependencies between MapReduce jobs

      • How to do it…

      • How it works…

      • There’s more…

    • Hadoop counters to report custom metrics

      • How to do it…

      • How it works…

  5. Analytics

    • Introduction

    • Simple analytics using MapReduce

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

    • Performing GROUP BY using MapReduce

      • Getting ready

      • How to do it…

      • How it works…

    • Calculating frequency distributions and sorting using MapReduce

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

    • Plotting the Hadoop MapReduce results using gnuplot

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

    • Calculating histograms using MapReduce

      • Getting ready

      • How to do it…

      • How it works…

    • Calculating Scatter plots using MapReduce

      • Getting ready

      • How to do it…

      • How it works…

    • Parsing a complex dataset with Hadoop

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

    • Joining two datasets using MapReduce

      • Getting ready

      • How to do it…

      • How it works…

  6. Hadoop Ecosystem – Apache Hive

    • Introduction

    • Getting started with Apache Hive

      • How to do it…

      • See also

    • Creating databases and tables using Hive CLI

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

        • Hive data types

        • Hive external tables

        • Using the describe formatted command to inspect the metadata of Hive tables 

    • Simple SQL-style data querying using Apache Hive

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

        • Using Apache Tez as the execution engine for Hive

      • See also

    • Creating and populating Hive tables and views using Hive query results

      • Getting ready

      • How to do it…

    • Utilizing different storage formats in Hive - storing table data using ORC files

      • Getting ready

      • How to do it…

      • How it works…

    • Using Hive built-in functions

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

      • See also

    • Hive batch mode - using a query file

      • How to do it…

      • How it works…

      • There’s more…

      • See also

    • Performing a join with Hive

      • Getting ready

      • How to do it…

      • How it works…

      • See also

    • Creating partitioned Hive tables

      • Getting ready

      • How to do it…

    • Writing Hive User-defined Functions (UDF)

      • Getting ready

      • How to do it…

      • How it works…

    • HCatalog – performing Java MapReduce computations on data mapped to Hive tables

      • Getting ready

      • How to do it…

      • How it works…

    • HCatalog – writing data to Hive tables from Java MapReduce computations

      • Getting ready

      • How to do it…

      • How it works…

  7. Hadoop Ecosystem II – Pig, HBase, Mahout, and Sqoop

    • Introduction

    • Getting started with Apache Pig

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

      • See also

    • Joining two datasets using Pig

      • How to do it…

      • How it works…

      • There’s more…

    • Accessing a Hive table data in Pig using HCatalog

      • Getting ready

      • How to do it…

      • There’s more…

      • See also

    • Getting started with Apache HBase

      • Getting ready

      • How to do it…

      • There’s more…

      • See also

    • Data random access using Java client APIs

      • Getting ready

      • How to do it…

      • How it works…

    • Running MapReduce jobs on HBase

      • Getting ready

      • How to do it…

      • How it works…

    • Using Hive to insert data into HBase tables

      • Getting ready

      • How to do it…

      • See also

    • Getting started with Apache Mahout

      • How to do it…

      • How it works…

      • There’s more…

    • Running K-means with Mahout

      • Getting ready

      • How to do it…

      • How it works…

    • Importing data to HDFS from a relational database using Apache Sqoop

      • Getting ready

      • How to do it…

    • Exporting data from HDFS to a relational database using Apache Sqoop

      • Getting ready

      • How to do it…

  8. Searching and Indexing

    • Introduction

    • Generating an inverted index using Hadoop MapReduce

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

        • Outputting a random accessible indexed InvertedIndex

      • See also

    • Intradomain web crawling using Apache Nutch

      • Getting ready

      • How to do it…

      • See also

    • Indexing and searching web documents using Apache Solr

      • Getting ready

      • How to do it…

      • How it works…

      • See also

    • Configuring Apache HBase as the backend data store for Apache Nutch

      • Getting ready

      • How to do it…

      • How it works…

      • See also

    • Whole web crawling with Apache Nutch using a Hadoop/HBase cluster

      • Getting ready

      • How to do it…

      • How it works…

      • See also

    • Elasticsearch for indexing and searching

      • Getting ready

      • How to do it…

      • How it works…

      • See also

    • Generating the in-links graph for crawled web pages

      • Getting ready

      • How to do it…

      • How it works…

      • See also

  9. Classifications, Recommendations, and Finding Relationships

    • Introduction

    • Performing content-based recommendations

      • How to do it…

      • How it works…

      • There’s more…

    • Classification using the naïve Bayes classifier

      • How to do it…

      • How it works…

    • Assigning advertisements to keywords using the Adwords balance algorithm

      • How to do it…

      • How it works…

      • There’s more…

  10. Mass Text Data Processing

    • Introduction

    • Data preprocessing using Hadoop streaming and Python

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

      • See also

    • De-duplicating data using Hadoop streaming

      • Getting ready

      • How to do it…

      • How it works…

      • See also

    • Loading large datasets to an Apache HBase data store – importtsv and bulkload

      • Getting ready

      • How to do it…

      • How it works…

      • There’s more…

        • Data de-duplication using HBase

      • See also

    • Creating TF and TF-IDF vectors for the text data

      • Getting ready

      • How to do it…

      • How it works…

      • See also

    • Clustering text data using Apache Mahout

      • Getting ready

      • How to do it…

      • How it works…

      • See also

    • Topic discovery using Latent Dirichlet Allocation (LDA)

      • Getting ready

      • How to do it…

      • How it works…

      • See also

    • Document classification using Mahout Naive Bayes Classifier

      • Getting ready

      • How to do it…

      • How it works…

      • See also

Index

3.27206
Average: 3.3 (238 votes)

Search the Web

Custom Search

Searches whole web. Use the search in the right sidebar to search only within javajee.com!!!