AVAILABILITY OF THE JOBTRACKER MACHINE IN HADOOP/MAP-REDUCE IMPLEMENTATIONS

For more Computer Science projects click here


ABSTRACT

Due to the growing demand for Cloud Computing services, the need and importance of Distributed Systems cannot be underestimated. However, it is di cult to use the traditional Message Passing Interface (MPI) ap-proach to implement synchronization, coordination,and prevent deadlocks in distributed systems. This di culty is lessened by the use of Apache's Hadoop/MapReduce and Zookeeper to provide Fault Tolerance in a Homo-geneously Distributed Hardware/Software environment.

In this thesis, a mathematical model for the availability of the JobTracker in Hadoop/MapReduce using Zookeeper's Leader Election Service is examined. Though the availability is less than what is expected in a k Fault Tolerance system for higher values of the hardware failure rate, this approach makes coordination and synchronization easy, reduces the e ect of Crash failures, and provides Fault Tolerance for distributed systems.

The availability model starts with a Markov state diagram for a general case of N Zookeeper servers followed by speci c cases of 3,4,and 5 servers. Both software and hardware faults are considered in addition to the e ect of hardware and software repair rates. Comparisons show that, the system availability changes with change in the number of Zookeeper servers, with 3 servers having the highest availability.


The model presented in this study can be used to decide on how many servers are optimal for maximum availability and from which vendor they must be purchased. It can also help determine what time to use a Zookeeper coordinated Hadoop cluster to perform critical tasks.


TABLE OF CONTENTS

List of Tables
List of Figures
Abstract

CHAPTER ONE
1  Introduction
1.1 Problem Statement
1.2 Objectives
1.3 Thesis Organization

CHAPTER TWO
2  Cloud Computing and Fault Tolerance
2.1 Cloud Computing
2.2 Types of Clouds
2.3 Virtualization in the Cloud
2.3.1 Advantages of virtualization
2.4 Fault, Error and Failure
2.4.1 Faults Types
2.5 Fault Tolerance
2.5.1 Fault-tolerance Properties
2.5.2 K Fault Tolerant Systems
2.5.3 Hardware Fault Tolerance
2.5.4 Software Fault Tolerance
2.6 Properties of a Fault Tolerant Cloud
2.6.1 Availability
2.6.2 Reliability
2.6.3 Scalability

CHAPTER THREE
3 Hadoop/MapReduce Architecture
3.1 Hadoop/MapReduce
3.2 MapReduce
3.3 Hadoop/MapReduce versus other Systems
3.3.1 Relational Database Management Systems (RDBMS)
3.3.2 Grid Computing
3.3.3 Volunteer Computing
3.4 Features of MapReduce
3.4.1 Automatic Parallelization and Distribution of Work
3.4.2 Fault Tolerance in Hadoop/MapReduce
3.4.3 Cost E  ciency
3.4.4 Simplicity
3.5 Limitations of Hadoop/MapReduce
3.6 Apache's ZooKeeper
3.6.1 ZooKeeper Data Model
3.6.2 Zookeeper Guarantees
3.6.3 Zookeeper Primitives
3.6.4 Zookeeper Fault Tolerance
3.7 Related Work

CHAPTER FOUR
4 Availability Model
4.1 JobTracker Availability Model
4.1.1 Related Work
4.2 Model Assumptions
4.3 Markov Model for a Multi-Host System
4.3.1 The Parameter  s(t)
4.4 Markov Model for a Three-Host (N = 3)
                        Hadoop/MapReduce Cluster Using
                        Zookeeper as Coordinating Service
4.5 Numerical Solution to the System of Di erential Equations
4.5.1 Interpretation of Availability plot of the JobTracker
4.6 Discussion of Results
4.6.1 Sensitivity Analysis

CHAPTER FIVE
5  Conclusion and Future Work
5.1 Conclusion
5.2 Future Work
Appendix


Chapter 1

Introduction

The e ectiveness of most modern information (data) processing involves the ability to process huge datasets in parallel to meet stringent time con-straints and organizational needs. A major challenge facing organizations today is the ability to organize and process large data generated by cus-tomers. According to Nielson Online[1] there are more than 1,733,993,741 internet users. How much data these users are generating and how it is pro-cessed largely determines the success of the organization concerned. Con-sider the social networking site Facebook; as at August 2011, it has over 750 million active users[2] who spend 700 billion minutes per month on the network. They install over 20 million applications every day and interact with 30 billion pieces of content (web links, news stories, blog posts, notes, photo albums, etc.) each month. Since April 2010 when social plugins were launched, an average of 10,000 new websites has integrated with Facebook. The amount of data generated in Facebook is estimated as follows [3]:

12 TB of compressed data added per day

800 TB of compressed data scanned per day 25,000 map-reduce jobs per day

65 million  les in HDFS

30,000 simultaneous clients to the HDFS NameNode


It was a similar demand to process large datasets in Google that inspired Engineers in Google to introduce MapReduce [4]. At Google MapReduce is used to build Index for Google Search, Article clustering for Google News and perform Statistical machine translations. At Yahoo!, it is used to build Index for Yahoo! Search and spam detection. And at Facebook, MapReduce is used for Data mining, Ad optimization, and Spam detection [5]. MapRe-duce is designed to use commodity nodes (runs on cheaper machines) that can fail at any time. Its performance does not reduce signi cantly due to.....


For more Computer Science projects click here
___________________________________________________________________________
This is a Postgraduate Thesis and the complete research material plus questionnaire and references can be obtained at an affordable price of N3,000 within Nigeria or its equivalent in other currencies.


INSTRUCTION ON HOW TO GET THE COMPLETE PROJECT MATERIAL

Kindly pay/transfer a total sum of N3,000 into any of our Bank Accounts listed below:
·         Diamond Bank Account:
A/C Name:      Haastrup Francis
A/C No.:         0096144450

·         GTBank Account:
A/C Name:      Haastrup Francis
A/C No.:         0029938679

After payment, send your desired Project Topic, Depositor’s Name, and your Active E-Mail Address to which the material would be sent for downloading (you can request for a downloading link if you don’t have an active email address) to +2348074521866 or +2348066484965. You can as well give us a direct phone call if you wish to. Projects materials are sent in Microsoft format to your mail within 30 Minutes once payment is confirmed. 

--------------------------------------------------------
N/B:    By ordering for our material means you have read and accepted our Terms and Conditions


Terms of Use: This is an academic paper. Students should NOT copy our materials word to word, as we DO NOT encourage Plagiarism. Only use as guide in developing your original research work.

Delivery Assurance
We are trustworthy and can never SCAM you. Our success story is based on the love and fear for God plus constant referrals from our clients who have benefited from our site. We deliver project materials to your Email address within 15-30 Minutes depending on how fast your payment is acknowledged by us.

Quality Assurance
All research projects, Research Term Papers and Essays on this site are well researched, supervised and approved by lecturers who are intellectuals in their various fields of study.
Share:

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Search for your topic here

To view a full list of Project Topics under your Department

Featured Post

Article: How to Write a Research Proposal

Most students and beginning researchers do not fully understand what a research proposal means, nor do they understand ...

Popular Posts