TABLE OF CONTENTS
CHAPTER ONE
Introduction
1.1. Background
1.2. Problem Statement
1.3. Objective
1.3.1 General Objective
1.3.2 Specific Objective
1.4. Methodology
1.5 Thesis Outline
CHAPTER TWO
Literature Review
Hash and Encryption Standards
CHAPTER THREE
3.1 Secure Hash Standard (SHS)
3.2 Advanced Encryption Standard (AES)
3.2.1 Description of AES Algorithm
3.2.2 High-level description of the AES algorithm
Client Side De-duplication
CHAPTER FOUR
4.1 The Challenge of Client Side de-duplication
4.2 Convergent Encryption
Analysis, Design and Implementation
CHAPTER FIVE
5.1 Overview of proposed scheme
Case I: First upload
Case II: Subsequent uploads
5.1.1 Storage Manager
5.1.2 Proof of Ownership Verifier
5.2. Implementation of the system
5.2.1. The Server Side Application
5.2.2 The Client Component
CHAPTER SIX
Result and Discussion
CHAPTER SEVEN
Conclusion and Recommendation
Bibliography
Appendix
Abstract
According to a recent survey by Iternational Data Corporation [63], 75% of today’s digital data are duplicated copies. To reduce the unnecessarily redundant copies, the storage servers would handle duplication (either at a file level or chunks of data sized 4KB and larger). De-duplication can be managed both at the server-side and the client-side. In order to identify duplicated copies, it is required that files be un-encrypted. However users may be worried about the security of their files and may want their data to be encrypted. However encryption makes cipher text indistinguishable from theoretically random data, i.e., encrypted data are always distributed randomly, so identical plaintext encrypted by randomly generated cryptographic keys will very likely have different cipher texts which cannot be de-duplicated. In this research, a method that resolves the conflict between de-duplication and encryption is presented.
Chapter One
Introduction
1.1. Background
Currently, commercial large scale storage services including Microsoft Skydrive, Amazon and Google drive Storage have attracted millions of users. While data redundancy was once an acceptable operational part of the backup process, the rapid growth of digital content in the data center has pushed organizations to rethink how they approach this issue and to look for ways to optimize storage capacity utilization across the enterprise. Explosive data growth over the recent years has brought much pressure on the infrastructure and storage management.
The Flud backup system [4] and Google [6] etc can save on storage costs by removing duplication. According to a recent survey by IDC [63], 75% of today’s digital data are duplicated copies. To reduce the unnecessarily redundant copies, the storage servers would handle duplication (either at a file level or chunks of data sized 4KB and larger) by keeping only one or few copies for each file and making a link to the file for every user who asks to store the file, regardless of how many copies there are. The copies are replaced by pointers which reference the original block of data in a way that is seamless to the user, who continues to use a file as if all of the blocks of data it contains are his or hers alone.
Duplication can be managed both at the server-side and the client-side; client-side de-duplication is mostly known for effectively
1. Reduced band width requirement
2. Reduced storage space requirement
3. Lower electric consumption (hence a greener environment)
4. Lower overall cost of storage.............
================================================================
Item Type: Project Material | Size: 119 pages | Chapters: 1-5
Format: MS Word | Delivery: Within 30Mins.
================================================================