TABLE OF CONTENTS
CHAPTER
ONE
1.1 Introduction
1.2 Problem of the study
1.3 Aim and Objectives of the study
1.4 Scope of the study
1.5 Significance of the study
1.6 Limitation of the study
1.7 Definition of terms
CHAPTER
TWO
Literature
Review
2.1 Search Engines
2.2 Building block of search engine
2.3 Search Engine Component
2.3.1 Text Acquisition
2.3.2 Text Transformation
2.3.5 Ranking
2.4 Issue in Search Engine Research
CHAPTER
THREE
System
Analysis and Design
3.0 System Analysis
3.1 System Overview
3.2 System Feature
3.3 Methods of Data Entry
CHAPTER
FOUR
4.1 Choice and Justification of the
programming laguage used
4.2 Implementation Plan
4.3 Program flowchart
4.4 Procedure chart
CHAPTER
FIVE
Summary,
Recommendation and Conclusion
5.1 Summary
5.2 Recommendation
5.3 Conclusion
References
Source Code
CHAPTER ONE
1.1 INTRODUCTION
This
Project deals with the design and implementation of a content-based search engine.
Content-based means that the system utilizes information available in the web
documents in a holistic manner to determine what might be interesting to the
user. We focus on textual content that is written in a natural language as
opposed to, say, images included in the documents. We call the presented system
a search engine, as it contains components to retrieve and index web documents,
and it provides a mechanism to return a ranked subset of the documents
according to the user's requests. The system should be able to process millions
of documents in a reasonable time and respond to queries with a low average
latency. The starting point is a Web Crawler (or spider) to retrieve all Web
pages: it simply traverses the entire Web or a certain subset of it, to
download the pages or files it encounters and save for other components to use.
The actual traversal algorithm varies depends on the implementation; depth
first, breadth first, or random traversal are all being used to meet different
design goals. The parser takes all downloaded raw results, analyze and
eventually try to make sense out of them. In the case of a text search engine,
this is done by extracting keywords and checking the locations and/or
frequencies of them. Hidden HTML tags, such as KEYWORDS and DESCRIPTION, are
also considered. Usually a scoring system is involved to give a final point for
each keyword on each page. Simple or complicated, a search engine must have a
way to determine which pages are more important than the others, and present them
to users in a particular order. This is called the Ranking System. The most
famous one is the Page Rank Algorithm published by Google founders [Brin 1998].
A
reliable repository system is definitely critical for any application. Search
engine also requires everything to be stored in the most efficient way to
ensure maximum performance. The choice of database vendor and the schema design
can make big difference on performance for metadata such as URL description,
crawling date, keywords, etc. More challenging part is the huge volume of
downloaded files to be saved before they are picked up by other modules.
Finally,
a front-end interface for users: This is the face and presentation of the
search engine. When a user submits a query, usually in the form of a list of
textual terms, an internal scoring function is applied to each Web page in the
repository [Pandey 2005], and the list of result is presented, usually in the
order or relevance and importance. Google has been known for its simple and
straight forward interface, while some most recent competitors, such as
Ask.com1, provide much richer user experience by adding features like preview
or hierarchy displaying. This project work will focus on how we can make a
search engine system that will that can gather information from all angles of
the web, index and rank it while maintaining a simple and rich user interface
for users to query information.
1.2 PROBLEM OF THE STUDY
To
design a system for retrieving relevant information from the internet depending
on the user supplied query. the internet carries an extensive range of
information, such as the inter-linked hypertext documents etc. In the manual
method of retrieving information
condition the user to know the domain name which is the address that help users
to retrieve information from the website. The amount of time users spends in
searching information is very high and the probability of getting the
information is low. This Search Engine System will perform an automatic job of
scrawling (searching) hypertext document from various website and store it in a
disk. When the user search for
information using keyword, the Search Engine system will take the keywords and
compare it with the hypertext document that reside on it database and return
the URL Address where the information can be found.
1.3 AIM AND OBJECTIVE OF THE STUDY
The
aim of this project work is to create a search engine system that retrieve
information from the internet based on the keyword that the user supplied in an
environment that is very simple for user interaction. The method for retrieving
information will be will be easy for non-technical users.
The
objective of this research work is to
1. Embedded the system in an environment that the
user is already familiar with.
2. To
enable user to retrieve information with simple application such as Web
browser.
3. Reduce
The amount of time internet users search for information.
1.4 SCOPE OF THE STUDY
User
will be able retrieve generalized information relating Mathematics, Physics,
Government, Physiology, Programming, Law, Health, Science etc. The system will
cover entirely any text information that can be found on the internet.
1.5 SIGNIFICANT OF THE STUDY
1. Examine
the practical method the system will be implemented
2. Internet
users will be able to search information easily.
3. The
amount of time that average user used in reaching information will be reduce.
4. User
will be able to filter information based on needs.
5. This
Project will also serve as a solid ground for researchers on want to improve on
this system.
1.6 LIMITATION OF THE
STUDY
We
focused on providing information for user as fast as the system can relate.
There is one major restriction that will be hard to handle during the
implementation of the system. The system will craw and scrape information on
hypertext document as much it can handle but the storage facilities that is
provided to store information is very low compare to the amount of information
that the system is expected to crawl.
For more Computer Science projects click here
___________________________________________________________________________
This is an Undergraduate Thesis and the complete research material plus questionnaire and references can be obtained at an affordable price of N3,000 within Nigeria or its equivalent in other currencies.
INSTRUCTION ON HOW TO GET THE COMPLETE PROJECT MATERIAL
Kindly pay/transfer a total sum of N3,000 into any of our Bank Accounts listed below:
· Diamond Bank Account:
A/C Name: Haastrup Francis
A/C No.: 0096144450
· GTBank Account:
A/C Name: Haastrup Francis
A/C No.: 0029938679
After payment, send your desired Project Topic, Depositor’s Name, and your Active E-Mail Address to which the material would be sent for downloading (you can request for a downloading link if you don’t have an active email address) to +2348074521866 or +2348066484965. You can as well give us a direct phone call if you wish to. Projects materials are sent in Microsoft format to your mail within 30 Minutes once payment is confirmed.
--------------------------------------------------------
N/B: By ordering for our material means you have read and accepted our Terms and Conditions
Terms of Use: This is an academic paper. Students should NOT copy our materials word to word, as we DO NOT encourage Plagiarism. Only use as guide in developing your original research work.
Delivery Assurance
We are trustworthy and can never SCAM you. Our success story is based on the love and fear for God plus constant referrals from our clients who have benefited from our site. We deliver project materials to your Email address within 15-30 Minutes depending on how fast your payment is acknowledged by us.
Quality Assurance
All research projects, Research Term Papers and Essays on this site are well researched, supervised and approved by lecturers who are intellectuals in their various fields of study.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.