Title : ( A Focused Linked Data Crawler based on HTML Link Analysis )
Authors: reihaneh emamdadi , Mohsen Kahani , Fattane Zarrinkalam ,Abstract
Linked Data can be published as RDF documents or embedded in HTML documents. A linked data crawler is a program that discovers the published linked data from the web by following RDF links. Note that there are RDF documents that are surrounded by HTML documents. Therefore, linked data crawlers require to follow HTML links in addition to RDF links to be able to discover such RDF documents as well as harvest the embedded linked data in HTML documents. However, many HTML documents have not embedded any linked data and not pointed to any RDF documents. So, crawling such HTML documents decreases discovery rate of RDF documents per unit of network bandwidth and wastes computation resources on non-RDF documents. In this paper, a focused linked data crawler is proposed to address this problem. The proposed crawler analyzes and prioritizes HTML links by calculating the possibility that a link will lead to an RDF document. The experimental evaluation shows that the proposed approach is effective in terms of increasing discovery rate of RDF document in comparison with a non-focused linked data crawler.
Keywords
linked data crawler; focused crawler; RDF link; HTML link; discovery rate@inproceedings{paperid:1046209,
author = {Emamdadi, Reihaneh and Kahani, Mohsen and Zarrinkalam, Fattane},
title = {A Focused Linked Data Crawler based on HTML Link Analysis},
booktitle = {4th International eConference on Computer and Knowledge Engineering (ICCKE2014)},
year = {2014},
location = {Mashhad, IRAN},
keywords = {linked data crawler; focused crawler; RDF link; HTML link; discovery rate},
}
%0 Conference Proceedings
%T A Focused Linked Data Crawler based on HTML Link Analysis
%A Emamdadi, Reihaneh
%A Kahani, Mohsen
%A Zarrinkalam, Fattane
%J 4th International eConference on Computer and Knowledge Engineering (ICCKE2014)
%D 2014