Extracting Information from Web Content and Structure

Tesar, Roman

Keywords: classification, web mining, information retrieval, ranking algorithms

Abstract

Web is a vast data repository. By mining from this data efficiently, we can gain valuable knowledge. Unfortunately, in addition to useful content there are also many Web documents considered harmful (e.g. pornography, terrorism, illegal drugs). Web mining that includes three main areas – content, structure, and usage mining – may help us detect and eliminate these sites. In this paper, we concentrate on applications of Web content and Web structure mining. First, we introduce a system for detection of pornographic textual Web pages. We discuss its classification methods and depict its architecture. Second, we present analysis of relations among Czech academic computer science Web sites. We give an overview of ranking algorithms and determine importance of the sites we analyzed.

Más información

Fecha de publicación: 2006
Año de Inicio/Término: 25 - 26 April 2006
Página de inicio: 133
Página final: 140
Idioma: English
URL: https://www.kiv.zcu.cz/~dalfia/publications/ISIM2006final.pdf