Using SOFM to improve web site text content

Ríos S.A.; Yasuda, H.; Aoki, T; Velásquez J.D.; Vera E.S.

Keywords: information, vectors, algorithms, world, text, analysis, clustering, retrieval, web, reverse, wide, page, Contents, SOFM

Abstract

We introduce a new method to improve web site text content by identifying the most relevant free text in the web pages. In order to understand the variations in web page text, we collect pages during a period. The page text content is then transformed into a feature vector and is used as input of a clustering algorithm (SOFM), which groups the vectors by common text content. In each cluster, a centroid and its neighbor vectors are extracted. Then using a reverse clustering analysis, the pages represented by each vector are reviewed in order to find the similar. Furthermore, the proposed method was tested in a real web site, proving the effectiveness of this approach. © Springer-Verlag Berlin Heidelberg 2005.

Más información

Título de la Revista: BIO-INSPIRED SYSTEMS AND APPLICATIONS: FROM ROBOTICS TO AMBIENT INTELLIGENCE, PT II
Volumen: 3611
Número: PART II
Editorial: SPRINGER INTERNATIONAL PUBLISHING AG
Fecha de publicación: 2005
Página de inicio: 622
Página final: 626
URL: http://www.scopus.com/inward/record.url?eid=2-s2.0-26844528966&partnerID=q2rCbXpz