Method and system for classifying web sites using query-based web site models

Poblete, Barbara; Spiliopoulou, Maria; Mendoza, Marcelo.

Abstract

Web sites are grouped by generating feature space representations of documents, and aggregating the feature space representations into web site vectors. A document vector may be generated for each document of a plurality of documents associated with a set of web sites according to a query-based feature space model. The query-based feature space model defines features of the documents. Each document vector includes weights determined for features associated with the corresponding document. A web site vector is generated for each of the web sites using the plurality of document vectors. The web sites are grouped according to the web site vectors.

Más información

Fecha de publicación: 2012
DOI:

US 20120166439 A1

Notas: patent pending