Method and system for classifying web sites using query-based web site models
Abstract
Web sites are grouped by generating feature space representations of documents, and aggregating the feature space representations into web site vectors. A document vector may be generated for each document of a plurality of documents associated with a set of web sites according to a query-based feature space model. The query-based feature space model defines features of the documents. Each document vector includes weights determined for features associated with the corresponding document. A web site vector is generated for each of the web sites using the plurality of document vectors. The web sites are grouped according to the web site vectors.
Más información
Fecha de publicación: | 2012 |
DOI: |
US 20120166439 A1 |
Notas: | patent pending |