Method and system for classifying web sites using query-based web site models
Abstract
Web sites are grouped by generating feature space representations of documents, and aggregating the feature space representations into web site vectors. A document vector may be generated for each document of a plurality of documents associated with a set of web sites according to a query-based feature space model. The query-based feature space model defines features of the documents. Each document vector includes weights determined for features associated with the corresponding document. A web site vector is generated for each of the web sites using the plurality of document vectors. The web sites are grouped according to the web site vectors.
Más información
| Fecha de publicación: | 2012 |
| DOI: |
US 20120166439 A1 |
| Notas: | patent pending |