Advanced techniques in web data pre-processing and cleaning
Abstract
Central to successful e-business is the construction of web sites that attract users, capture user preferences, and entice them into making a purchase. Web mining is diverse data mining applied to categorize both the content and structure of web sites with the goal of aiding e-business. Web mining requires knowledge of the web site structure (hyperlink graph), the web content (vector model) and user sessions (the sequence of pages visited by each user to a site). Much of the data for web mining can be noisy. The origin of the noise comes from many sources, for example, undocumented changes to the web site structure and content, a different understanding of the text and media semantic, and web logs without individual user identification. There may not be any record of the number of times a specific page has been visited in a session as page is stored on a proxy or web browser cache. Such noise presents a challenge for web mining. This chapter presents issues with and approaches for cleaning web data in preparation for web mining analysis. © 2010 Springer-Verlag Berlin Heidelberg.
Más información
Título de la Revista: | SERVICE ORIENTATION IN HOLONIC AND MULTI-AGENT MANUFACTURING |
Volumen: | 311 |
Editorial: | SPRINGER-VERLAG BERLIN |
Fecha de publicación: | 2010 |
Página de inicio: | 19 |
Página final: | 48 |
URL: | http://www.scopus.com/inward/record.url?eid=2-s2.0-77956537722&partnerID=q2rCbXpz |