Logo do repositório
 
A carregar...
Miniatura
Publicação

The Viuva Negra crawler

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
06-21.pdf367.78 KBAdobe PDF Ver/Abrir

Orientador(es)

Resumo(s)

This report discusses architectural aspects of web crawlers and details the design, implementation and evaluation of the Viuva Negra (VN) crawler. VN has been used for 4 years, feeding a search engine and an archive of the Portuguese web. In our experiments it crawled over 2 million documents per day, correspondent to 63 GB of data. We describe hazardous situations to crawling found on the web and the adopted solutions to mitigate their effects. The gathered information was integrated in a web warehouse that provides support for its automatic processing by text mining applications.

Descrição

Palavras-chave

Crawler design tumba! web partitioning,experiments harvesting Tomba

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo

Editora

Department of Informatics, University of Lisbon

Licença CC