Repository logo
 
No Thumbnail Available
Publication

Collecting Statistics about the Portuguese Web

Use this identifier to reference this record.
Name:Description:Size:Format: 
03-10.pdf248.62 KBAdobe PDF Download

Advisor(s)

Abstract(s)

This report presents a characterization of text documents from the Portuguese Web. This characterization was produced from a crawl of over 4 million URLs and 131 thousand sites in 2003. We describe rules that we established for defvining its boundaries and the methodology used to gather statistics. We also show how crawling constraints and abnormal situations on the Web can influence the results

Description

Keywords

Web characterization Portuguese Portugal tumba! statistics crawling

Pedagogical Context

Citation

Research Projects

Organizational Units

Journal Issue

Publisher

Department of Informatics, University of Lisbon

CC License