Autores
Orientador(es)
Resumo(s)
This report presents a statistical study of WPT-03, a text corpus built from the pages of the `Portuguese Web' collected in the repository of the tumba! search engine. We give a statistical analysis of the textual contents available in the Portuguese Web, including size distributions, the language of the pages, and the terms they contain
