<?xml version="1.0" encoding="UTF-8" ?><xb:digital_entity_call xmlns:xb="http://com/exlibris/digitool/repository/api/xmlbeans"><xb:digital_entity><pid>16989</pid><control><label>Geographically aware Web text mining</label><note>Martins, Bruno Emanuel da Graça, 1975-</note><ingest_id>ing1365</ingest_id><ingest_name>ulsd27jan09/1</ingest_name><entity_type xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/><entity_group xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/><usage_type>VIEW</usage_type><preservation_level>critical</preservation_level><partition_a>OAI-RUL</partition_a><partition_b xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/><partition_c xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/><status xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/><creation_date>2009-01-27 11:49:14</creation_date><creator>creator:MARTA</creator><modification_date>2009-08-10 13:10:40</modification_date><modified_by>viewer:internal</modified_by><admin_unit>DUL01</admin_unit></control><mds><md shared="false"><mid>15604</mid><description> </description><name>descriptive</name><type>dc</type><value><![CDATA[<?xml version="1.0" encoding="UTF-8"?><record xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">	<dc:language>eng</dc:language>	<dc:type>info:eu-repo/semantics/doctoralThesis</dc:type>	<dc:format>application/pdf</dc:format>	<dc:title>Geographically aware Web text mining</dc:title>	<dc:creator>Martins, Bruno Emanuel da Graça, 1975-</dc:creator>	<dcterms:advisor>Silva, Mário Jorge Gaspar da, 1961-</dcterms:advisor>	<dc:date>2009</dc:date>	<dcterms:abstract>Text mining and search have become important research areas over the past few years, mostly due to the large popularity of the Web. A natural extension for these technologies is the development of methods for exploring the geographic context of Web information. Human information needs often present specific geographic constraints. Many Web documents also refer to specific locations. However, relatively little effort has been spent on developing the facilities required for geographic access to unstructured textual information. Geographically aware text mining and search remain relatively unexplored. This thesis addresses this new area, arguing that Web text mining can be applied to extract geographic context information, and that this information can be explored for information retrieval. Fundamental questions investigated include handling geographic references in text, assigning geographic scopes to the documents, and building retrieval applications that handle/use geographic scopes. The thesis presents appropriate solutions for each of these challenges, together with a comprehensive evaluation of their effectiveness. By investigating these questions, the thesis presents several findings on how the geographic context can be effectively handled by text processing tools.</dcterms:abstract>	<dcterms:accessRights>open access</dcterms:accessRights>	<dc:description>Tese de doutoramento em Informática (Engenharia Informática), apresentada à Universidade de Lisboa através da Faculdade de Ciências, 2009</dc:description>	<dc:link>http://sibul.reitoria.ul.pt/F/?func=item-global&amp;doc_library=ULB01&amp;type=03&amp;doc_number=000545977</dc:link><dc:subject>Engenharia informática</dc:subject><dc:subject>Teses de doutoramento - 2009 </dc:subject><dcterms:abstract>A pesquisa e prospecção de texto tornaram-se nos últimos anos importantes áreas de pesquisa, em grande parte devido à popularidade da Web. Uma extensão natural destas tecnologias é a criação de métodos para explorar o contexto geográfico da informação na Web. As necessidades de informação são muitas vezes expressas com um dado constrangimento geográfico e muitos documentos referem-se também a locais específicos. No entanto, tem sido dedicado relativamente pouco esforço ao desenvolvimento dos mecanismos necessários para a exploração geográfica de informação textual não estruturada. A prospecção e pesquisa de informação textual, cientes da geografia, permanecem relativamente inexploradas. Esta tese aborda esta problemática, levantando a hipótese de poderem ser usadas técnicas de prospecção de textos Web para extrair informação relativa ao contexto geográfico, podendo ainda esta informação ser usada na pesquisa de documentos. As questões fundamentais sob investigação incluem o processamento de referências geográficas, a atribuição de âmbitos geográficos a documentos, e a construção de aplicações de pesquisa suportando âmbitos geográficos. São descritas soluções adequadas a cada um destes desafios, e é feita uma avaliação pormenorizada da sua eficácia. Como resultado da investigação, a tese apresenta algumas conclusões sobre como pode o contexto geográfico ser considerado em aplicações de processamento de texto.</dcterms:abstract></record>]]></value></md><md shared="false"><mid>15609</mid><description xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/><name>technical</name><type>text_md</type><value><![CDATA[<?xml version="1.0" encoding="utf-8"?>
<textmd:textMD xmlns:textmd="http://www.loc.gov/METS/" xmlns="http://www.loc.gov/METS/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:jhove="http://hul.harvard.edu/ois/xml/ns/jhove" xsi:schemaLocation="http://www.loc.gov/METS http://dlib.nyu.edu/METS/textmd.xsd">
   <textmd:encoding QUALITY="1">
      <textmd:encoding_software/>
      <textmd:encoding_platform linebreak=""/>
      <textmd:encoding_software/>
      <textmd:encoding_agent role=""/>
   </textmd:encoding>
   <textmd:character_info>
      <textmd:charset/>
      <textmd:byte_order/>
      <textmd:character_size encoding=""/>
      <textmd:linebreak/>
   </textmd:character_info>
   <textmd:language/>
   <textmd:alt_language authority=""/>
   <textmd:font_script/>
   <textmd:markup_basis version=""/>
   <textmd:markup_language version=""/>
   <textmd:processingNote/>
   <textmd:printRequirements/>
   <textmd:textNote/>
   <textmd:viewingRequirements/>
   <textmd:file>
      <textmd:fileSize>10235951</textmd:fileSize>
      <textmd:checksum>
         <textmd:checksumMethod>CRC32</textmd:checksumMethod>
         <textmd:checksumValue>4ad59431</textmd:checksumValue>
      </textmd:checksum>
      <textmd:checksum>
         <textmd:checksumMethod>MD5</textmd:checksumMethod>
         <textmd:checksumValue>a42cc7847ba7a23a561776168ead9791</textmd:checksumValue>
      </textmd:checksum>
      <textmd:checksum>
         <textmd:checksumMethod>SHA-1</textmd:checksumMethod>
         <textmd:checksumValue>3c7bc61e982ac2691990511324776b62933c58fb</textmd:checksumValue>
      </textmd:checksum>
      <textmd:mimeType>application/octet-stream</textmd:mimeType>
      <textmd:extension/>
   </textmd:file>
</textmd:textMD>]]></value></md><md shared="false"><mid>15610</mid><description xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/><name>changehistory</name><type>changehistory_md</type><value><![CDATA[<xb:history xmlns:xb="http://com/exlibris/digitool/repository/api/xmlbeans"><events><event xmlID="1"><eventIdentifier><eventIdentifierType>Service</eventIdentifierType><eventIdentifierValue>TechnicalMetadataExtractor</eventIdentifierValue></eventIdentifier><eventType>Technical Metadata Extraction</eventType><eventDateTime>2009-01-27T14:14:54.702Z</eventDateTime><eventDetail>Add some metadata to the digital entity using jhove</eventDetail><linkingAgentIdentifier><linkingAgentIdentifierType>software_used</linkingAgentIdentifierType><linkingAgentIdentifierValue>JHove</linkingAgentIdentifierValue></linkingAgentIdentifier></event></events></xb:history>]]></value></md></mds><relations/><stream_ref><file_name>16989_thesis.pdf</file_name><file_extension>pdf</file_extension><mime_type>application/pdf</mime_type><directory_path>/digitool_storage/deposit-master/2009/01/27/file_1/16989</directory_path><file_id xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/><storage_id>1006</storage_id><external_type>-1</external_type><file_size_bytes>10235951</file_size_bytes></stream_ref></xb:digital_entity></xb:digital_entity_call>