Cruz, João Carlos Caetano de Freitas Pires daMelo, Hygor Piaget MonteiroCordeiro, Luís António Rodrigues Gaspar2024-06-182024-06-1820222021http://hdl.handle.net/10451/65068Tese de Mestrado Integrado, Engenharia Física, 2022, Universidade de Lisboa, Faculdade de CiênciasNatural languages, as typical complex systems, exhibit distinctive properties arising from the relationships between their elements such as nonlinearity and emergence. Such properties, to gether with the high dimensionality inherent of extensive vocabularies, make natural languages intrinsically difficult to model. Word embedding models have tackled these difficulties by using distributional semantics along with neural-based models for computing vector representation of words in a space of reduced dimension. In particular, the word2vec model makes use of a 3- layer neural network that generates a vector space, Γ, where a quantitative notion of meaning is recovered. In this work, we use the word2vec architecture to show that, in the space of reduced dimension, in addition to meaning, it is also possible to recover a notion of word attractiveness. In this framework, we define in Γ the quantity mass, M, for each of the V words that form the vocabulary. It was found that M is positively correlated with the word frequencies in the text, f, and that both f and M are distributed according to power laws. It was also found that when the text is shuffled, that is, keeping word frequencies but changing their order, practically all words have M = 0 which suggests that mass is a property that does not bypass text’s emergent structure. In addition, we have extended the definition of mass to serve as connection criterion for a new linguistic network (a model for languages in terms of a graph structure). It was found that this network exhibits scale-free and small-world properties and that its topology is signifi cantly affected by text shuffling, on contrast to what is observed for other unsupervised linguistic networks. We also suggest that the total mass of the system may function as a measure that represents an intuitive concept of information and that is uniquely defined.engLinguística QuantitativaSistemas ComplexosRedes LinguísticasVetorização lexicalTeses de mestrado - 2022Dealing with language emergent behavior using vectors of reduced dimensionmaster thesis203201949