Affect Lexicon Induction For the Github Subculture Using Distributed Word Representations

Jiao, Yuwei

Affect Lexicon Induction For the Github Subculture Using Distributed Word Representations

Files

Jiao_Yuwei.pdf (1.23 MB)

Date

2018-11-06

Authors

Jiao, Yuwei

Advisor

Hoey, Jesse

Publisher

University of Waterloo

Abstract

Sentiments and emotions play essential roles in small group interactions, especially in self-organized collaborative groups. Many people view sentiments as universal constructs; however, cultural differences exist in some aspects of sentiments. Understanding the features of sentiment space in small group cultures provides essential insights into the dynamics of self-organized collaborations. However, due to the limit of carefully human annotated data, it is hard to describe sentimental divergences across cultures. In this thesis, we present a new approach to inspect cultural differences on the level of sentiments and compare subculture with the general social environment. We use Github, a collaborative software development network, as an example of self-organized subculture. First, we train word embeddings on large corpora and do embedding alignment using linear transformation method. Then we model finer-grained human sentiment in the Evaluation- Potency-Activity (EPA) space and extend subculture EPA lexicon with two-dense-layered neural networks. Finally, we apply Long Short-Term Memory (LSTM) network to analyze the identities’ sentiments triggered by event-based sentences. We evaluate the predicted EPA lexicon for Github community using a recently collected dataset, and the result proves our approach could capture subtle changes in affective dimensions. Moreover, our induced sentiment lexicon shows individuals from two environments have different understandings to sentiment-related words and phrases but agree on nouns and adjectives. The sentiment features of “Github culture” could explain that people in self-organized groups tend to reduce personal sentiment to improve group collaboration.