Opinion Mining for Biomedical Data: Feature Space Design and Selection
Oral Defence Date:
Profs. Yang, Singh and Petkovic
Unstructured text (e.g., journal articles) remains as the primary means for publishing biomedical research results. To extract and integrate knowledge from such data, text mining has been routinely applied. On e important task is extracting relationships between bio-entities such as foods and diseases. Most existing studies however stop short of further analyzing the extracted relationships such as the polarity and the level of certainty at which the authors reported on a given relationship. The latter is termed as the relationship strength and marked at three levels — weak, medium and strong. In this work we detail our studies on constructing an effective feature space towards effectively predicting the polarity and strength of a relationship. We consider four types of namely, positive, negative, neutral and no-relationship. Another contribution is that in addition to the commonly accepted lexicon-based features, we have identified a set of novel features that capture both the semantic and structural aspects of a relationship. Our intensive evaluations demonstrate that combining these new features with the lexicon-based ones can achieve the best accuracy for polarity prediction (~0.91). This however is not the case for strength prediction, where lexicon-based features alone are sufficient (~0.96).
Text mining, natural language processing, opinion mining, sentiment analysis, machine learning, regression