CE 18.05


Spammer Detection on Social Networks


Lijie Zhou

Oral Defence Date: 

Wednesday, May 9, 2018 - 13:00


TH 434


Asst. Prof. Hao Yue and Assoc. Prof. Hui Yang


Twitter spam has been a challenging and critical problem. Previously, researchers have proposed many different machine learning based methods to address this problem. One limitation of those existing solutions is that there is often not enough labeled data. Another issue is feature selection. There is no universal standard on feature selection for spam detection. Researchers often manually choose features based on individual experience and observation. However, it is very easy to bring bias to the model and cause overfitting problem. In this project, we propose a deep learning based approach to address these two problems. First, we apply self-taught learning (STL) to learn features from the unlabeled data. We use the unsupervised feature learning to train our sparse auto-encoder and obtain the trained parameter set. Then, instead of feature selection, we use the trained sparse auto-encoder for feature compression. We then link the auto-encoder to the softmax classifier to classify spammers and non-spammer accounts. We compare our strategy with the traditional machine learning based method with feature selection and our method outperforms the latter by 6% in terms of accuracy. Our project is innovative in applying self-taught learning (STL) and stacked auto-encoder on unlabeled dataset in the spam detection realm.


security, spam detection, deep learning, sparse auto-encoder, self taught learning