Το Trip Advisor ως σώμα δεδομένων για την ανάλυση συναισθήματος με βάση την άποψη

This thesis is about describing briefly and training several machine learning methods such as Naïve Bayes, Neural Networks, Deep Neural Networks, Support Vector Machines and Gradient Boosted Trees(GBT) in order to achieve sentiment analysis based on the aspect of real reviews that users posted to Tr...

Full description

Saved in:
Bibliographic Details
Main Author: Ξεκαλάκης, Αργύρης
Other Authors: Μαραγκουδάκης, Εμμανουήλ
Language:el_GR
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/11610/18408
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This thesis is about describing briefly and training several machine learning methods such as Naïve Bayes, Neural Networks, Deep Neural Networks, Support Vector Machines and Gradient Boosted Trees(GBT) in order to achieve sentiment analysis based on the aspect of real reviews that users posted to Tripadvisor website (www.tripadvisor.com), which were extracted with the use of Aylien API. Supervised machine learning techniques were used in all cases. Special emphasis is given to the case of deep neural networks and deep learning, which was the one that gave the best results, compared to all the other machine learning methods and algorithms that were used and to the Word2Vec model by implementing a practical example. In all cases, even in the case of the deep learning method, feature selection and extraction from each sentence was used. General description of the most feature selection methods from the bibliography is done. Additionally, short description is done of the several categories of sentiment analysis, of the several types of features and of the possible polarity values that can define a sentence based on the sentiment that is derived from it. The extraction of the sentences that derived from the hotel reviews was made with a tool called Aylien API. A Microsoft Excel file was created with respective columns that contained the link that the review came from, the text of the review, the translation of the text in the Greek language, the serial number of the review in the file and each of it’s respective sentences, the respective sentences themselves, the aspect that each sentence was addressed to and the respective sentiment polarity of each sentence; meaning if the sentence had positive, negative or neutral sentiment polarity based on the results of the Aylien API. Then, a comparison was made between the best classification results of each one of the several machine learning methods and the classification results that were derived from the Aylien API. From all the tests, which were made by using 10-fold cross validation, the best results of the combination of the feature selection choices and each machine learning algorithm were kept. The respective confusion matrices for each algorithm were created including the corresponding values of precision, recall, accuracy and F-measure. The machine learning method that the Support Vector Machines was used, had the worst results, despite the fact that optimizations were made, as far as accuracy and F-measure were concerned. In fact, those classification results were significantly lower in accuracy and F-measure values comparing to the Aylien API classification results. The Gradient Boosted Trees and Naïve Bayes gave similar results but the results of the first were slightly worse than the ones that came from the Aylien API. The Neural Networks and the Deep Learning Neural Networks produced clearly better results than the Aylien API. Finally, the best results, based on accuracy and F-measure, were achieved with the use of deep neural nets and deep learning method by using the TF-IDF method for the feature selection of the term vectors.