-
Notifications
You must be signed in to change notification settings - Fork 1
/
Paper 2
42 lines (28 loc) · 3.59 KB
/
Paper 2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Paper link- https://www.sciencedirect.com/science/article/pii/S0925231216305604
Learning algorithms for text classification require human experts to organize and label documents.
This paper presents a methodology using Self-Organizing Maps (SOM) and automatic clustering using Correlation Coef.
These clusters are used as labels for training the Support Vector Machine (SVM). Experiments are presented to verify the accuracy of the proposed scheme,
and results show that the proposed combination has better accuracy compared to training the learning machine using expert knowledge.
In certain situations, training a text classifier using well-organized and labeled documents can be expensive due to limited expert knowledge. Unsupervised learning schemes are often used,
but these techniques suffer from inaccuracies due to the Curse of Dimensionality (COD),
which affects matching accuracy. The Euclidean distance formula's "averaging" action dilutes the effect of individual features,
making it difficult to differentiate between classes with different dominant features. To achieve proper classification,
a training scheme free from expert knowledge is required, which is immune to the COD.
This paper proposes a technique using unsupervised approaches like Self-Organizing Maps (SOM) and Correlation Coefficient (CorrCoef) to group unlabeled text documents and use them as a machine-labeled training set for the Support Vector Machine (SVM).
This method eliminates the effect of Computational Overload (COD) and the need for human experts to label all training documents.
SVM is widely used in text classification applications,
but the tedious task of organizing and labeling training documents has high time and cost consumption.
Unsupervised clustering approaches replace human involvement in this task,
allowing for a hybrid model that inherits excellent classification performance from supervised SVM while immuneing the system to COD.
This hybrid model offers an unsupervised classification approach that can be used in text classification applications.
The paper examines hybrid classification systems, SOM-SVM and CorrCoef-SVM, to understand their algorithms, properties, and characteristics. The proposed approach is presented, describing its structure and algorithm.
The prototype hybrid classifiers were tested using three benchmark datasets of text classification.
The performance of the proposed approaches was measured by obtaining the accuracy of classification tasks on each dataset with different parameters applied to the classifiers.
Accuracy was determined using a test dataset, which was manually examined for conformity to train data features.
The automatically labeled data was used to train the SVM, and experiments were conducted to investigate the effects of varying the Euclidean distance threshold.
The results showed that different clusters produced by the SOM and CorrCoef approaches directly and significantly impacted the SVM's performance.
Support vector machine (SVM) classification has been a dominant supervised machine learning technique for the past decade due to its outstanding classification accuracy.
Its generalization ability is attributed to the implementation of the Structural Risk Minimization (SRM) principle,
which focuses on finding an optimal separating hyper-plane for the lowest classification error.
SVM only considers support vectors in determining this hyper-plane, allowing for representation of support vectors for a set of training vectors.
This is illustrated with an illustration of the optimal separating hyper plane, parallel hyper planes, and support vectors in the vector space.