Comparative Study for News Categorization by Multinomial Naïve Bayes and Support Vector Machine
Keywords:Multinomial Naive Bayes, Support Vector Machine, TF-IDF vectorizer
Online news is published in very large numbers, searches take a long time, and because of this huge
number of news articles that include different genres (such as politics, sports, business, entertainment,
technology, health, economics, real estate, art, etc) it will be difficult for users to access the important
news that It suits their inclinations and desires. In this paper, the news group of BBC News is categorized
into five categories, including (sports, business, politics, entertainment and technology). The objective of
is to classify news into its own category to help users quickly and easily access relevant news without
wasting any time through classification methods that use machine learning algorithms. The classification
algorithms Multinomial Naive Bayes and Support Vector Machine were applied to the news data set after
extracting the features from it using count vectorizer method and TF-IDF vectorizer. SVM algorithm has
proven superiority over MNB in count vectorizer with an accuracy 99.1% and TF-IDF vectorizer with
an accuracy 98.2%.
Y. HaCohen-Kerner, D. Miller, and Y. Yigal, “The influence of preprocessing on text
classification using a bag-of-words representation,” PLoS One, vol. 15, no. 5, May 2020, doi:
R. Kibble, “Introduction to natural language processing Undergraduate study in Computing and
related programmes,” Roeper Rev., vol. 1, no. 2, p. 26, 2013.
K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text
classification algorithms: A survey,” Inf., vol. 10, no. 4, pp. 1–68, 2019, doi:
A. Wilkinson, N. Wenger, and L. R. Shugarman, “L Iterature R Eview on,” vol. 7, no. June, pp.
E. K. Jacob, “Classification and categorization: A difference that makes a difference,” Libr.
Trends, vol. 52, no. 3, 2004.
R. Sathya and A. Abraham, “Comparison of Supervised and Unsupervised Learning Algorithms
for Pattern Classification,” Int. J. Adv. Res. Artif. Intell., vol. 2, no. 2, 2013, doi:
J. Ahmed and M. Ahmed, “Online News Classification Using Machine Learning Techniques,”
IIUM Eng. J., vol. 22, no. 2, pp. 210–225, 2021, doi: 10.31436/iiumej.v22i2.1662.
U. Suleymanov and S. Rustamov, “Automated News Categorization using Machine Learning
methods,” in IOP Conference Series: Materials Science and Engineering, 2018, vol. 459, no. 1,
N. Dey, A. S. Ashour, and G. N. Nguyen, “Deep learning for multimedia content analysis,” Min.
Multimed. Doc., vol. 1, no. 4, pp. 193–203, 2017, doi: 10.1201/b21638.
D. Liparas, Y. Hacohen-Kerner, A. Moumtzidou, S. Vrochidis, and I. Kompatsiaris, “LNCS
- News Articles Classification Using Random Forests and Weighted Multimodal Features,”
[Online]. Available: http://www.bbc.com/news/business-25445906.
K. Thandar Nwet, “Machine Learning Algorithms for Myanmar News Classification,” Int. J. Nat.
Lang. Comput., vol. 8, no. 4, pp. 17–24, Aug. 2019, doi: 10.5121/ijnlc.2019.8402.
M. A. Ramdhani, M. A. Ramdhani, D. S. adillah Maylawati, and T. Mantoro, “Indonesian
news classification using convolutional neural network,” Indones. J. Electr. Eng. Comput. Sci., vol.
, no. 2, pp. 1000–1009, Aug. 2020, doi: 10.11591/ijeecs.v19.i2.pp1000-1009.
M. B. Khan, “Urdu News Classification using Application of Machine Learning Algorithms on
News Headline,” IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 21, no. 2, p. 229, 2021, doi:
A. Mulahuwaish, K. Gyorick, K. Z. Ghafoor, H. S. Maghdid, and D. B. Rawat, “Efficient
classification model of web news documents using machine learning algorithms for accurate
information,” Comput. Secur., vol. 98, Nov. 2020, doi: 10.1016/j.cose.2020.102006.
S. Xu, Y. Li, and Z. Wang, “Bayesian multinomial naïve bayes classifier to text classification,”
Lect. Notes Electr. Eng., vol. 448, no. November, pp. 347–352, 2017, doi: 10.1007/978-981-10-
H. M. Ismail, S. Harous, and B. Belkhouche, “A Comparative Analysis of Machine Learning
Classifiers for Twitter Sentiment Analysis,” Res. Comput. Sci., vol. 110, no. 1, pp. 71–83, 2016,
V. Jakkula, “Tutorial on Support Vector Machine (SVM),” Sch. EECS, Washingt. State Univ., pp.
–13, 2011, [Online]. Available: http://www.ccs.neu.edu/course/cs5100f11/resources/jakkula.pdf.
Y. Ahuja and S. Kumar Yadav, “Multiclass Classification and Support Vector Machine,” Glob.
J. Comput. Sci. Technol. Interdiscip., vol. 12, no. 11, pp. 14–19, 2012, [Online]. Available:
J. Han, M. Kamber, and J. Pei, “Data Mining : Concepts and Solution Manual,” Data Min.
Concepts Tech. Solut. Man., p. 135, 2012, [Online]. Available: https://moam.info/data-miningconcepts-
S. Kumar Thapa and S. Pokhrel, “Nepali News Document Classification using Global Vectors
and Long Short Term Memory.”
The Authors submitting a manuscript do so on the understanding that if accepted for publication, copyright of the article shall be assigned to Journal of education for Pure Science (Jeds), University of Thi-Qar as publisher of the journal.
Copyright encompasses exclusive rights to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms and any other similar reproductions, as well as translations. The reproduction of any part of this journal, its storage in databases and its transmission by any form or media, such as electronic, electrostatic and mechanical copies, photocopies, recordings, magnetic media, etc. , will be allowed only with a written permission from Journal of education for Pure Science (Jeds), University of Thi-Qar.
Journal of education for Pure Science (Jeds), University of Thi-Qar, the Editors and the Advisory International Editorial Board make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in the Journal of education for Pure Science (Jeds), University of Thi-Qar are sole and exclusive responsibility of their respective authors and advertisers.