Comparative Study for News Categorization by Multinomial Naïve Bayes and Support Vector Machine
DOI:
https://doi.org/10.32792/jeps.v12i2.203Keywords:
Multinomial Naive Bayes, Support Vector Machine, TF-IDF vectorizerAbstract
Online news is published in very large numbers, searches take a long time, and because of this huge
number of news articles that include different genres (such as politics, sports, business, entertainment,
technology, health, economics, real estate, art, etc) it will be difficult for users to access the important
news that It suits their inclinations and desires. In this paper, the news group of BBC News is categorized
into five categories, including (sports, business, politics, entertainment and technology). The objective of
is to classify news into its own category to help users quickly and easily access relevant news without
wasting any time through classification methods that use machine learning algorithms. The classification
algorithms Multinomial Naive Bayes and Support Vector Machine were applied to the news data set after
extracting the features from it using count vectorizer method and TF-IDF vectorizer. SVM algorithm has
proven superiority over MNB in count vectorizer with an accuracy 99.1% and TF-IDF vectorizer with
an accuracy 98.2%.
References
Y. HaCohen-Kerner, D. Miller, and Y. Yigal, “The influence of preprocessing on text
classification using a bag-of-words representation,” PLoS One, vol. 15, no. 5, May 2020, doi:
1371/journal.pone.0232525.
R. Kibble, “Introduction to natural language processing Undergraduate study in Computing and
related programmes,” Roeper Rev., vol. 1, no. 2, p. 26, 2013.
K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text
classification algorithms: A survey,” Inf., vol. 10, no. 4, pp. 1–68, 2019, doi:
3390/info10040150.
A. Wilkinson, N. Wenger, and L. R. Shugarman, “L Iterature R Eview on,” vol. 7, no. June, pp.
–57, 2007.
E. K. Jacob, “Classification and categorization: A difference that makes a difference,” Libr.
Trends, vol. 52, no. 3, 2004.
R. Sathya and A. Abraham, “Comparison of Supervised and Unsupervised Learning Algorithms
for Pattern Classification,” Int. J. Adv. Res. Artif. Intell., vol. 2, no. 2, 2013, doi:
14569/ijarai.2013.020206.
J. Ahmed and M. Ahmed, “Online News Classification Using Machine Learning Techniques,”
IIUM Eng. J., vol. 22, no. 2, pp. 210–225, 2021, doi: 10.31436/iiumej.v22i2.1662.
U. Suleymanov and S. Rustamov, “Automated News Categorization using Machine Learning
methods,” in IOP Conference Series: Materials Science and Engineering, 2018, vol. 459, no. 1,
doi: 10.1088/1757-899X/459/1/012006.
N. Dey, A. S. Ashour, and G. N. Nguyen, “Deep learning for multimedia content analysis,” Min.
Multimed. Doc., vol. 1, no. 4, pp. 193–203, 2017, doi: 10.1201/b21638.
D. Liparas, Y. Hacohen-Kerner, A. Moumtzidou, S. Vrochidis, and I. Kompatsiaris, “LNCS
- News Articles Classification Using Random Forests and Weighted Multimodal Features,”
[Online]. Available: http://www.bbc.com/news/business-25445906.
K. Thandar Nwet, “Machine Learning Algorithms for Myanmar News Classification,” Int. J. Nat.
Lang. Comput., vol. 8, no. 4, pp. 17–24, Aug. 2019, doi: 10.5121/ijnlc.2019.8402.
M. A. Ramdhani, M. A. Ramdhani, D. S. adillah Maylawati, and T. Mantoro, “Indonesian
news classification using convolutional neural network,” Indones. J. Electr. Eng. Comput. Sci., vol.
, no. 2, pp. 1000–1009, Aug. 2020, doi: 10.11591/ijeecs.v19.i2.pp1000-1009.
M. B. Khan, “Urdu News Classification using Application of Machine Learning Algorithms on
News Headline,” IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 21, no. 2, p. 229, 2021, doi:
22937/IJCSNS.2021.21.2.27.
A. Mulahuwaish, K. Gyorick, K. Z. Ghafoor, H. S. Maghdid, and D. B. Rawat, “Efficient
classification model of web news documents using machine learning algorithms for accurate
information,” Comput. Secur., vol. 98, Nov. 2020, doi: 10.1016/j.cose.2020.102006.
S. Xu, Y. Li, and Z. Wang, “Bayesian multinomial naïve bayes classifier to text classification,”
Lect. Notes Electr. Eng., vol. 448, no. November, pp. 347–352, 2017, doi: 10.1007/978-981-10-
-1_57.
H. M. Ismail, S. Harous, and B. Belkhouche, “A Comparative Analysis of Machine Learning
Classifiers for Twitter Sentiment Analysis,” Res. Comput. Sci., vol. 110, no. 1, pp. 71–83, 2016,
doi: 10.13053/rcs-110-1-6.
V. Jakkula, “Tutorial on Support Vector Machine (SVM),” Sch. EECS, Washingt. State Univ., pp.
–13, 2011, [Online]. Available: http://www.ccs.neu.edu/course/cs5100f11/resources/jakkula.pdf.
Y. Ahuja and S. Kumar Yadav, “Multiclass Classification and Support Vector Machine,” Glob.
J. Comput. Sci. Technol. Interdiscip., vol. 12, no. 11, pp. 14–19, 2012, [Online]. Available:
https://globaljournals.org/GJCST_Volume12/2-Multiclass-Classification-and.pdf.
J. Han, M. Kamber, and J. Pei, “Data Mining : Concepts and Solution Manual,” Data Min.
Concepts Tech. Solut. Man., p. 135, 2012, [Online]. Available: https://moam.info/data-miningconcepts-
and-techniques-solution-manual_59894d1b1723ddd1695415f9.html.
S. Kumar Thapa and S. Pokhrel, “Nepali News Document Classification using Global Vectors
and Long Short Term Memory.”
Downloads
Published
Issue
Section
License
Copyright Policy
Authors retain copyright of their articles published in the Journal of Education for Pure Science (JEPS).
By submitting their work, authors grant the journal a non-exclusive license to publish, distribute, and archive the article in all formats and media.
License
All articles published in JEPS are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
This license permits unrestricted use, distribution, and reproduction in any medium, provided that the original author(s) and the source are properly credited.
Author Rights
Authors have the right to:
-
Share their articles on personal websites, institutional repositories, and academic platforms
-
Reuse their work in future research and publications
-
Distribute the published version without restriction
Journal Rights
The journal retains the right to:
-
Publish and archive the articles
-
Include them in indexing and archiving systems such as LOCKSS and CLOCKSS
-
Promote and disseminate the published work
Responsibility
The contents of all articles are the sole responsibility of the authors. The journal, editors, and editorial board are not responsible for any errors, opinions, or statements expressed in the published articles.
Open Access Statement
JEPS provides immediate open access to its content, supporting the principle that making research freely available to the public enhances global knowledge exchange.
This work is licensed under a Creative Commons Attribution 4.0 International License.
https://creativecommons.org/licenses/by/4.0/