Comparative Study for News Categorization by Multinomial Naïve Bayes and Support Vector Machine

المؤلفون

  • Melad Farawn Ogaib
  • Kadhim Mahdi Hashim

DOI:

https://doi.org/10.32792/jeps.v12i2.203

الكلمات المفتاحية:

Multinomial Naive Bayes، Support Vector Machine، TF-IDF vectorizer

الملخص

Online news is published in very large numbers, searches take a long time, and because of this huge
number of news articles that include different genres (such as politics, sports, business, entertainment,
technology, health, economics, real estate, art, etc) it will be difficult for users to access the important
news that It suits their inclinations and desires. In this paper, the news group of BBC News is categorized
into five categories, including (sports, business, politics, entertainment and technology). The objective of
is to classify news into its own category to help users quickly and easily access relevant news without
wasting any time through classification methods that use machine learning algorithms. The classification
algorithms Multinomial Naive Bayes and Support Vector Machine were applied to the news data set after
extracting the features from it using count vectorizer method and TF-IDF vectorizer. SVM algorithm has
proven superiority over MNB in count vectorizer with an accuracy 99.1% and TF-IDF vectorizer with
an accuracy 98.2%.

المراجع

Y. HaCohen-Kerner, D. Miller, and Y. Yigal, “The influence of preprocessing on text

classification using a bag-of-words representation,” PLoS One, vol. 15, no. 5, May 2020, doi:

1371/journal.pone.0232525.

R. Kibble, “Introduction to natural language processing Undergraduate study in Computing and

related programmes,” Roeper Rev., vol. 1, no. 2, p. 26, 2013.

K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text

classification algorithms: A survey,” Inf., vol. 10, no. 4, pp. 1–68, 2019, doi:

3390/info10040150.

A. Wilkinson, N. Wenger, and L. R. Shugarman, “L Iterature R Eview on,” vol. 7, no. June, pp.

–57, 2007.

E. K. Jacob, “Classification and categorization: A difference that makes a difference,” Libr.

Trends, vol. 52, no. 3, 2004.

R. Sathya and A. Abraham, “Comparison of Supervised and Unsupervised Learning Algorithms

for Pattern Classification,” Int. J. Adv. Res. Artif. Intell., vol. 2, no. 2, 2013, doi:

14569/ijarai.2013.020206.

J. Ahmed and M. Ahmed, “Online News Classification Using Machine Learning Techniques,”

IIUM Eng. J., vol. 22, no. 2, pp. 210–225, 2021, doi: 10.31436/iiumej.v22i2.1662.

U. Suleymanov and S. Rustamov, “Automated News Categorization using Machine Learning

methods,” in IOP Conference Series: Materials Science and Engineering, 2018, vol. 459, no. 1,

doi: 10.1088/1757-899X/459/1/012006.

N. Dey, A. S. Ashour, and G. N. Nguyen, “Deep learning for multimedia content analysis,” Min.

Multimed. Doc., vol. 1, no. 4, pp. 193–203, 2017, doi: 10.1201/b21638.

D. Liparas, Y. Hacohen-Kerner, A. Moumtzidou, S. Vrochidis, and I. Kompatsiaris, “LNCS

- News Articles Classification Using Random Forests and Weighted Multimodal Features,”

[Online]. Available: http://www.bbc.com/news/business-25445906.

K. Thandar Nwet, “Machine Learning Algorithms for Myanmar News Classification,” Int. J. Nat.

Lang. Comput., vol. 8, no. 4, pp. 17–24, Aug. 2019, doi: 10.5121/ijnlc.2019.8402.

M. A. Ramdhani, M. A. Ramdhani, D. S. adillah Maylawati, and T. Mantoro, “Indonesian

news classification using convolutional neural network,” Indones. J. Electr. Eng. Comput. Sci., vol.

, no. 2, pp. 1000–1009, Aug. 2020, doi: 10.11591/ijeecs.v19.i2.pp1000-1009.

M. B. Khan, “Urdu News Classification using Application of Machine Learning Algorithms on

News Headline,” IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 21, no. 2, p. 229, 2021, doi:

22937/IJCSNS.2021.21.2.27.

A. Mulahuwaish, K. Gyorick, K. Z. Ghafoor, H. S. Maghdid, and D. B. Rawat, “Efficient

classification model of web news documents using machine learning algorithms for accurate

information,” Comput. Secur., vol. 98, Nov. 2020, doi: 10.1016/j.cose.2020.102006.

S. Xu, Y. Li, and Z. Wang, “Bayesian multinomial naïve bayes classifier to text classification,”

Lect. Notes Electr. Eng., vol. 448, no. November, pp. 347–352, 2017, doi: 10.1007/978-981-10-

-1_57.

H. M. Ismail, S. Harous, and B. Belkhouche, “A Comparative Analysis of Machine Learning

Classifiers for Twitter Sentiment Analysis,” Res. Comput. Sci., vol. 110, no. 1, pp. 71–83, 2016,

doi: 10.13053/rcs-110-1-6.

V. Jakkula, “Tutorial on Support Vector Machine (SVM),” Sch. EECS, Washingt. State Univ., pp.

–13, 2011, [Online]. Available: http://www.ccs.neu.edu/course/cs5100f11/resources/jakkula.pdf.

Y. Ahuja and S. Kumar Yadav, “Multiclass Classification and Support Vector Machine,” Glob.

J. Comput. Sci. Technol. Interdiscip., vol. 12, no. 11, pp. 14–19, 2012, [Online]. Available:

https://globaljournals.org/GJCST_Volume12/2-Multiclass-Classification-and.pdf.

J. Han, M. Kamber, and J. Pei, “Data Mining : Concepts and Solution Manual,” Data Min.

Concepts Tech. Solut. Man., p. 135, 2012, [Online]. Available: https://moam.info/data-miningconcepts-

and-techniques-solution-manual_59894d1b1723ddd1695415f9.html.

S. Kumar Thapa and S. Pokhrel, “Nepali News Document Classification using Global Vectors

and Long Short Term Memory.”

التنزيلات

منشور

2023-02-14