News Classification by N-Gram and Machine Learning Algorithms


  • Department of Computer Science, College of Education for pure Sciences, University of Th-Qar1, Iraq. Imam Ja'afar Al-Sadiq University2, Iraq
  • Department of Computer Science, College of Education for pure Sciences, University of Th-Qar1, Iraq. Imam Ja'afar Al-Sadiq University2, Iraq


Multinomial Naïve Bayes, Decision Tree, N-gram


News is information obtained from different sources such as television, internet, newspapers and
magazines. Online news is published in very large numbers, and because there are so many news, it will
be challenging for users to find the pertinent information that matches their preferences. In this paper, the
news is categorized so that a specific category can be obtained quickly and easily. The BBC's newsgroup
was used in its five categories: sports, politics, business, technology and entertainment. The classification
algorithms Multinomial Naive Bayes (MNB) and decision tree (DT) were applied to the news data set
after extracting the features from it using n-gram method . Multinomial naïve Bayes algorithm has proven
superiority over decision tree with an accuracy 98.2%.


K. M. Verspoor and K. B. Cohen, “Encyclopedia of Systems Biology,” Encycl. Syst. Biol., no.

June 2018, 2013, doi: 10.1007/978-1-4419-9863-7

S. Theodoridis, Machine learning A Bayesian, vol. 53, no. 9. 2019.

N. Ortiz, R. D. Hernandez, R. Jimenez, M. Mauledeoux, and O. Aviles,

“Survey of biometric pattern recognition via machine learning techniques,” Contemp. Eng. Sci., vol.

, no. 34, pp. 1677–1694, 2018, doi: 10.12988/ces.2018.84166.

S. S. Mousavi, M. Schukat, and E. Howley, “Deep Reinforcement Learning:

An Overview,” Lect. Notes Networks Syst., vol. 16, pp. 426–440, 2018, doi: 10.1007/978-3-319-56991-


A. Ławrynowicz and V. Tresp, “Introducing machine learning,” Perspect. Ontol. Learn., vol. 18,

no. November, pp. 35–50, 2014, doi: 10.1007/978-3-


D. Liparas, Y. Hacohen-Kerner, A. Moumtzidou, S. Vrochidis, and I. Kompatsiaris, “LNCS

- News Articles Classification Using Random Forests and Weighted Multimodal Features,” 2014.

[Online]. Available:

J. Ahmed and M. Ahmed, “Online News Classification Using Machine Learning Techniques,”

IIUM Eng. J., vol. 22, no. 2, pp. 210–225, 2021, doi: 10.31436/iiumej.v22i2.1662.

Z. M. Jawad and Z. A. Khalaf, “The combination of text classification system 1,” vol. 1, no. 1,

M. M. Rahman, M. A. Z. Khan, and A. A. Biswas, “Bangla News Classification using Graph

Convolutional Networks,” Jan. 2021, doi: 10.1109/ICCCI50826.2021.9402567.

P. P. Ramadhani and S. Hadi, “Text classification on the Instagram caption using support

vector machine,” J. Phys. Conf. Ser., vol. 1722, no. 1, 2021, doi: 10.1088/1742-6596/1722/1/012023.

M. B. Khan, “Urdu News Classification using Application of Machine Learning Algorithms on

News Headline,” IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 21, no. 2, p. 229, 2021, doi:


S. Xu, Y. Li, and Z. Wang, “Bayesian multinomial naïve bayes classifier to text classification,”

Lect. Notes Electr. Eng., vol. 448, no. November, pp. 347–352, 2017, doi: 10.1007/978-981-10-5041-


H. M. Ismail, S. Harous, and B. Belkhouche, “A Comparative Analysis of Machine Learning

Classifiers for Twitter Sentiment Analysis,” Res. Comput. Sci., vol. 110, no. 1, pp. 71–83, 2016, doi:


A. Abdi, “Three types of Machine Learning Algorithms List of Common Machine Learning

Algorithms,” no. November, 2016, doi: 10.13140/RG.2.2.26209.10088.

A. Dey, “Machine Learning Algorithms: A Review,” Int. J. Comput. Sci. Inf. Technol., vol. 7,

no. 3, pp. 1174–1179, 2016, [Online]. Available:

J. Han, M. Kamber, and J. Pei, “Data Mining : Concepts and Solution Manual,” Data Min.

Concepts Tech. Solut. Man., p. 135, 2012, [Online]. Available:

techniques- solutionmanual_59894d1b1723ddd1695415f9.html.

S. Kumar Thapa and S. Pokhrel, “Nepali News Document Classification using

Global Vectors and Long Short Term Memory.”