Understanding the basics of machine learning based NLP

March 09, 2025

machine learning based NLP

In our last article we discussed the heuristic-based approach to NLP. In this article we will understand how machine learning is playing a key role on solving NLP challenges. Like all machine learning problems are applied for other data problems, NLP has a similar approach. On a broad level, we can breakdown machine learning-based NLP task into following categories.

Supervised and unsupervised learning in NLP:

First is supervised machine learning. Supervised learning enables a model to learn based on historical data. Based on this learning, the model produces an output based on data it has not seen. Machine learning techniques such as classification and regression are also used for NLP. An example of supervised learning is when an ML model can learn the classification of news topic based on analyzing the text.

Next, let's talk about unsupervised machine learning. Unsupervised learning enables a model to analyze and cluster unlabeled data. Unsupervised algorithms are given data and they can discover hidden patterns or data grouping without the additional human intervention.

An example of unsupervised learning is when machine learning model goes through unlabeled set of data and is able to categorize that into groups. NLP unsupervised problem-solving is tougher than the supervised learning. We will now look at the overall steps that are involved when leveraging an ML-based approach to NLP.

How machine learning models applies to NLP?

Let's go over the high-level process of any machine learning approach. Any ML approach for NLP, supervised or unsupervised, consists of three common steps. First is extracting features from text. Based on the desired output, we extract the required feature from text. Based on this text, the machine learning model will learn. Next is using feature representation to learn a model.

Once you have done the extraction then for model to learn we will use feature representation. The model will learn based on its ability to learn highly abstract representation of data. And finally, we'll do the evaluation and improving of the model.

Followed by the feature representation and extraction, we will evaluate the efficiency of the model and work towards improving the results. This is an iterative process and will apply in almost all scenarios where machine learning is used for NLP.

Now that we have understood the overall approach to how machine learning can be applied to natural language processing, in the next article, we will look at some of the commonly applied machine learning models. We will briefly discuss some of the commonly used supervised machine learning model used in NLP.

We will discuss this in greater detail in the following article as well. The following models are Naïve Bayes, support vector machines, Hidden Markov Models, and conditional random fields. Let's discuss each of these topics one by one.

Naïve Bayes model:

The Naïve Bayes algorithm relies on Bayes theorem to calculate the probability of a class label belonging to a set of feature in an input data. In the example of news classification, there is an assumption that the word count of domain specific labels are not correlated with one another. That is, the word count are independent.

Support vector machine:

Naïve Bayes is a great starting algorithm as it's simple to understand and fast to train and run. Next is Support Vector Machine. Support Vector Machine is one of the most popular classification algorithm which works in principle of learning a decision boundary that act as a separator between various categories of text.

In the example of news classification it could differentiate between sports and finance, and this can be done using a linear or a non-linear decision boundary. The boundary of the support vector machine should be such that the distance between point across classes is at maximum.

The strength of SVM is that the models are robust even with variations and noise in the data. It is built to handle high dimensional data or high feature data, which is common in NLP. The weakness of the SVM is that it takes lot of time to train and hence scaling when there is a large amount of training data which becomes difficult.

Hidden Markov model:

The other problem is that the results are not explainable. The next model is Hidden Markov model. Hidden Markov model is a statistical model that assumes that there is a hidden state that generates data and that only once the data is generated we can try to model the hidden states.

For instance in POS tagging which assigns part of speech to sentences, Hidden Markov models are used to tag text data in the sentence to part of speed such as subjects, verbs, objects and adverbs. The assumption made here is that each hidden state, which would be part of speech in our case here, is dependent on the previous state.

Conditional Random Fields:

The next model is Conditional Random Fields or CRF as they are properly known to have the ability to model sequential data. Conditional Random Fields outperform Hidden Markov model which rely mostly on sequential nature of language.

For the example of POS tagging, CRFs performed better than Hidden Markov model as CRFs can tag any word by classifying them individually from the pool of all POS tags. We now have a brief understanding of machine learning and how that can be used for NLP application. In the next article we'll try to understand how deep learning can be used for NLP.

W3google