Saved 100s of hours of manual processes when predicting game viewership when using Domo’s automated dataflow engine.
What is NLP (Natural Language Processing)?

In the alphabet soup of artificial intelligence (AI) acronyms, NLP is one of many tools that could have a huge impact on data science. NLP stands for natural language processing. It’s the process of teaching computers to comprehend and interpret human language. And it’s often something that feels like more of a pipe dream than reality, having the potential to unlock a massive trove of unstructured data for analysis.
But human language comes with more than just words. It has intent, dialect, slang, and sarcasm that are all difficult for computers to understand and translate into meaningful and consistent results. However, there are tools making strides into NLP. In this article, we’ll go over some of the key terms you need to know, how NLP works, the benefits and challenges, and how it’s used in real-world applications.
Key Terms in Natural Language Processing
There are several different ways NLP tools approach reading and understanding language. Here are some of the main terms:
- Tokenization: Segmenting text into sentences or words while removing punctuation and extra symbols.
- Stop Words Removal: Removing common words like “and,” “the,” or “a” to focus only on meaningful terms.
- Stemming: Reducing words into their root form so machines can recognize that “stop,” “stopped,” and “stopping” all mean the same thing.
- Word embeddings: Representing words as numbers where similar words share similar values.
- TF-IDF: A statistical method to measure how important a word is in a document relative to a collection of documents.
- Topic modeling: Extracting main topics from text, useful for trend detection and classification.
- Sentiment analysis: Identifying subjective information like opinions, attitudes, or emotions in text.

How does NLP work?
NLP works by combining machine learning, linguistics, and computer science to transform unstructured language into structured data. A typical NLP pipeline includes:
- Text preprocessing: Cleaning data, tokenizing, removing stop words, and standardizing formats.
- Feature extraction: Converting words into numbers using TF-IDF, embeddings, or Bag-of-Words.
- Text analysis: Using techniques like part-of-speech tagging, named entity recognition, sentiment analysis, and topic modeling.
- Model training: Training algorithms to learn from language patterns and make predictions or generate new text.
- Deployment: Applying trained models in tools like chatbots, search engines, or business intelligence dashboards.
Benefits of natural language processing
Natural Language Processing (NLP) is transforming industries by enabling computers to understand everyday language. Here’s how it’s driving innovation:
- Automation: Simplifies repetitive tasks such as document processing and customer support.
- Faster Insights: Identifies trends in reviews, surveys, and social media data with ease.
- Smarter Search: Goes beyond keywords to interpret the true intent behind queries.
- Content Generation: Supports the creation of reports, marketing materials, and more.
- Enhanced AI and ML Models: Provides clean, structured text to boost machine learning accuracy and performance.
With its wide applications, NLP is reshaping how businesses harness data and streamline operations.
What impact does NLP have on data science?
In many instances, NLP can feel like the Holy Grail of AI tools in data science–so powerful and searched for by so many data scientists, but difficult to actually find. Part of the reason NLP is desirable is that it scales across industries and departments. Every data science team deals with unstructured, free text data that has the potential to provide impactful insights.
There are two ways NLP will greatly impact data science. The first is on the query end: translating questions users ask into actionable intelligence. If any analyst or business user could type in queries and NLP tools could understand the free-text question and serve up actionable insight, it could be business-changing. Opening up data tools and insights to the unsophisticated business users to find their own insights without needing to understand the tools or data behind it. Some data science tools out there, like Domo, have chatbots already working on their platforms.
The second way NLP impacts data science is on the data ingestion side. For example, in the medical field, there is data abounding on people’s health. But the critical intersection of a person’s interaction with their medical provider can be difficult to get because so often doctors are recording notes on visits as free text. NLP tools can be a critical cog in helping get the complete data picture on a person’s health.
There are other job functions that use NLP:
- News websites can use NLP to track what users are searching for and serve up more relevant content on their front pages.
- HR teams can use NLP to filter candidate resumes and online job posting responses to better prioritize candidates beyond just how well they match keywords.
- Financial traders would benefit from real-time NLP analysis of conversations about key companies and potential trades or mergers.
From marketing teams that want to analyze brand sentiment on social media to researchers parsing through free text responses on surveys, it’s easy to imagine what data science could do with NLP tools that give them easy access to unstructured text data.

Approaches to NLP
NLP techniques have come a long way, evolving through three major stages:
- Rules-based NLP: Early NLP systems relied on predefined grammars and if-then logic rules to process language. While innovative for their time, these systems were rigid and lacked the flexibility to handle the complexities and nuances of natural language.
- Statistical NLP: The introduction of probability models revolutionized the field. Techniques like part-of-speech tagging, word sense disambiguation, and text prediction became possible by leveraging large datasets and statistical algorithms. This stage marked a significant step forward in understanding patterns in language, but still struggled with deeper context and meaning.
- Deep learning NLP: Modern NLP systems now use neural networks and transformer-based models like BERT, GPT, and other state-of-the-art architectures. These models can process vast amounts of data, understand context, and generate human-like responses. They have powered applications like chatbots, machine translation, and content generation, offering results that are more accurate, nuanced, and context-aware than ever before.
This evolution has transformed NLP into a critical tool for applications across industries, from customer service to healthcare and beyond.
Common NLP Tasks
- Named entity recognition (NER): The process of identifying specific entities in text, such as names (e.g., "John"), organizations (e.g., "Google"), and locations (e.g., "Paris"). It helps structure unstructured data by classifying these key terms into predefined categories, making the information easier to analyze and use.
- Part-of-speech tagging: Assigning grammatical roles to words in a sentence, such as identifying nouns (e.g., "dog"), verbs (e.g., "run"), adjectives (e.g., "happy"), and more, based on their context. This is essential for understanding the structure and meaning of sentences.
- Coreference resolution: Determining when different words or phrases refer to the same entity in a text. For example, in the sentences "Maria went to the store. She bought apples," coreference resolution identifies that "She" refers to "Maria." This is crucial for maintaining context and coherence in natural language understanding.
- Word sense disambiguation: Resolving the correct meaning of a word when it has multiple definitions, based on its context. For instance, the word "bank" could mean a financial institution or the side of a river, and word sense disambiguation ensures the correct interpretation is applied. This enhances the accuracy of
Best practices for natural language processing in data science.
Don’t rely on it entirely. It’s still in its infancy.
There are high-profile examples of how NLP tools show a lot of promise but still require a lot of tweaking. Google’s flu trends tool was released with much fanfare in 2009, using search data to predict the flu. But it struggled with accuracy and disappeared quickly.
Consider the ethics of NLP.
Microsoft once demonstrated that by analyzing search engine queries they could identify users who had pancreatic cancer, some before they had received a diagnosis. As with all AI, the potential for good here is mind-blowing. However, the ethics of telling someone they may or may not have a life-threatening disease have huge implications. And AI analysis of unstructured data can come with a huge error rate.
Start small.
As with all AI tools, the potential for NLP is huge. But the reality currently requires users to focus on smaller and more niche applications of the technology. Rather than looking at every part of your unstructured, free text data, train NLP tools to look in specific areas for specific things.
Challenges of NLP
Natural Language Processing (NLP) has seen remarkable progress, but significant challenges remain:
- Bias in data: Training data often reflects societal biases, which can lead to reinforcement of stereotypes in NLP models. Addressing this requires careful data curation and bias mitigation strategies.
- Ambiguity: Language is inherently complex, with words often having multiple meanings depending on the context. For example, "bank" can refer to a financial institution or the side of a river, making disambiguation a key challenge.
- Evolving vocabulary: NLP models struggle to keep up with new vocabulary, slang, and shifting grammar rules as language evolves rapidly, particularly in online and informal communication.
- Tone and sarcasm: Understanding tone, sarcasm, and subtle nuances in language is particularly difficult for machines, as they often lack the cultural and contextual understanding humans rely on.
Overcoming these hurdles is essential to improving NLP systems and ensuring they are more accurate, fair, and adaptable in real-world applications.
NLP use cases by industry
- Finance: Interpreting reports and market trends to identify opportunities or detect potential fraud.
- Healthcare: Gleaning valuable insights from patient records and medical research.
- Insurance: Simplifying claims processing and enhancing fraud detection capabilities.
- Legal: Accelerating contract analysis and streamlining case discovery processes.
Final thoughts
NLP makes it possible for machines to interact with us in our own language, powering everything from chatbots to predictive analytics. For businesses, it transforms unstructured data into structured insights, enabling smarter decisions and new efficiencies.
Organizations that invest in NLP responsibly by combining automation, governance, and domain expertise can unlock enormous value from the language data all around them.