How to use Google Brain’s newest library to easily train and build models for Natural Language Processing

Image for post
Image for post
Trax is the newest book to the already thriving library of NLP. Image by Alfons Morales at Unsplash.

After reading this article you’ll be able to:

Machine Learning Engineers and Data Scientists already have a handful of tools…

Getting Started

A comprehensive guide into using NLP for Machine Learning

Image for post
Image for post
Simple question: How to turn text into features? Image by author

Imagine you’ve been tasked with the activity of building a Sentiment Analysis tool for your company product reviews. As a seasoned Data Scientist, you built many insights about future sale predictions and was even able to classify customers based on their purchase behavior.

But now, you’re intrigued: you have this bunch of text entries and have to turn them into features for a Machine Learning model. How can that be done? That’s a common question when Data Scientists meet text for the first time.

As simple as it may look for experienced NLP Data Scientists, turning text into features is…

Image for post
Image for post
Image by Markus Winkler from Unsplash, edited by the author.

Why, what, and how

In the last few articles we spent some time explaining and implementing some of the most important preprocessing techniques in NLP. However, we played too little with real text situations. Now it is the time to work a little with that.

We talked about Text Normalization in the article about stemming. However, stemming is not the most important (and even used) task in Text Normalization. We also went on into some other Normalization techniques earlier, such as Tokenization, Sentencizing and Lemmatization. …

Image for post
Image for post

And why

If you’re into NLP, you probably stumbled over a dozen tools that have this neat feature named “lemmatization”. In this article, I’ll do my best to guide you into what is Lemmatization, why is it useful and how can we build a Lemmatizer!

If you’re coming from my previous article onto how to make a PoS Tagger, you’ve already grasped the important prerequisites to do Lemmatization. If not, I’ll gently present them through the length of this article, so let’s get started!

What is Lemmatization?

PoS Tagging — what, when, why and how.

Time to dive a little deeper onto grammar.

In this article, following the series on NLP, we’ll understand and create a Part of Speech (PoS) Tagger. The idea is to be able to extract “hidden” information from our text and also enable future use of Lemmatization, a text normalization tool that depends on PoS tags for correction.

In this article, we’ll use some more advanced topics, such as Machine Learning algorithms and some stuff about grammar and syntax. …

An Intro to Text Normalization — let all look alike!

It is time to talk about stems.

Stems are the main body or stalk of a plant or shrub, typically rising above ground but occasionally subterranean. Well, that’s what google says, and it is right!

But here we’re going to talk about Word Stems. If you’re coming from my previous articles, you know that this is an optional step in the NLP Preprocessing Pipeline.

In this article we’ll implement the Porter Stemmer, probably the most famous algorithm for stemming out there, created by Martin Porter in 1979 (yes, it is old!). …

Understanding the underlying bones that give NLP its structure

Image for post
Image for post

So we’ve learned about the many distinct steps that a Preprocessing Pipeline can take. If you’re coming from my previous article (or a NLP class), you probably have a general idea about what is Tokenization and what does it serve for. The purpose of this article is to help better clarify what is Tokenization, how it works and, most importantly, implement a Tokenizer.

If you came from my previous article, you might also be wondering about what happened to the “Bare String Preprocessing” step. …

This article is part on a series that aims to clarify the most important details of NLP. You can refer to the main article here.

After some story, we get to see when and why to apply NLP. In this track, there’s an important concept called “preprocessing” — one that is common to any area of Data Science (you want your data to get neat and clean, right?).

But, while in numerical data you’ll usually apply some normalization rules (reduce difference between max and min values), drop and fill NaNs (that means empty values) and detect outliers (points out of…

Natural Language Processing (NLP) is probably one of the most turbulent fields today under the Computer Sciences umbrella.

Image for post
Image for post
Don’t look at me like that C-3PO! I just want to explain NLP more easily!

While it is not something new, technological advancements, new algorithms and data abundance made the possibility of getting computers to read/write and listen/speak something almost mundane (not to mention the attempts to make computers really understand what is written — which is the deal of Natural Language Understanding — NLU).

This story is the starting point for a series proposing to present the field of Natural Language Processing without being neither too math-tech-bot nor too languages-theories-worm. …

The skin

So you’re into Natural Language Processing — NLP (not to mistake with Neuro Linguistic Programming, whatever that is, it keeps appearing in my search results…).

Its 1950, the Mind Magazine. One prominent scientist (mathematician) with name Alan Turing, after a long discussion on theoretical ways to make a machine learn, wrote the following words:

it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English.[1]

And there was born NLP. Okay, not so abruptly, but the idea was there. …

Tiago Duque

A Data Scientist passionate about data and text. Trying to understand and clearly explain all important nuances of Natural Language Processing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store