What is Tokenization

Tokenization breaks up text into smaller pieces:  Words, Sentences, Characters, Subwords so it is easier to digest and understand.

import nltk
nltk.download('punkt_tab')
from nltk.tokenize import word_tokenize, sent_tokenize

punkt_tab figures out punctuation.  It helps figure out when a sentence begins and end