Tokenization breaks up text into smaller pieces: Words, Sentences, Characters, Subwords so it is easier to digest and understand.
import nltk nltk.download('punkt_tab') from nltk.tokenize import word_tokenize, sent_tokenize
punkt_tab figures out punctuation. It helps figure out when a sentence begins and end