How to tokenize a sentence using the nltk package?

Question

How to tokenize a sentence using the nltk package?

1 Answer

john ganales · Answer 1 · 2023-05-08T22:43:13+0000

Tokenization is a process used in NLP to split a sentence into tokens. Sentence tokenization refers to splitting a text or paragraph into sentences.

For tokenizing, we will import sent_tokenize from the nltk package:

from nltk.tokenize import sent_tokenize<>

We will use the below paragraph for sentence tokenization:

Para = “Hi Guys. Welcome to Intellipaat. This is a blog on the NLP interview questions and answers.”

sent_tokenize(Para)

Output:

[ 'Hi Guys.' ,

'Welcome to Intellipaat. ',

'This is a blog on the NLP interview questions and answers. ' ]

Tokenizing a word refers to splitting a sentence into words.

Now, to tokenize a word, we will import word_tokenize from the nltk package.

from nltk.tokenize import word_tokenize

Para = “Hi Guys. Welcome to Intellipaat. This is a blog on the NLP interview questions and answers.”

word_tokenize(Para)

Output:

[ 'Hi' , 'Guys' , ' . ' , 'Welcome' , 'to' , 'Intellipaat' , ' . ' , 'This' , 'is' , 'a', 'blog' , 'on' , 'the' , 'NLP' , 'interview' , 'questions' , 'and' , 'answers' , ' . ' ]

How to tokenize a sentence using the nltk package?

Please log in or register to answer this question.

1 Answer