0 votes
in NLP using Python by
How to tokenize a sentence using the nltk package?

1 Answer

0 votes
by

Tokenization is a process used in NLP to split a sentence into tokens. Sentence tokenization refers to splitting a text or paragraph into sentences.

For tokenizing, we will import sent_tokenize from the nltk package:

  from nltk.tokenize import sent_tokenize<>

We will use the below paragraph for sentence tokenization:

Para = “Hi Guys. Welcome to Intellipaat. This is a blog on the NLP interview questions and answers.”

  sent_tokenize(Para)

Output:

  [ 'Hi Guys.' ,

  'Welcome to Intellipaat. ',

  'This is a blog on the NLP interview questions and answers. ' ] 

Tokenizing a word refers to splitting a sentence into words.

Now, to tokenize a word, we will import word_tokenize from the nltk package.

  from nltk.tokenize import word_tokenize

Para = “Hi Guys. Welcome to Intellipaat. This is a blog on the NLP interview questions and answers.”

  word_tokenize(Para)

Output:

  [ 'Hi' , 'Guys' , ' . ' , 'Welcome' , 'to' , 'Intellipaat' , ' . ' , 'This' , 'is' ,   'a', 'blog' , 'on' , 'the' , 'NLP' , 'interview' , 'questions' , 'and' , 'answers' , ' . ' ]

...