The use of a tokenizer is to divide a stream of text into token series where the token is taken as a subsequence of character in the text. Each newly created token will be passed through filters to add, remove or update the particular token.