opennlp.tools.tokenize
Interface Tokenizer

All Known Implementing Classes:
SimpleTokenizer, Tokenizer, Tokenizer, Tokenizer, Tokenizer, TokenizerME, WhitespaceTokenizer

public interface Tokenizer

The interface for tokenizers, which turn messy text into nicely segmented text tokens.

Version:
$Revision: 1.2 $, $Date: 2004/01/26 14:16:37 $
Author:
Jason Baldridge

Method Summary
 java.lang.String[] tokenize(java.lang.String s)
          Tokenize a string.
 Span[] tokenizePos(java.lang.String s)
          Tokenize a string.
 

Method Detail

tokenize

java.lang.String[] tokenize(java.lang.String s)
Tokenize a string.

Parameters:
s - The string to be tokenized.
Returns:
The String[] with the individual tokens as the array elements.

tokenizePos

Span[] tokenizePos(java.lang.String s)
Tokenize a string.

Parameters:
s - The string to be tokenized.
Returns:
The Span[] with the spans (offsets into s) for each token as the individuals array elements.


Copyright 2008 Jason Baldridge, Gann Bierner, and Thomas Morton. All Rights Reserved.