opennlp.tools.ngram
Class NGramModel

java.lang.Object
  extended by opennlp.tools.ngram.NGramModel

public class NGramModel
extends java.lang.Object

The NGramModel can be used to crate ngrams and character ngrams.

Version:
$Revision: 1.8 $, $Date: 2007/08/16 00:28:19 $
Author:
Joern Kottmann

Field Summary
protected static java.lang.String COUNT
           
 
Constructor Summary
NGramModel()
          Initializes an empty instance.
NGramModel(java.io.InputStream in)
          Initializes the current instance.
 
Method Summary
 void add(java.lang.String chars, int minLength, int maxLength)
          Adds character NGrams to the current instance.
 void add(TokenList ngram)
          Adds one NGram, if it already exists the count increase by one.
 void add(TokenList ngram, int minLength, int maxLength)
          Adds NGrams up to the specified length to the current instance.
 boolean contains(TokenList tokens)
          Checks fit he given tokens are contained by the current instance.
 void cutoff(int cutoffUnder, int cutoffOver)
          Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.
 boolean equals(java.lang.Object obj)
           
 int getCount(TokenList ngram)
          Retrives the count of the given ngram.
 int hashCode()
           
 java.util.Iterator iterator()
          Retrives an Iterator over all TokenList entires.
 int numberOfGrams()
          Retrives the total count of all Ngrams.
 void remove(TokenList tokens)
          Removes the specified tokens form the NGram model, they are just dropped.
 void serialize(java.io.OutputStream out)
          Writes the ngram instance to the given OutputStream.
 void setCount(TokenList ngram, int count)
          Sets the count of an existing ngram.
 int size()
          Retrives the number of TokenList entries in the current instance.
 Dictionary toDictionary()
           
 Dictionary toDictionary(boolean caseSensitive)
          Creates a dictionary which contains all TokenLists which are in the current NGramModel.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

COUNT

protected static final java.lang.String COUNT
See Also:
Constant Field Values
Constructor Detail

NGramModel

public NGramModel()
Initializes an empty instance.


NGramModel

public NGramModel(java.io.InputStream in)
           throws java.io.IOException,
                  InvalidFormatException
Initializes the current instance.

Parameters:
in -
Throws:
java.io.IOException
InvalidFormatException
Method Detail

getCount

public int getCount(TokenList ngram)
Retrives the count of the given ngram.

Parameters:
ngram -
Returns:
count of the ngram or 0 if it is not contained

setCount

public void setCount(TokenList ngram,
                     int count)
Sets the count of an existing ngram.

Parameters:
ngram -
count -

add

public void add(TokenList ngram)
Adds one NGram, if it already exists the count increase by one.

Parameters:
ngram -

add

public void add(TokenList ngram,
                int minLength,
                int maxLength)
Adds NGrams up to the specified length to the current instance.

Parameters:
ngram - the tokens to build the uni-grams, bi-grams, tri-grams, .. from.
minLength - - minimal length
maxLength - - maximal length

add

public void add(java.lang.String chars,
                int minLength,
                int maxLength)
Adds character NGrams to the current instance.

Parameters:
chars -
minLength -
maxLength -

remove

public void remove(TokenList tokens)
Removes the specified tokens form the NGram model, they are just dropped.

Parameters:
tokens -

contains

public boolean contains(TokenList tokens)
Checks fit he given tokens are contained by the current instance.

Parameters:
tokens -
Returns:
true if the ngram is contained

size

public int size()
Retrives the number of TokenList entries in the current instance.

Returns:
number of different grams

iterator

public java.util.Iterator iterator()
Retrives an Iterator over all TokenList entires.

Returns:
iterator over all grams

numberOfGrams

public int numberOfGrams()
Retrives the total count of all Ngrams.

Returns:
total count of all ngrams

cutoff

public void cutoff(int cutoffUnder,
                   int cutoffOver)
Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.

Parameters:
cutoffUnder -
cutoffOver -

toDictionary

public Dictionary toDictionary()

toDictionary

public Dictionary toDictionary(boolean caseSensitive)
Creates a dictionary which contains all TokenLists which are in the current NGramModel.

Parameters:
caseSensitive - Specifies whether case distinctions should be kept in the creation of the dictionary.
Returns:
the new dictionary

serialize

public void serialize(java.io.OutputStream out)
               throws java.io.IOException
Writes the ngram instance to the given OutputStream.

Parameters:
out -
Throws:
java.io.IOException - if an I/O Error during writing occures

equals

public boolean equals(java.lang.Object obj)
Overrides:
equals in class java.lang.Object

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object


Copyright 2008 Jason Baldridge, Gann Bierner, and Thomas Morton. All Rights Reserved.