Package word_completion :: Module word_collection :: Class WordCollection
[hide private]
[frames] | no frames]

Class WordCollection

source code

                   object --+    
                            |    
ternarytree.TernarySearchTree --+
                                |
                               WordCollection
Known Subclasses:

Word lookup based on a Patricia tree (a.k.a. Radix Tree, a.k.a. Trie data structure). This data structure is efficiently searchable by the prefix of words. Such a prefix search takes a string prefix, and returns all dictionary words that begin with that prefix.

This class ingests rank/word pair files in a given directory. The ranks are intended to be relative usage frequencies. The class manages these frequency ranks.

Public methods:

Instance Methods [hide private]
 
__init__(self, dictDir=None, userDictFilePath=None)
Keep track of a Python dict mapping from word to its frequency rank, of the total number of entries, and the number of word files ingested from disk.
source code
 
createDictStructureFromFiles(self)
Goes through the self.dictDir directory on disk, and reads all the files there.
source code
 
addToUserDict(self, newWord, rankInt=0)
Given a word, checks whether the word is already in the in-memory dictionary.
source code
 
insert(self, word, rankInt=None)
Insert one word into the word collection.
source code
 
rank(self, word)
Return the frequency rank of the given word in the collection.
source code
list
prefix_search(self, word, cutoffRank=None)
Returns all dictionary entries that begin with the string word.
source code
 
startsWith(self, word, prefix)
True if word starts with, or is equal to prefix.
source code
 
__len__(self)
Return number of words in the collection.
source code

Inherited from ternarytree.TernarySearchTree: __new__, add, contains

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  DEFAULT_USER_DICT_FILE_NAME = 'dictUserRankAndWord.txt'
  USER_DICT_FILE_PATH = None
Properties [hide private]

Inherited from ternarytree.TernarySearchTree: root, size

Inherited from object: __class__

Method Details [hide private]

__init__(self, dictDir=None, userDictFilePath=None)
(Constructor)

source code 

Keep track of a Python dict mapping from word to its frequency rank, of the total number of entries, and the number of word files ingested from disk.

Parameters:
  • dictDir (string) - full path to directory that contains the dictionary files. If None, a built-in dictionary of 6000 words is used.
  • userDictFilePath (string) - full path to within a user dictionary. That file must be organized like the other dictionary files.
Overrides: object.__init__

createDictStructureFromFiles(self)

source code 

Goes through the self.dictDir directory on disk, and reads all the files there. Each file must be a list of whitespace-separated frequency-rank / word pairs. Assumes that self.dictDir is set to directory of dictionary files.

Raises:
  • ValueError - if a rank in any of the files cannot be read as an integer.

addToUserDict(self, newWord, rankInt=0)

source code 

Given a word, checks whether the word is already in the in-memory dictionary. If so, does nothing and returns False; Else appends the word to dict_files/dictUserRankAndWord.txt with the provided rank; then returns True

Parameters:
  • newWord (string) - word to be added to the user dictionary.
  • rankInt (int) - frequency rank of the word. Rank 0 is most important; 1 is second-most important, etc. OK to have ties.

insert(self, word, rankInt=None)

source code 

Insert one word into the word collection.

Parameters:
  • word (string) - word to insert.
  • rankInt (int) - Optionally the frequency rank of the word. If None, no rank is recorded, and subsequent calls to the rank() method will fail.
Raises:
  • ValueError - if word is not valid or empty.

rank(self, word)

source code 

Return the frequency rank of the given word in the collection. I is an error to request the rank of a word that is not in the collection, or of a word whose rank was never specified in an ingestion file or as part of an insert() call.

Parameters:
  • word (string) - the word whose frequency rank is requested.
Raises:
  • KeyError - if word or rank are not present in the word collection.

prefix_search(self, word, cutoffRank=None)

source code 

Returns all dictionary entries that begin with the string word. If the optional cutoffRank is specified, it limits the length of the returned list to include only the top cutoffRank words. Example, if cutoffRank=5, only the five most highly ranked dictionary entries are returned. Also, if cutoffRank is specified, the returned list is sorted by decreasing word rank. If cutoffRank is not specified, or is None, the returned list is unsorted.

Parameters:
  • word (string.) - prefix to search by.
  • cutoffRank (int) - Number of most highly ranked dictionary entries to return in rank-sorted order.
Returns: list
Overrides: ternarytree.TernarySearchTree.prefix_search

startsWith(self, word, prefix)

source code 

True if word starts with, or is equal to prefix. Else False.

Parameters:
  • word (string.) - word to examine.
  • prefix (string) - string that word is required to start with for a return of True