Package word_completion :: Module word_collection :: Class TelPadEncodedWordCollection
[hide private]
[frames] | no frames]

Class TelPadEncodedWordCollection

source code

                   object --+        
                            |        
ternarytree.TernarySearchTree --+    
                                |    
                   WordCollection --+
                                    |
                                   TelPadEncodedWordCollection

Instances behave as the superclass WordCollection. However, all added words are encoded as if entered via a telephone pad. Each letter group of the telephone pad is represented by its first letter. Example: "and" --> "amd" (phone buttons 1,5,2). <p> The class can thus be used to search words by entering for each real word's letters 'c' the first letter of the telephone pad that contains 'c'. For word input, clients need not concern themselves with this encoding. That transformation occurs automatically. <p> However, calls to search_prefix() or contains() must encode the real words with the encoded version. Thus, instead of calling myColl.contains("and"), the client would call myColl.contains("amd"). Method encodeWord() takes a real word and returns the encoded version. <p> Method search_prefix() will usually contain a larger number of 'remaining possible words' than a regular WordCollection. This is because the mapping from encoded to real words is one-to-many.

Instance Methods [hide private]
 
__init__(self)
Maintain a data structure that maps each encoded word to all the possible equivalent real words.
source code
list
prefix_search(self, encWord)
Prefix search operates as for the WordCollection superclass, but takes as input a telephone pad encoded prefix.
source code
 
encodeTelPadLabel(self, label)
Given a string label as seen on the JBoard button pad, return the single letter that represents the group of label chars in this class.
source code
 
decodeTelPadLabel(self, encLetter)
Given the encoding of a button label, return the original label.
source code
 
encodeWord(self, word)
Given a real word, return its telephone pad encoded equivalent.
source code
 
insert(self, newRealWord, newRankInt)
Takes a real, that is unencoded word, encodes it, and inserts it into the (in-memory) tree.
source code
 
addToUserDict(self, newRealWord, rankInt=0)
Given an unencoded word, checks whether the word is already in the in-memory dictionary.
source code

Inherited from WordCollection: __len__, createDictStructureFromFiles, rank, startsWith

Inherited from ternarytree.TernarySearchTree: __new__, add, contains

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  symbolToEnc = {'ABC': 'a', 'DEF': 'd', 'GHI': 'g', 'JKL': 'j',...
  encToSymbol = {'a': 'ABC', 'd': 'DEF', 'g': 'GHI', 'j': 'JKL',...
  alphabet = {'a': 'a', 'b': 'a', 'c': 'a', 'd': 'd', 'e': 'd', ...

Inherited from WordCollection: DEFAULT_USER_DICT_FILE_NAME, USER_DICT_FILE_PATH

Properties [hide private]

Inherited from ternarytree.TernarySearchTree: root, size

Inherited from object: __class__

Method Details [hide private]

__init__(self)
(Constructor)

source code 

Maintain a data structure that maps each encoded word to all the possible equivalent real words. We call these multiple words 'collisions.'

Parameters:
  • dictDir - full path to directory that contains the dictionary files. If None, a built-in dictionary of 6000 words is used.
  • userDictFilePath - full path to within a user dictionary. That file must be organized like the other dictionary files.
Overrides: object.__init__

prefix_search(self, encWord)

source code 

Prefix search operates as for the WordCollection superclass, but takes as input a telephone pad encoded prefix. Returns an array of all real words that could complete the given prefix.

Parameters:
  • encWord (string) - the encoded prefix
Returns: list
Raises:
  • ValueError - if the mapping from encoded words to collisions is corrupted. Never caused by caller.
Overrides: ternarytree.TernarySearchTree.prefix_search

encodeTelPadLabel(self, label)

source code 

Given a string label as seen on the JBoard button pad, return the single letter that represents the group of label chars in this class. Ex: "ABC" returns symbolToEnc["ABC"] == 'a'.

Parameters:
  • label (string) - Group of chars on a JBoard (more or less telephone pad):
Raises:
  • KeyError - if passed-in button label is not a true button label.

decodeTelPadLabel(self, encLetter)

source code 

Given the encoding of a button label, return the original label. Ex.: 'a' ==> 'ABC', 's' ==> 'STUV'

Parameters:
  • encLetter (string) - label encoding.
Raises:
  • KeyError - if passed-in letter is not an encoded label.

encodeWord(self, word)

source code 

Given a real word, return its telephone pad encoded equivalent.

Parameters:
  • word (string) - the real word to encode.
Returns:
the encoded equivalent string.

insert(self, newRealWord, newRankInt)

source code 

Takes a real, that is unencoded word, encodes it, and inserts it into the (in-memory) tree. Updates the mapping from encoded words to their collisions.

Parameters:
  • newRealWord (string) - the unencoded word to insert.
  • newRankInt (int) - the new word's frequency rank.
Raises:
  • ValueError - if the encoded-word to collisions data structure is corrupted. Not caused by caller.
Overrides: WordCollection.insert

addToUserDict(self, newRealWord, rankInt=0)

source code 

Given an unencoded word, checks whether the word is already in the in-memory dictionary. If so, does nothing and returns False; Else appends the word to dict_files/dictUserRankAndWord.txt with the provided rank; then returns True

Parameters:
  • newRealWord (string) - word to be added to the user dictionary.
  • rankInt - frequency rank of the word. Rank 0 is most important; 1 is second-most important, etc. OK to have ties.
Overrides: WordCollection.addToUserDict

Class Variable Details [hide private]

symbolToEnc

Value:
{'ABC': 'a',
 'DEF': 'd',
 'GHI': 'g',
 'JKL': 'j',
 'MNO': 'm',
 'PQR': 'p',
 'STUV': 's',
 'WXYZ': 'w'}

encToSymbol

Value:
{'a': 'ABC',
 'd': 'DEF',
 'g': 'GHI',
 'j': 'JKL',
 'm': 'MNO',
 'p': 'PQR',
 's': 'STUV',
 'w': 'WXYZ'}

alphabet

Value:
{'a': 'a',
 'b': 'a',
 'c': 'a',
 'd': 'd',
 'e': 'd',
 'f': 'd',
 'g': 'g',
 'h': 'g',
...