Pronunciation dictionary manager.
A pronunciation dictionary contains a list of tokens, each one with a list
of possible pronunciations.
sppasDictPron can load the dictionary from an HTK-ASCII file. Each line of
such file looks like the following:
acted [acted] { k t e d
acted(2) [acted] { k t i d
The first columns indicates the tokens, eventually followed by the variant
number into braces. The second column (with brackets) is ignored. It should
contain the token. Other columns are the phones separated by whitespace.
sppasDictPron accepts missing variant numbers, empty brackets, or missing
brackets.
Example
>>> d = sppasDictPron('eng.dict')
>>> d.add_pron('acted', '{ k t e')
>>> d.add_pron('acted', '{ k t i')
Then, the phonetization of a token can be accessed with get_pron() method:
Example
>>> print(d.get_pron('acted'))
>>> {-k-t-e-d|{-k-t-i-d|{-k-t-e|{-k-t-i
The following convention is adopted to represent the pronunciation
variants:
- '-' separates the phones (X-SAMPA standard)
- '|' separates the variants
Notice that tokens in the dict are case-insensitive.