Skip to content

HfstTransducer

eaxelson edited this page Aug 29, 2017 · 25 revisions

class HfstTransducer

A synchronous finite-state transducer.

Argument handling

Transducer functions modify their calling object and return a reference to the calling object after modification, unless otherwise mentioned. Transducer arguments are usually not modified.

# transducer is reversed
transducer.reverse()
# transducer2 is not modified, but a copy of it is disjuncted with
# transducer1
transducer1.disjunct(transducer2)
# a chain of functions is possible
transducer.reverse().determinize().reverse().determinize()

Implementation types

Currently, an HfstTransducer has three implementation types that are well supported. When an HfstTransducer is created, its type is defined with an argument. For functions that take a transducer as an argument, the type of the calling transducer must be the same as the type of the argument transducer:

# this will cause a TransducerTypeMismatchException:
tropical_transducer.disjunct(foma_transducer)
# this works, but weights are lost in the conversion
tropical_transducer.convert(hfst.ImplementationType.SFST_TYPE).disjunct(sfst_transducer)
# this works, information is not lost
tropical_transducer.disjunct(sfst_transducer.convert(hfst.ImplementationType.TROPICAL_OPENFST_TYPE))

Creating transducers

With HfstTransducer constructors it is possible to create empty, epsilon, one-transition and single-path transducers. Transducers can also be created from scratch with #hfst.HfstBasicTransducer and converted to an HfstTransducer. More complex transducers can be combined from simple ones with various functions. class HfstTransducer:

Whether HFST is linked to the transducer library needed by implementation type \a type. is_implementation_type_available(type):

init(self)

Create an empty transducer.

tr = hfst.HfstTransducer()
assert(tr.compare(hfst.empty_fst()))

init(self, another)

Create a deep copy of HfstTransducer \a another or a transducer equivalent to HfstBasicTransducer \a another. @param another An HfstTransducer or HfstBasicTransducer.

An example:

tr1 = hfst.regex('foo bar foo')
tr2 = hfst.HfstTransducer(tr1)
tr2.substitute('foo','FOO')
tr1.concatenate(tr2)

init(self, t, type)

Create an HFST transducer equivalent to HfstBasicTransducer \a t. The type of the created transducer is defined by \a type. @param t An HfstBasicTransducer. @param type The type of the resulting transducer. If you want to use the default type, you can just call hfst.HfstTransducer(fsm)

copy(self)

Return a deep copy of the transducer.

tr = hfst.regex('[foo:bar::0.3]*')
TR = tr.copy()
assert(tr.compare(TR))

set_name(self, name)

Rename the transducer \a name. @param name The name of the transducer. @see #get_name

get_name(self)

Get the name of the transducer. @see #set_name

str(self)

An AT&T representation of the transducer. Defined for print command. An example:

>>> print(hfst.regex('[foo:bar::2]+'))
0       1       foo     bar     2.000000
1       1       foo     bar     2.000000
1       0.000000
@todo Works only for small transducers.

prune(self)

Make transducer coaccessible. A transducer is coaccessible iff there is a path from every state to a final state.

set_property(self, property, value)

Set arbitrary string property \a property to \a value. @param property A string naming the property. @param value A string expressing the value of \a property.

set_property('name', 'name of the transducer') equals set_name('name of the transducer').

@note While this function is capable of creating endless amounts of arbitrary metadata, it is suggested that property names are drawn from central repository, or prefixed with "x-". A property that does not follow this convention may affect the behavior of transducer in future releases.

get_property(self, property)

Get arbitrary string propert @a property. @param property The name of the property whose value is returned. get_property('name') works like get_name().

get_properties(self)

Get all properties from the transducer. @return A dictionary whose keys are properties and whose values are the values of those properties.

get_alphabet(self)

Get the alphabet of the transducer.

The alphabet is defined as the set of symbols known to the transducer. @return A tuple of strings.

insert_to_alphabet(self, symbol)

Explicitly insert \a symbol to the alphabet of the transducer. @param symbol The symbol (string) to be inserted.

@note Usually this function is not needed since new symbols are added to the alphabet by default.

remove_from_alphabet(self, symbol)

Remove \a symbol from the alphabet of the transducer. @param symbol The symbol (string) to be removed.

@pre \a symbol does not occur in any transition of the transducer. @note Use with care, removing a symbol that occurs in a transition of the transducer can have unexpected results.

eliminate_flag(self, symbol)

Eliminate flag diacritic \a symbol from the transducer. @param symbol The flag to be eliminated. TODO: explain more.

An equivalent transducer with no flags \a symbol.

eliminate_flags(self, symbols)

Eliminate flag diacritics listed in \a symbols from the transducer. @param symbols The flags to be eliminated. TODO: explain more.

An equivalent transducer with no flags listed in \a symbols.

is_automaton(self)

Whether each transition in the transducer has equivalent input and output symbols. @note Transition with hfst.UNKNOWN on both sides IS NOT a transition with equivalent input and output symbols. @note Transition with hfst.IDENTITY on both sides IS a transition with equivalent input and output symbols.

is_cyclic(self)

Whether the transducer is cyclic.

get_type(self)

The implementation type of the transducer. @return hfst.ImplementationType

compare(self, another)

Whether this transducer and \a another are equivalent. @param another The compared transducer. @pre \a self and \a another must have the same implementation type.

Two transducers are equivalent iff they accept the same input/output string pairs with the same weights and the same alignments. @note For weighted transducers, the function often returns false negatives due to weight precision issues.

remove_epsilons(self)

Remove all epsilon:epsilon transitions from the transducer so that the resulting transducer is equivalent to the original one.

determinize(self)

Determinize the transducer.

Determinizing a transducer yields an equivalent transducer that has no state with two or more transitions whose input:output symbol pairs are the same.

number_of_states(self):

The number of states in the transducer.

number_of_arcs(self):

The number of transitions in the transducer.

write(self, ostr):

Write the transducer in binary format to \a ostr. @param ostr A hfst.HfstOutputStream where the transducer is written.

write_att(self, f, write_weights=True):

Write the transducer in AT&T format to file \a f, \a write_weights defined whether weights are written. @param f A python file where transducer is written. @param write_weights Whether weights are written.

write_prolog(self, f, name, write_weights=True):

Write the transducer in prolog format with name \a name to file \a f, \a write_weights defined whether weights are written. @param f A python file where the transducer is written. @param name The name of the transducer that must be given in a prolog file. @param write_weights Whether weights are written.

minimize(self):

Minimize the transducer.

Minimizing a transducer yields an equivalent transducer with the smallest number of states.

@bug OpenFst's minimization algorithm seems to add epsilon transitions to weighted transducers?

n_best(self, n):

Extract \a n best paths of the transducer.

In the case of a weighted transducer (#hfst.ImplementationType.TROPICAL_OPENFST_TYPE or #hfst.ImplementationType.LOG_OPENFST_TYPE), best paths are defined as paths with the lowest weight. In the case of an unweighted transducer (#hfst.ImplementationType.SFST_TYPE or #hfst.ImplementationType.FOMA_TYPE), the function returns random paths.

This function is not implemented for #hfst.ImplementationType.FOMA_TYPE or #hfst.ImplementationType.SFST_TYPE. If this function is called by an HfstTransducer of type #hfst.ImplementationType.FOMA_TYPE or #hfst.ImplementationType.SFST_TYPE, it is converted to #hfst.ImplementationType.TROPICAL_OPENFST_TYPE, paths are extracted and it is converted back to #hfst.ImplementationType.FOMA_TYPE or #hfst.ImplementationType.SFST_TYPE. If HFST is not linked to OpenFst library, an #hfst.exceptions.ImplementationTypeNotAvailableException is thrown.

repeat_star(self):

A concatenation of N transducers where N is any number from zero to infinity.

repeat_plus(self):

A concatenation of N transducers where N is any number from one to infinity.

repeat_n(self, n):

A concatenation of \a n transducers.

repeat_n_minus(self, n):

A concatenation of N transducers where N is any number from zero to \a n, inclusive.

repeat_n_plus(self, n):

A concatenation of N transducers where N is any number from \a n to infinity, inclusive.

repeat_n_to_k(self, n, k):

A concatenation of N transducers where N is any number from \a n to \a k, inclusive.

optionalize(self):

Disjunct the transducer with an epsilon transducer.

invert(self):

Swap the input and output symbols of each transition in the transducer.

reverse(self):

Reverse the transducer.

A reverted transducer accepts the string 'n(0) n(1) ... n(N)' iff the original transducer accepts the string 'n(N) n(N-1) ... n(0)'

input_project(self):

Extract the input language of the transducer.

All transition symbol pairs isymbol:osymbol are changed to isymbol:isymbol.

output_project(self):

Extract the output language of the transducer.

All transition symbol pairs isymbol:osymbol are changed to osymbol:osymbol.

compose(self, another):

Compose this transducer with \a another. @param another The second argument in the composition. Not modified.

lenient_composition(self, another):

Perform a lenient composition on this transducer and \a another. TODO: explain more.

compose_intersect(self, v, invert=False):

Compose this transducer with the intersection of transducers in \a v. If \a invert is true, then compose the intersection of the transducers in \a v with this transducer.

The algorithm used by this function is faster than intersecting all transducers one by one and then composing this transducer with the intersection.

@pre The transducers in \a v are deterministic and epsilon-free. @param v A tuple of transducers. @param invert Whether the intersection of the transducers in \a v is composed with this transducer.

concatenate(self, another):

Concatenate this transducer with \a another.

disjunct(self, another):

Disjunct this transducer with \a another.

intersect(self, another):

Intersect this transducer with \a another.

subtract(self, another):

Subtract transducer \a another from this transducer.

minus(self, another):

Alias for subtract. @see hfst.HfstTransducer.subtract

conjunct(self, another):

Alias for intersect. @see hfst.HfstTransducer.intersect

convert(self, type, options=''):

Convert the transducer into an equivalent transducer in format \a type.

If a weighted transducer is converted into an unweighted one, all weights are lost. In the reverse case, all weights are initialized to the semiring's one.

A transducer of type #hfst.ImplementationType.SFST_TYPE, #hfst.ImplementationType.TROPICAL_OPENFST_TYPE, #hfst.ImplementationType.LOG_OPENFST_TYPE or #hfst.ImplementationType.FOMA_TYPE can be converted into an #hfst.ImplementationType.HFST_OL_TYPE or #hfst.ImplementationType.HFST_OLW_TYPE transducer, but an #hfst.ImplementationType.HFST_OL_TYPE or #hfst.ImplementationType.HFST_OLW_TYPE transducer cannot be converted to any other type.

@note For conversion between HfstBasicTransducer and HfstTransducer, see #hfst.HfstTransducer.init and #hfst.HfstBasicTransducer.init

write_att(self, ofile, write_weights=True):

Write the transducer in AT&T format to file \a ofile, \a write_weights defines whether weights are written.

The fields in the resulting AT&T format are separated by tabulator characters.

NOTE: If the transition symbols contain space characters,the spaces are printed as '@SPACE@' because whitespace characters are used as field separators in AT&T format. Epsilon symbols are printed as '@0@'.

If several transducers are written in the same file, they must be separated by a line of two consecutive hyphens "--", so that they will be read correctly by hfst.read_att.

An example:

tr1 = hfst.regex('[foo:bar baz:0 " "]::0.3')
tr2 = hfst.empty_fst()
tr3 = hfst.epsilon_fst(0.5)
tr4 = hfst.regex('[foo]')
tr5 = hfst.empty_fst()

f = hfst.hfst_open('testfile.att', 'w')
for tr in [tr1, tr2, tr3, tr4]:
    tr.write_att(f)
    f.write('--\n')
tr5.write_att(f)
f.close()

This will yield a file 'testfile.att' that looks as follows:

0       1       foo     bar     0.299805
1       2       baz     @0@     0.000000
2       3       @_SPACE_@       @_SPACE_@       0.000000
3       0.000000
--
--
0       0.500000
--
0       1       foo     foo     0.000000
1       0.000000
--

@throws StreamCannotBeWrittenException @throws StreamIsClosedException

@see #hfst.HfstOutputStream.write @see #hfst.HfstTransducer.init

write_att(self, filename, write_weights=True):

Write the transducer in AT&T format to file named \a filename. \a write_weights defines whether weights are written.

If the file exists, it is overwritten. If the file does not exist, it is created.

priority_union(self, another):

Make priority union of this transducer with \a another.

For the operation t1.priority_union(t2), the result is a union of t1 and t2, except that whenever t1 and t2 have the same string on left side, the path in t2 overrides the path in t1.

Example

Transducer 1 (t1):
a : a
b : b

Transducer 2 (t2):
b : B
c : C

Result ( t1.priority_union(t2) ):
a : a
b : B
c : C
For more information, read fsmbook.

cross_product(self, another):

Make cross product of this transducer with \a another. It pairs every string of this with every string of \a another. If strings are not the same length, epsilon padding will be added in the end of the shorter string. @pre Both transducers must be automata, i.e. map strings onto themselves.

shuffle(self, another):

Shuffle this transducer with transducer \a another.

If transducer A accepts string 'foo' and transducer B string 'bar', the transducer that results from shuffling A and B accepts all strings [(f|b)(o|a)(o|r)].

@pre Both transducers must be automata, i.e. map strings onto themselves.

insert_freely(self, ins):

Freely insert a transition or a transducer into the transducer. @param ins The transition or transducer to be inserted.

If \a ins is a transition, i.e. a 2-tuple of strings: A transition is added to each state in this transducer. The transition leads from that state to itself with input and output symbols defined by \a ins. The weight of the transition is zero.

If \a ins is an #hfst.HfstTransducer: A copy of \a ins is attached with epsilon transitions to each state of this transducer. After the operation, for each state S in this transducer, there is an epsilon transition that leads from state S to the initial state of \a ins, and for each final state of \a ins, there is an epsilon transition that leads from that final state to state S in this transducer. The weights of the final states in \a ins are copied to the epsilon transitions leading to state S.

set_final_weights(self, weight):

Set the weights of all final states to \a weight. If the HfstTransducer is of unweighted type (#hfst.ImplementationType.SFST_TYPE or #hfst.ImplementationType.FOMA_TYPE), nothing is done.

push_weights_to_start(self):

Push weights towards initial state.

If the HfstTransducer is of unweighted type (#hfst.ImplementationType.SFST_TYPE or #hfst.ImplementationType.FOMA_TYPE), nothing is done.

An example:

>>> import hfst
>>> tr = hfst.regex('[a::1 a:b::0.3 (b::0)]::0.7;')
>>> tr.push_weights_to_start()
>>> print(tr)
0       1       a       a       2.000000
1       2       a       b       0.000000
2       3       b       b       0.000000
2       0.000000
3       0.000000
# @see hfst.HfstTransducer.push_weights_to_end

push_weights_to_end(self):

Push weights towards final state(s).

If the HfstTransducer is of unweighted type (#hfst.ImplementationType.SFST_TYPE or #hfst.ImplementationType.FOMA_TYPE), nothing is done.

An example:

>>> import hfst
>>> tr = hfst.regex('[a::1 a:b::0.3 (b::0)]::0.7;')
>>> tr.push_weights_to_end()
>>> print(tr)
0       1       a       a       0.000000
1       2       a       b       0.000000
2       3       b       b       0.000000
2       2.000000
3       2.000000

@see hfst.HfstTransducer.push_weights_to_start

substitute(self, s, S=None, **kwargs):

Substitute symbols or transitions in the transducer.

@param s The symbol or transition to be substituted. Can also be a dictionary of substitutions, if S == None. @param S The symbol, transition, a tuple of transitions or a transducer (hfst.HfstTransducer) that substitutes \a s. @param kwargs Arguments recognized are 'input' and 'output', their values can be False or True, True being the default. These arguments are valid only if \a s and \a S are strings, else they are ignored. @param input Whether substitution is performed on input side, defaults to True. Valid only if \a s and \a S are strings. @param output Whether substitution is performed on output side, defaults to True. Valid only if \a s and \ S are strings.

For more information, see hfst.HfstBasicTransducer.substitute. The function works similarly, with the exception of argument \a S, which must be hfst.HfstTransducer instead of hfst.HfstBasicTransducer.

@see hfst.HfstBasicTransducer.substitute

lookup_optimize(self):

Lookup string \a input. @param input The input. A string or a pre-tokenized tuple of symbols (i.e. a tuple of strings). @param kwargs Possible parameters and their default values are: obey_flags=True, max_number=-1, time_cutoff=0.0, output='tuple' @param obey_flags Whether flag diacritics are obeyed. Always True for HFST_OL(W)_TYPE transducers. @param max_number Maximum number of results returned, defaults to -1, i.e. infinity. @param time_cutoff How long the function can search for results before returning, expressed in seconds. Defaults to 0.0, i.e. infinitely. Always 0.0 for transducers that are not of HFST_OL(W)_TYPE. @param output Possible values are 'tuple', 'text' and 'raw', 'tuple' being the default.

@note This function has an efficient implementation only for optimized lookup format (hfst.ImplementationType.HFST_OL_TYPE or hfst.ImplementationType.HFST_OLW_TYPE). Other formats perform the lookup via composition. Consider converting the transducer to optimized lookup format or to a HfstBasicTransducer. Conversion to HFST_OL(W)_TYPE might take a while but the lookup is fast. Conversion to HfstBasicTransducer is quick but lookup is slower.

Optimize the transducer for lookup. This effectively converts the transducer into #hfst.ImplementationType.HFST_OL_TYPE.

remove_optimization(self):

Remove lookup optimization. This effectively converts transducer (back) into default fst type.

extract_paths(self, **kwargs):

Extract paths that are recognized by the transducer.

@param kwargs Arguments recognized are filter_flags, max_cycles, max_number, obey_flags, output, random. @param filter_flags Whether flags diacritics are filtered out from the result (default True). @param max_cycles Indicates how many times a cycle will be followed, with negative numbers indicating unlimited (default -1 i.e. unlimited). @param max_number The total number of resulting strings is capped at this value, with 0 or negative indicating unlimited (default -1 i.e. unlimited). @param obey_flags Whether flag diacritics are validated (default True). @param output Output format. Values recognized: 'text', 'raw', 'dict' (the default). 'text' returns a string where paths are separated by newlines and each path is represented as input_string + ":" + output_string + "\t" t weight. 'raw' yields a tuple of all paths where each path is a 2-tuple consisting of a weight and a tuple of all transition symbol pairs, each symbol pair being a 2-tuple of an input and an output symbol. 'dict' gives a dictionary that maps each input string into a list of possible outputs, each output being a 2-tuple of an output string and a weight. @param random Whether result strings are fetched randomly (default False). @return The extracted strings. \a output controls how they are represented.

@pre The transducer must be acyclic, if both \a max_number and \a max_cycles have unlimited values. Else a hfst.exceptions.TransducerIsCyclicException will be thrown.

An example:

>>> tr = hfst.regex('a:b+ (a:c+)')
>>> print(tr)
0       1       a       b       0.000000
1       1       a       b       0.000000
1       2       a       c       0.000000
1       0.000000
2       2       a       c       0.000000
2       0.000000

>>> print(tr.extract_paths(max_cycles=1, output='text'))
a:b     0
aa:bb   0
aaa:bbc 0
aaaa:bbcc       0
aa:bc   0
aaa:bcc 0

>>> print(tr.extract_paths(max_number=4, output='text'))
a:b     0
aa:bc   0
aaa:bcc 0
aaaa:bccc       0

>>> print(tr.extract_paths(max_cycles=1, max_number=4, output='text'))
a:b     0
aa:bb   0
aa:bc   0
aaa:bcc 0

@throws TransducerIsCyclicException @see #hfst.HfstTransducer.n_best @note Special symbols are printed as such. @todo a link to flag diacritics

extract_shortest_paths(self):

Extract shortest paths of the transducer. @return A dictionary.

extract_longest_paths(self, **kwargs):

Extract longest paths of the transducer. @return A dictionary.

longest_path_size(self, **kwargs):

Get length of longest path of the transducer.

lookup(input, limit=-1):

Lookup or apply a single tokenized string \a tok_input and return a maximum of \a limit results.

TODO: This is a version of lookup that handles flag diacritics as ordinary symbols and does not validate the sequences prior to outputting. Currently, this function calls lookup_fd.

@todo Handle flag diacritics as ordinary symbols instead of calling lookup_fd. @sa lookup_fd @return HfstOneLevelPaths pointer @param tok_input A tuple of consecutive symbols (strings). @param limit Number of strings to look up. -1 tries to look up all and may get stuck if infinitely ambiguous. lookup(tok_input, limit=-1):

Lookup or apply a single string \a input and return a maximum of \a limit results.

This is an overloaded lookup function that leaves tokenizing to the transducer. @return HfstOneLevelPaths pointer

lookup(tok, input, limit=-1):

Lookup or apply a single string \a input and return a maximum of \a limit results. \a tok defined how \a s is tokenized.

This is an overloaded lookup function that leaves tokenizing to \a tok. @return HfstOneLevelPaths pointer

lookup_fd(tok_input, limit = -1):

Lookup or apply a single string \a tok_input minding flag diacritics properly and return a maximum of \a limit results.

Traverse all paths on logical first level of the transducer to produce all possible outputs on the second. This is in effect a fast composition of single path from left hand side.

This is a version of lookup that handles flag diacritics as epsilons and validates the sequences prior to outputting. Epsilons on the second level are represented by empty strings in the results.

@pre The transducer must be of type #hfst.ImplementationType.HFST_OL_TYPE or #hfst.ImplementationType.HFST_OLW_TYPE. This function is not implemented for other transducer types.

@param tok_input A tuple of consecutive symbols (strings) to look up. @param limit (Currently ignored.) Number of strings to look up. -1 tries to look up all and may get stuck if infinitely ambiguous.

@see #is_lookup_infinitely_ambiguous @return HfstOneLevelPaths pointer

@todo Do not ignore argument \a limit.

lookup_fd(input, limit = -1):

Lookup or apply a single string \a s minding flag diacritics properly and return a maximum of \a limit results.

This is an overloaded lookup function that leaves tokenizing to the transducer. @return HfstOneLevelPaths pointer

lookup_fd(tok, input, limit = -1):

Lookup or apply a single string \a input minding flag diacritics properly and return a maximum of \a limit results. \a tok defines how s is tokenized.

This is an overloaded lookup function that leaves tokenizing to \a tok. @return HfstOneLevelPaths pointer

is_lookup_infinitely_ambiguous(self, tok_input):

Whether lookup of path \a input will have infinite results.

Currently, this function will return whether the transducer is infinitely ambiguous on any lookup path found in the transducer, i.e. the argument \a input is ignored.

@todo Do not ignore the argument \a input

is_infinitely_ambiguous(self):

Whether the transducer is infinitely ambiguous.

A transducer is infinitely ambiguous if there exists an input that will yield infinitely many results, i.e. there are input epsilon loops that are traversed with that input.

has_flag_diacritics(self):

Whether the transducer has flag diacritics in its transitions.

Clone this wiki locally