Abstract
We discuss the problem of large scale grammatical inference in the context of the Tenjinno competition, with reference to the inference of deterministic finite state transducers, and discuss the design of the algorithms and the design and implementation of the program that solved the first problem. Though the OSTIA algorithm has good asymptotic guarantees for this class of problems, the amount of data required is prohibitive. We therefore developed a new strategy for inferring large scale transducers that is more adapted for large random instances of the type in question, which involved combining traditional state merging algorithms for inference of finite state automata with EM based alignment algorithms and state splitting algorithms.
Original language | Undefined/Unknown |
---|---|
Title of host publication | Proceedings of the 8th International Colloquium on Grammatical Inference (ICGI) |
Pages | 227-239 |
Number of pages | 13 |
Volume | 4201 LNAI |
Publication status | Published - 2006 |