Abstract
The Syntactic Concept Lattice is a residuated lattice based on the distributional structure of a language; the natural representation based on this is a context sensitive formalism. Here we examine the possibility of basing a context free grammar (sc cfg) on the structure of this lattice; in particular by choosing non-terminals to correspond to concepts in this lattice. We present a learning algorithm for context free grammars which uses positive data and membership queries, and prove its correctness under the identification in the limit paradigm. Since the lattice itself may be infinite, we consider only a polynomially bounded subset of the set of concepts, in order to get an efficient algorithm. We compare this on the one hand to learning algorithms for context free grammars, where the non-terminals correspond to congruence classes, and on the other hand to the use of context sensitive techniques such as Binary Feature Grammars and Distributional Lattice Grammars. The class of sc cfgs that can be learned in this way includes inherently ambiguous and thus non-deterministic languages; this approach therefore breaks through an important barrier in sc cfg inference. ERRATA: there are two significant errors in this paper. I am grateful to Hans Leiss for pointing these out to me, and proposing modifications to the algorithm that correct the errors.
Original language | Undefined/Unknown |
---|---|
Title of host publication | Grammatical Inference: Theoretical Results and Applications. Proceedings of the International Colloquium on Grammatical Inference |
Editors | José Sempere, Pedro Garcia |
Publisher | Springer |
Pages | 38-51 |
Number of pages | 14 |
Volume | 6339 LNAI |
Publication status | Published - 2010 |