Distributional learning of parallel multiple context-free grammars

Alexander Clark, Ryo Yoshinaka

Research output: Contribution to journalArticlepeer-review

19 Citations (Scopus)
46 Downloads (Pure)

Abstract

Natural languages require grammars beyond context-free for their description. Here we extend a family of distributional learning algorithms for context-free grammars to the class of Parallel Multiple Context-Free Grammars (pmcfgs). These grammars have two additional operations beyond the simple context-free operation of concatenation: the ability to interleave strings of symbols, and the ability to copy or duplicate strings. This allows the grammars to generate some non-semilinear languages, which are outside the class of mildly context-sensitive grammars. These grammars, if augmented with a suitable feature mechanism, are capable of representing all of the syntactic phenomena that have been claimed to exist in natural language.

We present a learning algorithm for a large subclass of these grammars, that includes all regular languages but not all context-free languages. This algorithm relies on a generalisation of the notion of distribution as a function from tuples of strings to entire sentences; we define nonterminals using finite sets of these functions. Our learning algorithm uses a nonprobabilistic learning paradigm which allows for membership queries as well as positive samples; it runs in polynomial time.
Original languageEnglish
Pages (from-to)5-31
Number of pages27
JournalMACHINE LEARNING
Volume96
Issue number1-2
Early online date3 Oct 2013
DOIs
Publication statusPublished - Jul 2014

Keywords

  • Mildly context-sensitive
  • Grammatical inference
  • Semilinearity

Fingerprint

Dive into the research topics of 'Distributional learning of parallel multiple context-free grammars'. Together they form a unique fingerprint.

Cite this