Some Alternatives to Parikh Matrices Using String Kernels

Alexander Clark, Chris Watkins

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

We describe methods of representing strings as real valued vectors or matrices; we show how to integrate two separate lines of enquiry: string kernels, developed in machine learning, and Parikh matrices [8], which have been studied intensively over the last few years as a powerful tool in the study of combinatorics over words. In the field of machine learning, there is widespread use of string kernels, which use analogous mappings into high dimensional feature spaces based on the occurrences of subwords or factors. In this paper we show how one can use string kernels to construct two alternatives to Parikh matrices, that overcome some of the limitations of the Parikh matrix construction. These are morphisms from the free monoid to rings of real-valued matrices under multiplication: one is based on the subsequence kernel and the other on the gap-weighted string kernel. For the latter kernel we demonstrate that for many values of the gap-weight hyperparameter the resulting morphism is injective.
Original languageEnglish
Pages (from-to)291-303
Number of pages13
JournalFUNDAMENTA INFORMATICAE
Volume84
Issue number3-4
Publication statusPublished - 2008

Fingerprint

Dive into the research topics of 'Some Alternatives to Parikh Matrices Using String Kernels'. Together they form a unique fingerprint.

Cite this