Linguistics 158: Computer-aided methods in linguistics

John B. Lowe / Department of Linguistics
University of California, Berkeley - Spring 1997

Finite State Transducers for morphology and phonology

week /class: 4 / 8 : Lab Th , 13 Feb 1997

DEMO : "FST" a Hypercard stack implementing a Türing Machine


Q: What is a transducer, anyway?


DEMO : Regular Expressions in PERL


Two-level Phonology

(from PC-KIMMO web page...)

"...phonological alternations are treated as direct correspondences between the underlying (or lexical) representation of words and their realization on the surface level." Example from English orthography spies = spy + -s (plural allomorph)

    Lexical Representation:   ` s p y + 0 s
    Surface Representation:   0 s p i 0 e s
(where ` indicates stress, + indicates a morpheme boundary, and 0 indicates a null element):

(i.e. there are two tapes...one for each representation. As the transducer moves along one tape, it creates the other...

Rules must be written to account for the special correspondences `:0, y:i, +:0, and 0:e. For example, the two-level rule for the y:i correspondence looks like this (somewhat simplified):

          y:i => @:C___+:0
A PC-KIMMO description of a language consists of two files provided by the
  1. a rules file, which specifies the alphabet and the phonological (or spelling) rules, and

  2. a lexicon file, which lists lexical items (words and morphemes) and their glosses, and encodes morphotactic constraints.

Main components of PC-KIMMO

           +-----------+        +-----------+
           |  RULES    |        |  LEXICON  |
           +----+------+        +------+----+
                |-------+      +-------|
                        |      |
                        v      v
    Surface Form:  +------------------+      Lexical Form:
     spies ------->|    Recognizer    |----> `spy+s
                   +----+-------------+      [N(spy)+PLURAL]
                        |
                        v
                   +------------------+
     spies <-------|    Generator     |<----- `spy+s
                   +------------------+
The rules and lexicon are implemented computationally using finite state machines. For example, the two-level rule shown above for the y:i correspondence must be translated into the following finite state table for PC-KIMMO to use:
          |@ y + @
          |C i 0 @
        --+-------
        1:|2 0 1 1
        2:|2 3 2 1
        3.|0 0 1 0

DEMO: Morphological Parsing of English Words using PC-KIMMO


DEMO: Part of Speech tagging using KTagger


Homework 2 : WWW Content (due: Today!)
Homework answers
[Ling 158 Home Page | Linguistics 158 schedule]