Linguistics 158: Computer-aided methods in linguistics

John B. Lowe / Department of Linguistics
University of California, Berkeley - Spring 1997

Computational Lexical Semantics

week /class: 11 / 19 : Lec Tu , 1 Apr 1997

WordNet

Lexical relations

WordNet provides a tool for examining sophisticated relationships between word forms. Like a thesaurus, it can represent Additionally, the relationships of hyponymic and hypenymic can be given. These represent the IS-A relation and its inverse, for example,

All of the WordNet noun synsets are organized into hierarchies. Each synset is part of at least one hierarchy, headed by a synset called a unique beginner. All of these synsets originate in the lexicographer file noun.Tops. From any noun synset (excluding the unique beginners), the hypernym pointers can be traced up to one of the following unique beginners:

{ entity, (something having concrete existence; living or nonliving) } { psychological_feature, (a feature of the mental life of a living organism) } { abstraction, (a concept formed by extracting common features from examples) } { location, space,#p (a point or extent in space) } { shape, form, (the spatial arrangement of something as distinct from its substance) } { state, (the way something is with respect to its main attributes; "the current state of knowledge"; "his state of health"; "in a weak financial state") } { event, (something that happens at a given place and time) } { act, human_action, human_activity, (something that people do or cause to happen) } { group, grouping, (any number of entities (members) considered as a unit) } { possession, (anything owned or possessed) } { phenomenon, (any state or process known through the senses rather than by intuition or reasoning) }

Synsets: example from VIBRATION

...
Sense 2
shaking, tremor, trembling, quiver, quivering, vibration, palpitation

Database organization

The lexical database format is comprised of two text file types: an index file and a data file. The index file contains sorted word records. Each record contains the word form, which is the indexing key, and pointers to all the synsets which contain that word. These pointers are represented by lists of offsets into the data files. The data file is made up of a collection of synsets. Each synset comprises a list of synonymous word forms, and a list of pointers representing semantic relationships (antonymic, hyponymic, etc) between this synset and other synsets. These pointers are again represented as offsets in the data files. Additionally records for verbs also contain a list of applicable verb frames (e.g. somebody hits somebody). This relationship between the index and data records is shown in figure .

COMLEX Syntax

Adam Meyers, Catherine Macleod and Ralph Grishman
http://cs.nyu.edu/cs/faculty/grishman/comlex.html Each lexical entry is organized as a typed feature structure, using a Lisp-style notation which, can be mapped into other forms, e.g. Prolog, SGML-marked text, etc.

(verb           :orth "build" 
                :subc ((np) (np-for-np) (part-np :adval ("up")))) 
(noun           :orth "assertion" 
                :subc ((noun-that-s) (noun-be-that-s))) 
(adverb         :orth "even") 
(adjective      :orth "above-mentioned" 
                :features ((apreq) (attributive))) 
(verb           :orth "abbreviate" 
                :subc ((np-pp :pval ("to")) (np) (np-np-pred) (np-as-np))
                :features ((vveryving :pastpart t))) 
(noun           :orth "Prof." 
                :features ((ntitle))) 

a lexicon of
English semantics based on
Frame Semantic
principles


Fillmore et al., UC Berkeley
http://www.linguistics.berkeley.edu/lingdept/research/FrameNet/

Part of Frame-semantic "Tagset" for the Health Frame

label meaning
healer individual who tries to bring about an improvement in the patient
patient individual whose physical well-being is low
disease sickness or health condition that needs to be removed or relieved
wound tissue damage in the body of the patient
bodypart limb, organ, etc. affected by the disease or wound
symptom evidence indicating the presence of the disease
treatment process aimed at bringing about recovery
medicine substance applied or ingested in order to bring about recovery

KWIC for some corpus sentences for the lemma CURE

the medical system can cureany illness
eating yogurt will ... help ... cure certain diseases
why can't we cureschizophrenia?
Zolman Waksman ... won the Nobel Prize for curingTB
drugs that previously curedthe disease
a course of an antimicrobial drug ... would curethe ulcer
ridding these patients of the bacterium will cure their gastritis
an antibody combined with a cancer drug curedmice of transplanted human cancers
A gene therapy technique curedmice of muscular dystrophy
she has curedchildren of asthma, allergies and other common ailments
Swedish doctors cureda deaf man by removing a 47-year-old bus ticket from his ear
(cold and flu sufferers) they can cure themselves without professional advice
Mr. Hyman's herbs curedShiloh, her dog
people who cure with crystals
peptic ulcers ... can be permanently curedby antibiotics
Mice were cured of muscular dystrophy by scientists using gene therapy

Frame Semantic Enhancements of WordNet Sentence Frames

Existing WordNet Frame Valence Formulas Example Corpus freq.
Sb ---- s sth healer
SUBJ
person,
disease
DIR-OBJ
disease
she cured my disease .150
Sth ---- s sb medicine
SUBJ
medicine,
patient
DIR-OBJ
being
Mr Hyman's herbs cured Shiloh, her dog. .013
Sb ---- s sb of sth healer
SUBJ
person,
patient
DIR-OBJ
being,
disease
OF-OBLQ
disease
she has cured children of infections, asthma, allergies etc. .013
Sb ---- s with sth healer
SUBJ
person,
treatment
WITH-OBLIQUE
therapy
people who cure with crystals .007
Sth ---- s sth treatment
SUBJ
therapy,
disease
DIR-OBJ
disease
ridding these patients of the bacterium will cure their gastritis .320

A subframe can inherit elements and semantics from its parent.


Homework 6 : Database (due: Today! )
Homework answers
[Ling 158 Home Page | Linguistics 158 schedule]