Research Specialist

Sino-Tibetan Eymological Dictionary and Thesaurus Project
University of California at Berkeley
November 2010 to Present
Resumed work on this long-running NSF- and NEH-funded effort, which I started with others in 1987. Colloborate on the development and evangelizing of the STEDT database. Specific issues and tasks include process improvement in the development of the UI, designing and coding new features, my own research in the methodology of historical reconstruction, and so on. The STEDT database now contains over 455,000 words in more than 500 language varieties and is still growing. I share responsibility for the day-to-day management of the project with the PI, including staff supervision, evaluation and purchase of hardware and software, budgeting, and planning.
Co-founder and CTO

San Francisco, CA
June 2009 to present
Co-founded a startup to provide novel analysis of news stories; designed and developed (with a small team) the front- and back-end systems to support the project. Delivered several versions of the product.
Senior Scientist

SkyGrid, Inc.
Sunnyvale, CA
February 2008 to March 2009
Designed and developed pipeline components to do Named Entity Recognition, document grouping. Also worked on engineering process improvements, QA, reliability and availability, and other infrastructure issues. Developed evaluation metrics. Articulated and promoted metrics-based feature development in the company.
Manager of Knowledge Resources

Powerset, Inc.
San Francisco, CA
June 2006 to January 2008
Managed engineering team that produced and maintained the lexical, semantic, and onomastic resources in the company. Integrated other knowledge resources into the Powerset suite including Wordnet, Freebase, and other smaller datasets.
Vice President of Language Engineering & Chief Linguist

Ask Jeeves, Inc.
5858 Horton
Emeryville, CA 94608
February 2000 to October 2001
Set strategic direction for technology development. Present and evangelize the company's message at professional and academic forums. Guide and critique linguistic aspects of engineering initiatives. Participate in evaluation of emerging technology and competitive landscape. Assist corporate divisions such as the international and web properties divisions in launching new initiatives. Oversee the work of technology advisory board.
Director of Core Engineering

Ask Jeeves, Inc.
5858 Horton
Emeryville, CA 94608
May 1999 to February 2000
Led the Ask Jeeves software engineering team initially composed of ten software engineers in the development of the next generation of the Ask Jeeves question-answering system, a unique suite of both high-performance web-centric C++ components and database-centric GUI components written in C++ with MFC. Supported and backed up the CTO on product direction, interactions with senior management, technology review, competitive analysis, and other tasks. Hired and integrated additional engineering staff, eventually supervising a team of twenty-five developers and support staff. Jumpstarted the Quality Assurance team; developed and implemented (with others) the product lifecycle process coordinated engineering efforts with technical publications, training, production, sales, marketing and other groups. Designed and reviewed new features of AJ systems.
Senior Software Engineer

Ask Jeeves, Inc.
5858 Horton
Emeryville, CA 94608
February 1999 to May 1999
Design and implemented software components to support dictionary and other language functionality in the Ask Jeeves question-answering system. Instructed staff in computational linguistic techniques. Analyzed user input (queries) and system performance. Critiqued designs and specifications.
Assistant Researcher

Institute of Cognitive Studies &
Center for South and Southeast Asian Studiesh
University of California at Berkeley
May 1995 to 1999
Continued my research program on both the etymological database projects mentioned below. I also worked with Charles Fillmore to construct a database of lexical representations based on frames semantics (FrameNet) and with Sharon Inkelas to create the Turkish Electronic Living Lexicon (TELL). Both projects are supported by NSF under different programs. I assisted Johanna Nicols and Balthasar Bickel in improving their typological databases used in their comparative work; Consulted with faculty on project in Russian and Hindi.
Membre Associé

Centre National de Recherche Scientifique /Laboratoire de langues et civilisation à tradition orale (LACITO/CNRS)
44 rue de l'Amiral Mouchez
75014 Paris, FRANCE
December 1991 to Present
I have been appointed an associate member of the lab in order to continue my work with researchers in Tibeto-Burman languages there. We have been working under an NSF/CNRS collaborative grant for the past two years on a project to produce automated tools for research in historical linguistics. Our efforts are documented (in part) in the list of publications below. I am also advising the lab on computing and telecommunications and developing funding for several other projects.
Research Assistant

Comparative Bantu Online Dictionary Project
University of California at Berkeley
February 1994 to May 1995
In collaboration with Larry Hyman, professor of linguistics and principal investigator of the CBOLD project, an international collaboration funded by NSF, I am creating a cross-linguistic lexical database for 100 to 200 of the 500+ Bantu languages of Central and Southern Africa. Building on experience gained at STEDT and elsewhere we are attempting to refine the tools for corpus-based comparative phonological research. Most of the effort to date has been directed towards data acquisition (scanning and OCR processing of existing sources), database preparation (including data design and parsing/tagging of dictionary entries into SGML), and database design. The CBOLD database now (6/95) contains over 235,000 words in 118 languages. I am now developing cross-platform access and editing software for the database using the FoxPro database package for the Macintosh and Windows environment. I share responsibility for the day-to-day management of the project, including staff hiring and supervision, evaluation and purchase of hardware and software, budgeting, and planning.
Research Assistant

Sino-Tibetan Eymological Dictionary and Thesaurus Project
University of California at Berkeley
August 1987 to May 1995
In collaboration with James A. Matisoff, professor of linguistics and principal investigator of the STEDT project, funded by NSF and NEH grants, I have designed and developed software to aid in the publication of the dictionary thesaurus. I have prepared Hypercard stacks for the collection and analysis of lexical data, designed database structures and algorithms to facilitate the storage and retrieval of a variety of types of linguistic and bibliographic data. The STEDT database now contains over 232,000 words in more than 200 languages and is still growing. I am continuing to develop and refine software to access and update the database using the Foxbase database package for the Macintosh. I share responsibility for the day-to-day management of the project, including staff supervision, evaluation and purchase of hardware and software, budgeting, and planning.

Library Systems Office
University of California at Berkeley
March 1994 to September 1994
Developed LAN-based database supporting the operation of the Library's new 'trouble desk.' The Help Desk is a telephone service for Library staff to report and track problems with computer equipment used in the library. The Berkeley Library supports several local area networks connecting some 500 PC's and a comparable number of online-catalog terminals around the campus. The program I wrote in Microsoft Access permits Help Desk and technical support staff responsible for maintaining the hardware and software to enter and retrieve information about reported problems and outstanding workorders. The multiuser database is available everywhere on the Berkeley network allowing technicians in the field to keep current with the constantly changing demands for service and repair. I reported to Bernie Hurley, director of the Library Systems Office.
Instructional Technology Programi
Research Assistant
Instructional Technology Program
University of California at Berkeley
March 1994 to September 1994
Consulted with ITP staff on the development of interactive software for teaching foreign languages, in particular Hindi. The effort involved digitizing sound and video, integrating these components into HyperCard to make them useful to language learners. A prototype of an Interactive Ramayana was produced, and work continues (though I am no longer on staff) on an Interactive Intermediate Hindi Reader. I am working on the projedct with Steve Thorne of ITP and Bruce Pray and Usha Jain of the South and Southeast Asian languages department.
Research Assistant

Phonology Lab
University of California at Berkeley
January 1992 to July 1992
Working with lab director Steven Greenberg and others, I upgraded and improved the UCB phonology lab research environment and carried out experiments in speech perception. With others working under subcontract to SRI (PI: Jared Bernstein), I developed, implemented, and maintained UNIX-based software to gather data on the perception of English utterances by Japanese language students. I also designed and performed statistical analysis of these data using SAS and other programs and drafted progress reports and other documents in support of the project. Generally in the lab, I worked on the installation, networking, and operation of Sun workstations, Macintosh computers, and 'IBM-compatibles,' trained and supervised other staff and students and oversaw the acquisition and installation of other equipment for speech research including the Kay Elemetrics CSL system and Entropics Waves. I assisted and instructed students and researchers in the lab in the use of the other software and equipment, including Soundedit, Uppsala SoundWave, our TCP/IP based telecommunications programs, and other Unix-, Mac-, and Windows-based programs.
Visiting Researcher

Vakgroep Verglijkende Taalwettenschappen
Rijksuniversiteit Leiden
Postbus 9515
2300 RA Leiden
Kingdom of the Netherlands
July 1991 to October 1991 & December 1992 to February 1993
At the invitation of the Tangut Dictionary Project I created a font and database system for editing and printing the characters of Tangut, an extinct Tibeto-Burman language written in an ideographic script of about 6,000 characters which superficially resembles Chinese. The TDP is funded by the Dutch government to produce an English-Tangut-Russian dictionary. I created the system which is being used to enter the Tangut characters themselves on Apple Macintoshes.
Chercheur Associé

Centre National de Recherche Scientifique Laboratoire de langues et civilisation à tradition orale (LACITO/CNRS)
44 rue de l'Amiral Mouchez
75014 Paris, FRANCE
March 1990 to July 1990
At the invitation of collegues studying Himalayish languages of Nepal, I worked for three months in a poste rouge on a project to create software tools to assist in the historical analysis of groups of related languages. The project is being carried out as part of a collaborative effort between French and American linguists sponsored jointly by the National Science Foundation and the CNRS. The software suite is called a "reconstruction engine" and provides a means to analyze lexical corpora (machine-readable dictionaries) of different languages and test hypotheses concerning the nature of the phonological relationships between them. I also assisted in instructing students in the use of computers in linguistics and provided programming assistance and consultation to French linguists working in other areas.
Principal Programmer

University of California
Division of Library Automation, Office of the President
University of California (Systemwide)
June 1984 to August 1987
I had primary responsibility for the implementation of optical disk technology for large databases. Optical storage is an emerging technology which permits storage and retrieval of large amounts of data (usually many gigabytes, and millions of records). DLA was engaged in R&D projects utilizing 12-inch write-once devices, 5.25-inch OROM, and CDROM. The project required the development of embedded software utilizing Ethernet-based local area networking technology and the TCP/IP telecommunications protocol. Specific duties in this area included the identification and evaluation of hardware and software, the design and implementation of prototype systems (in both the IBM mainframe and IBM PC and PC/AT environments), and eventual integration of these devices into existing information retrieval systems. I also advised management in the areas of applications, database systems, and training for both anticipated and continuing DLA operations. DLA supported a multi-processor IBM 4381 environment running OS/MVT 21.8F under ASP version 3.2. During the conversion of the Division's operating environment to MVS/XA, I shared responsibility for conversion of the database and associated software, particularly ADABAS. Such responsibility required competence in the fields of hardware and software support, disaster-recovery planning, performance prediction and analysis, evaluation and implementation of new technologies, and systems analysis. I also advised on development strategies and options for the use of microcomputers in the library environment. I wrote applications and systems software for microcomputers used by DLA and associated institutions and trained staff in the use of microcomputer hardware and software. I also acted as an ad hoc liaison between applications and operations staff, monitoring production projects and identifying processing bottlenecks and potential solutions. In particular, I monitored use of DLA's database-management system, ADABAS, to insure high performance and database integrity. I provided systems support for installing and testing new production systems.
Programmer/Senior Programmer

Division of Library Automation, Office of the President
University of California (Systemwide)
May 1979 to June 1984
As an applications and systems programmer for the Division of Library Automation my primary duties involved maintaining and enhancing large database applications written in PL/I, using the ADABAS database management system. Later duties were expanded to include designing and implementing new applications. Other responsibilities included installation and maintenance of ADABAS, our database management system, participation in the design of an interactive online catalog, the MELVYL catalog, and work in IBM Assembly language, on systems-oriented tasks (i.e., OS maintenance, installing and customizing software packages, and writing assembly language interfaces to high level languages). During this period I was the database administrator (DBA) for our site. I developed the REMARC retrieval system, used by hundreds of libraries around the world. This system, a joint effort effort of the University and a private firm, permits libraries to retrieve machine-readable records from the Library of Congresses databases. The system is a result of two years of work on my part, and required the development of mainframe database software, telecommunications software, and Apple II microcomputer software.
Assistant Programmer

Division of Library Studies and Research, Office of the Assistant Vice President, Library Plans and Policies
University of California (Systemwide)
July 1977 to April 1979
In collaboration with analysts, I was responsible for the design and execution of statistical studies of library-related problems, design of data collection and analysis methodologies, budgeting computing activities, supervising coders and key-entry operators, and providing and maintaining documentation on the software and systems used. The job involved substantial programming in PL/I, SPSS, and APL. Most work was carried out in a teleprocessing environment, using such systems as WYLBUR, CMS, ATS, and TSO. Mainframes included IBM 360 models 65 and 91; IBM 370 models 145 and 158; PDP 11 models 34 and 70. Operating systems included VM and OS/MVT under HASP (on the IBM machines) and RSTS and UNIX (on the DEC machines).
Contract Programmer

Office of the Executive Director for Library Planning
University of California (Systemwide)
April 1977 to July 1977
Applications programming for the Universitywide Library Automation program. Designed and programmed an interactive simulation model of library space utilization for the systemwide administration. Programming was done in APL and run on an IBM 360/91 under OS/MVT and an IBM 370/148 under VM. Two versions of the model were produced, one in STSC's APL*PLUS and one in VSAPL. I also performed other analysis tasks in support of the development of the University libraries' master plan, The University of California Libraries: Plan for Development.
Scientific Programmer

Department of Psychology
Yale University
New Haven, Connecticut 06520
1975 to 1976
Scientific programming in support of dissertation research. The job, part of a work-study program, involved creating, maintaining, and analyzing a statistical database for ongoing experiments with the circadian rhythms of rats. Programs were written in APL and run on Yale's IBM 370/148 under MVT and later MVS.


