| Research Specialist
Sino-Tibetan Eymological Dictionary and Thesaurus Project University of California at Berkeley |
November 2010 to Present |
Resumed work on this long-running NSF- and NEH-funded effort, which I started with others in 1987. Colloborate on the development and evangelizing of the STEDT database. Specific issues and tasks include process improvement in the development of the UI, designing and coding new features, my own research in the methodology of historical reconstruction, and so on. The STEDT database now contains over 455,000 words in more than 500 language varieties and is still growing. I share responsibility for the day-to-day management of the project with the PI, including staff supervision, evaluation and purchase of hardware and software, budgeting, and planning. |
|
| Co-founder and CTO
Qewz,Inc. San Francisco, CA |
June 2009 to present |
Co-founded a startup to provide novel analysis of news stories; designed and developed (with a small team) the front- and back-end systems to support the project. Delivered several versions of the product. |
|
| Senior Scientist
SkyGrid, Inc. Sunnyvale, CA |
February 2008 to March 2009 |
Designed and developed pipeline components to do Named Entity Recognition, document grouping. Also worked on engineering process improvements, QA, reliability and availability, and other infrastructure issues. Developed evaluation metrics. Articulated and promoted metrics-based feature development in the company. |
|
| Manager of Knowledge Resources
Powerset, Inc. San Francisco, CA |
June 2006 to January 2008 |
Managed engineering team that produced and maintained the lexical, semantic, and onomastic resources in the company. Integrated other knowledge resources into the Powerset suite including Wordnet, Freebase, and other smaller datasets. |
|
| Vice President of Language
Engineering & Chief Linguist
Ask Jeeves, Inc. 5858 Horton Emeryville, CA 94608 |
February 2000 to October 2001 |
Set strategic direction for technology development. Present and evangelize the company's message at professional and academic forums. Guide and critique linguistic aspects of engineering initiatives. Participate in evaluation of emerging technology and competitive landscape. Assist corporate divisions such as the international and web properties divisions in launching new initiatives. Oversee the work of technology advisory board. |
|
| Director of Core Engineering
Ask Jeeves, Inc. 5858 Horton Emeryville, CA 94608 |
May 1999 to February 2000 |
Led the Ask Jeeves software engineering team initially composed of ten software engineers in the development of the next generation of the Ask Jeeves question-answering system, a unique suite of both high-performance web-centric C++ components and database-centric GUI components written in C++ with MFC. Supported and backed up the CTO on product direction, interactions with senior management, technology review, competitive analysis, and other tasks. Hired and integrated additional engineering staff, eventually supervising a team of twenty-five developers and support staff. Jumpstarted the Quality Assurance team; developed and implemented (with others) the product lifecycle process coordinated engineering efforts with technical publications, training, production, sales, marketing and other groups. Designed and reviewed new features of AJ systems. |
|
| Senior Software Engineer
Ask Jeeves, Inc. 5858 Horton Emeryville, CA 94608 |
February 1999 to May 1999 |
Design and implemented software components to support dictionary and other language functionality in the Ask Jeeves question-answering system. Instructed staff in computational linguistic techniques. Analyzed user input (queries) and system performance. Critiqued designs and specifications. |
|
| Assistant Researcher
Institute of Cognitive Studies & Center for South and Southeast Asian Studiesh University of California at Berkeley |
May 1995 to 1999 |
Continued my research program on both the etymological database projects mentioned below. I also worked with Charles Fillmore to construct a database of lexical representations based on frames semantics (FrameNet) and with Sharon Inkelas to create the Turkish Electronic Living Lexicon (TELL). Both projects are supported by NSF under different programs. I assisted Johanna Nicols and Balthasar Bickel in improving their typological databases used in their comparative work; Consulted with faculty on project in Russian and Hindi. |
|
| Membre Associé
Centre National de Recherche Scientifique /Laboratoire de langues et civilisation à tradition orale (LACITO/CNRS) 44 rue de l'Amiral Mouchez 75014 Paris, FRANCE |
December 1991 to Present |
I have been appointed an associate member of the lab in order to continue my work with researchers in Tibeto-Burman languages there. We have been working under an NSF/CNRS collaborative grant for the past two years on a project to produce automated tools for research in historical linguistics. Our efforts are documented (in part) in the list of publications below. I am also advising the lab on computing and telecommunications and developing funding for several other projects. |
|
| Research Assistant
Comparative Bantu Online Dictionary Project University of California at Berkeley |
February 1994 to May 1995 |
In collaboration with Larry Hyman, professor of linguistics and principal investigator of the CBOLD project, an international collaboration funded by NSF, I am creating a cross-linguistic lexical database for 100 to 200 of the 500+ Bantu languages of Central and Southern Africa. Building on experience gained at STEDT and elsewhere we are attempting to refine the tools for corpus-based comparative phonological research. Most of the effort to date has been directed towards data acquisition (scanning and OCR processing of existing sources), database preparation (including data design and parsing/tagging of dictionary entries into SGML), and database design. The CBOLD database now (6/95) contains over 235,000 words in 118 languages. I am now developing cross-platform access and editing software for the database using the FoxPro database package for the Macintosh and Windows environment. I share responsibility for the day-to-day management of the project, including staff hiring and supervision, evaluation and purchase of hardware and software, budgeting, and planning. |
|
| Research Assistant
Sino-Tibetan Eymological Dictionary and Thesaurus Project University of California at Berkeley |
August 1987 to May 1995 |
In collaboration with James A. Matisoff, professor of linguistics and principal investigator of the STEDT project, funded by NSF and NEH grants, I have designed and developed software to aid in the publication of the dictionary thesaurus. I have prepared Hypercard stacks for the collection and analysis of lexical data, designed database structures and algorithms to facilitate the storage and retrieval of a variety of types of linguistic and bibliographic data. The STEDT database now contains over 232,000 words in more than 200 languages and is still growing. I am continuing to develop and refine software to access and update the database using the Foxbase database package for the Macintosh. I share responsibility for the day-to-day management of the project, including staff supervision, evaluation and purchase of hardware and software, budgeting, and planning. |
|
| Programmer/Analyst
Library Systems Office University of California at Berkeley |
March 1994 to September 1994 |
Developed LAN-based database supporting the operation of the Library's new 'trouble desk.' The Help Desk is a telephone service for Library staff to report and track problems with computer equipment used in the library. The Berkeley Library supports several local area networks connecting some 500 PC's and a comparable number of online-catalog terminals around the campus. The program I wrote in Microsoft Access permits Help Desk and technical support staff responsible for maintaining the hardware and software to enter and retrieve information about reported problems and outstanding workorders. The multiuser database is available everywhere on the Berkeley network allowing technicians in the field to keep current with the constantly changing demands for service and repair. I reported to Bernie Hurley, director of the Library Systems Office. |
|
|
Instructional Technology Programi Research Assistant Instructional Technology Program University of California at Berkeley |
March 1994 to September 1994 |
Consulted with ITP staff on the development of interactive software for teaching foreign languages, in particular Hindi. The effort involved digitizing sound and video, integrating these components into HyperCard to make them useful to language learners. A prototype of an Interactive Ramayana was produced, and work continues (though I am no longer on staff) on an Interactive Intermediate Hindi Reader. I am working on the projedct with Steve Thorne of ITP and Bruce Pray and Usha Jain of the South and Southeast Asian languages department. |
|
| Research Assistant
Phonology Lab University of California at Berkeley |
January 1992 to July 1992 |
Working with lab director Steven Greenberg and others, I upgraded and improved the UCB phonology lab research environment and carried out experiments in speech perception. With others working under subcontract to SRI (PI: Jared Bernstein), I developed, implemented, and maintained UNIX-based software to gather data on the perception of English utterances by Japanese language students. I also designed and performed statistical analysis of these data using SAS and other programs and drafted progress reports and other documents in support of the project. Generally in the lab, I worked on the installation, networking, and operation of Sun workstations, Macintosh computers, and 'IBM-compatibles,' trained and supervised other staff and students and oversaw the acquisition and installation of other equipment for speech research including the Kay Elemetrics CSL system and Entropics Waves. I assisted and instructed students and researchers in the lab in the use of the other software and equipment, including Soundedit, Uppsala SoundWave, our TCP/IP based telecommunications programs, and other Unix-, Mac-, and Windows-based programs. |
|
| Visiting Researcher
Vakgroep Verglijkende Taalwettenschappen Rijksuniversiteit Leiden Postbus 9515 2300 RA Leiden Kingdom of the Netherlands |
July 1991 to October 1991 & December 1992 to February 1993 |
At the invitation of the Tangut Dictionary Project I created a font and database system for editing and printing the characters of Tangut, an extinct Tibeto-Burman language written in an ideographic script of about 6,000 characters which superficially resembles Chinese. The TDP is funded by the Dutch government to produce an English-Tangut-Russian dictionary. I created the system which is being used to enter the Tangut characters themselves on Apple Macintoshes. |
|
| Chercheur Associé
Centre National de Recherche Scientifique Laboratoire de langues et civilisation à tradition orale (LACITO/CNRS) 44 rue de l'Amiral Mouchez 75014 Paris, FRANCE |
March 1990 to July 1990 |
At the invitation of collegues studying Himalayish languages of Nepal, I worked for three months in a poste rouge on a project to create software tools to assist in the historical analysis of groups of related languages. The project is being carried out as part of a collaborative effort between French and American linguists sponsored jointly by the National Science Foundation and the CNRS. The software suite is called a "reconstruction engine" and provides a means to analyze lexical corpora (machine-readable dictionaries) of different languages and test hypotheses concerning the nature of the phonological relationships between them. I also assisted in instructing students in the use of computers in linguistics and provided programming assistance and consultation to French linguists working in other areas. |
|
| Principal Programmer
University of California Division of Library Automation, Office of the President University of California (Systemwide) |
June 1984 to August 1987 |
I had primary responsibility for the implementation of optical disk technology for large databases. Optical storage is an emerging technology which permits storage and retrieval of large amounts of data (usually many gigabytes, and millions of records). DLA was engaged in R&D projects utilizing 12-inch write-once devices, 5.25-inch OROM, and CDROM. The project required the development of embedded software utilizing Ethernet-based local area networking technology and the TCP/IP telecommunications protocol. Specific duties in this area included the identification and evaluation of hardware and software, the design and implementation of prototype systems (in both the IBM mainframe and IBM PC and PC/AT environments), and eventual integration of these devices into existing information retrieval systems. I also advised management in the areas of applications, database systems, and training for both anticipated and continuing DLA operations. DLA supported a multi-processor IBM 4381 environment running OS/MVT 21.8F under ASP version 3.2. During the conversion of the Division's operating environment to MVS/XA, I shared responsibility for conversion of the database and associated software, particularly ADABAS. Such responsibility required competence in the fields of hardware and software support, disaster-recovery planning, performance prediction and analysis, evaluation and implementation of new technologies, and systems analysis. I also advised on development strategies and options for the use of microcomputers in the library environment. I wrote applications and systems software for microcomputers used by DLA and associated institutions and trained staff in the use of microcomputer hardware and software. I also acted as an ad hoc liaison between applications and operations staff, monitoring production projects and identifying processing bottlenecks and potential solutions. In particular, I monitored use of DLA's database-management system, ADABAS, to insure high performance and database integrity. I provided systems support for installing and testing new production systems. |
|
| Programmer/Senior Programmer
Division of Library Automation, Office of the President University of California (Systemwide) |
May 1979 to June 1984 |
As an applications and systems programmer for the Division of Library Automation my primary duties involved maintaining and enhancing large database applications written in PL/I, using the ADABAS database management system. Later duties were expanded to include designing and implementing new applications. Other responsibilities included installation and maintenance of ADABAS, our database management system, participation in the design of an interactive online catalog, the MELVYL catalog, and work in IBM Assembly language, on systems-oriented tasks (i.e., OS maintenance, installing and customizing software packages, and writing assembly language interfaces to high level languages). During this period I was the database administrator (DBA) for our site. I developed the REMARC retrieval system, used by hundreds of libraries around the world. This system, a joint effort effort of the University and a private firm, permits libraries to retrieve machine-readable records from the Library of Congresses databases. The system is a result of two years of work on my part, and required the development of mainframe database software, telecommunications software, and Apple II microcomputer software. |
|
| Assistant Programmer
Division of Library Studies and Research, Office of the Assistant Vice President, Library Plans and Policies University of California (Systemwide) |
July 1977 to April 1979 |
In collaboration with analysts, I was responsible for the design and execution of statistical studies of library-related problems, design of data collection and analysis methodologies, budgeting computing activities, supervising coders and key-entry operators, and providing and maintaining documentation on the software and systems used. The job involved substantial programming in PL/I, SPSS, and APL. Most work was carried out in a teleprocessing environment, using such systems as WYLBUR, CMS, ATS, and TSO. Mainframes included IBM 360 models 65 and 91; IBM 370 models 145 and 158; PDP 11 models 34 and 70. Operating systems included VM and OS/MVT under HASP (on the IBM machines) and RSTS and UNIX (on the DEC machines). |
|
| Contract Programmer
Office of the Executive Director for Library Planning University of California (Systemwide) |
April 1977 to July 1977 |
Applications programming for the Universitywide Library Automation program. Designed and programmed an interactive simulation model of library space utilization for the systemwide administration. Programming was done in APL and run on an IBM 360/91 under OS/MVT and an IBM 370/148 under VM. Two versions of the model were produced, one in STSC's APL*PLUS and one in VSAPL. I also performed other analysis tasks in support of the development of the University libraries' master plan, The University of California Libraries: Plan for Development. |
|
| Scientific Programmer
Department of Psychology Yale University New Haven, Connecticut 06520 |
1975 to 1976 |
Scientific programming in support of dissertation research. The job, part of a work-study program, involved creating, maintaining, and analyzing a statistical database for ongoing experiments with the circadian rhythms of rats. Programs were written in APL and run on Yale's IBM 370/148 under MVT and later MVS. |
In addition, I had major responsibility (in design, programming, and production) for the following reference works: