Linguistics 158: Computer-aided methods in linguistics
John B. Lowe /
Department of Linguistics
University of California, Berkeley - Spring 1997
Statistical Software
week /class: 13 / 24 :
Lab
Th , 17 Apr 1997
Overview
We'll look at using a spreadsheet program, Microsoft Excel, and Microsoft Graph (part of MS Word).
Computing word frequencies with CONC
- Open CONC
- New...
- Options...Index (0 references)
- Build...Index
- Export Index as...
- Edit file
- Copy into Clipboard
Computing word frequencies with PERL
- Open wordcount
- Modify program (include code for output file, modify print statement)
- Run the program
- Edit file
- Copy into clipboard
Spreadsheet example: computing the Zipf constant
- Paste the two columns into Worksheet
- Determine number of rows (= number of types) using COUNT
- SORT: $B$10:$B$2000, by Rows, Descending
- Determine number of tokens (SUM over range)
- Compute Relative Frequency: Fi / N, expressed as a Percentage
- NB: Something like =B10/E1 is not robust when copied-and-pasted!
- Format Relative Frequency as a percentage
- Compute Z, Zipf's constant ( Zi= i * fi)
- Compute (for all Zi)
- Sum(Zi)
- N
- Sum(Z) / N (i.e. mean)
- AVERAGE (using Excel)
- Compute (for i = 1 to 200)
- Sum(Zi)
- N
- Sum(Z) / N (i.e. mean)
- AVERAGE (using Excel)
- Compute X - mean(X)
- Compute (X - mean(X)) squared
- Divide by N - 1 (variance)
- Take the square root (standard deviation)
- Do the same thing with STDEV
Graphs
In Excel:
- Plot Fabs vs Rank (how does this variable look?)
- Plot Zi vs. Rank as a scattergram (Is s.d. constant?)
In Microsoft Graph:
- Plot Fabs vs Rank (how does this variable look?)
- Experiment with other plot types
Homework 8 : Statistics (due: Tusesday April 22)
Homework answers
[Ling 158 Home Page |
Linguistics 158 schedule]