?? talks.info
字號:
A simple text classification problem: classify postings to /usr/msgsas talk announcements or "other". There are 818 messages, going backto Aug 93. I used messages numbered 500 and up as test cases.Average words/message is around 160. I used a simple lex program totokenize the data. Class labels were obtained by manual inspection.Doug McIlroy suggested this dataset, and also suggested that egrep -i'talk|abstract' would be hard to beat. (Which is correct). recall prec. #errors %err fp fndoug's egrep 22 6.77 8 14rocchio -m acc 75.9 96.9 22 6.77 2 20rocchio -m F 95.2 86.8 16 4.92 12 4ripper -L0.06 83.1 87.3 24 7.38 10 14 ripper -L0.125 89.2 89.2 18 5.54 9 9ripper -L0.25 95.2 91.9 11 3.38 7 4ripper -L0.5 94.0 91.8 12 3.69 7 5ripper -L1 89.2 92.5 15 4.62 6 9ripper -L1.5 79.5 95.7 20 6.15 3 17ripper -L2 77.1 100.0 19 5.85 0 19
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -