LibCat » Книги » Компьютеры и интернет » Программирование » Peter Siebel - Practical Common Lisp

Peter Siebel - Practical Common Lisp

Здесь есть возможность читать онлайн «Peter Siebel - Practical Common Lisp» весь текст электронной книги совершенно бесплатно (целиком полную версию без сокращений). В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Год выпуска: 2005, ISBN: 2005, Издательство: Apress, Жанр: Программирование, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Practical Common Lisp
Автор:
Peter Siebel
Издательство:
Apress
Жанр:
Программирование / на английском языке
Год:
2005
ISBN:
1-59059-239-5
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Practical Common Lisp: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Practical Common Lisp»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Practical Common Lisp — читать онлайн бесплатно полную книгу (весь текст) целиком

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Practical Common Lisp», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

However, READ-SEQUENCE returns the number of characters actually read. So, you can attempt to read the number of characters reported by FILE-LENGTH and return a substring if the actual number of characters read was smaller.

(defun start-of-file (file max-chars)

(with-open-file (in file)

(let* ((length (min (file-length in) max-chars))

(text (make-string length))

(read (read-sequence text in)))

(if (< read length)

(subseq text 0 read)

text))))

Analyzing the Results

Now you're ready to write some code to analyze the results generated by test-classifier. Recall that test-classifierreturns the list returned by test-from-corpusin which each element is a plist representing the result of classifying one file. This plist contains the name of the file, the actual type of the file, the classification, and the score returned by classify. The first bit of analytical code you should write is a function that returns a symbol indicating whether a given result was correct, a false positive, a false negative, a missed ham, or a missed spam. You can use DESTRUCTURING-BIND to pull out the :typeand :classificationelements of an individual result list (using &allow-other-keys to tell DESTRUCTURING-BIND to ignore any other key/value pairs it sees) and then use nested ECASE to translate the different pairings into a single symbol.

(defun result-type (result)

(destructuring-bind (&key type classification &allow-other-keys) result

(ecase type

(ham

(ecase classification

(ham 'correct)

(spam 'false-positive)

(unsure 'missed-ham)))

(spam

(ecase classification

(ham 'false-negative)

(spam 'correct)

(unsure 'missed-spam))))))

You can test out this function at the REPL.

SPAM> (result-type '(:FILE #p"foo" :type ham :classification ham :score 0))

CORRECT

SPAM> (result-type '(:FILE #p"foo" :type spam :classification spam :score 0))

CORRECT

SPAM> (result-type '(:FILE #p"foo" :type ham :classification spam :score 0))

FALSE-POSITIVE

SPAM> (result-type '(:FILE #p"foo" :type spam :classification ham :score 0))

FALSE-NEGATIVE

SPAM> (result-type '(:FILE #p"foo" :type ham :classification unsure :score 0))

MISSED-HAM

SPAM> (result-type '(:FILE #p"foo" :type spam :classification unsure :score 0))

MISSED-SPAM

Having this function makes it easy to slice and dice the results of test-classifierin a variety of ways. For instance, you can start by defining predicate functions for each type of result.

(defun false-positive-p (result)

(eql (result-type result) 'false-positive))

(defun false-negative-p (result)

(eql (result-type result) 'false-negative))

(defun missed-ham-p (result)

(eql (result-type result) 'missed-ham))

(defun missed-spam-p (result)

(eql (result-type result) 'missed-spam))

(defun correct-p (result)

(eql (result-type result) 'correct))

With those functions, you can easily use the list and sequence manipulation functions I discussed in Chapter 11 to extract and count particular kinds of results.

SPAM> (count-if #'false-positive-p *results*)

6

SPAM> (remove-if-not #'false-positive-p *results*)

((:FILE #p"ham/5349" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9999983107355541d0)

(:FILE #p"ham/2746" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.6286468956619795d0)

(:FILE #p"ham/3427" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9833753501352983d0)

(:FILE #p"ham/7785" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9542788587998488d0)

(:FILE #p"ham/1728" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.684339162891261d0)

(:FILE #p"ham/10581" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9999924537959615d0))

You can also use the symbols returned by result-typeas keys into a hash table or an alist. For instance, you can write a function to print a summary of the counts and percentages of each type of result using an alist that maps each type plus the extra symbol totalto a count.

(defun analyze-results (results)

(let* ((keys '(total correct false-positive

false-negative missed-ham missed-spam))

(counts (loop for x in keys collect (cons x 0))))

(dolist (item results)

(incf (cdr (assoc 'total counts)))

(incf (cdr (assoc (result-type item) counts))))

(loop with total = (cdr (assoc 'total counts))

for (label . count) in counts

do (format t "~&~@(~a~):~20t~5d~,5t: ~6,2f%~%"

label count (* 100 (/ count total))))))

This function will give output like this when passed a list of results generated by test-classifier:

SPAM> (analyze-results *results*)

Total: 3761 : 100.00%

Correct: 3689 : 98.09%

False-positive: 4 : 0.11%

False-negative: 9 : 0.24%

Missed-ham: 19 : 0.51%

Missed-spam: 40 : 1.06%

NIL

And as a last bit of analysis you might want to look at why an individual message was classified the way it was. The following functions will show you:

(defun explain-classification (file)

(let* ((text (start-of-file file *max-chars*))

(features (extract-features text))

(score (score features))

(classification (classification score)))

(show-summary file text classification score)

(dolist (feature (sorted-interesting features))

(show-feature feature))))

(defun show-summary (file text classification score)

(format t "~&~a" file)

(format t "~2%~a~2%" text)

(format t "Classified as ~a with score of ~,5f~%" classification score))

(defun show-feature (feature)

(with-slots (word ham-count spam-count) feature

(format

t "~&~2t~a~30thams: ~5d; spams: ~5d;~,10tprob: ~,f~%"

word ham-count spam-count (bayesian-spam-probability feature))))

(defun sorted-interesting (features)

(sort (remove-if #'untrained-p features) #'< :key #'bayesian-spam-probability))

What's Next

Obviously, you could do a lot more with this code. To turn it into a real spam-filtering application, you'd need to find a way to integrate it into your normal e-mail infrastructure. One approach that would make it easy to integrate with almost any e-mail client is to write a bit of code to act as a POP3 proxy—that's the protocol most e-mail clients use to fetch mail from mail servers. Such a proxy would fetch mail from your real POP3 server and serve it to your mail client after either tagging spam with a header that your e-mail client's filters can easily recognize or simply putting it aside. Of course, you'd also need a way to communicate with the filter about misclassifications—as long as you're setting it up as a server, you could also provide a Web interface. I'll talk about how to write Web interfaces in Chapter 26, and you'll build one, for a different application, in Chapter 29.