LibCat » Книги » Компьютеры и интернет » Программирование » Peter Siebel - Practical Common Lisp

Peter Siebel - Practical Common Lisp

Здесь есть возможность читать онлайн «Peter Siebel - Practical Common Lisp» весь текст электронной книги совершенно бесплатно (целиком полную версию без сокращений). В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Год выпуска: 2005, ISBN: 2005, Издательство: Apress, Жанр: Программирование, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Practical Common Lisp
Автор:
Peter Siebel
Издательство:
Apress
Жанр:
Программирование / на английском языке
Год:
2005
ISBN:
1-59059-239-5
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Practical Common Lisp: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Practical Common Lisp»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Practical Common Lisp — читать онлайн бесплатно полную книгу (весь текст) целиком

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Practical Common Lisp», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

Within the loop, you can use the function untrained-pto skip features extracted from the message that were never seen during training. These features will have spam counts and ham counts of zero. The untrained-pfunction is trivial.

(defun untrained-p (feature)

(with-slots (spam-count ham-count) feature

(and (zerop spam-count) (zerop ham-count))))

The only other new function is fisheritself. Assuming you already had an inverse-chi-squarefunction, fisheris conceptually simple.

(defun fisher (probs number-of-probs)

"The Fisher computation described by Robinson."

(inverse-chi-square

(* -2 (log (reduce #'* probs)))

(* 2 number-of-probs)))

Unfortunately, there's a small problem with this straightforward implementation. While using REDUCE is a concise and idiomatic way of multiplying a list of numbers, in this particular application there's a danger the product will be too small a number to be represented as a floating-point number. In that case, the result will underflow to zero. And if the product of the probabilities underflows, all bets are off because taking the LOG of zero will either signal an error or, in some implementation, result in a special negative-infinity value, which will render all subsequent calculations essentially meaningless. This is particularly unfortunate in this function because the Fisher method is most sensitive when the input probabilities are low—near zero—and therefore in the most danger of causing the multiplication to underflow.

Luckily, you can use a bit of high-school math to avoid this problem. Recall that the log of a product is the same as the sum of the logs of the factors. So instead of multiplying all the probabilities and then taking the log, you can sum the logs of each probability. And since REDUCE takes a :keykeyword parameter, you can use it to perform the whole calculation. Instead of this:

(log (reduce #'* probs))

write this:

(reduce #'+ probs :key #'log)

Inverse Chi Square

The implementation of inverse-chi-squarein this section is a fairly straightforward translation of a version written in Python by Robinson. The exact mathematical meaning of this function is beyond the scope of this book, but you can get an intuitive sense of what it does by thinking about how the values you pass to fisherwill affect the result: the more low probabilities you pass to fisher, the smaller the product of the probabilities will be. The log of a small product will be a negative number with a large absolute value, which is then multiplied by -2, making it an even larger positive number. Thus, the more low probabilities were passed to fisher, the larger the value it'll pass to inverse-chi-square. Of course, the number of probabilities involved also affects the value passed to inverse-chi-square. Since probabilities are, by definition, less than or equal to 1, the more probabilities that go into a product, the smaller it'll be and the larger the value passed to inverse-chi-square. Thus, inverse-chi-squareshould return a low probability when the Fisher combined value is abnormally large for the number of probabilities that went into it. The following function does exactly that:

(defun inverse-chi-square (value degrees-of-freedom)

(assert (evenp degrees-of-freedom))

(min

(loop with m = (/ value 2)

for i below (/ degrees-of-freedom 2)

for prob = (exp (- m)) then (* prob (/ m i))

summing prob)

1.0))

Recall from Chapter 10 that EXP raises e to the argument given. Thus, the larger valueis, the smaller the initial value of probwill be. But that initial value will then be adjusted upward slightly for each degree of freedom as long as mis greater than the number of degrees of freedom. Since the value returned by inverse-chi-squareis supposed to be another probability, it's important to clamp the value returned with MIN since rounding errors in the multiplication and exponentiation may cause the LOOP to return a sum just a shade over 1.

Training the Filter

Since you wrote classifyand trainto take a string argument, you can test them easily at the REPL. If you haven't yet, you should switch to the package in which you've been writing this code by evaluating an IN-PACKAGE form at the REPL or using the SLIME shortcut change-package. To use the SLIME shortcut, type a comma at the REPL and then type the name at the prompt. Pressing Tab while typing the package name will autocomplete based on the packages your Lisp knows about. Now you can invoke any of the functions that are part of the spam application. You should first make sure the database is empty.

SPAM> (clear-database)

Now you can train the filter with some text.

SPAM> (train "Make money fast" 'spam)

And then see what the classifier thinks.

SPAM> (classify "Make money fast")

SPAM

SPAM> (classify "Want to go to the movies?")

UNSURE

While ultimately all you care about is the classification, it'd be nice to be able to see the raw score too. The easiest way to get both values without disturbing any other code is to change classificationto return multiple values.

(defun classification (score)

(values

(cond

((<= score *max-ham-score*) 'ham)

((>= score *min-spam-score*) 'spam)

(t 'unsure))

score))

You can make this change and then recompile just this one function. Because classifyreturns whatever classificationreturns, it'll also now return two values. But since the primary return value is the same, callers of either function who expect only one value won't be affected. Now when you test classify, you can see exactly what score went into the classification.

SPAM> (classify "Make money fast")

SPAM

0.863677101854273D0

SPAM> (classify "Want to go to the movies?")

UNSURE

0.5D0

And now you can see what happens if you train the filter with some more ham text.

SPAM> (train "Do you have any money for the movies?" 'ham)

1

SPAM> (classify "Make money fast")

SPAM

0.7685351219857626D0

It's still spam but a bit less certain since money was seen in ham text.

SPAM> (classify "Want to go to the movies?")

HAM

0.17482223132078922D0

And now this is clearly recognizable ham thanks to the presence of the word movies , now a hammy feature.

However, you don't really want to train the filter by hand. What you'd really like is an easy way to point it at a bunch of files and train it on them. And if you want to test how well the filter actually works, you'd like to then use it to classify another set of files of known types and see how it does. So the last bit of code you'll write in this chapter will be a test harness that tests the filter on a corpus of messages of known types, using a certain fraction for training and then measuring how accurate the filter is when classifying the remainder.