Peter Siebel - Practical Common Lisp
Здесь есть возможность читать онлайн «Peter Siebel - Practical Common Lisp» весь текст электронной книги совершенно бесплатно (целиком полную версию без сокращений). В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Год выпуска: 2005, ISBN: 2005, Издательство: Apress, Жанр: Программирование, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.
- Название:Practical Common Lisp
- Автор:
- Издательство:Apress
- Жанр:
- Год:2005
- ISBN:1-59059-239-5
- Рейтинг книги:4 / 5. Голосов: 1
-
Избранное:Добавить в избранное
- Отзывы:
-
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5
Practical Common Lisp: краткое содержание, описание и аннотация
Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Practical Common Lisp»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.
Practical Common Lisp — читать онлайн бесплатно полную книгу (весь текст) целиком
Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Practical Common Lisp», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.
Интервал:
Закладка:
Within the loop, you can use the function untrained-p
to skip features extracted from the message that were never seen during training. These features will have spam counts and ham counts of zero. The untrained-p
function is trivial.
(defun untrained-p (feature)
(with-slots (spam-count ham-count) feature
(and (zerop spam-count) (zerop ham-count))))
The only other new function is fisher
itself. Assuming you already had an inverse-chi-square
function, fisher
is conceptually simple.
(defun fisher (probs number-of-probs)
"The Fisher computation described by Robinson."
(inverse-chi-square
(* -2 (log (reduce #'* probs)))
(* 2 number-of-probs)))
Unfortunately, there's a small problem with this straightforward implementation. While using REDUCE
is a concise and idiomatic way of multiplying a list of numbers, in this particular application there's a danger the product will be too small a number to be represented as a floating-point number. In that case, the result will underflow to zero. And if the product of the probabilities underflows, all bets are off because taking the LOG
of zero will either signal an error or, in some implementation, result in a special negative-infinity value, which will render all subsequent calculations essentially meaningless. This is particularly unfortunate in this function because the Fisher method is most sensitive when the input probabilities are low—near zero—and therefore in the most danger of causing the multiplication to underflow.
Luckily, you can use a bit of high-school math to avoid this problem. Recall that the log of a product is the same as the sum of the logs of the factors. So instead of multiplying all the probabilities and then taking the log, you can sum the logs of each probability. And since REDUCE
takes a :key
keyword parameter, you can use it to perform the whole calculation. Instead of this:
(log (reduce #'* probs))
write this:
(reduce #'+ probs :key #'log)
Inverse Chi Square
The implementation of inverse-chi-square
in this section is a fairly straightforward translation of a version written in Python by Robinson. The exact mathematical meaning of this function is beyond the scope of this book, but you can get an intuitive sense of what it does by thinking about how the values you pass to fisher
will affect the result: the more low probabilities you pass to fisher
, the smaller the product of the probabilities will be. The log of a small product will be a negative number with a large absolute value, which is then multiplied by -2, making it an even larger positive number. Thus, the more low probabilities were passed to fisher
, the larger the value it'll pass to inverse-chi-square
. Of course, the number of probabilities involved also affects the value passed to inverse-chi-square
. Since probabilities are, by definition, less than or equal to 1, the more probabilities that go into a product, the smaller it'll be and the larger the value passed to inverse-chi-square
. Thus, inverse-chi-square
should return a low probability when the Fisher combined value is abnormally large for the number of probabilities that went into it. The following function does exactly that:
(defun inverse-chi-square (value degrees-of-freedom)
(assert (evenp degrees-of-freedom))
(min
(loop with m = (/ value 2)
for i below (/ degrees-of-freedom 2)
for prob = (exp (- m)) then (* prob (/ m i))
summing prob)
1.0))
Recall from Chapter 10 that EXP
raises e to the argument given. Thus, the larger value
is, the smaller the initial value of prob
will be. But that initial value will then be adjusted upward slightly for each degree of freedom as long as m
is greater than the number of degrees of freedom. Since the value returned by inverse-chi-square
is supposed to be another probability, it's important to clamp the value returned with MIN
since rounding errors in the multiplication and exponentiation may cause the LOOP
to return a sum just a shade over 1.
Training the Filter
Since you wrote classify
and train
to take a string argument, you can test them easily at the REPL. If you haven't yet, you should switch to the package in which you've been writing this code by evaluating an IN-PACKAGE
form at the REPL or using the SLIME shortcut change-package
. To use the SLIME shortcut, type a comma at the REPL and then type the name at the prompt. Pressing Tab while typing the package name will autocomplete based on the packages your Lisp knows about. Now you can invoke any of the functions that are part of the spam application. You should first make sure the database is empty.
SPAM> (clear-database)
Now you can train the filter with some text.
SPAM> (train "Make money fast" 'spam)
And then see what the classifier thinks.
SPAM> (classify "Make money fast")
SPAM
SPAM> (classify "Want to go to the movies?")
UNSURE
While ultimately all you care about is the classification, it'd be nice to be able to see the raw score too. The easiest way to get both values without disturbing any other code is to change classification
to return multiple values.
(defun classification (score)
(values
(cond
((<= score *max-ham-score*) 'ham)
((>= score *min-spam-score*) 'spam)
(t 'unsure))
score))
You can make this change and then recompile just this one function. Because classify
returns whatever classification
returns, it'll also now return two values. But since the primary return value is the same, callers of either function who expect only one value won't be affected. Now when you test classify
, you can see exactly what score went into the classification.
SPAM> (classify "Make money fast")
SPAM
0.863677101854273D0
SPAM> (classify "Want to go to the movies?")
UNSURE
0.5D0
And now you can see what happens if you train the filter with some more ham text.
SPAM> (train "Do you have any money for the movies?" 'ham)
1
SPAM> (classify "Make money fast")
SPAM
0.7685351219857626D0
It's still spam but a bit less certain since money was seen in ham text.
SPAM> (classify "Want to go to the movies?")
HAM
0.17482223132078922D0
And now this is clearly recognizable ham thanks to the presence of the word movies , now a hammy feature.
However, you don't really want to train the filter by hand. What you'd really like is an easy way to point it at a bunch of files and train it on them. And if you want to test how well the filter actually works, you'd like to then use it to classify another set of files of known types and see how it does. So the last bit of code you'll write in this chapter will be a test harness that tests the filter on a corpus of messages of known types, using a certain fraction for training and then measuring how accurate the filter is when classifying the remainder.
Читать дальшеИнтервал:
Закладка:
Похожие книги на «Practical Common Lisp»
Представляем Вашему вниманию похожие книги на «Practical Common Lisp» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.
Обсуждение, отзывы о книге «Practical Common Lisp» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.