LibCat » Книги » Компьютеры и интернет » Программирование » Peter Siebel - Practical Common Lisp

Peter Siebel - Practical Common Lisp

Здесь есть возможность читать онлайн «Peter Siebel - Practical Common Lisp» весь текст электронной книги совершенно бесплатно (целиком полную версию без сокращений). В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Год выпуска: 2005, ISBN: 2005, Издательство: Apress, Жанр: Программирование, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Practical Common Lisp
Автор:
Peter Siebel
Издательство:
Apress
Жанр:
Программирование / на английском языке
Год:
2005
ISBN:
1-59059-239-5
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Practical Common Lisp: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Practical Common Lisp»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Practical Common Lisp — читать онлайн бесплатно полную книгу (весь текст) целиком

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Practical Common Lisp», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

• Execute the forms provided by any initiallyclauses—the prologue—in the order they appear in the loop.

• Iterate, executing the body of the loop as described in the next paragraph.

• Execute the forms provided by any finallyclauses—the epilogue—in the order they appear in the loop.

While the loop is iterating, the body is executed by first stepping any iteration control variables and then executing any conditional or unconditional execution, accumulation, or termination test clauses in the order they appear in the loop code. If any of the clauses in the loop body terminate the loop, the rest of the body is skipped and the loop returns, possibly after running the epilogue.

And that's pretty much all there is to it. [245] The one aspect of LOOP I haven't touched on at all is the syntax for declaring the types of loop variables. Of course, I haven't discussed type declarations outside of LOOP either. I'll cover the general topic a bit in Chapter 32. For information on how they work with LOOP , consult your favorite Common Lisp reference. You'll use LOOP fairly often in the code later in this book, so it's worth having some knowledge of it. Beyond that, it's up to you how much you use it.

And with that, you're ready to dive into the practical chapters that make up the rest of the book—up first, writing a spam filter.

23. Practical: A Spam Filter

In 2002 Paul Graham, having some time on his hands after selling Viaweb to Yahoo, wrote the essay "A Plan for Spam" [246] Available at http://www.paulgraham.com/spam.html and also in Hackers & Painters: Big Ideas from the Computer Age (O'Reilly, 2004) that launched a minor revolution in spam-filtering technology. Prior to Graham's article, most spam filters were written in terms of handcrafted rules: if a message has XXX in the subject, it's probably a spam; if a message has a more than three or more words in a row in ALL CAPITAL LETTERS, it's probably a spam. Graham spent several months trying to write such a rule-based filter before realizing it was fundamentally a soul-sucking task.

To recognize individual spam features you have to try to get into the mind of the spammer, and frankly I want to spend as little time inside the minds of spammers as possible.

To avoid having to think like a spammer, Graham decided to try distinguishing spam from nonspam, a.k.a. ham , based on statistics gathered about which words occur in which kinds of e-mails. The filter would keep track of how often specific words appear in both spam and ham messages and then use the frequencies associated with the words in a new message to compute a probability that it was either spam or ham. He called his approach Bayesian filtering after the statistical technique that he used to combine the individual word frequencies into an overall probability. [247] There has since been some disagreement over whether the technique Graham described was actually "Bayesian." However, the name has stuck and is well on its way to becoming a synonym for "statistical" when talking about spam filters.

The Heart of a Spam Filter

In this chapter, you'll implement the core of a spam-filtering engine. You won't write a soup-to-nuts spam-filtering application; rather, you'll focus on the functions for classifying new messages and training the filter.

This application is going to be large enough that it's worth defining a new package to avoid name conflicts. For instance, in the source code you can download from this book's Web site, I use the package name COM.GIGAMONKEYS.SPAM, defining a package that uses both the standard COMMON-LISPpackage and the COM.GIGAMONKEYS.PATHNAMESpackage from Chapter 15, like this:

(defpackage :com.gigamonkeys.spam

(:use :common-lisp :com.gigamonkeys.pathnames))

Any file containing code for this application should start with this line:

(in-package :com.gigamonkeys.spam)

You can use the same package name or replace com.gigamonkeyswith some domain you control. [248] It would, however, be poor form to distribute a version of this application using a package starting with com.gigamonkeys since you don't control that domain.

You can also type this same form at the REPL to switch to this package to test the functions you write. In SLIME this will change the prompt from CL-USER>to SPAM>like this:

CL-USER> (in-package :com.gigamonkeys.spam)

#

SPAM>

Once you have a package defined, you can start on the actual code. The main function you'll need to implement has a simple job—take the text of a message as an argument and classify the message as spam, ham, or unsure. You can easily implement this basic function by defining it in terms of other functions that you'll write in a moment.

(defun classify (text)

(classification (score (extract-features text))))

Reading from the inside out, the first step in classifying a message is to extract features to pass to the scorefunction. In scoreyou'll compute a value that can then be translated into one of three classifications—spam, ham, or unsure—by the function classification. Of the three functions, classificationis the simplest. You can assume scorewill return a value near 1 if the message is a spam, near 0 if it's a ham, and near .5 if it's unclear.

Thus, you can implement classificationlike this:

(defparameter *max-ham-score* .4)

(defparameter *min-spam-score* .6)

(defun classification (score)

(cond

((<= score *max-ham-score*) 'ham)

((>= score *min-spam-score*) 'spam)

(t 'unsure)))

The extract-featuresfunction is almost as straightforward, though it requires a bit more code. For the moment, the features you'll extract will be the words appearing in the text. For each word, you need to keep track of the number of times it has been seen in a spam and the number of times it has been seen in a ham. A convenient way to keep those pieces of data together with the word itself is to define a class, word-feature, with three slots.

(defclass word-feature ()

((word

:initarg :word

:accessor word

:initform (error "Must supply :word")

:documentation "The word this feature represents.")

(spam-count

:initarg :spam-count

:accessor spam-count

:initform 0

:documentation "Number of spams we have seen this feature in.")

(ham-count

:initarg :ham-count

:accessor ham-count

:initform 0

:documentation "Number of hams we have seen this feature in.")))

You'll keep the database of features in a hash table so you can easily find the object representing a given feature. You can define a special variable, *feature-database*, to hold a reference to this hash table.

(defvar *feature-database* (make-hash-table :test #'equal))

You should use DEFVAR rather than DEFPARAMETER because you don't want *feature-database*to be reset if you happen to reload the file containing this definition during development—you might have data stored in *feature-database*that you don't want to lose. Of course, that means if you do want to clear out the feature database, you can't just reevaluate the DEFVAR form. So you should define a function clear-database.