LibCat » Книги » Компьютеры и интернет » Программирование » Peter Siebel - Practical Common Lisp

Peter Siebel - Practical Common Lisp

Здесь есть возможность читать онлайн «Peter Siebel - Practical Common Lisp» весь текст электронной книги совершенно бесплатно (целиком полную версию без сокращений). В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Год выпуска: 2005, ISBN: 2005, Издательство: Apress, Жанр: Программирование, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Practical Common Lisp
Автор:
Peter Siebel
Издательство:
Apress
Жанр:
Программирование / на английском языке
Год:
2005
ISBN:
1-59059-239-5
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Practical Common Lisp: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Practical Common Lisp»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Practical Common Lisp — читать онлайн бесплатно полную книгу (весь текст) целиком

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Practical Common Lisp», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

To get the next octet, you'd use a byte specifier of (byte 8 8)like this:

(ldb (byte 8 8) #xabcd) ==> 171 ; 171 is #xab

You can use LDB with SETF to set the specified bits of an integer stored in a SETF able place.

CL-USER> (defvar *num* 0)

*NUM*

CL-USER> (setf (ldb (byte 8 0) *num*) 128)

128

CL-USER> *num*

128

CL-USER> (setf (ldb (byte 8 8) *num*) 255)

255

CL-USER> *num*

65408

Thus, you can also write read-u2like this: [264] Common Lisp also provides functions for shifting and masking the bits of integers in a way that may be more familiar to C and Java programmers. For instance, you could write read-u2 yet a third way, using those functions, like this: (defun read-u2 (in) (logior (ash (read-byte in) 8) (read-byte in))) which would be roughly equivalent to this Java method: public int readU2 (InputStream in) throws IOException { return (in.read() << 8) | (in.read()); } The names LOGIOR and ASH are short for LOGical Inclusive OR and Arithmetic SHift . ASH shifts an integer a given number of bits to the left when its second argument is positive or to the right if the second argument is negative. LOGIOR combines integers by logically or ing each bit. Another function, LOGAND , performs a bitwise and , which can be used to mask off certain bits. However, for the kinds of bit twiddling you'll need to do in this chapter and the next, LDB and BYTE will be both more convenient and more idiomatic Common Lisp style.

(defun read-u2 (in)

(let ((u2 0))

(setf (ldb (byte 8 8) u2) (read-byte in))

(setf (ldb (byte 8 0) u2) (read-byte in))

u2))

To write a number out as a 16-bit integer, you need to extract the individual 8-bit bytes and write them one at a time. To extract the individual bytes, you just need to use LDB with the same byte specifiers.

(defun write-u2 (out value)

(write-byte (ldb (byte 8 8) value) out)

(write-byte (ldb (byte 8 0) value) out))

Of course, you can also encode integers in many other ways—with different numbers of bytes, with different endianness, and in signed and unsigned format.

Strings in Binary Files

Textual strings are another kind of primitive data type you'll find in many binary formats. When you read files one byte at a time, you can't read and write strings directly—you need to decode and encode them one byte at a time, just as you do with binary-encoded numbers. And just as you can encode an integer in several ways, you can encode a string in many ways. To start with, the binary format must specify how individual characters are encoded.

To translate bytes to characters, you need to know both what character code and what character encoding you're using. A character code defines a mapping from positive integers to characters. Each number in the mapping is called a code point . For instance, ASCII is a character code that maps the numbers from 0-127 to particular characters used in the Latin alphabet. A character encoding, on the other hand, defines how the code points are represented as a sequence of bytes in a byte-oriented medium such as a file. For codes that use eight or fewer bits, such as ASCII and ISO-8859-1, the encoding is trivial—each numeric value is encoded as a single byte.

Nearly as straightforward are pure double-byte encodings, such as UCS-2, which map between 16-bit values and characters. The only reason double-byte encodings can be more complex than single-byte encodings is that you may also need to know whether the 16-bit values are supposed to be encoded in big-endian or little-endian format.

Variable-width encodings use different numbers of octets for different numeric values, making them more complex but allowing them to be more compact in many cases. For instance, UTF-8, an encoding designed for use with the Unicode character code, uses a single octet to encode the values 0-127 while using up to four octets to encode values up to 1,114,111. [265] Originally, UTF-8 was designed to represent a 31-bit character code and used up to six bytes per code point. However, the maximum Unicode code point is #x10ffff , so a UTF-8 encoding of Unicode requires at most four bytes per code point.

Since the code points from 0-127 map to the same characters in Unicode as they do in ASCII, a UTF-8 encoding of text consisting only of characters also in ASCII is the same as the ASCII encoding. On the other hand, texts consisting mostly of characters requiring four bytes in UTF-8 could be more compactly encoded in a straight double-byte encoding.

Common Lisp provides two functions for translating between numeric character codes and character objects: CODE-CHAR , which takes an numeric code and returns as a character, and CHAR-CODE , which takes a character and returns its numeric code. The language standard doesn't specify what character encoding an implementation must use, so there's no guarantee you can represent every character that can possibly be encoded in a given file format as a Lisp character. However, almost all contemporary Common Lisp implementations use ASCII, ISO-8859-1, or Unicode as their native character code. Because Unicode is a superset ofISO-8859-1, which is in turn a superset of ASCII, if you're using a Unicode Lisp, CODE-CHAR and CHAR-CODE can be used directly for translating any of those three character codes. [266] If you need to parse a file format that uses other character codes, or if you need to parse files containing arbitrary Unicode strings using a non-Unicode-Common-Lisp implementation, you can always represent such strings in memory as vectors of integer code points. They won't be Lisp strings, so you won't be able to manipulate or compare them with the string functions, but you'll still be able to do anything with them that you can with arbitrary vectors.

In addition to specifying a character encoding, a string encoding must also specify how to encode the length of the string. Three techniques are typically used in binary file formats.

The simplest is to not encode it but to let it be implicit in the position of the string in some larger structure: a particular element of a file may always be a string of a certain length, or a string may be the last element of a variable-length data structure whose overall size determines how many bytes are left to read as string data. Both these techniques are used in ID3 tags, as you'll see in the next chapter.

The other two techniques can be used to encode variable-length strings without relying on context. One is to encode the length of the string followed by the character data—the parser reads an integer value (in some specified integer format) and then reads that number of characters. Another is to write the character data followed by a delimiter that can't appear in the string such as a null character.

The different representations have different advantages and disadvantages, but when you're dealing with already specified binary formats, you won't have any control over which encoding is used. However, none of the encodings is particularly more difficult to read and write than any other. Here, as an example, is a function that reads a null-terminated ASCII string, assuming your Lisp implementation uses ASCII or one of its supersets such as ISO-8859-1 or full Unicode as its native character encoding: