//: C04:HTMLStripper2.cpp
//{L} ../C03/ReplaceAll
// Filter to remove html tags and markers
#include
#include
#include
#include
#include
#include
#include
#include "../require.h"
using namespace std;
string& replaceAll(string& context, const string& from,
const string& to);
string& stripHTMLTags(string& s) throw(runtime_error) {
size_t leftPos;
while ((leftPos = s.find('<')) != string::npos) {
size_t rightPos = s.find('>', leftPos+1);
if (rightPos == string::npos) {
ostringstream msg;
msg << "Incomplete HTML tag starting in position "
<< leftPos;
throw runtime_error(msg.str());
}
s.erase(leftPos, rightPos - leftPos + 1);
}
// Remove all special HTML characters
replaceAll(s, "<", "<");
replaceAll(s, ">", ">");
replaceAll(s, "&", "&");
replaceAll(s, " ", " ");
// Etc...
return s;
}
int main(int argc, char* argv[]) {
requireArgs(argc, 1,
"usage: HTMLStripper2 InputFile");
ifstream in(argv[1]);
assure(in, argv[1]);
// Read entire file into string; then strip
ostringstream ss;
ss << in.rdbuf();
try {
string s = ss.str();
cout << stripHTMLTags(s) << endl;
return EXIT_SUCCESS;
}
catch (runtime_error& x) {
cout << x.what() << endl;
return EXIT_FAILURE;
}
} ///:~
In this program we read the entire file into a string by inserting a rdbuf( )call to the file stream into an ostringstream. Now it’s an easy matter to search for HTML delimiter pairs and erase them without having to worry about crossing line boundaries like we had to with the previous version in Chapter 3 .
The following example shows how to use a bidirectional (that is, read/write) string stream.
//: C04:StringSeeking.cpp
// Reads and writes a string stream
//{-bor}
#include
#include
#include
using namespace std;
int main() {
string text = "We will sell no wine";
stringstream ss(text);
ss.seekp(0, ios::end);
ss << " before its time.";
assert(ss.str() ==
"We will sell no wine before its time.");
// Change "sell" to "ship"
ss.seekg(9, ios::beg);
string word;
ss >> word;
assert(word == "ell");
ss.seekp(9, ios::beg);
ss << "hip";
// Change "wine" to "code"
ss.seekg(16, ios::beg);
ss >> word;
assert(word == "wine");
ss.seekp(16, ios::beg);
ss << "code";
assert(ss.str() ==
"We will ship no code before its time.");
ss.str("A horse of a different color.");
assert(ss.str() == "A horse of a different color.");
} ///:~
As always, to move the put pointer, you call seekp( ), and to reposition the get pointer, you call seekg( ). Even though we didn’t show it with this example, string streams are a little more forgiving than file streams in that you can switch from reading to writing or vice-versa at any time. You don’t need to reposition the get or put pointers or flush the stream. This program also illustrates the overload of str( )that replaces the stream’s underlying stringbufwith a new string .
The goal of the iostreams design is to allow you to easily move and/or format characters. It certainly wouldn’t be useful if you couldn’t do most of the formatting provided by C’s printf( )family of functions. In this section, you’ll learn all the output formatting functions that are available for iostreams, so you can format your bytes the way you want them .
The formatting functions in iostreams can be somewhat confusing at first because there’s often more than one way to control the formatting: through both member functions and manipulators. To further confuse things, a generic member function sets state flags to control formatting, such as left or right justification, to use uppercase letters for hex notation, to always use a decimal point for floating-point values, and so on. On the other hand, separate member functions set and read values for the fill character, the field width, and the precision .
In an attempt to clarify all this, we’ll first examine the internal formatting data of an iostream , along with the member functions that can modify that data. (Everything can be controlled through the member functions, if desired.) We’ll cover the manipulators separately .
The class ioscontains data members to store all the formatting information pertaining to a stream. Some of this data has a range of values and is stored in variables: the floating-point precision, the output field width, and the character used to pad the output (normally a space). The rest of the formatting is determined by flags, which are usually combined to save space and are referred to collectively as the format flags . You can find out the value of the format flags with the ios::flags( )member function, which takes no arguments and returns an object of type fmtflags(usually a synonym for long) that contains the current format flags. All the rest of the functions make changes to the format flags and return the previous value of the format flags .
fmtflags ios::flags(fmtflags newflags);
fmtflags ios::setf(fmtflags ored_flag);
fmtflags ios::unsetf(fmtflags clear_flag);
fmtflags ios::setf(fmtflags bits, fmtflags field);
The first function forces all the flags to change, which you do sometimes. More often, you change one flag at a time using the remaining three functions .
The use of setf( )can seem somewhat confusing. To know which overloaded version to use, you must know what type of flag you’re changing. There are two types of flags: those that are simply on or off, and those that work in a group with other flags. The on/off flags are the simplest to understand because you turn them on with setf(fmtflags)and off with unsetf(fmtflags). These flags are shown in the following table .
on/off flag |
Effect |
ios::skipws |
Skip white space. (For input; this is the default.) |
ios::showbase |
Indicate the numeric base (as set, for example, by dec, oct, or hex) when printing an integral value. Input streams also recognize the base prefix when showbaseis on. |
ios::showpoint |
Show decimal point and trailing zeros for floating-point values. |
ios::uppercase |
Display uppercase A-F for hexadecimal values and E for scientific values. |
ios::showpos |
Show plus sign (+) for positive values. |
ios::unitbuf |
"Unit buffering." The stream is flushed after each insertion. |
For example, to show the plus sign for cout, you say cout.setf(ios::showpos). To stop showing the plus sign, you say cout.unsetf(ios::showpos) .
The unitbufflag controls unit buffering , which means that each insertion is flushed to its output stream immediately. This is handy for error tracing, so that in case of a program crash, your data is still written to the log file. The following program illustrates unit buffering.
//: C04:Unitbuf.cpp
Читать дальше