"" // Must also test the empty string
};
class TrimTest : public TestSuite::Test {
public:
void testTrim() {
test_(trim(s[0]) == "abcdefghijklmnop");
test_(trim(s[1]) == "abcdefghijklmnop");
test_(trim(s[2]) == "abcdefghijklmnop");
test_(trim(s[3]) == "a");
test_(trim(s[4]) == "ab");
test_(trim(s[5]) == "abc");
test_(trim(s[6]) == "a b c");
test_(trim(s[7]) == "a b c");
test_(trim(s[8]) == "a \t b \t c");
test_(trim(s[9]) == "");
test_(trim(s[10]) == "");
}
void run() {
testTrim();
}
};
int main() {
TrimTest t;
t.run();
return t.report();
} ///:~
In the array of strings, you can see that the character arrays are automatically converted to stringobjects. This array provides cases to check the removal of spaces and tabs from both ends, as well as ensuring that spaces and tabs are not removed from the middle of a string .
Removing characters from strings
Removing characters is easy and efficient with the erase( )member function, which takes two arguments: where to start removing characters (which defaults to 0), and how many to remove (which defaults to string::npos).If you specify more characters than remain in the string, the remaining characters are all erased anyway (so calling erase( )without any arguments removes all characters from a string). Sometimes it’s useful to take an HTML file and strip its tags and special characters so that you have something approximating the text that would be displayed in the Web browser, only as a plain text file. The following uses erase( )to do the job: .
//: C03:HTMLStripper.cpp
//{L} ReplaceAll
// Filter to remove html tags and markers
#include
#include
#include
#include
#include
#include
#include "../require.h"
using namespace std;
string& replaceAll(string& context, const string& from,
const string& to);
string& stripHTMLTags(string& s) {
static bool inTag = false;
bool done = false;
while (!done) {
if (inTag) {
// The previous line started an HTML tag
// but didn't finish. Must search for '>'.
size_t rightPos = s.find('>');
if (rightPos != string::npos) {
inTag = false;
s.erase(0, rightPos + 1);
}
else {
done = true;
s.erase();
}
}
else {
// Look for start of tag:
size_t leftPos = s.find('<');
if (leftPos != string::npos) {
// See if tag close is in this line
size_t rightPos = s.find('>');
if (rightPos == string::npos) {
inTag = done = true;
s.erase(leftPos);
}
else
s.erase(leftPos, rightPos - leftPos + 1);
}
else
done = true;
}
}
// Remove all special HTML characters
replaceAll(s, "<", "<");
replaceAll(s, ">", ">");
replaceAll(s, "&", "&");
replaceAll(s, " ", " ");
// Etc...
return s;
}
int main(int argc, char* argv[]) {
requireArgs(argc, 1,
"usage: HTMLStripper InputFile");
ifstream in(argv[1]);
assure(in, argv[1]);
string s;
while(getline(in, s))
if (!stripHTMLTags(s).empty())
cout << s << endl;
} ///:~
This example will even strip HTML tags that span multiple lines. [32] To keep the exposition simple, this version does not handle nested tags, such as comments.
This is accomplished with the static flag, inTag, which is truewhenever the start of a tag is found, but the accompanying tag end is not found in the same line. All forms of erase( )appear in the stripHTMLFlags( )function. [33] It is tempting to use mathematics here to factor out some of these calls to erase(В ) , but since in some cases one of the operands is string::npos (the largest unsigned integer available), integer overflow occurs and wrecks the algorithm.
The version of getline( )we use here is a global function declared in the header and is handy because it stores an arbitrarily long line in its stringargument. You don’t have to worry about the dimension of a character array as you do with istream::getline( ). Notice that this program uses the replaceAll( )function from earlier in this chapter. In the next chapter, we’ll use string streams to create a more elegant solution .
Comparing strings is inherently different from comparing numbers. Numbers have constant, universally meaningful values. To evaluate the relationship between the magnitudes of two strings, you must make a lexical comparison . Lexical comparison means that when you test a character to see if it is "greater than" or "less than" another character, you are actually comparing the numeric representation of those characters as specified in the collating sequence of the character set being used. Most often this will be the ASCII collating sequence, which assigns the printable characters for the English language numbers in the range 32 through 127 decimal. In the ASCII collating sequence, the first "character" in the list is the space, followed by several common punctuation marks, and then uppercase and lowercase letters. With respect to the alphabet, this means that the letters nearer the front have lower ASCII values than those nearer the end. With these details in mind, it becomes easier to remember that when a lexical comparison that reports s1is "greater than" s2, it simply means that when the two were compared, the first differing character in s1came later in the alphabet than the character in that same position in s2 .
C++ provides several ways to compare strings, and each has advantages. The simplest to use are the nonmember, overloaded operator functions: operator ==, operator != operator >, operator <, operator >=,and operator <= .
//: C03:CompStr.cpp
//{L} ../TestSuite/Test
#include
#include "../TestSuite/Test.h"
using namespace std;
class CompStrTest : public TestSuite::Test {
public:
void run() {
// Strings to compare
string s1("This");
string s2("That");
test_(s1 == s1);
test_(s1 != s2);
test_(s1 > s2);
test_(s1 >= s2);
test_(s1 >= s1);
test_(s2 < s1);
test_(s2 <= s1);
test_(s1 <= s1);
}
};
int main() {
CompStrTest t;
t.run();
return t.report();
} ///:~
The overloaded comparison operators are useful for comparing both full strings and individual string character elements .
Notice in the following code fragment the flexibility of argument types on both the left and right side of the comparison operators. For efficiency, the stringclass provides overloaded operators for the direct comparison of string objects, quoted literals, and pointers to C-style strings without having to create temporary stringobjects .
// The lvalue is a quoted literal and
// the rvalue is a string
if("That" == s2)
cout << "A match" << endl;
// The left operand below is a string and the right is a
// pointer to a C-style null terminated string
if(s1 != s2.c_str())
cout << "No match" << endl;
The c_str( )function returns a const char*that points to a C-style, null-terminated string equivalent to the contents of the stringobject. This comes in handy when you want to pass a string to a standard C function, such as atoi( )or any of the functions defined in the header. It is an error to use the value returned by c_str( )as non- constargument to any function .
Читать дальше