No More String Errors - String(C++)


The following functions have been included for users who are especially interested in efficiency and are willing to understand some implementation details in order to achieve it. Note, however, that these implementation details are not officially part of our contract with users; if, in the future, we come up with a better way to implement Strings, these functions may cease to have any effect on programs that use them (they will become harmless no-ops).

Strings are implemented using a ``copy-on-write'' scheme. This means that memory is shared where possible and copies are performed only when absolutely necessary. For example, assigning one String to another simply increases a reference count and does not do any copying. If a change is subsequently made to either of the Strings, the copy occurs at that time. The advantage to this scheme is that a copy can often be avoided altogether, for example, when passing and returning Strings to and from functions. One disadvantage is that a copy can happen at a time when the user does not expect it, for example, when appending to a String.

Another situation in which an unexpected copy may take place is when more memory is needed to perform an operation. When space is first allocated for a String, a little extra is grabbed so that operations which lengthen the String can be done without going back for more memory or copying the String. If the amount of space which has been allocated for a String is less than that which will be needed for any particular operation, however, new space is allocated and the contents of the String are copied.

The String package provides a set of functions which help the user avoid some of these surprises and determine to some extent when copying takes place. The following operations are defined for users:

void String::uniq();

If s shares memory with another String, s.uniq() allocates new space for the String and copies its contents to this space. If s does not share memory with another String, it does nothing. A user might call uniq() after a String assignment from another String to ensure that copying is occurring at the assignment.

unsigned String::max() const;

s.max() reports the amount of space available for use by that String. This operation can be used to check the amount of space available before an operation.

void String::reserve(Stringsize(int));

s.reserve() provides a way to do temporary space reservation for a String. At least as much space as asked for is reserved for the String. The contents of the String remains unchanged. Important: The space reserved for String, s, is only guaranteed to be available until the next time s is assigned to another String, or until another String (or expression whose value is a String) is assigned to it.

Unsafe Programming with Strings

The functions described above are completely safe; they do not expose your programs to the risk of mysterious core dumps. In order to achieve maximum efficiency with Strings, however, it is sometimes necessary to enter into the realm of unsafe programming. The programming technique described in this section is unsafe because it involves casting away a constant in order to directly manipulate the internal String representation. Unsafe programming can only be justified in the innermost loops of time-critical programs. It is risky even when you know it ``works correctly'' because it may no longer do so if the String implementation changes.

We saw in the section, ``Basic Operations'', how the conversion from String to const char* allows a C++ programmer to pass Strings to C functions that expect null-terminated character arrays. The conversion works by providing read-only access to the String's internal storage area. What if you were to cast this pointer to a non-constant pointer?

       String x;
       char* p = (char*)(const char*)x;

This would give you unlimited access to the String's internal storage, permitting an unsafe operation like this:


If you are lucky, this will dump core. If you are unlucky, it will have a subtle side effect that doesn't show up until much later, making the problem extremely hard to diagnose. In short, when you program unsafely, you must have a solid understanding of String internals.

If x shares storage with other Strings, then an unsafe modification to x will change the value of all such Strings as a side effect. You must therefore make sure that the storage for the value of x is not shared:


Unfortunately, this is not enough. You must also guarantee that the internal storage of x is large enough to hold the extra five characters written by strcat (four x's plus the null character). We will use the function reserve to increase the String's capacity:

       x.reserve(x.length() + 5);
       char* p = (char*)(const char*)x;

We still have a problem. If we print x, it will appear not to have changed at all! The problem is that x doesn't "know" that its length has changed. To adjust the length of x to the correct value, we use pad:

       x.reserve(x.length() + 5);
       char* p = (char*)(const char*)x;

If the above seems artificial, consider a practical example:

   readstring(String& x, istream& is) {
       //  Read the next line into String x
       const int BLOCKSIZE = 128;
       int current_size = 0;
       do {
           char* p = (char*)(const char*)x
               + current_size;
           is.getline(p, BLOCKSIZE, '\ n');
               // see iostream(C++) --
               // this reads a line
           current_size += is.gcount();
               // see iostream(C++) --
               // number of chars read
       } while (is.gcount() == BLOCKSIZE-1);
           // break out of the loop when a block
           // that is not full has been read
           // adjust the length of the String

As an exercise, try writing a ``safe'' version of readstring that is as efficient as this one.

Next topic: Miscellaneous
Previous topic: UNIX System and Library Calls

© 2005 The SCO Group, Inc. All rights reserved.
SCO OpenServer Release 6.0.0 -- 02 June 2005