A Class for Efficient Key Lookup in C++ - Symbol(C++)

Remembering the hash

The problem with the above declaration of Symbol_table is that, in a compiler, an individual symbol is typically looked up in more than one symbol table. (Consider searching for a symbol in all scopes nesting a given program point, where each scope has its own symbol table containing the declarations in that scope.) The sequence of calls to the lookup() routine is inefficient, since each call to lookup() will rehash the symbol. (Hashing is not a particularly fast operation.) Having an efficient symbol table lookup routine is critical to the speed of a compiler, which typically spends much of its time looking up symbols. This problem can be solved by caching the hash value of a symbol, and passing it around with the symbol. Here is a class that does just that:

       class Hstring {
           String s;
           unsigned long hash;
           Hstring(const char* s_) : s(s_) {
               hash = s.hashval();
           unsigned long hashval() const {
               return hash;
           // ...

Hstring (short for ``hashed string'') hashes the string once, upon construction. Users of Hstring use the cached value, rather than re-hashing. Hstring can also provide whatever String-like operations the user wishes; notice, however, the operations that change the string must recompute the hash. As an aside, notice that we did not make Hstring a (publically) derived class of String. This would imply that Hstring is a subtype of String, which it is not. If Hstring were derived from String, we could do the following:

       // warning: hypothetical, "what-if" code
       // assume Hstring is derived from String
       void f(String& s) {
           s = "bar";
       main() {
           Hstring s("foo");
           // ...

This has the effect of changing the string the Hstring represents without updating the cached hash value. A symbol table using Hstring would look like this:

       class Symbol_table {
           void enter(const Hstring& s, Info* i);
           Info* lookup(const Hstring& s);
           // ...

Now searching for an individual symbol in more than one symbol table will not result in repeated hashes.

Next topic: A perfect hash
Previous topic: Symbol tables

© 2005 The SCO Group, Inc. All rights reserved.
SCO OpenServer Release 6.0.0 -- 02 June 2005