Perform an initialization the first time it is needed.
Initializing data structures lazily is a common technique. Not only does it avoid the cost of initializing unused data structures, it is often a more convenient way to structure a program.
Threads share access to an object.
The object should not be created until the first access.
The second force covers several possible motivations:
The object is expensive to create and creating it early would slow down program startup.
It is not used in every run of the program.
Early initialization would require adding code where it is undesirable for readability or structural reasons.
A parallel solution is substantially trickier, because it must deal with several concurrency issues.
Races: If two threads attempt to simultaneously access to the object for the first time, and thus cause creation of the object, the race must be resolved in a way that both threads end up with a reference to the same object of type T.
Memory leaks: In the event of a race, the implementation must ensure that any extra transient T objects are cleaned up.
Memory consistency: If thread X executes value=new T(), all other threads must see stores by new T() occur before the assignment value=.
Deadlock: What if the constructor of T() requires acquiring a lock, but the current holder of that lock is also racing to access the object for the first time?
There are two solutions. One is based on double-check locking. The other relies on compare-and-swap. Because the tradeoffs and issues are subtle, most of the discussion is in the following examples section.
An Intel® Threading Building Blocks (Intel® TBB) implementation of the "double-check" pattern is shown below.
template<typename T, typename Mutex=tbb::mutex>
class lazy {
   tbb::atomic<T*> value;
   Mutex mut;
public:
   lazy() : value() {}                    // Initializes value to NULL
   ~lazy() {delete value;}
   T& get() {
       if( !value ) {                     // Read of value has acquire semantics. 
           Mutex::scoped_lock lock(mut);
           if( !value ) value = new T();. // Write of value has release semantics 
       }
       return *value;
   }
}; 
		The name comes from the way that the pattern deals with races. There is one check done without locking and one check done after locking. The first check handles the presumably common case that the initialization has already been done, without any locking. The second check deals with cases where two threads both see an uninitialized value, and both try to acquire the lock. In that case, the second thread to acquire the lock will see that the initialization has already occurred.
If T() throws an exception, the solution is correct because value will still be NULL and the mutex unlocked when object lock is destroyed.
The solution correctly addresses memory consistency issues. A write to a tbb::atomic value has release semantics, which means that all of its prior writes will be seen before the releasing write. A read from tbb::atomic value has acquire semantics, which means that all of its subsequent reads will happen after the acquiring read. Both of these properties are critical to the solution. The releasing write ensures that the construction of T() is seen to occur before the assignment to value. The acquiring read ensures that when the caller reads from *value, the reads occur after the "if(!value)" check. The release/acquire is essentially the Fenced Data Transfer pattern, where the "message" is the fully constructed instance T(), and the "ready" flag is the pointer value.
The solution described involves blocking threads while initialization occurs. Hence it can suffer the usual pathologies associated with blocking. For example, if the thread first acquires the lock is suspended by the OS, all other threads will have to wait until that thread resumes. A lock-free variation avoids this problem by making all contending threads attempt initialization, and atomically deciding which attempt succeeds.
An Intel® TBB implementation of the non-blocking variant follows. It also uses double-check, but without a lock.
template<typename T>
class lazy {
   tbb::atomic<T*> value;
public:
   lazy() : value() {}                    // Initializes value to NULL
   ~lazy() {delete value;}
   T& get() {
       if( !value ) {
           T* tmp = new T();
           if( value.compare_and_swap(tmp,NULL)!=NULL )
               // Another thread installed the value, so throw away mine.
               delete tmp;
       }
       return *value;
   }
}; 
		The second check is performed by the expression value.compare_and_swap(tmp,NULL)!=NULL, which conditionally assigns value=tmp if value==NULL, and returns true if the old value was NULL. Thus if multiple threads attempt simultaneous initialization, the first thread to execute the compare_and_swap will set value to point to its T object. Other contenders that execute the compare_and_swap will get back a non-NULL pointer, and know that they should delete their transient T objects.
As with the locking solution, memory consistency issues are addressed by the semantics of tbb::atomic. The first check has acquire semantics and the compare_and_swap has both acquire and release semantics.
Lawrence Crowl, "Dynamic Initialization and Destruction with Concurrency", http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2660.htm