Posts tagged ‘auto_buffer’

May 8, 2011

C and C++ Arrays on Stack and Heap

Simple table lists choices for different arrays in C and C++ regarding array length and allocation cost on storage. In this context, array means a linear sequential container of element objects of the same type that are next to its neighbors in memory. The array may or may not be resized. The colors are assigned based on the flexibility of array length, speed of allocation on storage, and extra benefits and pitfalls in features.

Array

Array Length

Storage (auto variable)

Notes

C/C++ array fixed at compile-time stack example: int a[10];
boost::array fixed at compile-time stack container-like support
C99 Variable Length Array fixed after created at run-time stack unspecified behavior on error
C++ Variable Length Array on stack with alloca() fixed after created at run-time stack See previous post. non-portable alloca().
std::vector with fixed_block_allocator<T,N> Dynamic at run-time, but no larger than N stack STL compliant
stlsoft::auto_buffer Dynamic at run-time stack if size within a static length; heap otherwise covered in Imperfect C++ (book)
boost::auto_buffer Dynamic at run-time stack if size within a static length; heap otherwise more versatile than  stlsoft::auto_buffer. Still in boost review process.
eastl::fixed_vector Dynamic at run-time stack if size within a static length; heap otherwise EASTL replaces STL for high performance game development mostly on memory allocation. Download
rde::fixed_vector Dynamic at run-time stack if size within a static length; heap otherwise RDESTL is similar to EASTL.
std::vector (plain) Dynamic at run-time heap explicit elements construction (slow)
TR2 dynarray fixed after created at run-time heap, but compiler may use stack for auto variable optimization in fact a std::vector but run-time size is fixed when vector’s variable length is unnecessary.
std::valarray Dynamic at run-time heap mostly for mathematic use; no iterator
C++0x std::array fixed at compile-time stack see boost::array

It is quite clear that when high performance is needed, auto_buffers and fixed_vectors are winners, because it provides a fast stack allocation/deallocation when array length is less than the static template parameter, and falls back to heap storage in the worst case.

In fact, the same problem exists on other containers (and string as a special container). EASTL and RDESTL provide fixed_xxxx containers and fixed_string primarily based on stack. STLSOFT has basic_simple_string and basic_static_string for strings.

Leveraging Allocator of STL Containers

Other than std::vector with fixed_block_allocator<T,N> and std::vector, all approaches above are exactly STL containers. Given all the benefits of STL, it would be great to have a STL compliant approach.

STL containers have an allocator template argument for customizing memory allocation for the container. So other than creating new array classes, it is interesting to use the STL container types by using a custom allocator class to get the stack memory for container, for example, for vector.

The C++ allocator for containers is still quite limited with a lot of restrictions (see Effective STL Item 10: Be aware of allocator conventions and restrictions, and Improving Performance with Custom Pool Allocators for STL). For example, an allocator should be able to deallocate memory allocated by another instance of the same allocator class; this is important to implement swap() and list::splice correctly. This would require all allocator types to not have instance data members, but have only class-static data or access global variables. This requirement would make allocator customization a real pain. One can imagine such an allocator is really very restricted.

Fortunately, one can make the operator == of the allocator class to return false when one has instance data in the class. Most STL implementation would know that free swapping is not available for containers with this allocator, and would copy elements one by one. Therefore, by signifying an always unequal allocator, one can get much freedom in designing an allocator.

Even though the standard does not explicitly require the container to have an allocator member variable, it implies so in Allocator requirements. Especially, C++0x enhanced the the allocator model; unlike C++98/03, C++0x explicitly allows one to use an allocator with state (scoped allocator). The container can use meta-programming to take out the allocator member completely if an allocator does not have any data and compact the container’s footprint. If the allocator has state, however, the container has to store it as a data member.

This basically means that it is possible to create upfront an allocator with a big size, for example a fixed length array of T as reserved for container elements. When an instance of the container with a data member of such an allocator type is created on stack, it has that reserved array on stack too.

C++ Programming: Stack Allocators for STL Containers is an example. The fixed_block_allocator<T,N> would house up to N elements. It simply uses max_size() to indicate that fixed length as its memory limit, and trying to size over that would result in std::length_error in container. Because the array member in the allocator is obviously non-static, the allocator uses a false-returning operator ==. Currently it works fine with Visual C++ 10.0 in Release build, but with Debug build the vector has run-time error in the default Visual C++ setting, probably because VC adds debugging bells and rings out-run the buffer.

I think it is possible to extend fixed_block_allocator to use heap as a fall back when more memory is required, such that it does not have the severe limitation of capacity N. This will bring it on par with the auto_buffers and fixed_vectors, but still remain STL compliant.

May 8, 2011

Create Variable Length Array on the Stack in C++

It is easy to use static size (compile-time fixed length) arrays, such as C array and boost::array, on the stack in a function.

Not all the time one knows the size of array at compile-time. If the size of the array can only be known at run-time, there are a few options.

The standard C++ vector container can be used as a very powerful variable length array, but it allocates memory from heap, which may be slow depending on your application requirements. It is often a lot more efficient if an array is allocated from stack, not only because the stack allocation is normally simply a bump on the stack frame, but also because each thread has its own stack so no synchronization is required when multiple threads try to allocate memory. Due to limited space in stack compared to heap, generally the array cannot be too big in the context of stack allocation.

C99 provides Variable Length Array (VLA) on stack:

  1 void foo(int size) 
  2 {
  3     int arr[size]; /* variable size at run-time */
  4     for(int i=0; i<size; ++i) 
  5         arr[i] = i;
  6 }

Although it is regarded by some as evil, because if the requested size is too big the behavior of Line 3 is implementation defined, which could mean crash, it works fine with reasonable array sizes.

In C++, however, VLA is not supported. Ferruccio gives an example of using the non-standard alloca() to create C++ VLA on stack. This post is based on his work, in an effort to fix a few things in his contribution.

The stack memory allocation alloca() is not part of standard. This means it:

  • may not be supported on some platform;
  • may have different alignments on different platforms.
  • may behave differently when error happens on different platform. For example, in Visual C++ it would raise Windows-specific structured exception requiring Structured Exception Handling (SEH) on Windows. On some *NIX platforms, it may just go haywire with overflowed pointer on stack that is totally undefined behavior.

See also What’s wrong with alloca()? for an ancient summary of pitfalls of alloca().

However, alloca()’s attractiveness is enticing: it is the only way to get dynamic size of memory from stack with literally no allocation cost!

The code below is the C++ VLA support:

  1 #include <new>		// placement new
  2 #include <cassert>	// assert
  3 #include <malloc.h>	// alloca on Windows
  4 
  5 namespace dsa {
  6 
  7 /****************************************************************************************************
  8 stack_array<T> constructs n elements of T type on a given memory buffer at initialization in ctor,
  9 and destructs these elements in its dtor. It does not allocate or deallocate memory.
 10 
 11 The given memory buffer should be at least sizeof(T)*n bytes.
 12 T must be default constructible and its dtor should not throw.
 13 
 14 Its main purpose is to simulate T arr[n]; on stack using alloca with n unknown at compile time.
 15 This allocates the array really fast compared to using dynamic memory allocation from heap.
 16 *****************************************************************************************************/
 17 template <typename T>
 18 class stack_array
 19 {
 20 public:
 21 	stack_array(T* p, size_t n) : p_(p), n_(n) { init_elements(); }
 22 	~stack_array() { uninit_elements(n_); }
 23 
 24 	// use like T ts[n], ts+i, ts[i] etc.
 25 	operator T* () { return p_; }
 26 
 27 	// very limited container like support support.
 28 	size_t size() { return n_; }
 29 	T* begin() { return p_; }
 30 	T* end() { return p_ + n_; }
 31 
 32 private:
 33 	// uninitialize the first n elements in reverse order.
 34 	void uninit_elements(size_t n)
 35 	{
 36 		while(n>0)
 37 			p_[--n].~T();
 38 	}
 39 
 40 	// default construct all elements, or none at exception.
 41 	void init_elements()
 42 	{
 43 		size_t i = 0;
 44 		try
 45 		{
 46 			for(; i<n_; ++i)
 47 			{
 48 				// note: boost::has_trivial_default_constructor avoids unnecessary ctor calls explicitly
 49 				T* pi_ = new (p_+i) T;
 50 				// if this breaks, initial buffer may not hold n elements for element alignment problems
 51 				assert( pi_==p_+i );
 52 			}
 53 		}
 54 		catch(...)
 55 		{	// when T ctor throws
 56 			uninit_elements(i);	// clean up initialized objects.
 57 			throw;	// re-throw the exception.
 58 		}
 59 	}
 60 
 61 private:
 62 	T*       p_;
 63 	size_t   n_;
 64 };
 65 
 66 } // dsa
 67 
 68 // allocate from stack a memory buffer that holds n T objects.
 69 // This does not handle the case that alloca fails. Handling that is platform specific.
 70 #define NEW_ON_STACK(T,n)                static_cast<T*>(alloca(sizeof(T) * (n)))
 71 
 72 // auxilliary macro to be used with stack_array:
 73 //   dsa::stack_array<T> arr(NEW_STACK_ARRAY(T,n));
 74 #define NEW_STACK_ARRAY(T,n)             NEW_ON_STACK(T,(n)),(n)
 75 
 76 // convenience macro to create a stack_array object arr of n T objects on stack.
 77 #define DYNAMIC_STACK_ARRAY(T, arr, n)   dsa::stack_array<T> arr(NEW_STACK_ARRAY(T,n))
 78 
 79 // in the usage above, you do not need to clean up stack_array or memory from alloca.
 80 // It destructs automatically when going out of scope.

It basically provides a class stack_array to manage the elements, and uses alloca() to acquire memory from current stack frame. Compared to Ferruccio’s VLA, by taking comments of the code into account, this code has the following improvements.

  • It uses placement new to call constructors to initialize the elements. Using array placement new on a raw buffer assumes a compiler specific memory layout and is therefore non-portable (or more non-portable on top of alloca). As pointed out by Fabio, Visual C++ requires an extra size_t at the memory buffer head to keep the size of array, but the call to alloca() does not reflect that. Since there is no portable way to find out what layout this could be for different compilers, it is better not to use array placement new here. Therefore, the improved code simply allocate sizeof(T)*n bytes, and use a for loop to initialize individual elements.
  • It is exception safe. Since alloca() itself is not portable, one can only do the best to make sure stack_array is exception safe. The class is relatively simple. Assuming the T destructor does not throw (which is reasonable), it is just to make the stack_array to be exception neutral and safe. If an exception is thrown when constructing an element, the code would destruct all constructed elements and re-throw the same exception.
  • It destructs the elements in reverse order. This is just a good practice.

Because the focus is using memory allocated on the stack, this VLA does not support array initializers or element constructors with one or more parameters. Another limitation is that the length cannot be changed once it is created.

This is the sample code to use it:

  1 void foo(size_t n)
  2 {
  3 	{
  4 		DYNAMIC_STACK_ARRAY(int, arrI, n);	// int arrI[n];
  5 		// or this line below has the same effect with some extra typing:
  6 		// dsa::stack_array<int> arr(NEW_STACK_ARRAY(int,n));
  7 		assert( arrI.size()==n );
  8 		arrI[0] = 313;
  9 		*(arrI+1) = 572;
 10 		// out of scope: arrI to destruct: so do all elements in it.
 11 	}
 12 	// function to return: the memory on stack from alloca is to be reclaimed.
 13 }

Once the VLA is created, it is similar to a C99 VLA in usage. The difference is that it is a true C++ construct, meaning you can put arbitrary objects, including your own classes in the array. The only requirement for the element class is that it must be default constructible and the destructor does not throw. For types with trivial constructor, for example, POD types like int, double etc, the stack VLA is very efficient and does not bother to explicitly default construct each elements as in std::vector. But similar to C99 VLA, due to its use of alloca(), this C++ VLA has the same non-portable behavior, especially when it cannot allocate the requested buffer from stack.

If you can have a good guess for the run-time length of the array at compile time, C++ VLA above can always be replaced by stlsoft::auto_buffer or  boost::auto_buffer. Plus, auto_buffer is truly portable, and allows changes of array length at run-time, and in the worst case, it would use heap when the array is too big to fit in the predefined stack buffer. boost::auto_buffer also offers various element initialization options, buffer policies and grow policies.

Follow

Get every new post delivered to your Inbox.

Join 43 other followers