Learnings on C++: Alignment

In fact this post is sort of an extension from the previous Alignment - first thoughts.
In general, sizeof( T ) is not the optimal alignment modulus for type T.
It's coarse even for some basic types and specially for user-defined types.
One can verify this even on a Windows XP and DJGPP C++11:

sizeof( long double ) = 12
alignof( long double ) = 4

sizeof( aggregate< double > ) = 8
alignof( aggregate< double > ) = 4

with

template< typename T >
struct aggregate
{
 T element;
};

The larger in size T gets, the coarser is the alignment based on its full size.
For aggregate types larger than the most stringent type the waste can be well over 100%.
For example:

struct T
{
 char member_1[ 5 ];
 long double member_2;
 char member_3[ 2 ];
 void * member_4;
};

 TYPE OFFSET SIZE ALIGN
member_1 : char [ 5 ] 0 5 1
member_2 : long double 8 12 4
member_3 : char [ 2 ] 20 2 1
member_4 : void * 24 4 4
 28 4

28 is a valid, although sub-optimal, alignment modulus for type T.
align< T * >( (void *) 1 ) yields 28, thus wasting 27 bytes!
This problem only gets worse and worse as the size of T gets bigger and bigger.
Had we known that the optimal alignment modulus is 4 we would waste just 3 bytes!
Had we known the size of the most stringent type that fits inside T we'd waste 11 bytes.

As suggested above, there are 2 solutions for this problem.
One of them is optimal and the other although sub-optimal is bounded.
They are respectively:

The advent of C++11 alignof() which is a portable and standard way of querying the optimal alignment modulus for a given type. Up to C++03, as far as I know, the only way out is by using some non-standard non-portable compiler extension, such as the GNU's __align__(). The importance of this case is for obtaining the optimal alignment for a specific data type. A specialized memory allocator would benefit from this.
The advent of C++11 alignof( std::max_align_t ) or the even coarser but still bounded sizeof( std::max_align_t ). As far as I know, there's no GNU extension supporting this. The importance of this case is for obtaining the most general alignment suitable to any data type. A generalized memory allocator would benefit from this.

SPECIALIZED SOLUTION

For more detail on the following two quick examples of the optimal solution using respectively the C++11 standards and GNU's extensions, please, refer to my previous post. The hereby excerpts were slightly adjusted by replacing the sizeof( T ) in order to get away with its original inefficiency.

template< typename P >
inline P align( void * p )
{
 const std::size_t alignment =
 alignof( typename pointed_by::type );

 const std::intptr_t pointer =
reinterpret_cast< std::intptr_t >( p ) +
 alignment - 1;

 return reinterpret_cast
( pointer - pointer % alignment );
}

or...

template< typename P >
inline P align( void * p )
{
 const std::size_t alignment =
 __alignof__( typename pointed_by::type );

 const std::intptr_t pointer =
reinterpret_cast< std::intptr_t >( p ) +
 alignment - 1;

 return reinterpret_cast
( pointer - pointer % alignment );
}

GENERALIZED SOLUTION

template< typename P >
inline P align( void * p )
{
 const std::size_t alignment =
 alignof( std::max_align_t );

 const std::intptr_t pointer =
reinterpret_cast< std::intptr_t >( p ) +
 alignment - 1;

 return reinterpret_cast
( pointer - pointer % alignment );
}

ADDENDUM

As an addendum to this post I'd like to mention one basic technique (which may be useful in other completely different scenarios that also needs processing an arbitrary list of types) which I believe led to or greatly contributed to the absolutely great aforementioned C++11 alignment facilities of nowadays. It's all about dealing with template in order to statically compute the size of the most stringent basic type which is not greater than the size of T (specially an aggregate/user-defined type). Such computed size value must also be, by the language well-formed array rules, a valid (but not necessarily optimal) alignment modulus for T. What's also good on all this is that it also implies that we have a fixed upper-bound value to rely on instead of sizeof( T ) which can be considerably larger. This aforementioned static computation is only possible due to a few clever template tricks which I first heard of on a Dr. Dobb's article on Discriminated Unions authored by Andrei Alexandrescu. The main adaptation I did was simplifying it by replacing typelists with the C++11 variadic templates.

The first part is the abstraction of list of types for which I'm replacing the typelists for variadic templates. As guided explanation the following example is a list of the so-called basic types, but not on a strict sense specially due to the presence of some sort of "minimal" aggregates.

struct unknown;

template< typename T >
struct aggregate
{
 T element;
};

template< typename... >
struct type;

using basic_types = type
<
 char,
 short int,
 int,
 long int,
 long long int,

 aggregate< char >,
 aggregate< short int >,
 aggregate< int >,
 aggregate< long int >,
 aggregate< long long int >,

 char *,
 short int *,
 int *,
 long int *,
 long long int *,

 aggregate< char * >,
 aggregate< short int * >,
 aggregate< int * >,
 aggregate< long int * >,
 aggregate< long long int * >,

 float,
 double,
 long double,

 aggregate< float >,
 aggregate< double >,
 aggregate< long double >,

 float *,
 double *,
 long double *,

 aggregate< float * >,
 aggregate< double * >,
 aggregate< long double * >,

 void *,

 aggregate< void * >,

 unknown * unknown::*,
 unknown ( * )( unknown ),
 unknown ( unknown::* )( unknown ),

 aggregate< unknown * unknown::* >,
 aggregate< unknown ( * ) ( unknown ) >,
 aggregate< unknown ( unknown::* ) ( unknown ) >
>;

Next step is to be able to process such "lists" of types.
The processing will select types which sizes are not greater than a certain value.
In particular, I will show 2 ways of dealing with variadic templates:

The popular way which recursively consumes the first type.
In doing so the basic_types will be gradually processed.
So this represents a sort of input to the processing.
The possibly unusual way which is logically inverse to the popular way.
An initially empty variadic template will be gradually populated with selected types.
So this represents a sort of output of the processing.

The entire process of consuming the input and producing the selected output will be handled by a single specialized template called not_greater_than where T is the data type which size is the key controlling factor (selection threshold). I didn't invest any time on refining ideas such as further factorization or generalization because I think the complexity would explode beyond my interest or possibilities.

The raw usage would be as follows:
( result_type is like a selected subset of basic_types )

typename not_greater_than
<
 sizeof(T), // the selection threshold
 basic_types, // the input
 type<> // a technicality
>
::result_type // the output

In fact, another wrapping template could hide all of the above (see the conclusion).
But for now, the implementation would be as follows:

template
<
 std::size_t,
 typename,
 typename
>
struct not_greater_than;

//
// recursion
//
template<
 std::size_t SIZE,
 typename T, typename... IN_PACK,
 typename... OUT_PACK
>
struct not_greater_than
 < SIZE, type< T, IN_PACK... >, type< OUT_PACK... > >
{
 using result_type =
 typename not_greater_than< SIZE, type< IN_PACK... >,
 typename IF< sizeof(T) <= SIZE,
 type< T, OUT_PACK... >, // push_front T
 type< OUT_PACK... > // skip T
 >
 ::result_type
 >
 ::result_type;
};

//
// termination
//
template
<
 std::size_t SIZE,
 typename T,
 typename... OUT_PACK
>
struct not_greater_than
 < SIZE, type< T >, type< OUT_PACK... > >
{
 using result_type =
 typename IF< sizeof(T) <= SIZE,
 type< T, OUT_PACK... >, // push_front T
 type< OUT_PACK... > // skip T
 >
 ::result_type;
};

The selection depends on a well-known and straightforward helper template idiom:

template< bool, typename, typename >
struct IF;

template< typename T1, typename T2 >
struct IF< true, T1, T2 >
{
 using result_type = T1;
};

template< typename T1, typename T2 >
struct IF< false, T1, T2 >
{
 using result_type = T2;
};

Putting everything together you can consider a template device that produces a union large enough (and suitably aligned) to accommodate all the output types produced by the above example:

template< typename >
union merge_of;

template< typename T, typename... PACK >
union merge_of< type< T, PACK... > >
{
 T element;
 merge_of< type< PACK... > > pack;
};

template< typename T >
union merge_of< type< T > >
{
 T element;
};

which is used as follows:

template< typename T >
...
 // the following is a union
 // large enough and suitably aligned
 // to all basic_types not greater than sizeof(T)
 merge_of
 <
 typename not_greater_than
 < sizeof(T), basic_types, type<> >::result_type
 >;
...

It can, for example, be nested in another union together with some other data type one wants to possess the same alignment as merge_of (which alignment is that of the most stringent type not larger in size than sizeof( T ) ). That is, merge_of could be used as an aligner for some other type T. Or yet sizeof( merge_of< ... > ) could be used as coarser (upper-bounded) general alignment modulus.

Learnings on C++

Pages

Thursday, February 25, 2016

Alignment - second thoughts