Learnings on C++: 2016

Thursday, March 10, 2016

The SemWare Editor

If you do not yet know The SemWare Editor get to know it ASAP!
It isn't a full-blown IDE and in reality it never intended to be so.
What it really is is a powerful professional text editor.
Sadly it only has versions for Windows and DOS.
Even under Windows it isn't a GUI application.
It runs as a console window and thats OK!
No regrets, unless you're too lazy! ;-)

TSE:

Is fast! Very fast! Very much faster!
Much faster than any GUI editor you may know.

Is very, if not completely, customizable!
Its degree of customization is unbelievable.

Is possibly more powerful than VIM.
But certainly orders of magnitude easier to master.

TSE's internal macro language is very familiar and easy to learn.
To better understand its capabilities, check its features!

Trying to compare NetBeans' editor against TSE is unfair!
The NetBeans' degree of keyboard customizations is a joke.
TSE wins by far. But NetBeans wins as an IDE, of course.
But regarding text editing the game is over, TSE wins.

Even if you have a UNIX box for compiling and running, it would be advisable considering a (local or remote, virtual or not) Windows box just for the editing activity taking advantage of the superior text editing capabilities of TSE.

OPINION

In fact, the more professional you get, the less you need all those bells and whistles for text editing and building. You power and productivity will bias towards the bare bone tools without all those layers of fat of GUIs. Nevertheless, IDEs such as NetBeans survives because they integrate quite a lot of other goodies that may be important at some point of a project, remembering that if a project is growing extremely complex then there's probably some flaw or something wrong with its original design and/or engineering. The industry is rich of such defects, ranging from operating systems to all kinds of services and applications. That's all about Quality which is frequently sacrificed for Economic Pressures. And, by the way, much of this has to do with why I'm not fond of DOS, Windows, .NET, Java, Visual Studio, NetBeans and so on, on and on... All full of fat, overheads and inefficiencies and flaws which frequently require a lot of fixes, including security patches. One generally spends a great deal of time and money with maintenance to get a proportionally much lower ROI. A nightmare, in fact, a regularly employed deceiving strategy. I do not agree with this status-quo. I believe that systems and applications could be much more efficient and present much greater quality than they exhibit today.

NetBeans 8.1 under Solaris

If you are up to GCC under Solaris you may eventually get tempted to try a GUI development environment a.k.a Integrated Development Environment (IDE). The only major player that works on Solaris is NetBeans, but even if it weren't the only one, it would certainly be amongst the best ones. In fact, the only other major IDE that compete with NetBeans on other platforms is Eclipse, but NetBeans seems to have an edge in terms of software flexibility and openness.

The platform independent installation option of NetBeans is extremely simple to deal with. Basically, just download it and extract it to /opt. The extraction process may be tied to a subdirectory named netbeans, so that if you intend to keep multiple versions of side by side, this would probably be a problem. To workaround this lack of flexibility in the delivery method I suggest that you append version suffixes to the directory name, such as -81. Then, before setting to the extraction process, make sure you have a symbolic link from netbeans to netbeans-81.

NOTE

I'm completely ignoring here any slightly more sophisticated ZFS approach to dedicate a dataset under rpool/VARSHARE to be mounted under or linked at /opt. The one before the last sentence of the 1^st § and the 1^st sentence of the 2^nd § of an example of a more elaborate file-system setup can shed some light on what I mean by "a slightly more sophisticated file-system approach".

$ zfs list -o name,mountpoint .../netbeans-81
NAME MOUNTPOINT
.../netbeans-81 /opt/netbeans-81

Please, note that rpool/VARSHARE/netbeans-81 was used above just as an example. It's not the recommended dataset to do so. In general, there should be one or more dedicated datasets to hold /opt subdirectories.

That is:

# cd /opt
# ln -s netbeans-81 netbeans
# unzip -q /tmp/netbeans-8.1-201510222201-cpp.zip
# rm netbeans

$ ls -lhtr /opt/netbeans-81/
total 670
-rw-r--r-- 1 root root ... THIRDPARTYLICENSE.txt
-rw-r--r-- 1 root root ... README.html
-rw-r--r-- 1 root root ... netbeans.css
-rw-r--r-- 1 root root ... LICENSE.txt
-rw-r--r-- 1 root root ... CREDITS.html
drwxr-xr-x 7 root root ... nb
drwxr-xr-x 2 root root ... etc
drwxr-xr-x 11 root root ... harness
drwxr-xr-x 9 root root ... ide
drwxr-xr-x 9 root root ... platform
drwxr-xr-x 7 root root ... cnd
drwxr-xr-x 7 root root ... dlight
drwxr-xr-x 2 root root ... bin

Perform the first run as root but before that set up properly the X11 authorization and the DISPLAY environment variable as described in this other post X11 & SSH & SU. Launch NetBeans by executing and the file /opt/netbeans-81/bin/netbeans, accept the terms of the agreement and then exit at your own discretion. At this point you're pretty much done.

I would recommend to setup the default project folder, otherwise one will get a ~/NetBeansProjects which may be less than ideal. To address this, after a first launch, look for the projectui.properties file in ~/.netbeans and edit the value of projectsFolder entry (and any other eventual ocurrences of the unwanted path):

$ find ~/.netbeans -name projectui.properties

One final detail you may like to set up is a GNOME launcher under the menu Application | Developer Tools. In addition to referencing /opt/netbeans-81/bin/netbeans you'll probably reference the nice icon located at /opt/netbeans-81/nb/netbeans.png :

GCC under DOS

It's almost incredible but GCC 5.2.0 runs under DOS on Intel 386!
Of course, there's a 32-bits DOS Extender under the hood, but anyways!
That were made possible thanks to the heroic Deloire DJGPP project.
That's an incredible achievement constrained by DOS limitations.
Naturally there are not threads under DOS, please!
But it allows you to have a great start in C++11.

Nowadays, having a virtualized DOS is easy. You can use VirtualBox, VirtualPC, and so on... The list is quite extensive. And in face of actual powerful machines and sophisticated Operating Systems the requirements to a virtual DOS are trivially fulfilled! Booting a virtual DOS is ridiculously fast. Aside from obvious restrictions from DOS (as for lack of threads), the only slightly strange is that the generated executables are excessive large (as if many libraries were statically linked in), over 1Mb, even for the insidious "Hello, world!" minimal program. But never mind, if you follow the proper steps you can live with it and compilation times aren't that bad (for a DOS).

In order to get DJGPP, the recommended approach is to visit:
http://www.delorie.com/djgpp/zip-picker.html

I think it's best not to download too much stuff, so I stuck to the minimals as seen on the above image. Once the choices are made, just click the "Tell me which files I need" and follow the instructions. By the way, there's nothing hard about the instructions. Just note that you should use the default directory C:\DJGPP, the provided unzip32 and do a couple of very simple environment variable settings (one for DJGPP and other to your PATH).

By the way, note that differently from when under a Linux or UNIX platform, the C++ compiler executable is called gpp and not usual g++ as this last one amounts to an invalid file name under wonderful DOS world.

NOTE

In addition to DOS, as long as you stick to a 32-bits world, even those provided by Windows XP and Windows 7 (32-bits), you can still you the same approach depicted above, though you should select a closer match on the drop-down list-box which operating system you'll be using. You should select Windows 2000/XP. You should get a slightly different set of files, but not that much. I know that by doing this you'll even get GCC 5.3.0 (by the time of this writing).

GCC under Solaris

Solaris has been evolving and being modernized a lot.
That's very welcome since it's been always a great OS!
Thus it's now much easier to get GCC working under Solaris!
The Solaris 11.3 generally available release offers GCC 4.8.2.
That's more than enough to get our fit wet with C++11.
Once a local package repository is available you just have to do:

# pkg install gcc
...

# pkg install gdb
...

NOTE

Instead of the two commands above it may be sliglhtly better to adopt the following alternative:

# pkg install --be-name developer-gnu developer-gnu

And if you want to be a little bit more cautious just add the --be-name switch in order to designate a new boot environment (BE) under which the gcc and gdb packages are to be installed. In case you take this path after a successful completion of the commands you'll have to reboot the system into the newly created BE.

In the end you can confirm what's at your disposal as follows:

# pkg info -r gcc
          Name: developer/gcc
       Summary: GCC
      Category: Development/C (...)
                Development/C++ (...)
                Development/Fortran (...)
                Development/GNU (...)
                Development/Objective C (...)
         State: Installed
     Publisher: solaris
       Version: 4.8.2
Build Release: 5.11
        Branch: 0.175.3.0.0.30.0
Packaging Date: August 21, 2015 04:45:27 PM
          Size: 5.46 kB
          FMRI: pkg://.../gcc@4.8.2,5.11-0.175.3.0.0.30...

# pkg info -r gdb
          Name: developer/debug/gdb
       Summary: GDB 7.6
   Description: GDB, the GNU Debugger, is a ...
      Category: Development/System
         State: Installed
     Publisher: solaris
       Version: 7.6
Build Release: 5.11
        Branch: 0.175.3.0.0.30.0
Packaging Date: August 21, 2015 04:35:57 PM
          Size: 8.93 MB
          FMRI: pkg://.../gdb@7.6,5.11-0.175.3.0.0.30...

Pretty fast, simple and cool, isn't it?
By the way, the IPS takes care of everything!
You don't have to manually tweak absolutely nothing!

NOTE

Yes, it's true that GCC is not a native compiler to Solaris and as such it will never be able to generate the most streamlined and efficient code to the platform. But as a learn and a project started tool is a quite good solution. Later on, if your project become successful you can consider the platform's native compiler and tools Oracle Solaris Studio.

Newlines: DOS/Windows to UNIX

If you're in UNIX and happen to have a source file originally created in DOS/Windows, then you most definitely have to get rid of the extra \r (a.k.a. ^M) that DOS\Windows inserts on a text file in order to mark line breaks.

While in vim the extra ^M are easily noticed:

#ifndef ..._BASICS_HXX^M
#define ..._BASICS_HXX^M
^M
namespace ...^M
{^M
^M
//^M
//^M
//^M
using byte = unsigned char;^M
^M
//^M
//^M
//^M
using capacity_type = unsigned long long int;^M
^M
} // namespace ...^M
^M
#endif // ..._BASICS_HXX^M
^M
~
...
~
"basics.hxx" 20 lines, 224 characters

While still in vim simply issue the following command:

:%s/\r//g

Which means to globally (g) substitute all (%s) the \r for nothing, in other words, to remove all the \r. Just save the file and you're done. Naturally you'll find a dozen other ways to accomplish the same thing and even more efficiently by means of a myriad of alternative shell external commands. So, suit yourself :-)

Thursday, February 25, 2016

C++11 and C++14

The C+03 standard stayed among us for a long long time, perhaps a decade or so.
While we had plenty of time to get to know it reasonably well this led to stagnation.
But away from the eyes of the crowd, C++ continued continued its evolution.
It's true that there were some clues from good libraries, most notably Boost.
In fact, Boost is a very good library with significant influence on the C++ standards.
But Boost wasn't the standard itself and so was usually treated as experimental.
When the new C++11 standard finally saw the light of the day it was overwhelming.
The more reasonable generally available C++11 compilers appeared 2 to 4 years later.
Anyway, the multitude of new stuff of C++11 standard is great and much welcome.
But I risk to say it's almost impossible to absorb it in less than a year or so.
What seems to be true is that a wealth of previous limitations are gone.
The C++14, as far as I know, is a "slight" refinement of C++11.

The ISO C++ 11 standard is officially known as ISO International Standard ISO/IEC 14882:2011(E) and the ISO C++ 14 standard is officially known as ISO International Standard ISO/IEC 14882:2014(E).

Instead of mastering every corner of the new standards, it seems better to concentrate just on the enabling new features particular to a certain objective at hand, while being aware that later on one may surprisingly come to discover some built-in ready-made industrial-strength feature that is highly portable and efficient and does exactly what you're trying to do as well as or better than your individual efforts.

So, it will take a lot of time to tackle the C++11 and C++14 standards.
Hopefully much less than it took to know C++03.
Let's see, C++17 seems to be under way!

Alignment - second thoughts

In fact this post is sort of an extension from the previous Alignment - first thoughts.
In general, sizeof( T ) is not the optimal alignment modulus for type T.
It's coarse even for some basic types and specially for user-defined types.
One can verify this even on a Windows XP and DJGPP C++11:

sizeof( long double ) = 12
alignof( long double ) = 4

sizeof( aggregate< double > ) = 8
alignof( aggregate< double > ) = 4

with

template< typename T >
struct aggregate
{
   T element;
};

The larger in size T gets, the coarser is the alignment based on its full size.
For aggregate types larger than the most stringent type the waste can be well over 100%.
For example:

struct T
{
   char        member_1[ 5 ];
   long double member_2;
   char        member_3[ 2 ];
   void *      member_4;
};

           TYPE        OFFSET SIZE ALIGN
member_1 : char [ 5 ]       0     5      1
member_2 : long double      8    12      4
member_3 : char [ 2 ]      20     2      1
member_4 : void *        24 4    4
                           28      4

28 is a valid, although sub-optimal, alignment modulus for type T.
align< T * >( (void *) 1 ) yields 28, thus wasting 27 bytes!
This problem only gets worse and worse as the size of T gets bigger and bigger.
Had we known that the optimal alignment modulus is 4 we would waste just 3 bytes!
Had we known the size of the most stringent type that fits inside T we'd waste 11 bytes.

As suggested above, there are 2 solutions for this problem.
One of them is optimal and the other although sub-optimal is bounded.
They are respectively:

The advent of C++11 alignof() which is a portable and standard way of querying the optimal alignment modulus for a given type. Up to C++03, as far as I know, the only way out is by using some non-standard non-portable compiler extension, such as the GNU's __align__(). The importance of this case is for obtaining the optimal alignment for a specific data type. A specialized memory allocator would benefit from this.
The advent of C++11 alignof( std::max_align_t ) or the even coarser but still bounded sizeof( std::max_align_t ). As far as I know, there's no GNU extension supporting this. The importance of this case is for obtaining the most general alignment suitable to any data type. A generalized memory allocator would benefit from this.

The smallest 2ⁿ-multiple ≥ B

When dealing with quantities that are powers of 2, the binary (base-2) arithmetic as well as the typical binary representation of integer values can allow interesting alternatives to the usual base-10 arithmetic we are commonly used to.

The obvious prerequisites are binary arithmetic and logical operations.
The relevant formula is:

[ B + ( A - 1 ) ] & [ ~( A - 1 ) ], where A = 2ⁿ for some n ≥ 0.

The above formula will compute the smallest multiple of A greater than or equal to B.
B is a positive integer value which, if large enough, could represent a pointer.
In this process it's imperative that A be a (positive) power of 2.

One particular application is adjusting C++ pointers to a certain alignment modulus.
In other words, this technique is very important to C++ memory management.
B would be the value of a pointer to be aligned to an A-multiple boundary.
If an alignment modulus is 4, then A = 4 and thus n = 2.
You know: alignment is vital.

Of course I shall prove the formula with a minimal (but enough) formalism.
But first let's get to know better how it works.

The trivial case is when n = 0 ( A = 1 ), which just yields B, no big deal.
In this case we are just aligning to a single byte boundary as A = 1.
So no adjustment to B is required and hence B itself is fine.

For a 4-bytes (32-bits) boundary alignment we would use n = 2.
That means we are interested in multiples of 4, that is, A = 4 and hence n = 2.

To get the whole idea of the formula, let's just work with n = 3, that is, A = 8.
Let's assume, for the sake of simplicity, a single-byte magnitude for the B value.
In a memory alignment scenario, B would be the value of a multi-byte pointer.
As a power of 2, the binary representation of A is a 1 followed by n zeros.
As a direct consequence ( A - 1 ) is always a sequence of n ones.
Hence ~( A - 1 ) is a bit-mask to clear out the first n least significant bits.

Look:

If n = 3,
then A = 8 = 1000₂ and ( A - 1 ) = 111₂ and ~( A - 1 ) = 11111000₂.

And so what?

Alignment - first thoughts

Memory alignment is an important and unavoidable low-level fact.
The trade-off is space overhead due to padding which is to be minimized.
Managing alignment presents challenges, specially before C++11.
As you know, C++11 is a major advance since C++03.

A few important things to keep in mind are:

In main memory, every data type must (as an absolute requirement) or should (at least for optimization) be located at a characteristic address multiple, aka the alignment modulus due to hardware design and engineering issues. In fact, it's not uncommon, with exceptions, to have the arithmetic data types' alignments and the void * type alignment dictating all the other dependent data types' alignments.
Type T is more restrictive (or stringent) then type S if the alignment modulus for type T is greater than (or equal --- a technicality for the reflexive property) to the alignment modulus for type S.
In general, if type T is more restrictive than type S, then converting T * to S * and back won't cause addressing misalignment or exception in detriment of a program or application.
By means of an appropriate C or C++ typecast, a void * can be converted into a valid T * whatever be the type of T. Similarly, a const void * can be safely be assigned to a const T *. Hence, void * is more restrictive than T * for any type T (including function pointers void (*)() ). These facts are quite powerful because (low-level) allocators generally return a void * which in turn they obtain from a platform low-level system-call including ::shmat() and malloc(3C) variants such as: mtalloc(3ALLOC), umem_alloc(3ALLOC), and bsdmalloc(3MALLOC), not excluding their possible and respective implementation of memalign(). Either the allocator will return an address suitable to the most restrictive type of the compiler will adjust it according to the typecast.
An aggregate data type alignment modulus is frequently a multiple of the alignment modulus of the more stringent of its members, which ultimately is the alignment modulus of the more stringent primitive data type involved. Aside from hardware constrains, the basic fact behind this is the language requirements to well-formed arrays.

Naturally, I understand that platform and/or compiler specifics pragmas, such as pragma align and pragma pack, as well as unions including the most restrictive type are very useful tools in assuring a certain alignment constraint, but they consist on a different approach to what I'm talking about here, which (although not space efficient) is more portable in all aspects. My goal is to obtain a properly aligned value of a type-specific pointer (T *) within an arbitrary raw chunk of memory (a raw block of memory pages), which, after all, is typical in the context of memory allocators.

Thus, the trick is how to obtain proper alignment and avoid any portability problems.
I'd say the answer is based on the following equivalent rules:

Rely on the standards and be aware of industry-wide facts.
Assume nothing too specific about a platform and its system and compilers.

I always kept in mind these rules, but in practice I never really found a sufficiently clear example of how to adhere to them. Perhaps it is a matter of highly confidential treasures of a few. At the same time, the scenarios were apparently so ugly that I was discouraged to try to tackle anything by myself. In addition, third parties solutions claiming to address the issue were so clumsy that I became uninterested on embracing any of them. I just wanted some really simple approach that could get the job done on most straightforward scenarios.

Kicking off

This web log is to help myself and perhaps contribute to demystify some C++ topics.
The key to success is to balance performance, simplicity and elegance.

I've been trying a few other posts on C++ mixed up with other subjects on another blog of mine, but I intent to gradually revise and migrate all of them to this new blog I'm just kicking off dedicated to C++.

C++ is really great but I'm still skeptical about many corners of it, specially its libraries. My opinion is that enabling industrial-strength technologies is "more interesting" than attempting to provide ready-made solutions for every imaginable thing or fashion.

Learnings on C++

Pages