A safer printf with variadic templates

Jan. 10 2018 by Daniel Grumberg

Introduction

How many of you use the printf family of functions? Quite a lot, I assume. The functions come with some advantages over the standard output streams (cout, cerr, etc): - They are plain C and write directly to the underlying POSIX file descriptor, they are thus thread-safe.

  • They are somewhat faster as they don’t rely on operator overloading and thus dynamic dispatch through virtual function tables.

This StackOverflow post should give you some numbers if you care. Most people probably don’t/shouldn’t care about this.

  • They are ubiquitous, well-known, and specifying precision and width is arguably simpler than it is in the stream idiom.

However, printf and all its friends suffer from one big drawback, type safety, which keeps bringing people back to streams. Have you ever provided printf or worse scanf the wrong format string for what you were trying to achieve? I know I have…

All standard C libraries implement printf using the va_arg family of macros. Beyond being cumbersome, and error-prone to use, they delegate all the argument typing and “checking” to the run-time program. Most people agree that typing is better left to compile-time, where mistakes and errors are caught before the program gets a chance to run. Since C++11, we have variadic templates to help us with implementing such functions.

Quite a few people have proposed implementations of type-safe printf alternatives. I will not consider here things like boost::format or the fmt library as they implement a new formatting string language and I am only exploring in drop-in replacements for printf. The most notable discussion of this topic was done by Andrei Alexandrescu in this talk at Going Native 2012. He proposes a nice and simple two-step approach with one traversal of the string and arguments to type check them, and then just delegates the functionality to std::printf. He argues the checks can be easily disabled in release mode through the use of the NDEBUG macro to avoid any performance overhead. This approach works great, as no one ever uses printf to print large strings. However, I will be presenting a skeleton implementation for checking the arguments in place to explore implementing a safe printf from first principles as a go to facility for outputting to stdout. The main drawback of my approach is that all the characters in the format string until the invalid format specifier are outputted anyway. If you do care about this a lot, Alexandrescu’s approach is better suited to your needs.

Before you proceed any further you need to make sure you know how to use variadic templates in C++, if you are new to the topic or if you need a quick refresher you might want to check out my introductory post to the topic.

Argument normalisation

The first thing we have to notice, that printf performs argument normalization. Indeed, any integral type is considered to be a long (unless specified otherwise with width modifiers) and every floating point number is a double (again unless specified otherwise). Furthermore, we want to allow users to natively be able to format std::string. I choose to implement this functionality through a templated function as follows:

template <typename T>
typename std::enable_if<std::is_integral<T>::value, long>::type
normalize_arg(T arg) { return arg; }

template <typename T>
typename std::enable_if<std::is_floating_point<T>::value, double>::type
normalize_arg(T arg) { return arg; }

template <typename T>
typename std::enable_if<std::is_pointer<T>::value, T>::type
normalize_arg(T arg) { return arg; }

const char* normalize_arg(std::string const& str) { return str.c_str(); }

The aim here is to cast types having a certain trait to one of the types discussed above. We use the convenient std::enable_if to overload the return type of normalize_args based on traits of the template type. We also provide an overload that fully specialises the template where we convert std::string to its underlying C-style string. We need to do this because for simplicity reasons as we plan to delegate the actual formatting to printf. Of course, if we were going to implement the functionality from first principles this is an unnecessary restriction.

Printing to standard out

Once we have the argument normalisation building block we are able to define the shape of our top level safe_printf as follows:

template <typename ...Params>
void safe_printf(const char *str, Params const& ...parameters)
{
    flockfile(stdout);
    safe_printf_impl(str, normalize_arg(parameters)...);
    funlockfile(stdout);
}

As mentioned earlier, the printf family of functions works well with concurrent processes because they acquire the file lock associated with stdout. The simplest implementation of this functionality I could think of is wrapping the main body of work with calls to flockfile and funlockfile from the C standard library in the stdio.h header. Let’s now take a look at the main implementation:

void safe_printf_impl(const char *str) {
    // We already own the lock so we might as well use the unlocked version
    for(; *str && (*str != '%' || *(++str) == '%'); ++str) putchar_unlocked(*str);

    if (*str) throw std::runtime_error("Too few arguments were passed to safe_printf");
}

The above code snippet represents the base case of the compile-time recursion. The purpose here, is to keep printing the rest of the format string if we run out of formatting parameters. The last line of this function throws a std::runtime_error if we detect a format specifier in the format string. This is because we cannot format missing parameters and thus the call to safe_printf is invalid and does not type-check.

template <typename Param, typename ...Params>
void safe_printf_impl(const char *str, Param parameter, Params... parameters)
{
    // We already own the lock so we might as well use the unlocked version
    for(; *str && (*str != '%' || *(++str) == '%'); ++str) putchar_unlocked(*str);

    validate_type_parameter<Param>(*str);
    const char format[3] = {'%', *str, '\0'};
    printf(format, parameter);

    safe_printf_impl(++str, parameters...);
}

If we have a list of at least one parameter to format, we want to output the characters in the format string as usual until we hit a format specifier. We then proceed to make sure that the specifier agrees with the type of the first parameter. If this check succeeds we can format the parameter as usual. The last line is interesting as we now “recurse” (we are calling a different template instantiation) to process the remainder of the string with the remainder of the parameters.

I chose to use type traits to implement validate_type_parameter I find it a more expressive technique for checking type properties. An alternative to this scheme is to specify a template specialisation for each possible type check inside each one if the format specifier is correct. My implementation is given below:

#define ENFORCE(A) if (!(A)) throw std::runtime_error("Type did not match format specifier")

template <typename Param>
void validate_type_parameter(char format_specifier)
{
    switch(format_specifier)
    {
        default: throw std::runtime_error("Invalid format specifier, only f, d and s are allowed");
        case 'f':
            ENFORCE(std::is_floating_point<Param>::value);
            break;
        case 'd':
            ENFORCE(std::is_integral<Param>::value);
            break;
        case 's':
            constexpr bool is_valid_c_str
                = std::is_same<Param, const char *>::value || std::is_same<Param, char *>::value;
            ENFORCE(is_valid_c_str);
            break;
    }
}

Conclusion

The cool thing is that C++ supports template argument deduction, which allows you to to truly use this implementation as a drop-in replacement for printf. This achieves exactly what we want in the sense that we just aim to check if the arguments we supplied can be correctly printed using the supplied format string. Here is a quick example of how this implementation behaves:

std::string world("world");
safe_printf("Hello %s!. I am %f%% sure this works.\n", world, 0.99f); // This works as expected
safe_printf("Hello %s!. I am %f%% sure this works%s.\n", world, 0.99f); // This fails
safe_printf("Hello %s!. I am %d%% sure this works.\n", world, 0.99f); // This fails as well

The implementation of safe_printf I just presented does not implement, the full printf functionality. But the main idea for a drop-in replacement of classic printf is shown. Also, I would like to find a way of avoiding to produce output for invalid calls without otherwise introducing buffering that would not happen in the first place, if anyone knows how get in touch via email or in the comments section of this gist that has the full code.