Celebrating

7

years

of expertise

in software

development

Latest news

We wish you a pleasant reading

Qt containers and COW

For as long as Qt exists, there has been a big question: should I use Qt or std containers? QVector or std::vector?

In terms of API, such talk mostly gets reduced to terms of personal preferences. But there is a killer feature which brings Qt containers on top of std single-handedly. Copy-on-Write, or COW.

What is COW?

There’s a good article within Qt’s own documentation, which gives the details. Here, I will provide a brief intro.
Take the following code:

std::vector<int> v1 {1, 2, 3};
std::vector<int> v2 = v1;

When we assign v1 to v2, a deep copy is performed. We effectively get two different pieces of memory which do not have anything in common.

QVector<int> v1 {1, 2, 3};
QVector<int> v2 = v1;

Here, the situation is very different. There’s a single memory chunk which both vectors have pointers to. This yields a performance boost without writing any additional line of code.
Just like that. If you don’t need to have a copy of a vector, you won’t. You don’t have to create shared pointers for that, it’s given as is.
Okay, but what if we actually want to update the initial vector by assigning some value to one of its elements?
Then Copy-on-Write happens. The second vector detaches, and the deep copy is performed. Now, we have two memory chunks, just as with std vectors. And only when the write actually happens. So, you see, in the worst case you get the same performance as std, but a better one when possible.

Sounds cool, right?

Not so fast.
That was a preamble. Now let’s ask ourselves several questions. When is this deep copy actually performed? Is it always Copy-on-Write? What can we do to avoid it?
That’s what the article is actually about.

What’s wrong about COW?

Most of articles about Qt’s implementation of COW say something like this: “You should be aware that a deep copy may be performed on your containers”.
I don’t buy this “may”. If I care about performance, I need to know the exact cases when it happens and when it does not.
The Qt’s article says: “only a pointer to the data is passed around, and the data is copied only if and when a function writes to it”.
So, I expect the following to happen:

QVector<int> v1 {1, 2, 3};
QVector<int> v2 = v1;
int i = v2[0]; // No detach, we’re reading the element
v2[0] = 10; // Detach, we’re actually writing to the element

Wait.
But we don’t have different [] operators for reading and writing in C++! It’s the very same code called in both cases! How does it know that it should not detach when reading?
It doesn’t.
It always detaches.

// We’re reading the vector, but it enforces a deep copy!
const int i = v2[0];

Yes, you got it right. Copy-on-Write is a lie. It’s more like Copy-on-Write-or-Read-or-Whatever.
Let’s prove our point with an investigation.
Here is how operator[] is defined in QVector:

template <typename T>
class QVector
{
   inline T &operator[](int i)
   {
       return data()[i];
   }

inline T *data() {
       detach();
       return d->begin();
   }
}

It’s an unconditional detach. No checks, no precautions. Just a detach before returning a pointer to the data.
That’s actually logical. You return a pointer. You never know what a developer would do to that pointer, but you have to give guarantees.
(Homework: read the QVector code and find all places where detach() is called. You won’t like it.)
This has rough consequences. For example, if you just passed your QVectors by value as parameters hoping that implicit sharing would do the trick, you were only half-right.

void readVector(QVector<int> v) {
   // At this point, v is implicitly shared, and everything is fine.
   // But v gets automatically detached on the first iteration below
   for(int i : v) {
       // do something
   }
}

QVector<int> vec1 {1, 2, 3};
readVector(vec1);

Normally, you would pass such vector by reference (everything is obviously fine in such case), but if you pass it to a queued signal, it’d be tempting to pass by value and praise COW instead of messing with memory management or getting a crash due to an outdated reference. This is a good approach, but it requires a lot of attention to the code within those slots. Safety is implied, but no guarantee given.

What can we do about it?

Okay, COW doesn’t do what it promised to. Is there a way to access a container without having it detached?
Yes, there is. Remember how operator[] was defined in QVector? Here’s how const operator[] is defined:

template <typename T>
inline const T &QVector<T>::operator[](int i) const
{
   return d->begin()[i];
}

No detach. The operator knows that the data is not to be modified, so it does not do a precautious deep copy.

So, here’s the ultimate answer: detach always happens for non-const access, and never — for const access.
This could be the end of it, but let’s consider the actual use cases.
One would think: how does C++ compiler know when to call a const operator and when a non-const one? It’s simple. A const operator is called for a const object, and vice versa.
So, if you want to avoid detaching, you have to make your secondary vector const.
Getting back to our original example:

QVector<int> v1 {1, 2, 3};
const QVector<int> v2 = v1;
int i = v2[0]; // No detach, v2 is const

Note that constness matters when you try accessing the data, not every other time. So, you can easily have a non-const vector and just cast it to const when you read a value (qAsConst was added to Qt exactly for that).
If you can’t mess with constness or don’t want to, there are API calls which don’t detach by default. These are all member functions which are marked as const. For example, QVector::at(), QVector::value(), QVector::constData() or const iterators.
I am often asked, why we should be so vigilant about using constness extensively in our C++ code. This is one of the reasons, the one which has immediate measurable consequences.
If you’re thinking of an automated check for such hidden detach scenarios, clazy is your best friend.

Examples

Iterating

The most commonly used and an underestimated one as well: iterating over a container.

QVector<int> v1 {1, 2, 3};
for(const int i : v1) {
   int a = i;
}

This is what I regularly see even from the experienced developers. It mostly comes from the expectation that if we’re reading values, no detach happens. As we could see in the previous chapters, that’s utterly wrong.
There are two ways to correct this:

  1. If you can afford to mark the container const, do it. You might have found one issue with implicit detach, but you could’ve skipped others. The safest way to handle this is to mark the container const in the first place.
    QVector<int> v1 {1, 2, 3};
    const QVector<int> v2 = v1;
    for(const int i : v2) {
       int a = i;
    }
  2. Obviously, this does not work for dynamic containers.
    QVector<int> v1 {1, 2, 3};
    // Populate the vector
    QVector<int> v2 = v1;
    for(int i = 0; i < v2.size(); ++i) {
       v1[i]++;
    }

    for(const int i : qAsConst(v1)) {
       int a = i;\
    }

Calling qAsConst or std::as_const helps here, and this is the most common situation with iterations over Qt containers. Rule of thumb: if you don’t update your container in the loop, always use qAsConst. If you do update it, change your algorithm and still use it. Explicit copy is almost always better than implicit detach: at least in terms of clarity and thus maintainability.

Solution: mark the v1 const either explicitly or with qAsConst.

Function arguments and return values

QVector<int> processData(QVector<int> data) {
   QVector<int> dataCopy = data;
   for(int i = 0; i < 1000; ++i) {
       dataCopy = preprocessData(dataCopy);
   }
   return dataCopy;
}

QVector<int> preprocessData(QVector<int> data) {
   QVector<int> processedData;
   processedData.reserve(data.size());
   for(int i = 0; i < data.size(); ++i) {
       processedData[i] = data[i] * 2; // Yeah, something that simple
   }
   return processedData;
}

The code above makes 1000 iterations of preprocessing a vector of data. For every iteration, an internal copy is created, populated and returned as a value.
If you got too scared by these “Detaches! Detaches are everywhere!”, you might want to avoid passing vectors this way at all. But passing a vector as a value is pretty harmless. Actually, this is where COW is extremely useful: only a tiny fraction of vector’s data is allocated on stack. The vector’s body is implicitly shared and not copied when a call is made. So, this example is actually good.
But only in this term. Every time preprocessData() starts querying the data vector, the passed copy detaches. And yes, this happens 1000 times, because every time a different physical object is copied onto the function’s stack.

Solution: change the function’s signature to QVector<int> preprocessData(const QVector<int> data)

The following line may be a bit tricky for comprehending:

dataCopy = preprocessData(dataCopy);

One might think that we could miss a lot of implicit detaches here, as this happens in a loop. But this it not true. We pass dataCopy to preprocessData() by value, and we get a different vector as the result by value as well. This resulting vector we assign to the old dataCopy var. The old value of dataCopy is now gone, and its associated heap memory freed. If this is unacceptable to you as well, you might want to reuse the old container, but it has nothing to do with the detaches we’re fighting with in this article.
As you see, in a relatively complex example with vectors passed by value in nested calls in loops, the only issue was about adding const to a parameter. The rest is handled by COW pretty well, and this is where COW actually shines.

Signals and slots

class DataNode : public QObject {
public:
   void generateData()
   {
       QVector<int> data;
       // Populate the vector
       emit dataReady(data);
   }
signals:
   void dataReady(QVector<int> data);
slots:
   void processData(QVector<int> data)
   {
       long sum = 0; // Let’s just calculate vector’s sum as an example
       for(int i : qAsConst(data)) {
           sum += i;
       }
   }
}

DataNode *a1 = new DataNode;
DataNode *a2 = new DataNode;
connect(a1, &A::dataReady, a2, &A::processData);
a1->generateData();

Don’t look for an error here. The code is silly, but good.
This example is very similar to the previous one. It’s still translated to functions calls, and the vectors are passed by value. It’s still just about making the vectors const where needed.
This example is given to show that nothing changes when you switch from pure functions calls to signals and slots.
Note that even though we don’t mark the parameters as const, we used qAsConst when the detachment could have happened. That’s pretty enough.
If you make the connection queued and process slots in different threads, nothing changes. It’s still all about constness.
This example stands exceptional though. We can have multiple connections to the same signal. Imagine a thousand of nodes connected to dataReady, some implementation of Map-Reduce could utilize such concept. There’s still nothing wrong with passing a vector by value even in this case. But if you actually miss an accidental detach in one of these connected slots, your performance drops thousandfold (a deliberate exaggeration).
So, if you have one-to-many connection in your application and you pass containers around, be extremely careful about the actual logic in these slots, and if you can afford const arguments, do it. Just as you would in the previous example.

Solution: just be attentive and know where to look.

It’s worth noting that in all these examples we used vectors of int, which is as lightweight as it can be. If you have containers of your custom classes, especially when the classes are heavy enough, the cost of an accidental detach could be devastating.

Summary

Copy-on-Write happens not only on actual write.
It happens on any non-const access to the container, even on reading. COW in Qt is actually CONCA, Copy-on-Non-Const-Access.
There are two ways to avoid deep copy in the COW containers:

  1. Mark the container as const, e.g. with qAsConst..
  2. Use const access functions.

If you do it this way, COW provides a nice performance boost for free.
And in the end, here’s a good talk for you to watch.

Comments