std::vector and std::remove

2008-06-28

You might have known this but I just found this out yesterday.

I knew std::remove doesn't actually remove anything. The way it is used goes something like this.

std::vector<int> int_array;

int_array.push_back(1);
int_array.push_back(1);
int_array.push_back(1);
int_array.push_back(2);
int_array.push_back(3);
int_array.push_back(4);

// call remove to remove all the elements that equal 1
// remove reorders all elements so that the things you
// want removed are at the end of the vector
std::vector<int>::iterator erase_from = std:remove(int_array.begin(),
                                                   int_array.end(),
                                                   1);

<!-- more -->

// actually erase the elements at the end
std::erase(erase_from, int_array.end());

Those comments are how I understood it to work. But that's not the entire story.

It turns out if you actually check the contents of the vector after the std::remove above but before the erase it will be

2,3,4,2,3,4

In fact the stl docs state that the contents of the range between erase_from and int_array.end() are undefined!

Why is this important? Well, a couple of things

(*) if you look above you'll see there are two 2s, two 3s and two 4s. If you're storing objects and 2, 3 or 4 are large you'll now have 2 of each.

Why is this important? Imagine of you had a vector of vectors or a vector of std::strings where your strings were the contents of entire files. Maybe you were storing unique strings in your vector yet after the call to remove not everything in your vector will still be unique.

(*) The 1s actually got deleted. Remove *can* "remove" the 1s. Erase then erases the last elements which are no longer 1s. In other words, the idea that std::remove doesn't actually delete anything is mis−leading.

Why is this important? Well, if you were storing smart pointers in your vector then something might have just been deleted. Of course if you had an array like this 2,3,4,1,1,1 the ones would still be in your list at the end of remove in which case they didn't get removed.

(*) If you check most standard implementations, the way it works it effectively:

iter remove(iter start, iter end, int value)
{
  iter copy_pos = start;
  while (start < end)
  {
     if (*start != value)
     {
        // this is a value we still need so copy toward the begining
        *copy_pos = *start;
        ++copy_pos;
     }
     ++start;
   }

   return copy_pos;
}

What that code effectively does walk the entire vector and copy each element we want to keep toward the start of the list while skipping the ones we asked to be removed. The implication is that assignment operator will be called for each one. If your assignment operator does anything important that could be significant.

If you are using stl::list these issues go away as it can just bubble things to the beginning of the list, no copying needed.

Comments
Stop Pasting Code
Soul Bubbles