The C++ standard library implements numerous common algorithms that accept a predicate function object. The C++ Standard, [algorithms.general], paragraph 10 [ISO/IEC 14882-2014], states the following:

[Note: Unless otherwise specified, algorithms that take function objects as arguments are permitted to copy those function objects freely. Programmers for whom object identity is important should consider using a wrapper class that points to a noncopied implementation object such as reference_wrapper<T>, or some equivalent solution. — end note]

Because it is implementation-defined whether an algorithm copies a predicate function object, any such object must either

  • implement a function call operator that does not mutate state related to the function object's identity, such as nonstatic data members or captured values, or
  • wrap the predicate function object in a std::reference_wrapper<T> (or an equivalent solution).

Marking the function call operator as const is beneficial, but insufficient, because data members with the mutable storage class specifier may still be modified within a const member function.

Noncompliant Code Example (Functor)

This noncompliant code example attempts to remove the third item in a container using a predicate that returns true only on its third invocation.

#include <algorithm>
#include <functional>
#include <iostream>
#include <iterator>
#include <vector>
  
class MutablePredicate : public std::unary_function<int, bool> {
  size_t timesCalled;
public:
  MutablePredicate() : timesCalled(0) {}

  bool operator()(const int &) {
    return ++timesCalled == 3;
  }
};
 
template <typename Iter>
void print_container(Iter b, Iter e) {
  std::cout << "Contains: ";
  std::copy(b, e, std::ostream_iterator<decltype(*b)>(std::cout, " "));
  std::cout << std::endl;
}
 
void f() {
  std::vector<int> v{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
  print_container(v.begin(), v.end());

  v.erase(std::remove_if(v.begin(), v.end(), MutablePredicate()), v.end());
  print_container(v.begin(), v.end());
}

However, std::remove_if() is permitted to construct and use extra copies of its predicate function. Any such extra copies may result in unexpected output.

Implementation Details

This program produces the following results using gcc 4.8.1 with libstdc++.

Contains: 0 1 2 3 4 5 6 7 8 9  
Contains: 0 1 3 4 6 7 8 9

This result arises because std::remove_if makes two copies of the predicate before invoking it. The first copy is used to determine the location of the first element in the sequence for which the predicate returns true. The subsequent copy is used for removing other elements in the sequence. This results in the third element (2) and sixth element (5) being removed; two distinct predicate functions are used.

Noncompliant Code Example (Lambda)

Similar to the functor noncompliant code example, this noncompliant code example attempts to remove the third item in a container using a predicate lambda function that returns true only on its third invocation. As with the functor, this lambda carries local state information, which it attempts to mutate within its function call operator.

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>
  
template <typename Iter>
void print_container(Iter b, Iter e) {
  std::cout << "Contains: ";
  std::copy(b, e, std::ostream_iterator<decltype(*b)>(std::cout, " "));
  std::cout << std::endl;
}
 
void f() {
  std::vector<int> v{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
  print_container(v.begin(), v.end());

  int timesCalled = 0;
  v.erase(std::remove_if(v.begin(), v.end(), [timesCalled](const int &) mutable { return ++timesCalled == 3; }), v.end());
  print_container(v.begin(), v.end());
}

Compliant Solution (std::reference_wrapper)

This compliant solution uses std::ref to wrap the predicate in a std::reference_wrapper<T> object, ensuring that copies of the wrapper object all refer to the same underlying predicate object.

#include <algorithm>
#include <functional>
#include <iostream>
#include <iterator>
#include <vector>
  
class MutablePredicate : public std::unary_function<int, bool> {
  size_t timesCalled;
public:
  MutablePredicate() : timesCalled(0) {}

  bool operator()(const int &) {
    return ++timesCalled == 3;
  }
};
 
template <typename Iter>
void print_container(Iter b, Iter e) {
  std::cout << "Contains: ";
  std::copy(b, e, std::ostream_iterator<decltype(*b)>(std::cout, " "));
  std::cout << std::endl;
}
 
void f() {
  std::vector<int> v{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
  print_container(v.begin(), v.end());

  MutablePredicate mp;
  v.erase(std::remove_if(v.begin(), v.end(), std::ref(mp)), v.end());
  print_container(v.begin(), v.end());
}

The above compliant solution demonstrates using a reference wrapper over a functor object but can similarly be used with a stateful lambda. The code produces the expected results, where only the third element is removed.

Contains: 0 1 2 3 4 5 6 7 8 9  
Contains: 0 1 3 4 5 6 7 8 9

Compliant Solution (Iterator Arithmetic)

Removing a specific element of a container does not require a predicate function but can instead simply use std::vector::erase(), as in this compliant solution.

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>
 
template <typename Iter>
void print_container(Iter B, Iter E) {
  std::cout << "Contains: ";
  std::copy(B, E, std::ostream_iterator<decltype(*B)>(std::cout, " "));
  std::cout << std::endl;
}
 
void f() {
  std::vector<int> v{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
  print_container(v.begin(), v.end());
  v.erase(v.begin() + 3);
  print_container(v.begin(), v.end());
}

Risk Assessment

Using a predicate function object that contains state can produce unexpected values.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

CTR58-CPP

Low

Likely

High

P3

L3

Automated Detection

Tool

Version

Checker

Description

Helix QAC

2024.1

C++3225, C++3226, C++3227, C++3228, C++3229, C++3230, C++3231, C++3232, C++3233, C++3234


Parasoft C/C++test

2023.1

CERT_CPP-CTR58-a

Make predicates const pure functions

Polyspace Bug Finder

R2023b

CERT C++: CTR58-CPPChecks for function object that modifies its state (rule fully covered).

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

Bibliography

[ISO/IEC 14882-2014]

Subclause 25.1, "General"

[Meyers 2001]Item 39, "Make Predicates Pure Functions"



10 Comments

  1. What about reference members in addition to const members?

    1. A reference member could be acceptable to use in the given example of removing elements, but the predicate is still an inelegant solution to removing by index.

      1. Well, there's just something fundamentally wrong with the idea that a (functor) predicate can't rely on its own internal state to carry out its job. "remove_if()" is a great example of this. I recently wrote an "IsDuplicate" functor for passing to "remove_if()", which is a very clean way to remove duplicates from any sequence. Each time "remove_if()" calls my functor, I simply look up the current (passed) item in an internal collection (a hash member variable), and return true if found or false otherwise (and in the latter case, just insert the item into the collection for the next iteration). To my surprise, I quickly discovered the problem discussed in this posting. Now, I should have known better, being that I'm an experienced C/C++ developer (25+ years), but even I got tripped up at first. It goes to show how often I've had to write a predicate that requires internal state over the years (possibly never, which is surprising considering how much code I've written). Nevertheless, while there may be valid arguments in favour of allowing C++ implementors to copy a predicate (resulting in the issue now being discussed), from an end-user's perspective (a C++ programmer in this case), there is something deeply flawed about it IMHO. I should be able to rely on any internal state I want without having to turn to reference members or shared pointer members to pull this off. Of course it's a little easier now that we have lambdas, since I can now write my functor as a lambda instead, passing my hash collection by reference using the capture clause (which is just a glorified way of writing the same functor discussed above, in this case, one with a reference member to my external hash collection). For reusability however, a functor would be preferable (allowing me to reuse it as a predicate for any algorithm, not just "remove_if()"), but doing it this way is more unwieldy and (possibly) less efficient, because I now have to store a reference or shared pointer in my class due to the (predicate) copy issue (and therefore deal with how to initialize that reference or shared pointer in the first place). It's unclean and inelegant IMHO.

        1. Predicates holding state is a classic trade-off between performance and functionality. Once the predicates start holding state, there are performance implications because copy constructors must be called (or assignment operators), etc. By disallowing state, predicates have a consistent, predictable performance cost (so you don't run into the problem of adding a data member and suddenly degrading performance x10). That being said, the behavior can certainly be surprising (hence the reason why we have a rule prohibiting it).

          Btw, if you were able to sort your data set, you could then use std::unique to remove duplicates. But this necessitates sorting, which is sometimes not feasible (leading to contortions).

          1. I actually don't find that argument justified, assuming it was ever seriously considered (I don't know). Programmers themselves should be able to make their own decisions about performance and do all the time. However, I do understand the difficulties of writing a general purpose library (namespace std) that may have to copy client objects that have internal state. Passing them all around by reference (const or otherwise as the case may be) would seem to have been the better choice, but going down this avenue no doubt has all sorts its own implications and subtleties to overcome (especially in light of the complex rules surrounding templates). I suspect this had far more to do with the situation than protecting programmers from performance issues (disallowing copying might have proved very difficult). As for sorting the data, unfortunately I couldn't rely on "std::unique". The data I was dealing with had to maintain its original (unsorted) order, but even if not, it might not always be efficient to sort it just for the purposes of removing data (though the lookup hash I'm now relying on has its own overhead of course)

  2. Just a follow-up to my previous comments (happen to come across this post again 1.5+ years later). The "compliant" solution shown above (using a "std::reference_wrapper") fixes one problem, but it's still invalid AFAIK. assuming no changes occurred in C++11 (I checked "remove_if" under 25.3.8 in the standard and see nothing that disputes this). According to Herb Sutter (who else would know better), you can't make any assumptions about the order of elements that will be visited in the sequence. He effectively addresses an identical example to your own in rule 87 of his book "C++ Coding Standards: 101 Rules, Guidelines, and Best Practices" (co-written with Andrei Alexandrescu). Under rule 87 ("Make predicates pure functions"), he states that "conceptually, this example is perverse because "remove_if" only guarantees that it will remove all elements satisfying some criterion. It does not document the order in which elements in the range will be visited or removed, so the code is relying on an assumed, but undocumented, assumption. The correct way to remove a particular element is to iterate to it and then call erase.". More generally he states (when dealing with predicates), that "The standard algorithms generally make no guarantee about the order in which the predicate will be applied to the elements in the range. In the absence of guarantees about the order in which objects will be visited, operations like "flag the third element" (see Examples) make little sense, because which element will be visited "third" is not well-defined". Also search for "Property 7" here http://www.drdobbs.com/effective-standard-c-library-unary-predi/184403777. These rules make things very difficult for programmers in some cases, and should be reviewed by the committee IMHO. My earlier post is a very good example, because it makes removing duplicates needlessly complicated (noting that order needs to be preserved in this case), even when using predicates that can be safely copied using reference or shared pointer semantics. With no guarantee that elements will be visited in the expected order, what's a programmer to do (you apparently can't (easily) solve a fundamental requirement like removing duplicates using the standard algorithms - go figure).

    1. It's not entirely accurate to state that algorithms generally make no guarantee about their ordering. In this particular case, std::remove_if() does make a guarantee by virtue of requiring ForwardIterator arguments (see [algorithms.general]p5). Since a ForwardIterator only guarantees the ability to sequentially forward iterate from first to last, and the complexity of the algorithm is that exactly last - first applications of the corresponding predicate be called, there is a guaranteed total order. However, the point is still salient – it's not a good practice to rely on state within a predicate, which is why the second CS is a preferable alternative to the first.

      I'm not certain what you mean about no way to easily remove duplicates using the standard algorithms; that's what std::unique() does.

      1. In practice that's probably true, but it contradicts what Sutter says (he addresses this very example). Moreover, though unrealistic, it's theoretically possible for some implementation to visit each element in sequential order, store copies of the iterator and later invoke them out-of-order (the predicate is still called just once). I'm less concerned about that however (it won't realistically happen) than I am about the chair of the C++ committee claiming it's "not possible ... to work around the second point" (referring to the point I quoted from his book earlier). I recognize your own credentials, but you are in conflict with Sutter on the matter (regardless of who's right). Moreover, while some algorithms guarantee their order ("for_each"), others don't, and I'm inclined to err on the side of caution (I haven't researched this issue ad nauseam but even some experts apparently can't agree). BTW, "std::unique" removes consecutive duplicates only, so you're required to sort the collection first (to remove them all). Not good (inefficient and it doesn't allow you to preserve the collection's order should you require it).

        1. I don't think Herb and I are actually in conflict in practice. We both agree that stateful predicates are dangerous, we both agree that the proper solution to removing the third element of a container is to use .erase() to remove it. We simply disagree about the minutiae of the language lawyering side of things. I strongly agree with Herb that relying on implementation details of the algorithms is dangerous, we just differ on where specification stops and implementation detail begins.

          I wouldn't get hung-up on the horribly contrived example code's purpose. It's contrived code meant to demonstrate a class of problems and possible general solutions to those problems. It's not meant as a tutorial. In this case, the class of problem is stateful algorithm predicates. One possible, valid, solution to that is std::reference_wrapper<T>. The better solution to that is to not rely on stateful algorithm predicates. This should probably be made more clear in the text, however.

          As for std::unique(), you are correct that it requires the container to be sorted (as many containers are by default). I think that discussions of efficiency are premature because that depends heavily on your container's usage pattern, but would point out that sorting and uniquing a container is O(N log N + N), while removing duplicates from an unsorted container would be O(N^2). Of course, you are correct about preserving relative order of elements being problematic in this case.

          1. Unfortunately, programmers get hung up (and often stung) by these issues because the standard itself doesn't clarify them. As I understand it, you're a committee member, and so is Herb of course, and you're own example (and some comments) do conflict with his even though you recognize and acknowledge the issues involved (though you never stated that in the article, tacitly leaving people to believe it's safe to proceed in spite of the issue Herb addresses). I'm not criticizing you (seriously), but if two committee members can't agree on the legalese of the language (or lack thereof), what are non-language specialists supposed to do. I'm also a very experienced developer (25+ years in C/C++), and I still get caught up in issues like this sometimes, let alone far less experienced developers than myself (or you). You know as well as I do, that subtle issues can cause serious problems, and this issue is unclear, even if the order won't be a problem in any mainstream implementation (very likely not). There shouldn't be any debate at all, and the standard should be the final arbiter, but it's not (like so many other things). With all due respect though (this is meant to be constructive), your article contributes to the confusion IMHO, not by promoting ways to safely copy a predicate (it clearly addresses this very issue, though having to make predicates "pure" is arguably a language deficiency IMHO), but by neglecting to mention the issue surrounding the order of elements. By failing to mention it (perhaps you weren't thinking of it or didn't consider it at the time - an easy mistake which is part of the problem), it promotes the idea that the order doesn't factor in, when it clearly does (or can), and even experts disagree. The "minutiae" of the language is where a program can potentially blow up! BTW, sorting a container with the particular data I'm dealing with (to remove duplicates using "unique") is slower than dealing with an unsorted container (I once benchmarked it). Other issues can also impact the situation of course (unrelated to the size or type of data), like memory management, etc., but the primary issue however is maintaining the collection's order (what I meant when I previously said that the standard library doesn't address this fundamental need). Why should anyone have to sort a container first anyway, just to remove duplicates. Whether you're maintaining the order or not, or whether someone claims it's generally faster or not (remains to be seen), sorting and removing duplicates have nothing to do with each other, even if the former can be used to improve the latter. It shouldn't be forced upon you (though this has nothing do with the main issue we've been discussing).