Makes it an error to use flags that have not been defined on the libconsensus API.
There has been some confusion as to what pass to libconsensus, and (combined with mention in the release notes) this should clear it up.
Using undocumented flags is a risk because their meaning, and what combinations are allowed, changes from release to release. E.g. it is no longer possible to pass (CLEANSTACK | P2SH) without running into an assertion after the segwit changes.
However: this currently fails the tests because our own tests rely on this undocumented behavior(!). I am not entirely sure what would be the best way to solve this. Should we skip tests that have non-supported combinations of flags? Done - this does mean only 322 of 1327 script tests are applied to libconsensus, but those are the ones that test consensus behavior.