Consistency of the actual behaviour would be better verified by a test that sets 16 bytes as 0xFF and then sets the remaining bytes randomly, in order to detect a change to a behaviour that  always sets the output to all 0xFF.
The documentation doesn’t promise one behaviour vs the other, but preserving any entropy in the bad input is the safer behaviour (e.g. the data in it might get fed elsewhere into some rng hash or something) and the test should catch if it gets changed.
A more extensive test would also(?) check the boundary conditions (smallest non-overflowing, smallest overflowing, maximum value). I don’t think that’s particularly important so long as scalar-set’s overflow behaviour is checked precisely else where (I didn’t look).