I think you have it backwards kinda. we do not care about ordering whatsoever, we are keeping a “partition” of processed and unprocessed elements.
essentially we have a queue with processed and unprocessed elements and a pointer to where we should process next.
0// start
1
2[P1 P2 P3 P4 *U5 U6 U7 U8]
3
4// process element
5
6[P1 P2 P3 P4 P5 *U6 U7 U8]
7
8// insert processed element using push back and swap
9[P1 P2 P3 P4 P5 *U6 U7 U8 P9]
10[P1 P2 P3 P4 P5 *P9 U7 U8 U6]
11[P1 P2 P3 P4 P5 P9 *U7 U8 U6]
12// insert unprocessed element
13[P1 P2 P3 P4 P5 P9 *U7 U8 U6 U10]
If we were to try to do the same with inserts, it would cause N^2 behavior. As you can see, we also do not preserve order during the swap back approach.
If we were to do insert it would look like:
0[P1 P2 P3 P4 P5 *U6 U7 U8]
1// insert processed element using insert
2[P1 P2 P3 P4 P5 *P9 U6 U7 U8 P9]
3[P1 P2 P3 P4 P5 P9 *U6 U7 U8 P9]
And the shifting would cost O(N).
Deque is a good data structure, but has an awful lot of extra overhead making it a poor fit for a performance critical code section. One of the key benefits of this approach is that we keep our data structure quick to iterate/add to.
The swap approach is O(1) per element added, which is fine. If we used insert / shifting it would be O(N) per element, which would cause a quadratic blowup.