Use shuffle() implementation from zlib_into
Description
This should speed up compressing output data, by using a dedicated shuffle
implementation instead of transposing a numpy view.
How Has This Been Tested?
TBD in CI
Types of changes
Checklist:
- My code follows the code style of this project.