The superfluous last iteration when start = end wasn't harmful, since the
iteration body winds up as a no-op in that case anyway, but wasn't
intended or needed.
Calls to sha3-update that did not completely fill an already partially
filled buffer were handled incorrectly, in that the buffer-index wasn't
properly updated. Thanks to Orivej Desh for the bug report.
Quick and dirty benchmarks seem to imply that 32bit implementation is
faster for 64bit LispWorks than 16bit implementation, even though it
causes more consing.