Delphi FastSnappy64

A fast compressor/decompressor

8 aug.2016, from Google project page: Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems; is useful for higher-level framing and encapsulation of data, e.g. for transporting compressed data across HTTP in a streaming fashion.

Those C builds are compiled with LLVM 3.8.1 CLANG for WIN32/WIN64, BCCIOSARM 7.20 for IOS, BCCAARM 7.20 for ANDROID
Provided a basic sample tested with DCC32/DCC64/DCCIOSARM/DCCAARM 31.0, FPC 3.0

Http Json 50KB TMemoryStream file test
Intel core i7 2.6ghz, Windows 10 Pro

Compression ratio 6x

Snappy 64bit WIN64
compress in 237.33ms, ratio=85%, 1.6 GB/s
uncompress in 92.43ms, 4.3 GB/s

Snappy 32bit WIN32
compress in 269.96ms, ratio=85%, 1.4 GB/s
uncompress in 135.88ms, 2.9 GB/s

Zlib fastest mode 64bit WIN64
compress in 1.77s, ratio=89%, 231.7 MB/s
uncompress in 961.10ms, 427.6 MB/s

Zlib fastest mode 32bit WIN32
compress in 2.12s, ratio=89%, 193.6 MB/s
uncompress in 1.43s, 286.1 MB/s

Using TParallel.For from System.Threading WIN64
Snappy compress in 54.94ms, ratio=85%, 7.3 GB/s
Snappy uncompress in 46.05ms, 8.7 GB/s

Link to Delphi FastSnappy64 v.1.1.3 (stable)

Polly (dev) WIN64 static object built with clang 4.0 and polly, a high-level loop and data-locality polyhedral compiler (tiling, vectorize and parallelize optimizations)
(from my test into a single thread you can get a negligible speed gain on large data, consider this a test only, I'm waiting a final release)

Hint: take a look at SnappyJS for the http client side, eg. to uncompress a JSON GET response; add a class helper to Web.HTTPApp->TWebResponse with a method to compress the "Content" field, so to obtain a complete web compression pipeline.

Feel free to test it and/or enhance it.
Please check internal comments, thank you.

----------

Delphi FastZlib64 and FastJpeg64

26 june 2017, download a LLVM 4.0 enhanced version of Zlib 1.2.11 for Delphi 64 bit, compiled with 0 errors and 0 warnings, put the OBJ files into the "\lib\win64\release" folder (you should backup the old files) then add the path of the "\source\rtl\common file" and the "\source\vcl" folders into the library path. You will get a speed optimization for the System.Zlib, eg. used into a WebBroker DLL, and also in I/O of PNG files for the VCL. The compression quality is both as default Zlib cfg.

From my test, compression of a 650MB PST file (text with attachments):
using TStream classes: 21200msec with Delphi original, 15300msec with LLVM4 optimized version
using direct memory: 17875msec with Delphi original, 13565msec with LLVM4 optimized version
using direct memory enhanced: 10687msec with LLVM4 + SSE 4.2 (from Cloudflare fork)

Download
ZlibLLVM4 for Delphi 64bit
Download ZlibLLVM4SSE42 for Delphi 64bit extended for SSE 4.2 systems (check readme for patch)
Download JpegLLVM4 for Delphi 64bit (loading and saving in sequence of a 3340px x 2504px file is done at 350 x sec rate on my i7, loading then saving takes 0.0028 sec)

----------

Delphi RTL 64bit SIMD enhanced

20 july 2017, download an enhanced version of Delphi RTL 64bit patch, exceptionally scalable for multithreaded applications as web, tcp servers

From my multithreaded benchmarks over i7-6700 and Windows 10 64bit:
using BrainMM not reliable under WIN64
using Google TCMalloc not reliable under WIN64
using FastMM4 NoThreadContention 22098 msec
using ScaleMM2 22393 msec
using Windows 10 / Windows 2016 Heap 5102 msec
using Intel TBB + Intel IPP 3975 msec

Download Delphi 64bit RTL speedup

Write me at Roberto Della Pasqua