Discussion:
[PATCH 1/3] android,configure,meson: define HAVE_ZLIB
(too old to reply)
Grazvydas Ignotas
2017-12-29 01:56:36 UTC
Permalink
Raw Message
The next change wants to use some optional zlib functionality, however
not all platforms currently use zlib. Based on earlier Jordan Justen's
patches and their review feedback.

Signed-off-by: Grazvydas Ignotas <***@gmail.com>
---
Android.common.mk | 1 +
configure.ac | 1 +
meson.build | 1 +
3 files changed, 3 insertions(+)

diff --git a/Android.common.mk b/Android.common.mk
index d9f871c..52dc7bf 100644
--- a/Android.common.mk
+++ b/Android.common.mk
@@ -68,10 +68,11 @@ LOCAL_CFLAGS += \
-DHAVE___BUILTIN_UNREACHABLE \
-DHAVE_PTHREAD=1 \
-DHAVE_DLADDR \
-DHAVE_DL_ITERATE_PHDR \
-DHAVE_LINUX_FUTEX_H \
+ -DHAVE_ZLIB \
-DMAJOR_IN_SYSMACROS \
-fvisibility=hidden \
-Wno-sign-compare

LOCAL_CPPFLAGS += \
diff --git a/configure.ac b/configure.ac
index 79f275d..e236a3c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -904,10 +904,11 @@ esac
dnl See if posix_memalign is available
AC_CHECK_FUNC([posix_memalign], [DEFINES="$DEFINES -DHAVE_POSIX_MEMALIGN"])

dnl Check for zlib
PKG_CHECK_MODULES([ZLIB], [zlib >= $ZLIB_REQUIRED])
+DEFINES="$DEFINES -DHAVE_ZLIB"

dnl Check for pthreads
AX_PTHREAD
if test "x$ax_pthread_ok" = xno; then
AC_MSG_ERROR([Building mesa on this platform requires pthreads])
diff --git a/meson.build b/meson.build
index d9f7ea9..9d9d074 100644
--- a/meson.build
+++ b/meson.build
@@ -922,10 +922,11 @@ if dep_libdrm.found()
endif
endif

# TODO: some of these may be conditional
dep_zlib = dependency('zlib', version : '>= 1.2.3')
+pre_args += '-DHAVE_ZLIB'
dep_thread = dependency('threads')
if dep_thread.found() and host_machine.system() != 'windows'
pre_args += '-DHAVE_PTHREAD'
endif
if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 # TODO: clover
--
2.7.4
Grazvydas Ignotas
2017-12-29 01:56:38 UTC
Permalink
Raw Message
zlib provides a faster slice-by-4 CRC32 implementation than the
traditional single byte lookup one used by mesa. As most supported
platforms now link zlib unconditionally, we can easily use it.
For small buffers the old implementation is still used as it's faster
with cold cache (first call), as indicated by some throughput
benchmarking (avg MB/s, n=100, zlib 1.2.8):

i5-6600K C2D E4500
size mesa zlib mesa zlib
4 66 43 -35% +/- 4.8% 43 22 -49% +/- 9.6%
32 193 171 -11% +/- 5.8% 129 49 -61% +/- 7.2%
64 256 267 4% +/- 4.1% 171 63 -63% +/- 5.4%
128 317 389 22% +/- 5.8% 253 89 -64% +/- 4.2%
256 364 596 63% +/- 5.6% 304 166 -45% +/- 2.8%
512 401 838 108% +/- 5.3% 338 296 -12% +/- 3.1%
1024 420 1036 146% +/- 7.6% 375 461 23% +/- 3.7%
1M 443 1443 225% +/- 2.1% 403 1175 191% +/- 0.9%
100M 448 1452 224% +/- 0.3% 406 1214 198% +/- 0.3%

With hot cache (repeated calls) zlib almost always wins on both CPUS.
It has been verified the calculation results stay the same after this
change.

Signed-off-by: Grazvydas Ignotas <***@gmail.com>
---
src/util/crc32.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/src/util/crc32.c b/src/util/crc32.c
index f2e01c6..0cffa49 100644
--- a/src/util/crc32.c
+++ b/src/util/crc32.c
@@ -31,12 +31,20 @@
*
* @author Jose Fonseca
*/


+#ifdef HAVE_ZLIB
+#include <zlib.h>
+#endif
#include "crc32.h"

+/* For small buffers it's faster to avoid the library call.
+ * The optimal threshold depends on CPU characteristics, it is hoped
+ * the choice below is reasonable for typical modern CPU.
+ */
+#define ZLIB_SIZE_THRESHOLD 64

static const uint32_t
util_crc32_table[256] = {
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba,
0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
@@ -112,10 +120,15 @@ uint32_t
util_hash_crc32(const void *data, size_t size)
{
const uint8_t *p = data;
uint32_t crc = 0xffffffff;

+#ifdef HAVE_ZLIB
+ if (size >= ZLIB_SIZE_THRESHOLD && (uInt)size == size)
+ return ~crc32(0, data, size);
+#endif
+
while (size--)
crc = util_crc32_table[(crc ^ *p++) & 0xff] ^ (crc >> 8);

return crc;
}
--
2.7.4
Ian Romanick
2018-01-02 21:38:48 UTC
Permalink
Raw Message
Post by Grazvydas Ignotas
zlib provides a faster slice-by-4 CRC32 implementation than the
traditional single byte lookup one used by mesa. As most supported
platforms now link zlib unconditionally, we can easily use it.
For small buffers the old implementation is still used as it's faster
with cold cache (first call), as indicated by some throughput
i5-6600K C2D E4500
size mesa zlib mesa zlib
4 66 43 -35% +/- 4.8% 43 22 -49% +/- 9.6%
32 193 171 -11% +/- 5.8% 129 49 -61% +/- 7.2%
64 256 267 4% +/- 4.1% 171 63 -63% +/- 5.4%
128 317 389 22% +/- 5.8% 253 89 -64% +/- 4.2%
256 364 596 63% +/- 5.6% 304 166 -45% +/- 2.8%
512 401 838 108% +/- 5.3% 338 296 -12% +/- 3.1%
1024 420 1036 146% +/- 7.6% 375 461 23% +/- 3.7%
1M 443 1443 225% +/- 2.1% 403 1175 191% +/- 0.9%
100M 448 1452 224% +/- 0.3% 406 1214 198% +/- 0.3%
With hot cache (repeated calls) zlib almost always wins on both CPUS.
It has been verified the calculation results stay the same after this
change.
---
src/util/crc32.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/src/util/crc32.c b/src/util/crc32.c
index f2e01c6..0cffa49 100644
--- a/src/util/crc32.c
+++ b/src/util/crc32.c
@@ -31,12 +31,20 @@
*
*/
+#ifdef HAVE_ZLIB
+#include <zlib.h>
+#endif
#include "crc32.h"
+/* For small buffers it's faster to avoid the library call.
+ * The optimal threshold depends on CPU characteristics, it is hoped
+ * the choice below is reasonable for typical modern CPU.
+ */
+#define ZLIB_SIZE_THRESHOLD 64
For the actual users of this function in Mesa, is it even possible to
pass less than 64 bytes (I'm assuming that's the units)?
Post by Grazvydas Ignotas
static const uint32_t
util_crc32_table[256] = {
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba,
0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
@@ -112,10 +120,15 @@ uint32_t
util_hash_crc32(const void *data, size_t size)
{
const uint8_t *p = data;
uint32_t crc = 0xffffffff;
+#ifdef HAVE_ZLIB
+ if (size >= ZLIB_SIZE_THRESHOLD && (uInt)size == size)
^^^^^^^^^^^^^^^^^^
This comparison is always true (unless sizeof(uInt) != sizeof(size_t)?).
I'm not 100% sure what you're trying to accomplish here, but I think
you want 'size > 0'. I'm not familiar with this zlib function, so I
don't know what it's expectations for size are.
Post by Grazvydas Ignotas
+ return ~crc32(0, data, size);
+#endif
+
while (size--)
crc = util_crc32_table[(crc ^ *p++) & 0xff] ^ (crc >> 8);
return crc;
}
Grazvydas Ignotas
2018-01-03 00:52:28 UTC
Permalink
Raw Message
Post by Ian Romanick
Post by Grazvydas Ignotas
zlib provides a faster slice-by-4 CRC32 implementation than the
traditional single byte lookup one used by mesa. As most supported
platforms now link zlib unconditionally, we can easily use it.
For small buffers the old implementation is still used as it's faster
with cold cache (first call), as indicated by some throughput
i5-6600K C2D E4500
size mesa zlib mesa zlib
4 66 43 -35% +/- 4.8% 43 22 -49% +/- 9.6%
32 193 171 -11% +/- 5.8% 129 49 -61% +/- 7.2%
64 256 267 4% +/- 4.1% 171 63 -63% +/- 5.4%
128 317 389 22% +/- 5.8% 253 89 -64% +/- 4.2%
256 364 596 63% +/- 5.6% 304 166 -45% +/- 2.8%
512 401 838 108% +/- 5.3% 338 296 -12% +/- 3.1%
1024 420 1036 146% +/- 7.6% 375 461 23% +/- 3.7%
1M 443 1443 225% +/- 2.1% 403 1175 191% +/- 0.9%
100M 448 1452 224% +/- 0.3% 406 1214 198% +/- 0.3%
With hot cache (repeated calls) zlib almost always wins on both CPUS.
It has been verified the calculation results stay the same after this
change.
---
src/util/crc32.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/src/util/crc32.c b/src/util/crc32.c
index f2e01c6..0cffa49 100644
--- a/src/util/crc32.c
+++ b/src/util/crc32.c
@@ -31,12 +31,20 @@
*
*/
+#ifdef HAVE_ZLIB
+#include <zlib.h>
+#endif
#include "crc32.h"
+/* For small buffers it's faster to avoid the library call.
+ * The optimal threshold depends on CPU characteristics, it is hoped
+ * the choice below is reasonable for typical modern CPU.
+ */
+#define ZLIB_SIZE_THRESHOLD 64
For the actual users of this function in Mesa, is it even possible to
pass less than 64 bytes (I'm assuming that's the units)?
Hmm why wouldn't it be? The unit is a byte, and you can compute CRC32
of a single byte.
Post by Ian Romanick
Post by Grazvydas Ignotas
static const uint32_t
util_crc32_table[256] = {
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba,
0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
@@ -112,10 +120,15 @@ uint32_t
util_hash_crc32(const void *data, size_t size)
{
const uint8_t *p = data;
uint32_t crc = 0xffffffff;
+#ifdef HAVE_ZLIB
+ if (size >= ZLIB_SIZE_THRESHOLD && (uInt)size == size)
^^^^^^^^^^^^^^^^^^
This comparison is always true (unless sizeof(uInt) != sizeof(size_t)?).
I'm not 100% sure what you're trying to accomplish here, but I think
you want 'size > 0'. I'm not familiar with this zlib function, so I
don't know what it's expectations for size are.
zlib's uInt is always 32bit while size_t is 64bit on most (all?) 64bit
architectures, so if someone decides to CRC32 >= 4GiB buffer, this
function will do the wrong thing without such check. Newer zlib has
crc32_z that takes size_t, but I was trying to avoid build system
complications of detecting that function...

Gražvydas
Ian Romanick
2018-01-03 01:09:20 UTC
Permalink
Raw Message
Post by Grazvydas Ignotas
Post by Ian Romanick
Post by Grazvydas Ignotas
zlib provides a faster slice-by-4 CRC32 implementation than the
traditional single byte lookup one used by mesa. As most supported
platforms now link zlib unconditionally, we can easily use it.
For small buffers the old implementation is still used as it's faster
with cold cache (first call), as indicated by some throughput
i5-6600K C2D E4500
size mesa zlib mesa zlib
4 66 43 -35% +/- 4.8% 43 22 -49% +/- 9.6%
32 193 171 -11% +/- 5.8% 129 49 -61% +/- 7.2%
64 256 267 4% +/- 4.1% 171 63 -63% +/- 5.4%
128 317 389 22% +/- 5.8% 253 89 -64% +/- 4.2%
256 364 596 63% +/- 5.6% 304 166 -45% +/- 2.8%
512 401 838 108% +/- 5.3% 338 296 -12% +/- 3.1%
1024 420 1036 146% +/- 7.6% 375 461 23% +/- 3.7%
1M 443 1443 225% +/- 2.1% 403 1175 191% +/- 0.9%
100M 448 1452 224% +/- 0.3% 406 1214 198% +/- 0.3%
With hot cache (repeated calls) zlib almost always wins on both CPUS.
It has been verified the calculation results stay the same after this
change.
---
src/util/crc32.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/src/util/crc32.c b/src/util/crc32.c
index f2e01c6..0cffa49 100644
--- a/src/util/crc32.c
+++ b/src/util/crc32.c
@@ -31,12 +31,20 @@
*
*/
+#ifdef HAVE_ZLIB
+#include <zlib.h>
+#endif
#include "crc32.h"
+/* For small buffers it's faster to avoid the library call.
+ * The optimal threshold depends on CPU characteristics, it is hoped
+ * the choice below is reasonable for typical modern CPU.
+ */
+#define ZLIB_SIZE_THRESHOLD 64
For the actual users of this function in Mesa, is it even possible to
pass less than 64 bytes (I'm assuming that's the units)?
Hmm why wouldn't it be? The unit is a byte, and you can compute CRC32
of a single byte.
It can be done, but that's not my question. My question is: would any
of the existing users actually do this? I thought the main user was the
various disk cache / GetProgramBinary kind of things. You won't have a
shader binary that's a single byte... less than 64 also seems unlikely.
Unless there are other users?
Post by Grazvydas Ignotas
Post by Ian Romanick
Post by Grazvydas Ignotas
static const uint32_t
util_crc32_table[256] = {
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba,
0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
@@ -112,10 +120,15 @@ uint32_t
util_hash_crc32(const void *data, size_t size)
{
const uint8_t *p = data;
uint32_t crc = 0xffffffff;
+#ifdef HAVE_ZLIB
+ if (size >= ZLIB_SIZE_THRESHOLD && (uInt)size == size)
^^^^^^^^^^^^^^^^^^
This comparison is always true (unless sizeof(uInt) != sizeof(size_t)?).
I'm not 100% sure what you're trying to accomplish here, but I think
you want 'size > 0'. I'm not familiar with this zlib function, so I
don't know what it's expectations for size are.
zlib's uInt is always 32bit while size_t is 64bit on most (all?) 64bit
architectures, so if someone decides to CRC32 >= 4GiB buffer, this
function will do the wrong thing without such check. Newer zlib has
crc32_z that takes size_t, but I was trying to avoid build system
complications of detecting that function...
Ok. That makes sense. I'll bet this adds a warning about tautological
compares only on 32-bit, then. I was going to suggest comparing with
0xffffffff instead, but GCC emits worse code for that... and it probably
still results in the warning on 32-bit. Maybe just a comment (based on
your reply) that describes what's happening for the next person that
reads the code.
Post by Grazvydas Ignotas
Gražvydas
Grazvydas Ignotas
2018-01-03 01:30:29 UTC
Permalink
Raw Message
Post by Ian Romanick
Post by Grazvydas Ignotas
Post by Ian Romanick
Post by Grazvydas Ignotas
zlib provides a faster slice-by-4 CRC32 implementation than the
traditional single byte lookup one used by mesa. As most supported
platforms now link zlib unconditionally, we can easily use it.
For small buffers the old implementation is still used as it's faster
with cold cache (first call), as indicated by some throughput
i5-6600K C2D E4500
size mesa zlib mesa zlib
4 66 43 -35% +/- 4.8% 43 22 -49% +/- 9.6%
32 193 171 -11% +/- 5.8% 129 49 -61% +/- 7.2%
64 256 267 4% +/- 4.1% 171 63 -63% +/- 5.4%
128 317 389 22% +/- 5.8% 253 89 -64% +/- 4.2%
256 364 596 63% +/- 5.6% 304 166 -45% +/- 2.8%
512 401 838 108% +/- 5.3% 338 296 -12% +/- 3.1%
1024 420 1036 146% +/- 7.6% 375 461 23% +/- 3.7%
1M 443 1443 225% +/- 2.1% 403 1175 191% +/- 0.9%
100M 448 1452 224% +/- 0.3% 406 1214 198% +/- 0.3%
With hot cache (repeated calls) zlib almost always wins on both CPUS.
It has been verified the calculation results stay the same after this
change.
---
src/util/crc32.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/src/util/crc32.c b/src/util/crc32.c
index f2e01c6..0cffa49 100644
--- a/src/util/crc32.c
+++ b/src/util/crc32.c
@@ -31,12 +31,20 @@
*
*/
+#ifdef HAVE_ZLIB
+#include <zlib.h>
+#endif
#include "crc32.h"
+/* For small buffers it's faster to avoid the library call.
+ * The optimal threshold depends on CPU characteristics, it is hoped
+ * the choice below is reasonable for typical modern CPU.
+ */
+#define ZLIB_SIZE_THRESHOLD 64
For the actual users of this function in Mesa, is it even possible to
pass less than 64 bytes (I'm assuming that's the units)?
Hmm why wouldn't it be? The unit is a byte, and you can compute CRC32
of a single byte.
It can be done, but that's not my question. My question is: would any
of the existing users actually do this? I thought the main user was the
various disk cache / GetProgramBinary kind of things. You won't have a
shader binary that's a single byte... less than 64 also seems unlikely.
Unless there are other users?
You're most likely right, I'll just drop this.
Post by Ian Romanick
Post by Grazvydas Ignotas
Post by Ian Romanick
Post by Grazvydas Ignotas
static const uint32_t
util_crc32_table[256] = {
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba,
0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
@@ -112,10 +120,15 @@ uint32_t
util_hash_crc32(const void *data, size_t size)
{
const uint8_t *p = data;
uint32_t crc = 0xffffffff;
+#ifdef HAVE_ZLIB
+ if (size >= ZLIB_SIZE_THRESHOLD && (uInt)size == size)
^^^^^^^^^^^^^^^^^^
This comparison is always true (unless sizeof(uInt) != sizeof(size_t)?).
I'm not 100% sure what you're trying to accomplish here, but I think
you want 'size > 0'. I'm not familiar with this zlib function, so I
don't know what it's expectations for size are.
zlib's uInt is always 32bit while size_t is 64bit on most (all?) 64bit
architectures, so if someone decides to CRC32 >= 4GiB buffer, this
function will do the wrong thing without such check. Newer zlib has
crc32_z that takes size_t, but I was trying to avoid build system
complications of detecting that function...
Ok. That makes sense. I'll bet this adds a warning about tautological
compares only on 32-bit, then.
I don't seem to be getting a warning on gcc 7.2.0 or 5.4.0 as well as
clang 3.8.0 at least.
Post by Ian Romanick
I was going to suggest comparing with
0xffffffff instead, but GCC emits worse code for that... and it probably
still results in the warning on 32-bit. Maybe just a comment (based on
your reply) that describes what's happening for the next person that
reads the code.
ok

Grazvydas Ignotas
2017-12-29 01:56:37 UTC
Permalink
Raw Message
Signed-off-by: Grazvydas Ignotas <***@gmail.com>
---
src/util/crc32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/crc32.c b/src/util/crc32.c
index 44d637c..f2e01c6 100644
--- a/src/util/crc32.c
+++ b/src/util/crc32.c
@@ -109,11 +109,11 @@ util_crc32_table[256] = {
* @sa http://www.w3.org/TR/PNG/#D-CRCAppendix
*/
uint32_t
util_hash_crc32(const void *data, size_t size)
{
- uint8_t *p = (uint8_t *)data;
+ const uint8_t *p = data;
uint32_t crc = 0xffffffff;

while (size--)
crc = util_crc32_table[(crc ^ *p++) & 0xff] ^ (crc >> 8);
--
2.7.4
Loading...