Discussion:
[PATCH 0/5] i965: ASTC5x5 workaround
Add Reply
k***@intel.com
2017-12-01 17:19:17 UTC
Reply
Permalink
Raw Message
From: Kevin Rogovin <***@intel.com>

This patch series implements a needed workaround for Gen9 for ASTC5x5
sampler reads. The crux of the work around is to make sure that the
sampler does not read an ASTC5x5 texture and a surface with an auxilary
buffer without having a texture cache invalidate between such accesses.


Kevin Rogovin (5):
i965: define astc5x5 workaround infrastructure
i965: ASTC5x5 workaround logic for blorp
i965: set ASTC5x5 workaround texture type tracking on texture validate
i965: use ASTC5x5 workaround in brw_draw
i965: use ASTC5x5 workaround in brw_compute

src/mesa/drivers/dri/i965/brw_compute.c | 6 +++
src/mesa/drivers/dri/i965/brw_context.c | 63 ++++++++++++++++++++++++
src/mesa/drivers/dri/i965/brw_context.h | 23 +++++++++
src/mesa/drivers/dri/i965/brw_draw.c | 6 +++
src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 5 ++
src/mesa/drivers/dri/i965/genX_blorp_exec.c | 5 ++
src/mesa/drivers/dri/i965/intel_batchbuffer.c | 1 +
src/mesa/drivers/dri/i965/intel_tex_image.c | 16 ++++--
src/mesa/drivers/dri/i965/intel_tex_validate.c | 13 +++++
9 files changed, 134 insertions(+), 4 deletions(-)
--
2.14.2
k***@intel.com
2017-12-01 17:19:19 UTC
Reply
Permalink
Raw Message
From: Kevin Rogovin <***@intel.com>

Blorp will only read from an ASTC5x5 texture if it copies from
such a surface, that can only if an application is fetching
such pixels. Because an ASTC5x3 texture can never be a render
target, we do not need to worry about blorp reading such surfaces
on framebuffer blits, or any other copying from a framebuffer.

Signed-off-by: Kevin Rogovin <***@intel.com>
---
src/mesa/drivers/dri/i965/genX_blorp_exec.c | 5 +++++
src/mesa/drivers/dri/i965/intel_tex_image.c | 16 ++++++++++++----
2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/genX_blorp_exec.c b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
index 87e90fde91..73f72d2603 100644
--- a/src/mesa/drivers/dri/i965/genX_blorp_exec.c
+++ b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
@@ -230,6 +230,11 @@ genX(blorp_exec)(struct blorp_batch *batch,
struct gl_context *ctx = &brw->ctx;
bool check_aperture_failed_once = false;

+ if (brw->astc5x5_wa.blorp_sampling_from_astc5x5) {
+ brw_set_astc5x5_wa_mode(brw, BRW_ASTC5x5_WA_MODE_HAS_ASTC5x5);
+ } else {
+ brw_set_astc5x5_wa_mode(brw, BRW_ASTC5x5_WA_MODE_HAS_AUX);
+ }
/* Flush the sampler and render caches. We definitely need to flush the
* sampler cache so that we get updated contents from the render cache for
* the glBlitFramebuffer() source. Also, we are sometimes warned in the
diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 37c8e24f03..60028bb67a 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -759,10 +759,18 @@ intel_get_tex_sub_image(struct gl_context *ctx,
DBG("%s\n", __func__);

if (_mesa_is_bufferobj(ctx->Pack.BufferObj)) {
- if (intel_gettexsubimage_blorp(brw, texImage,
- xoffset, yoffset, zoffset,
- width, height, depth, format, type,
- pixels, &ctx->Pack))
+ bool blorp_success;
+
+ brw->astc5x5_wa.blorp_sampling_from_astc5x5 =
+ (texImage->TexFormat == MESA_FORMAT_RGBA_ASTC_5x5 ||
+ texImage->TexFormat == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5);
+ blorp_success = intel_gettexsubimage_blorp(brw, texImage,
+ xoffset, yoffset, zoffset,
+ width, height, depth,
+ format, type, pixels,
+ &ctx->Pack);
+ brw->astc5x5_wa.blorp_sampling_from_astc5x5 = false;
+ if (blorp_success)
return;

perf_debug("%s: fallback to CPU mapping in PBO case\n", __func__);
--
2.14.2
k***@intel.com
2017-12-01 17:19:18 UTC
Reply
Permalink
Raw Message
From: Kevin Rogovin <***@intel.com>

Some GEN's have a bug in the sample where if the sampler accesses
a texture with an auxialry surface and an ASTC5x5 texture without
having the texture cache invalidated between such accesses, then
the GPU will hang. This patch defines the infrastructure to
implement the needed workaround for such hardware.

Signed-off-by: Kevin Rogovin <***@intel.com>
---
src/mesa/drivers/dri/i965/brw_context.c | 63 +++++++++++++++++++++++++++
src/mesa/drivers/dri/i965/brw_context.h | 23 ++++++++++
src/mesa/drivers/dri/i965/intel_batchbuffer.c | 1 +
3 files changed, 87 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c
index dd55b43669..f2e9b9779a 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -1060,6 +1060,12 @@ brwCreateContext(gl_api api,
if (ctx->Extensions.INTEL_performance_query)
brw_init_performance_queries(brw);

+ brw->astc5x5_wa.required = (devinfo->gen == 9);
+ brw->astc5x5_wa.mode = BRW_ASTC5x5_WA_MODE_NONE;
+ brw->astc5x5_wa.texture_astc5x5_present = false;
+ brw->astc5x5_wa.texture_with_auxilary_present = false;
+ brw->astc5x5_wa.blorp_sampling_from_astc5x5 = false;
+
vbo_use_buffer_objects(ctx);
vbo_always_unmap_buffers(ctx);

@@ -1134,6 +1140,63 @@ intelDestroyContext(__DRIcontext * driContextPriv)
driContextPriv->driverPrivate = NULL;
}

+void
+brw_set_astc5x5_wa_mode(struct brw_context *brw,
+ enum brw_astc5x5_wa_mode_t mode)
+{
+ if (!brw->astc5x5_wa.required ||
+ mode == BRW_ASTC5x5_WA_MODE_NONE ||
+ brw->astc5x5_wa.mode == mode) {
+ return;
+ }
+
+ if (brw->astc5x5_wa.mode != BRW_ASTC5x5_WA_MODE_NONE) {
+ brw_emit_pipe_control_flush(brw, PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
+ }
+
+ brw->astc5x5_wa.mode = mode;
+}
+
+static void
+resolve_to_disable_aux_on_samplers(struct brw_context *brw)
+{
+ struct gl_context *ctx = &brw->ctx;
+ const int max_enabled_unit = ctx->Texture._MaxEnabledTexImageUnit;
+
+ for (int unit = 0; unit <= max_enabled_unit; unit++) {
+ struct gl_texture_unit *tex_unit = &ctx->Texture.Unit[unit];
+ struct gl_texture_object *tex_obj = tex_unit->_Current;
+ if (tex_obj) {
+ struct intel_mipmap_tree *mt = intel_texture_object(tex_obj)->mt;
+ if (mt && mt->aux_usage != ISL_AUX_USAGE_NONE) {
+ intel_miptree_prepare_access(brw, mt,
+ 0, INTEL_REMAINING_LEVELS,
+ 0, INTEL_REMAINING_LAYERS,
+ ISL_AUX_USAGE_NONE, false);
+ }
+ }
+ }
+}
+
+void
+brw_astc5x5_perform_wa(struct brw_context *brw)
+{
+ if (!brw->astc5x5_wa.required) {
+ return;
+ }
+
+ if (brw->astc5x5_wa.texture_astc5x5_present) {
+ if (brw->astc5x5_wa.texture_with_auxilary_present) {
+ /* resolve so that auxilary buffers are not needed
+ * by any sampler */
+ resolve_to_disable_aux_on_samplers(brw);
+ }
+ brw_set_astc5x5_wa_mode(brw, BRW_ASTC5x5_WA_MODE_HAS_ASTC5x5);
+ } else if (brw->astc5x5_wa.texture_with_auxilary_present) {
+ brw_set_astc5x5_wa_mode(brw, BRW_ASTC5x5_WA_MODE_HAS_AUX);
+ }
+}
+
GLboolean
intelUnbindContext(__DRIcontext * driContextPriv)
{
diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
index b3d7c6baf8..37dfe45592 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -165,6 +165,12 @@ enum brw_cache_id {
BRW_MAX_CACHE
};

+enum brw_astc5x5_wa_mode_t {
+ BRW_ASTC5x5_WA_MODE_NONE,
+ BRW_ASTC5x5_WA_MODE_HAS_ASTC5x5,
+ BRW_ASTC5x5_WA_MODE_HAS_AUX,
+};
+
enum brw_state_id {
/* brw_cache_ids must come first - see brw_program_cache.c */
BRW_STATE_URB_FENCE = BRW_MAX_CACHE,
@@ -1230,6 +1236,18 @@ struct brw_context
*/
bool draw_aux_buffer_disabled[MAX_DRAW_BUFFERS];

+ /* Certain GEN's have a hardware bug where the sampler hangs if it attempts
+ * to access auxilary buffers and an ASTC5x5 compressed buffer. The workaround
+ * is to invalidate the texture cache between such access.
+ */
+ struct {
+ bool required;
+ enum brw_astc5x5_wa_mode_t mode;
+ bool texture_astc5x5_present;
+ bool texture_with_auxilary_present;
+ bool blorp_sampling_from_astc5x5;
+ } astc5x5_wa;
+
__DRIcontext *driContext;
struct intel_screen *screen;
};
@@ -1663,6 +1681,11 @@ void brw_query_internal_format(struct gl_context *ctx, GLenum target,
GLenum internalFormat, GLenum pname,
GLint *params);

+/* brw_context::astc5x5_wa */
+void brw_set_astc5x5_wa_mode(struct brw_context *brw,
+ enum brw_astc5x5_wa_mode_t mode);
+void brw_astc5x5_perform_wa(struct brw_context *brw);
+
#ifdef __cplusplus
}
#endif
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 216073129b..042ed1fe62 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -618,6 +618,7 @@ brw_new_batch(struct brw_context *brw)

/* Create a new batchbuffer and reset the associated state: */
intel_batchbuffer_reset_and_clear_render_cache(brw);
+ brw->astc5x5_wa.mode = BRW_ASTC5x5_WA_MODE_NONE;

/* If the kernel supports hardware contexts, then most hardware state is
* preserved between batches; we only need to re-emit state that is required
--
2.14.2
k***@intel.com
2017-12-01 17:19:20 UTC
Reply
Permalink
Raw Message
From: Kevin Rogovin <***@intel.com>

One of the presteps in each draw (and compute) call is to validate
the textures. This is the perfect place (since all texture units
are looped through) to see if ASTC5x5 and/or textures with an
auxilary surface are accessed by the GPU.

Signed-off-by: Kevin Rogovin <***@intel.com>
---
src/mesa/drivers/dri/i965/intel_tex_validate.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_validate.c b/src/mesa/drivers/dri/i965/intel_tex_validate.c
index 2b7798c940..812c0c7793 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_validate.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_validate.c
@@ -188,11 +188,24 @@ brw_validate_textures(struct brw_context *brw)
struct gl_context *ctx = &brw->ctx;
const int max_enabled_unit = ctx->Texture._MaxEnabledTexImageUnit;

+ brw->astc5x5_wa.texture_astc5x5_present = false;
+ brw->astc5x5_wa.texture_with_auxilary_present = false;
for (int unit = 0; unit <= max_enabled_unit; unit++) {
struct gl_texture_unit *tex_unit = &ctx->Texture.Unit[unit];

if (tex_unit->_Current) {
+ struct intel_texture_object *tex =
+ intel_texture_object(tex_unit->_Current);
+ struct intel_mipmap_tree *mt = tex->mt;
+
intel_finalize_mipmap_tree(brw, unit);
+ if (mt && mt->aux_usage != ISL_AUX_USAGE_NONE) {
+ brw->astc5x5_wa.texture_with_auxilary_present = true;
+ }
+ if (tex->_Format == MESA_FORMAT_RGBA_ASTC_5x5 ||
+ tex->_Format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5) {
+ brw->astc5x5_wa.texture_astc5x5_present = true;
+ }
}
}
}
--
2.14.2
k***@intel.com
2017-12-01 17:19:21 UTC
Reply
Permalink
Raw Message
From: Kevin Rogovin <***@intel.com>

Perform the ASTC5x5 workaround tasks for drawing; note that
the function does not do anything and immediately returns
if the bug is not present on the hardware.

Signed-off-by: Kevin Rogovin <***@intel.com>
---
src/mesa/drivers/dri/i965/brw_draw.c | 6 ++++++
src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 5 +++++
2 files changed, 11 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c
index 7e29dcfd4e..f335c2bd64 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -684,6 +684,12 @@ brw_prepare_drawing(struct gl_context *ctx,
brw_predraw_resolve_inputs(brw, true);
brw_predraw_resolve_framebuffer(brw);

+ /* if necessary, perform astc5x5 workarounds to make sure any sampler does
+ * not sample sample from a surface using an auxilary buffer within the
+ * same batch of sampling from a surface with an ASTC5x5 format
+ */
+ brw_astc5x5_perform_wa(brw);
+
/* Bind all inputs, derive varying and size information:
*/
brw_merge_inputs(brw, arrays);
diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index adf60a840b..ccdb537227 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -447,6 +447,11 @@ brw_aux_surface_disabled(const struct brw_context *brw,
{
const struct gl_framebuffer *fb = brw->ctx.DrawBuffer;

+ if (brw->astc5x5_wa.required &&
+ brw->astc5x5_wa.texture_astc5x5_present) {
+ return true;
+ }
+
for (unsigned i = 0; i < fb->_NumColorDrawBuffers; i++) {
const struct intel_renderbuffer *irb =
intel_renderbuffer(fb->_ColorDrawBuffers[i]);
--
2.14.2
k***@intel.com
2017-12-01 17:19:22 UTC
Reply
Permalink
Raw Message
From: Kevin Rogovin <***@intel.com>

Perform the ASTC5x5 workaround tasks for compute; note that
the function does not do anything and immediately returns
if the bug is not present on the hardware.

Signed-off-by: Kevin Rogovin <***@intel.com>
---
src/mesa/drivers/dri/i965/brw_compute.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_compute.c b/src/mesa/drivers/dri/i965/brw_compute.c
index 9be7523bab..e338539dc1 100644
--- a/src/mesa/drivers/dri/i965/brw_compute.c
+++ b/src/mesa/drivers/dri/i965/brw_compute.c
@@ -179,6 +179,12 @@ brw_dispatch_compute_common(struct gl_context *ctx)

brw_predraw_resolve_inputs(brw, false);

+ /* if necessary, perform astc5x5 workarounds to make sure any sampler does
+ * not sample sample from a surface using an auxilary buffer within the
+ * same batch of sampling from a surface with an ASTC5x5 format
+ */
+ brw_astc5x5_perform_wa(brw);
+
/* Flush the batch if the batch/state buffers are nearly full. We can
* grow them if needed, but this is not free, so we'd like to avoid it.
*/
--
2.14.2
Ilia Mirkin
2017-12-01 17:37:54 UTC
Reply
Permalink
Raw Message
Post by k***@intel.com
This patch series implements a needed workaround for Gen9 for ASTC5x5
sampler reads. The crux of the work around is to make sure that the
sampler does not read an ASTC5x5 texture and a surface with an auxilary
buffer without having a texture cache invalidate between such accesses.
Presumably anv needs something like this too? What happens if you have
a single draw which samples from both an ASTC5x5 texture and one with
an aux buffer? [Just curious.]
Rogovin, Kevin
2017-12-01 18:06:08 UTC
Reply
Permalink
Raw Message
Hi,

Yes ANV will need something like this as well. If the GPU samples from both an ASTC5x5 texture and one with an aux buffer without a texture cache invalidate between such accesses, then the GPU hangs, which in turn makes the system unresponsive for a few seconds until the kernel resets the GPU; then an ioctl will fail in i965 which means things are very bad usually and (for me atleast on my system with how I build mesa) the application crashes.

-Kevin

-----Original Message-----
From: ***@gmail.com [mailto:***@gmail.com] On Behalf Of Ilia Mirkin
Sent: Friday, December 1, 2017 7:38 PM
To: Rogovin, Kevin <***@intel.com>
Cc: mesa-***@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 0/5] i965: ASTC5x5 workaround
Post by k***@intel.com
This patch series implements a needed workaround for Gen9 for ASTC5x5
sampler reads. The crux of the work around is to make sure that the
sampler does not read an ASTC5x5 texture and a surface with an
auxilary buffer without having a texture cache invalidate between such accesses.
Presumably anv needs something like this too? What happens if you have a single draw which samples from both an ASTC5x5 texture and one with an aux buffer? [Just curious.]
Matt Turner
2017-12-01 18:24:44 UTC
Reply
Permalink
Raw Message
Post by Rogovin, Kevin
Hi,
Yes ANV will need something like this as well. If the GPU samples from both an ASTC5x5 texture and one with an aux buffer without a texture cache invalidate between such accesses, then the GPU hangs, which in turn makes the system unresponsive for a few seconds until the kernel resets the GPU; then an ioctl will fail in i965 which means things are very bad usually and (for me atleast on my system with how I build mesa) the application crashes.
I think his question is -- is there anything we can do about the case
where a single shader program samples ASTC5x5 and a texture with an
aux buffer? Presumably there's no way to invalidate the texture cache
during a shader program, so what can you do?
Rogovin, Kevin
2017-12-01 19:13:21 UTC
Reply
Permalink
Raw Message
Hi,

For ANV I do not know as I have not really poked into its code. For i965, this patch series handles the situation as to what to do if a draw of dispatch compute accesses both an ASTC5x5 texture and a texture with an auxiliary buffer. It does this by checking if there are both such textures and ASTC5x5 textures in the list of currently bound textures. If the answer is yes, then it resolves all such auxiliary requiring textures that use an auxiliary buffer so that the sampler does not need them when it reads from the surfaces. The resolve stuff is handled in the function brw_astc5x5_perform_wa(() in brw_context.c of the first patch, the checking is handled in the 3rd patch by modifying brw_tex_validate() in intel_tex_validate.c. The 4'th and 5'th patches are deceptively small since all they do is add a call to brw_astc5x5_perform_wa(() in brw_draw.c and brw_compute.c respectively. The 4th patch also has a small addition to prevent surface state for sampler state to have the auxiliary surface given in the call.

As to how to do similar auto-resolve and tweak of state on ANV, I need to dive quite deep into the code to see how to do it.

-Kevin

-----Original Message-----
From: Matt Turner [mailto:***@gmail.com]
Sent: Friday, December 1, 2017 8:25 PM
To: Rogovin, Kevin <***@intel.com>
Cc: Ilia Mirkin <***@alum.mit.edu>; mesa-***@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 0/5] i965: ASTC5x5 workaround
Post by Rogovin, Kevin
Hi,
Yes ANV will need something like this as well. If the GPU samples from both an ASTC5x5 texture and one with an aux buffer without a texture cache invalidate between such accesses, then the GPU hangs, which in turn makes the system unresponsive for a few seconds until the kernel resets the GPU; then an ioctl will fail in i965 which means things are very bad usually and (for me atleast on my system with how I build mesa) the application crashes.
I think his question is -- is there anything we can do about the case where a single shader program samples ASTC5x5 and a texture with an aux buffer? Presumably there's no way to invalidate the texture cache during a shader program, so what can you do?
Pohjolainen, Topi
2017-12-02 10:42:24 UTC
Reply
Permalink
Raw Message
Post by k***@intel.com
This patch series implements a needed workaround for Gen9 for ASTC5x5
sampler reads. The crux of the work around is to make sure that the
sampler does not read an ASTC5x5 texture and a surface with an auxilary
buffer without having a texture cache invalidate between such accesses.
The solution here is to store the types of read access (aux, astc5x5) into
context. This information is then used for two purposes:

1) do extra tex cache flush when needed and specifically only when needed
2) resolve surfaces when undesired combination is about to be sampled

Latter case could be addressed also with on-the-fly check in
brw_predraw_resolve_inputs(). There one goes thru all the active textures for
the next draw call and resolves when necessary. One could check for the
undesired combination of textures there. I understand we would need to walk
the textures twice, first to check for any occurences of one type and
then for the other. This, however, would fit to the way we resolve other
texture types and prevent from adding more things to check into the context.

In the first case, if one didn't optimize, i.e., do the extra flush only when
needed, one could instead check if ASTC5x5 texture is going to be sampled and
flush before and after the draw call to be safe. This is not optimal but also
code-wise that simple that I'd be curious to know how much the optimized
flushing helps.

I think the non-optimized cases should be doable also in Vulkan. Jason, what
do you think?
Post by k***@intel.com
i965: define astc5x5 workaround infrastructure
i965: ASTC5x5 workaround logic for blorp
i965: set ASTC5x5 workaround texture type tracking on texture validate
i965: use ASTC5x5 workaround in brw_draw
i965: use ASTC5x5 workaround in brw_compute
src/mesa/drivers/dri/i965/brw_compute.c | 6 +++
src/mesa/drivers/dri/i965/brw_context.c | 63 ++++++++++++++++++++++++
src/mesa/drivers/dri/i965/brw_context.h | 23 +++++++++
src/mesa/drivers/dri/i965/brw_draw.c | 6 +++
src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 5 ++
src/mesa/drivers/dri/i965/genX_blorp_exec.c | 5 ++
src/mesa/drivers/dri/i965/intel_batchbuffer.c | 1 +
src/mesa/drivers/dri/i965/intel_tex_image.c | 16 ++++--
src/mesa/drivers/dri/i965/intel_tex_validate.c | 13 +++++
9 files changed, 134 insertions(+), 4 deletions(-)
--
2.14.2
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Rogovin, Kevin
2017-12-04 09:01:44 UTC
Reply
Permalink
Raw Message
Hi,
Post by Pohjolainen, Topi
1) do extra tex cache flush when needed and specifically only when needed
2) resolve surfaces when undesired combination is about to be sampled
Latter case could be addressed also with on-the-fly check in brw_predraw_resolve_inputs(). There one goes thru all the active textures for the
next draw call and resolves when necessary. One could check for the undesired combination of textures there. I understand we would need to
walk the textures twice, first to check for any occurrences of one type and then for the other. This, however, would fit to the way we resolve
other texture types and prevent from adding more things to check into the context.
This patch series does handle the second case, as follows:
a) Checking if there are astc5x5 textures and/or textures with auxiliary is done in brw_validate_textures(); this was a really nice place because
the function loops over all textures anyways
b) the resolve operation is handled by the added function brw_atsc_perform_wa(); this also sets the mode correctly too after the resolve.

I chose to make a dedicated function that does the mode setting and resolve. Putting the resolve logic into brw_predraw_resolve_inputs()
is not a bad idea and I am open to it; the interior of the loop would then look ickier as it would had the check on each texture unit of if
the workaround is necessary and if there is an astc5x5 sampler handing around. It would also kill off a member from the astc5x5_wa
struct that tracks if there are auxiliary textures.

At any rate, I'd appreciate some review for the code so it can land in some form shortly.

-Kevin
Post by Pohjolainen, Topi
i965: define astc5x5 workaround infrastructure
i965: ASTC5x5 workaround logic for blorp
i965: set ASTC5x5 workaround texture type tracking on texture validate
i965: use ASTC5x5 workaround in brw_draw
i965: use ASTC5x5 workaround in brw_compute
src/mesa/drivers/dri/i965/brw_compute.c | 6 +++
src/mesa/drivers/dri/i965/brw_context.c | 63 ++++++++++++++++++++++++
src/mesa/drivers/dri/i965/brw_context.h | 23 +++++++++
src/mesa/drivers/dri/i965/brw_draw.c | 6 +++
src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 5 ++
src/mesa/drivers/dri/i965/genX_blorp_exec.c | 5 ++
src/mesa/drivers/dri/i965/intel_batchbuffer.c | 1 +
src/mesa/drivers/dri/i965/intel_tex_image.c | 16 ++++--
src/mesa/drivers/dri/i965/intel_tex_validate.c | 13 +++++
9 files changed, 134 insertions(+), 4 deletions(-)
--
2.14.2
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Pohjolainen, Topi
2017-12-04 10:48:41 UTC
Reply
Permalink
Raw Message
Post by Rogovin, Kevin
Hi,
Post by Pohjolainen, Topi
1) do extra tex cache flush when needed and specifically only when needed
2) resolve surfaces when undesired combination is about to be sampled
Latter case could be addressed also with on-the-fly check in brw_predraw_resolve_inputs(). There one goes thru all the active textures for the
next draw call and resolves when necessary. One could check for the undesired combination of textures there. I understand we would need to
walk the textures twice, first to check for any occurrences of one type and then for the other. This, however, would fit to the way we resolve
other texture types and prevent from adding more things to check into the context.
a) Checking if there are astc5x5 textures and/or textures with auxiliary is done in brw_validate_textures(); this was a really nice place because
the function loops over all textures anyways
b) the resolve operation is handled by the added function brw_atsc_perform_wa(); this also sets the mode correctly too after the resolve.
I chose to make a dedicated function that does the mode setting and resolve. Putting the resolve logic into brw_predraw_resolve_inputs()
is not a bad idea and I am open to it; the interior of the loop would then look ickier as it would had the check on each texture unit of if
the workaround is necessary and if there is an astc5x5 sampler handing around. It would also kill off a member from the astc5x5_wa
struct that tracks if there are auxiliary textures.
I wanted to see how it would look using brw_predraw_resolve_inputs(). See the
patch I sent as reply and let me know how you feel about doing something of
that sort.

Writing that reminded me of something I meant to ask: do we hit the sampler
bug with both directions: sampling astc5x5 first and then aux, and vice versa?
Or is only one direction problematic?
Post by Rogovin, Kevin
At any rate, I'd appreciate some review for the code so it can land in some form shortly.
-Kevin
Post by Pohjolainen, Topi
i965: define astc5x5 workaround infrastructure
i965: ASTC5x5 workaround logic for blorp
i965: set ASTC5x5 workaround texture type tracking on texture validate
i965: use ASTC5x5 workaround in brw_draw
i965: use ASTC5x5 workaround in brw_compute
src/mesa/drivers/dri/i965/brw_compute.c | 6 +++
src/mesa/drivers/dri/i965/brw_context.c | 63 ++++++++++++++++++++++++
src/mesa/drivers/dri/i965/brw_context.h | 23 +++++++++
src/mesa/drivers/dri/i965/brw_draw.c | 6 +++
src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 5 ++
src/mesa/drivers/dri/i965/genX_blorp_exec.c | 5 ++
src/mesa/drivers/dri/i965/intel_batchbuffer.c | 1 +
src/mesa/drivers/dri/i965/intel_tex_image.c | 16 ++++--
src/mesa/drivers/dri/i965/intel_tex_validate.c | 13 +++++
9 files changed, 134 insertions(+), 4 deletions(-)
--
2.14.2
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Topi Pohjolainen
2017-12-04 10:48:32 UTC
Reply
Permalink
Raw Message
This is just drafting some thoughts and only compile tested.

CC: "Rogovin, Kevin" <***@intel.com>
---
src/mesa/drivers/dri/i965/brw_blorp.c | 8 +++++
src/mesa/drivers/dri/i965/brw_context.h | 10 ++++++
src/mesa/drivers/dri/i965/brw_draw.c | 54 ++++++++++++++++++++++++++++++++-
3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c
index 680121b6ab..b3f84ab8ca 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -186,11 +186,19 @@ blorp_surf_for_miptree(struct brw_context *brw,
surf->aux_addr.buffer = mt->hiz_buf->bo;
surf->aux_addr.offset = mt->hiz_buf->offset;
}
+
+ if (!is_render_target && brw->screen->devinfo.gen == 9)
+ gen9_astc5x5_sampler_wa(brw, GEN9_ASTC5X5_WA_TEX_TYPE_AUX);
} else {
surf->aux_addr = (struct blorp_address) {
.buffer = NULL,
};
memset(&surf->clear_color, 0, sizeof(surf->clear_color));
+
+ if (!is_render_target && brw->screen->devinfo.gen == 9 &&
+ (mt->format == MESA_FORMAT_RGBA_ASTC_5x5 ||
+ mt->format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5))
+ gen9_astc5x5_sampler_wa(brw, GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5);
}
assert((surf->aux_usage == ISL_AUX_USAGE_NONE) ==
(surf->aux_addr.buffer == NULL));
diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
index 0670483806..44602c23c0 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -165,6 +165,11 @@ enum brw_cache_id {
BRW_MAX_CACHE
};

+enum gen9_astc5x5_wa_tex_type {
+ GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5 = 1 << 0,
+ GEN9_ASTC5X5_WA_TEX_TYPE_AUX = 1 << 1,
+};
+
enum brw_state_id {
/* brw_cache_ids must come first - see brw_program_cache.c */
BRW_STATE_URB_FENCE = BRW_MAX_CACHE,
@@ -1262,6 +1267,8 @@ struct brw_context
*/
bool draw_aux_buffer_disabled[MAX_DRAW_BUFFERS];

+ enum gen9_astc5x5_wa_tex_type gen9_sampler_wa_tex_mask;
+
__DRIcontext *driContext;
struct intel_screen *screen;
};
@@ -1286,6 +1293,9 @@ void intel_update_renderbuffers(__DRIcontext *context,
__DRIdrawable *drawable);
void intel_prepare_render(struct brw_context *brw);

+void gen9_astc5x5_sampler_wa(struct brw_context *brw,
+ enum gen9_astc5x5_wa_tex_type curr_mask);
+
void brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering);

void intel_resolve_for_dri2_flush(struct brw_context *brw,
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c
index 7e29dcfd4e..929f806eb3 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -371,6 +371,50 @@ intel_disable_rb_aux_buffer(struct brw_context *brw,
return found;
}

+static enum gen9_astc5x5_wa_tex_type
+gen9_astc5x5_wa_get_tex_mask(const struct brw_context *brw)
+{
+ enum gen9_astc5x5_wa_tex_type mask = 0;
+ const struct gl_context *ctx = &brw->ctx;
+ const struct intel_texture_object *tex_obj;
+
+ const int maxEnabledUnit = ctx->Texture._MaxEnabledTexImageUnit;
+ for (int i = 0; i <= maxEnabledUnit; i++) {
+ if (!ctx->Texture.Unit[i]._Current)
+ continue;
+ tex_obj = intel_texture_object(ctx->Texture.Unit[i]._Current);
+ if (!tex_obj || !tex_obj->mt)
+ continue;
+
+ if (tex_obj->mt->aux_usage != ISL_AUX_USAGE_NONE)
+ mask |= GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5;
+
+ if (tex_obj->_Format == MESA_FORMAT_RGBA_ASTC_5x5 ||
+ tex_obj->_Format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5)
+ mask |= GEN9_ASTC5X5_WA_TEX_TYPE_AUX;
+ }
+
+ return mask;
+}
+
+/* TODO: Do we actually need this both ways: astc5x5 followed by aux
+ * and vice-versa? Or is only one direction problematic?
+ */
+void
+gen9_astc5x5_sampler_wa(struct brw_context *brw,
+ enum gen9_astc5x5_wa_tex_type curr_mask)
+{
+ if ((brw->gen9_sampler_wa_tex_mask & GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5) &&
+ (curr_mask & GEN9_ASTC5X5_WA_TEX_TYPE_AUX))
+ brw_emit_pipe_control_flush(brw, PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
+
+ if ((brw->gen9_sampler_wa_tex_mask & GEN9_ASTC5X5_WA_TEX_TYPE_AUX) &&
+ (curr_mask & GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5))
+ brw_emit_pipe_control_flush(brw, PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
+
+ brw->gen9_sampler_wa_tex_mask = curr_mask;
+}
+
/**
* \brief Resolve buffers before drawing.
*
@@ -383,6 +427,12 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering)
struct gl_context *ctx = &brw->ctx;
struct intel_texture_object *tex_obj;

+ const enum gen9_astc5x5_wa_tex_type curr_wa_mask =
+ (brw->screen->devinfo.gen == 9) ? gen9_astc5x5_wa_get_tex_mask(brw) : 0;
+
+ if (brw->screen->devinfo.gen == 9)
+ gen9_astc5x5_sampler_wa(brw, curr_wa_mask);
+
memset(brw->draw_aux_buffer_disabled, 0,
sizeof(brw->draw_aux_buffer_disabled));

@@ -413,6 +463,8 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering)
num_layers = INTEL_REMAINING_LAYERS;
}

+ const bool sampler_wa_disable_aux =
+ curr_wa_mask & GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5;
const bool disable_aux = rendering &&
intel_disable_rb_aux_buffer(brw, tex_obj->mt, min_level, num_levels,
"for sampling");
@@ -420,7 +472,7 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering)
intel_miptree_prepare_texture(brw, tex_obj->mt, view_format,
min_level, num_levels,
min_layer, num_layers,
- disable_aux);
+ disable_aux || sampler_wa_disable_aux);

brw_cache_flush_for_read(brw, tex_obj->mt->bo);
--
2.14.1
Rogovin, Kevin
2017-12-05 05:41:56 UTC
Reply
Permalink
Raw Message
Hi,

The patch series I have submitted handles the case of needing to resolve texture surfaces when a draw (or compute) accesses a texture which is astc5x5. As it is quite clear you understand the issue and know the code of i965 the patch centers on, you are an excellent person to review the code.

Here are my comments of the patch posted:

1. it is essentially replication and moving around of the code of the patch series posted earlier but missing various
important bits: preventing the sampler from using the auxiliary buffer (this requires to modify surface state
sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently (blorp might read from an ASTC5x5
and there are more paths in blorp than blorp_surf_for_miptree() that sample from surfaces.

2. using the check that the GPU is gen9 (and for that matter, using the name gen9_ astc5x5_sampler_wa())
is not ideal; I would not be surprised that the bug might not be present on various re-spins of Gen9 and/or
it might linger on to future Gens. I do NOT know; using a Boolean assigned earlier makes the code more
future-ish proof.

3. the nature of GPU fragment dispatch is going to require a texture invalidate even if the sampler only
have the bug in one direction; this is because a subslice is not guaranteed to process fragments in any
order. The crux is that a single sampler serves an entire sub-slice which has more than 1 EU. It is quite
possible that one EU has threads of a draw call ahead of the other and depending on the timing, portions
of those fragments' coming after might be processed by the sampler of those before of those fragments
coming before in batchbuffer order. Indeed a single EU might have threads from separate draws as well.
A texture invalidate places a barrier in the pipeline preventing the mixing (and means that efficiency of
GPU drops quite a bit with every texture invalidate sadly).

4. With 3 in mind, using the bit-masks is not a good idea as we want to then enforce at the code level
that only one of the two is possible without texture invalidates.

-Kevin


-----Original Message-----
From: Topi Pohjolainen [mailto:***@gmail.com]
Sent: Monday, December 4, 2017 12:49 PM
To: mesa-***@lists.freedesktop.org
Cc: Rogovin, Kevin <***@intel.com>
Subject: [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

This is just drafting some thoughts and only compile tested.

CC: "Rogovin, Kevin" <***@intel.com>
---
src/mesa/drivers/dri/i965/brw_blorp.c | 8 +++++
src/mesa/drivers/dri/i965/brw_context.h | 10 ++++++
src/mesa/drivers/dri/i965/brw_draw.c | 54 ++++++++++++++++++++++++++++++++-
3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c
index 680121b6ab..b3f84ab8ca 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -186,11 +186,19 @@ blorp_surf_for_miptree(struct brw_context *brw,
surf->aux_addr.buffer = mt->hiz_buf->bo;
surf->aux_addr.offset = mt->hiz_buf->offset;
}
+
+ if (!is_render_target && brw->screen->devinfo.gen == 9)
+ gen9_astc5x5_sampler_wa(brw, GEN9_ASTC5X5_WA_TEX_TYPE_AUX);
} else {
surf->aux_addr = (struct blorp_address) {
.buffer = NULL,
};
memset(&surf->clear_color, 0, sizeof(surf->clear_color));
+
+ if (!is_render_target && brw->screen->devinfo.gen == 9 &&
+ (mt->format == MESA_FORMAT_RGBA_ASTC_5x5 ||
+ mt->format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5))
+ gen9_astc5x5_sampler_wa(brw,
+ GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5);
}
assert((surf->aux_usage == ISL_AUX_USAGE_NONE) ==
(surf->aux_addr.buffer == NULL)); diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
index 0670483806..44602c23c0 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -165,6 +165,11 @@ enum brw_cache_id {
BRW_MAX_CACHE
};

+enum gen9_astc5x5_wa_tex_type {
+ GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5 = 1 << 0,
+ GEN9_ASTC5X5_WA_TEX_TYPE_AUX = 1 << 1,
+};
+
enum brw_state_id {
/* brw_cache_ids must come first - see brw_program_cache.c */
BRW_STATE_URB_FENCE = BRW_MAX_CACHE, @@ -1262,6 +1267,8 @@ struct brw_context
*/
bool draw_aux_buffer_disabled[MAX_DRAW_BUFFERS];

+ enum gen9_astc5x5_wa_tex_type gen9_sampler_wa_tex_mask;
+
__DRIcontext *driContext;
struct intel_screen *screen;
};
@@ -1286,6 +1293,9 @@ void intel_update_renderbuffers(__DRIcontext *context,
__DRIdrawable *drawable); void intel_prepare_render(struct brw_context *brw);

+void gen9_astc5x5_sampler_wa(struct brw_context *brw,
+ enum gen9_astc5x5_wa_tex_type curr_mask);
+
void brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering);

void intel_resolve_for_dri2_flush(struct brw_context *brw, diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c
index 7e29dcfd4e..929f806eb3 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -371,6 +371,50 @@ intel_disable_rb_aux_buffer(struct brw_context *brw,
return found;
}

+static enum gen9_astc5x5_wa_tex_type
+gen9_astc5x5_wa_get_tex_mask(const struct brw_context *brw) {
+ enum gen9_astc5x5_wa_tex_type mask = 0;
+ const struct gl_context *ctx = &brw->ctx;
+ const struct intel_texture_object *tex_obj;
+
+ const int maxEnabledUnit = ctx->Texture._MaxEnabledTexImageUnit;
+ for (int i = 0; i <= maxEnabledUnit; i++) {
+ if (!ctx->Texture.Unit[i]._Current)
+ continue;
+ tex_obj = intel_texture_object(ctx->Texture.Unit[i]._Current);
+ if (!tex_obj || !tex_obj->mt)
+ continue;
+
+ if (tex_obj->mt->aux_usage != ISL_AUX_USAGE_NONE)
+ mask |= GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5;
+
+ if (tex_obj->_Format == MESA_FORMAT_RGBA_ASTC_5x5 ||
+ tex_obj->_Format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5)
+ mask |= GEN9_ASTC5X5_WA_TEX_TYPE_AUX;
+ }
+
+ return mask;
+}
+
+/* TODO: Do we actually need this both ways: astc5x5 followed by aux
+ * and vice-versa? Or is only one direction problematic?
+ */
+void
+gen9_astc5x5_sampler_wa(struct brw_context *brw,
+ enum gen9_astc5x5_wa_tex_type curr_mask) {
+ if ((brw->gen9_sampler_wa_tex_mask & GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5) &&
+ (curr_mask & GEN9_ASTC5X5_WA_TEX_TYPE_AUX))
+ brw_emit_pipe_control_flush(brw,
+PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
+
+ if ((brw->gen9_sampler_wa_tex_mask & GEN9_ASTC5X5_WA_TEX_TYPE_AUX) &&
+ (curr_mask & GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5))
+ brw_emit_pipe_control_flush(brw,
+ PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
+
+ brw->gen9_sampler_wa_tex_mask = curr_mask; }
+
/**
* \brief Resolve buffers before drawing.
*
@@ -383,6 +427,12 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering)
struct gl_context *ctx = &brw->ctx;
struct intel_texture_object *tex_obj;

+ const enum gen9_astc5x5_wa_tex_type curr_wa_mask =
+ (brw->screen->devinfo.gen == 9) ?
+ gen9_astc5x5_wa_get_tex_mask(brw) : 0;
+
+ if (brw->screen->devinfo.gen == 9)
+ gen9_astc5x5_sampler_wa(brw, curr_wa_mask);
+
memset(brw->draw_aux_buffer_disabled, 0,
sizeof(brw->draw_aux_buffer_disabled));

@@ -413,6 +463,8 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering)
num_layers = INTEL_REMAINING_LAYERS;
}

+ const bool sampler_wa_disable_aux =
+ curr_wa_mask & GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5;
const bool disable_aux = rendering &&
intel_disable_rb_aux_buffer(brw, tex_obj->mt, min_level, num_levels,
"for sampling"); @@ -420,7 +472,7 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering)
intel_miptree_prepare_texture(brw, tex_obj->mt, view_format,
min_level, num_levels,
min_layer, num_layers,
- disable_aux);
+ disable_aux ||
+ sampler_wa_disable_aux);

brw_cache_flush_for_read(brw, tex_obj->mt->bo);

--
2.14.1
Pohjolainen, Topi
2017-12-05 09:31:48 UTC
Reply
Permalink
Raw Message
Post by Rogovin, Kevin
Hi,
The patch series I have submitted handles the case of needing to resolve texture surfaces when a draw (or compute) accesses a texture which is astc5x5. As it is quite clear you understand the issue and know the code of i965 the patch centers on, you are an excellent person to review the code.
We discussed earlier if we could handle the needed resolves in
brw_predraw_resolve_inputs(). I understood you had reservations on how it
would turn out and therefore I thought I try writing it out in code. I think
we could fit it there nicely.

I have to admit that I went quite a bit further. One thing I was looking for
was getting the code recording the needed bits into context closer to the
point consuming them. That looks doable as well, now reacting to the bits and
recording are both in gen9_astc5x5_sampler_wa(). It also becomes clear that
brw_predraw_resolve_inputs() does not depend on the bits in the context but
solely on the current texture settings.
This is important to me as I'm quite sure I'll be debugging things later on
where I need to pay attention to what happens with this workaround (among
other things).
Post by Rogovin, Kevin
1. it is essentially replication and moving around of the code of the patch series posted earlier but missing various
important bits: preventing the sampler from using the auxiliary buffer (this requires to modify surface state
sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently (blorp might read from an ASTC5x5
and there are more paths in blorp than blorp_surf_for_miptree() that sample from surfaces.
Can you explain both more in detail? Resolves done in
brw_predraw_resolve_inputs() mean that there is nothing interesting in the
aux buffers and surface setup won't therefore enable auxiliary for texture
surfaces.

In case of blorp, as far as I know all operations sampling something should go
thru blorp_surf_for_miptree(). Can you point out cases that don't?
Post by Rogovin, Kevin
2. using the check that the GPU is gen9 (and for that matter, using the name gen9_ astc5x5_sampler_wa())
is not ideal; I would not be surprised that the bug might not be present on various re-spins of Gen9 and/or
it might linger on to future Gens. I do NOT know; using a Boolean assigned earlier makes the code more
future-ish proof.
This is quite common style of ours, workaround has the name of the platform
it was first introduced in.
Post by Rogovin, Kevin
3. the nature of GPU fragment dispatch is going to require a texture invalidate even if the sampler only
have the bug in one direction; this is because a subslice is not guaranteed to process fragments in any
order. The crux is that a single sampler serves an entire sub-slice which has more than 1 EU. It is quite
possible that one EU has threads of a draw call ahead of the other and depending on the timing, portions
of those fragments' coming after might be processed by the sampler of those before of those fragments
coming before in batchbuffer order. Indeed a single EU might have threads from separate draws as well.
A texture invalidate places a barrier in the pipeline preventing the mixing (and means that efficiency of
GPU drops quite a bit with every texture invalidate sadly).
Right. In the case of sampling both aux and astc5x5 in the same draw cycle the
only thing to do is to disable aux. With my question of direction I meant the
texture cache flush between two cycles. Do we need to flush in both cases

1) ASTC5x5 in first cycle and AUX in the following
2) AUX in first cycle and ASTC5x5 in the following
Post by Rogovin, Kevin
4. With 3 in mind, using the bit-masks is not a good idea as we want to then enforce at the code level
that only one of the two is possible without texture invalidates.
Can you elaborate this a little more? It tells if aux is/was used and it tells
if astc5x5 is/was used. That is all we need, right?
Post by Rogovin, Kevin
-Kevin
-----Original Message-----
Sent: Monday, December 4, 2017 12:49 PM
Subject: [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
This is just drafting some thoughts and only compile tested.
---
src/mesa/drivers/dri/i965/brw_blorp.c | 8 +++++
src/mesa/drivers/dri/i965/brw_context.h | 10 ++++++
src/mesa/drivers/dri/i965/brw_draw.c | 54 ++++++++++++++++++++++++++++++++-
3 files changed, 71 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c
index 680121b6ab..b3f84ab8ca 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -186,11 +186,19 @@ blorp_surf_for_miptree(struct brw_context *brw,
surf->aux_addr.buffer = mt->hiz_buf->bo;
surf->aux_addr.offset = mt->hiz_buf->offset;
}
+
+ if (!is_render_target && brw->screen->devinfo.gen == 9)
+ gen9_astc5x5_sampler_wa(brw, GEN9_ASTC5X5_WA_TEX_TYPE_AUX);
} else {
surf->aux_addr = (struct blorp_address) {
.buffer = NULL,
};
memset(&surf->clear_color, 0, sizeof(surf->clear_color));
+
+ if (!is_render_target && brw->screen->devinfo.gen == 9 &&
+ (mt->format == MESA_FORMAT_RGBA_ASTC_5x5 ||
+ mt->format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5))
+ gen9_astc5x5_sampler_wa(brw,
+ GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5);
}
assert((surf->aux_usage == ISL_AUX_USAGE_NONE) ==
(surf->aux_addr.buffer == NULL)); diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
index 0670483806..44602c23c0 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -165,6 +165,11 @@ enum brw_cache_id {
BRW_MAX_CACHE
};
+enum gen9_astc5x5_wa_tex_type {
+ GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5 = 1 << 0,
+ GEN9_ASTC5X5_WA_TEX_TYPE_AUX = 1 << 1,
+};
+
enum brw_state_id {
/* brw_cache_ids must come first - see brw_program_cache.c */
*/
bool draw_aux_buffer_disabled[MAX_DRAW_BUFFERS];
+ enum gen9_astc5x5_wa_tex_type gen9_sampler_wa_tex_mask;
+
__DRIcontext *driContext;
struct intel_screen *screen;
};
@@ -1286,6 +1293,9 @@ void intel_update_renderbuffers(__DRIcontext *context,
__DRIdrawable *drawable); void intel_prepare_render(struct brw_context *brw);
+void gen9_astc5x5_sampler_wa(struct brw_context *brw,
+ enum gen9_astc5x5_wa_tex_type curr_mask);
+
void brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering);
void intel_resolve_for_dri2_flush(struct brw_context *brw, diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c
index 7e29dcfd4e..929f806eb3 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -371,6 +371,50 @@ intel_disable_rb_aux_buffer(struct brw_context *brw,
return found;
}
+static enum gen9_astc5x5_wa_tex_type
+gen9_astc5x5_wa_get_tex_mask(const struct brw_context *brw) {
+ enum gen9_astc5x5_wa_tex_type mask = 0;
+ const struct gl_context *ctx = &brw->ctx;
+ const struct intel_texture_object *tex_obj;
+
+ const int maxEnabledUnit = ctx->Texture._MaxEnabledTexImageUnit;
+ for (int i = 0; i <= maxEnabledUnit; i++) {
+ if (!ctx->Texture.Unit[i]._Current)
+ continue;
+ tex_obj = intel_texture_object(ctx->Texture.Unit[i]._Current);
+ if (!tex_obj || !tex_obj->mt)
+ continue;
+
+ if (tex_obj->mt->aux_usage != ISL_AUX_USAGE_NONE)
+ mask |= GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5;
+
+ if (tex_obj->_Format == MESA_FORMAT_RGBA_ASTC_5x5 ||
+ tex_obj->_Format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5)
+ mask |= GEN9_ASTC5X5_WA_TEX_TYPE_AUX;
+ }
+
+ return mask;
+}
+
+/* TODO: Do we actually need this both ways: astc5x5 followed by aux
+ * and vice-versa? Or is only one direction problematic?
+ */
+void
+gen9_astc5x5_sampler_wa(struct brw_context *brw,
+ enum gen9_astc5x5_wa_tex_type curr_mask) {
+ if ((brw->gen9_sampler_wa_tex_mask & GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5) &&
+ (curr_mask & GEN9_ASTC5X5_WA_TEX_TYPE_AUX))
+ brw_emit_pipe_control_flush(brw,
+PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
+
+ if ((brw->gen9_sampler_wa_tex_mask & GEN9_ASTC5X5_WA_TEX_TYPE_AUX) &&
+ (curr_mask & GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5))
+ brw_emit_pipe_control_flush(brw,
+ PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
+
+ brw->gen9_sampler_wa_tex_mask = curr_mask; }
+
/**
* \brief Resolve buffers before drawing.
*
@@ -383,6 +427,12 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering)
struct gl_context *ctx = &brw->ctx;
struct intel_texture_object *tex_obj;
+ const enum gen9_astc5x5_wa_tex_type curr_wa_mask =
+ (brw->screen->devinfo.gen == 9) ?
+ gen9_astc5x5_wa_get_tex_mask(brw) : 0;
+
+ if (brw->screen->devinfo.gen == 9)
+ gen9_astc5x5_sampler_wa(brw, curr_wa_mask);
+
memset(brw->draw_aux_buffer_disabled, 0,
sizeof(brw->draw_aux_buffer_disabled));
@@ -413,6 +463,8 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering)
num_layers = INTEL_REMAINING_LAYERS;
}
+ const bool sampler_wa_disable_aux =
+ curr_wa_mask & GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5;
const bool disable_aux = rendering &&
intel_disable_rb_aux_buffer(brw, tex_obj->mt, min_level, num_levels,
intel_miptree_prepare_texture(brw, tex_obj->mt, view_format,
min_level, num_levels,
min_layer, num_layers,
- disable_aux);
+ disable_aux ||
+ sampler_wa_disable_aux);
brw_cache_flush_for_read(brw, tex_obj->mt->bo);
--
2.14.1
Rogovin, Kevin
2017-12-05 10:26:33 UTC
Reply
Permalink
Raw Message
Hi,
Post by Pohjolainen, Topi
Post by Rogovin, Kevin
1. it is essentially replication and moving around of the code of the patch series posted earlier but missing various
important bits: preventing the sampler from using the auxiliary buffer (this requires to modify surface state
sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently (blorp might read from an ASTC5x5
and there are more paths in blorp than blorp_surf_for_miptree() that sample from surfaces.
Can you explain both more in detail? Resolves done in
brw_predraw_resolve_inputs() mean that there is nothing interesting in the aux buffers and surface setup won't therefore enable auxiliary for texture surfaces.
That there is nothing interesting is irrelevant to the sampler if the SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the GPU command; The sampler will always try to read the auxiliary buffer if it is given the opportunity to do so. Indeed, it is quite feasible that less bandwidth is consumed if the sampler is given the chance to read an auxiliary buffer in place of the buffer; as such even if the surface is resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer is fully resolved (see inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c).
Post by Pohjolainen, Topi
In case of blorp, as far as I know all operations sampling something should go thru blorp_surf_for_miptree(). Can you point out cases that don't?
Blorp is used in more than blorp_surf_for_miptree(), for example implementing GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 (you can see this handled in the patch series I posted). I chose the hammer that the default is to just assume blorp is going to access auxiliary buffers unless a flag is set when the caller knows that blorp is going to sample from an astc5x5 (against see my patch series).
Post by Pohjolainen, Topi
Right. In the case of sampling both aux and astc5x5 in the same draw cycle the only thing to do is to disable aux. With my question of direction I meant the texture
cache flush between two cycles. Do we need to flush in both cases
1) ASTC5x5 in first cycle and AUX in the following
2) AUX in first cycle and ASTC5x5 in the following
YES we need to flush in both cases. What is happening is that the sampler hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. Then if the sampler first samples from an ASTC5x5 then an AUX it would not hang, but the other way it would. However, if there are multiple draws in flight where one samples from an ASTC5x5 and the other does not, the command buffer order gives ZERO guarantee that the sampler will sample in that order because fragments get executed out-of-order even across draw calls even within a subslice (this is why sendc is needed at end of shader in GEN).
Post by Pohjolainen, Topi
Post by Rogovin, Kevin
4. With 3 in mind, using the bit-masks is not a good idea as we want to then enforce at the code level
that only one of the two is possible without texture invalidates.
Can you elaborate this a little more? It tells if aux is/was used and it tells if astc5x5 is/was used. That is all we need, right?
WRONG. We must enforce that a given draw call can have neither or only one. By having bitmasks it is possible to support a state having both.

At any rate, please review the patch series I have posted and I am happy to take suggestions to improve that patch series that I have tested.

-Kevin
Pohjolainen, Topi
2017-12-05 11:00:28 UTC
Reply
Permalink
Raw Message
Post by Rogovin, Kevin
Hi,
Post by Pohjolainen, Topi
Post by Rogovin, Kevin
1. it is essentially replication and moving around of the code of the patch series posted earlier but missing various
important bits: preventing the sampler from using the auxiliary buffer (this requires to modify surface state
sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently (blorp might read from an ASTC5x5
and there are more paths in blorp than blorp_surf_for_miptree() that sample from surfaces.
Can you explain both more in detail? Resolves done in
brw_predraw_resolve_inputs() mean that there is nothing interesting in the aux buffers and surface setup won't therefore enable auxiliary for texture surfaces.
That there is nothing interesting is irrelevant to the sampler if the SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the GPU command; The sampler will always try to read the auxiliary buffer if it is given the opportunity to do so. Indeed, it is quite feasible that less bandwidth is consumed if the sampler is given the chance to read an auxiliary buffer in place of the buffer; as such even if the surface is resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer is fully resolved (see inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c).
If you take a look at brw_update_texture_surface(), just in the end before
brw_emit_surface_state() the logic explictly consults for
intel_miptree_texture_aux_usage(). This in turn tells if the auxiliary buffer
is resolved and it doesn't need to be programmed.
Post by Rogovin, Kevin
Post by Pohjolainen, Topi
In case of blorp, as far as I know all operations sampling something should go thru blorp_surf_for_miptree(). Can you point out cases that don't?
Blorp is used in more than blorp_surf_for_miptree(), for example implementing GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 (you can see this handled in the patch series I posted). I chose the hammer that the default is to just assume blorp is going to access auxiliary buffers unless a flag is set when the caller knows that blorp is going to sample from an astc5x5 (against see my patch series).
This path goes thru brw_blorp_download_miptree() which in turn takes either
brw_blorp_copy_miptrees() or brw_blorp_blit_miptrees(). Both in turn consult
blorp_surf_for_miptree().
Post by Rogovin, Kevin
Post by Pohjolainen, Topi
Right. In the case of sampling both aux and astc5x5 in the same draw cycle the only thing to do is to disable aux. With my question of direction I meant the texture
cache flush between two cycles. Do we need to flush in both cases
1) ASTC5x5 in first cycle and AUX in the following
2) AUX in first cycle and ASTC5x5 in the following
YES we need to flush in both cases. What is happening is that the sampler hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. Then if the sampler first samples from an ASTC5x5 then an AUX it would not hang, but the other way it would. However, if there are multiple draws in flight where one samples from an ASTC5x5 and the other does not, the command buffer order gives ZERO guarantee that the sampler will sample in that order because fragments get executed out-of-order even across draw calls even within a subslice (this is why sendc is needed at end of shader in GEN).
This would be a nice a piece of documentation to add into the code. Thanks
for explaining.
Post by Rogovin, Kevin
Post by Pohjolainen, Topi
Post by Rogovin, Kevin
4. With 3 in mind, using the bit-masks is not a good idea as we want to then enforce at the code level
that only one of the two is possible without texture invalidates.
Can you elaborate this a little more? It tells if aux is/was used and it tells if astc5x5 is/was used. That is all we need, right?
WRONG. We must enforce that a given draw call can have neither or only one. By having bitmasks it is possible to support a state having both.
I don't see how the bit mask prevents one from forcing anything. I now do see
a flaw in the RFC related to this. In brw_predraw_resolve_inputs() one would
actually need to smash down the recorded AUX mask bit when it reacts to ASTC5x5
being present.
Post by Rogovin, Kevin
At any rate, please review the patch series I have posted and I am happy to take suggestions to improve that patch series that I have tested.
Well, if nothing else, I would really like to see you used
brw_predraw_resolve_inputs() for the resolves.
Post by Rogovin, Kevin
-Kevin
Pohjolainen, Topi
2017-12-05 11:14:03 UTC
Reply
Permalink
Raw Message
Post by Pohjolainen, Topi
Post by Rogovin, Kevin
Hi,
Post by Pohjolainen, Topi
Post by Rogovin, Kevin
1. it is essentially replication and moving around of the code of the patch series posted earlier but missing various
important bits: preventing the sampler from using the auxiliary buffer (this requires to modify surface state
sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently (blorp might read from an ASTC5x5
and there are more paths in blorp than blorp_surf_for_miptree() that sample from surfaces.
Can you explain both more in detail? Resolves done in
brw_predraw_resolve_inputs() mean that there is nothing interesting in the aux buffers and surface setup won't therefore enable auxiliary for texture surfaces.
That there is nothing interesting is irrelevant to the sampler if the SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the GPU command; The sampler will always try to read the auxiliary buffer if it is given the opportunity to do so. Indeed, it is quite feasible that less bandwidth is consumed if the sampler is given the chance to read an auxiliary buffer in place of the buffer; as such even if the surface is resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer is fully resolved (see inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c).
If you take a look at brw_update_texture_surface(), just in the end before
brw_emit_surface_state() the logic explictly consults for
intel_miptree_texture_aux_usage(). This in turn tells if the auxiliary buffer
is resolved and it doesn't need to be programmed.
Post by Rogovin, Kevin
Post by Pohjolainen, Topi
In case of blorp, as far as I know all operations sampling something should go thru blorp_surf_for_miptree(). Can you point out cases that don't?
Blorp is used in more than blorp_surf_for_miptree(), for example implementing GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 (you can see this handled in the patch series I posted). I chose the hammer that the default is to just assume blorp is going to access auxiliary buffers unless a flag is set when the caller knows that blorp is going to sample from an astc5x5 (against see my patch series).
This path goes thru brw_blorp_download_miptree() which in turn takes either
brw_blorp_copy_miptrees() or brw_blorp_blit_miptrees(). Both in turn consult
blorp_surf_for_miptree().
Post by Rogovin, Kevin
Post by Pohjolainen, Topi
Right. In the case of sampling both aux and astc5x5 in the same draw cycle the only thing to do is to disable aux. With my question of direction I meant the texture
cache flush between two cycles. Do we need to flush in both cases
1) ASTC5x5 in first cycle and AUX in the following
2) AUX in first cycle and ASTC5x5 in the following
YES we need to flush in both cases. What is happening is that the sampler hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. Then if the sampler first samples from an ASTC5x5 then an AUX it would not hang, but the other way it would. However, if there are multiple draws in flight where one samples from an ASTC5x5 and the other does not, the command buffer order gives ZERO guarantee that the sampler will sample in that order because fragments get executed out-of-order even across draw calls even within a subslice (this is why sendc is needed at end of shader in GEN).
This would be a nice a piece of documentation to add into the code. Thanks
for explaining.
Post by Rogovin, Kevin
Post by Pohjolainen, Topi
Post by Rogovin, Kevin
4. With 3 in mind, using the bit-masks is not a good idea as we want to then enforce at the code level
that only one of the two is possible without texture invalidates.
Can you elaborate this a little more? It tells if aux is/was used and it tells if astc5x5 is/was used. That is all we need, right?
WRONG. We must enforce that a given draw call can have neither or only one. By having bitmasks it is possible to support a state having both.
I don't see how the bit mask prevents one from forcing anything. I now do see
a flaw in the RFC related to this. In brw_predraw_resolve_inputs() one would
actually need to smash down the recorded AUX mask bit when it reacts to ASTC5x5
being present.
Post by Rogovin, Kevin
At any rate, please review the patch series I have posted and I am happy to take suggestions to improve that patch series that I have tested.
Well, if nothing else, I would really like to see you used
brw_predraw_resolve_inputs() for the resolves.
I would anyway wait a bit. Other people might have entirely different way of
looking at this. I'd be interested to hear an opinion or two.
Rogovin, Kevin
2017-12-05 11:27:49 UTC
Reply
Permalink
Raw Message
Post by Pohjolainen, Topi
If you take a look at brw_update_texture_surface(), just in the end before
brw_emit_surface_state() the logic explictly consults for intel_miptree_texture_aux_usage().
This in turn tells if the auxiliary buffer is resolved and it doesn't need to be programmed.
The full stack trace is this:

brw_update_texture_surface() calls intel_miptree_texture_usage() which for HiZ calls the function intel_miptree_sample_with_hiz() which, as the name suggest, returns true if and only if it is ok to use the HiZ to speed up depth texture reads. Indeed, the function calls intel_miptree_texture_usage() switches on intel_mipmap_tree::aux_usage which the documentation states is about the intended usage of the auxiliary buffer. Indeed, if the value is ISL_AUX_USAGE_NONE that means, quoting the comment tag above the field, "that auxiliary compression is permanently disabled". The conclusion then is that even if a buffer is fully resolved, the value of aux_usage is not ISL_AUX_USAGE_NONE and that the return value of calls intel_miptree_texture_usage() gives the return value assuming that the auxiliary buffer can be used with respect to that it holds good values enough for the sampler.
Post by Pohjolainen, Topi
This path goes thru brw_blorp_download_miptree() which in turn takes either
brw_blorp_copy_miptrees() or brw_blorp_blit_miptrees(). Both in turn consult blorp_surf_for_miptree().
If you are 100% sure (because I am not) that ALL blorp paths walk through that, then I have no real objection except it needs to do the magic of checking if it is reading from an ASTC5x5 or a surface with an auxiliary and update the enumeration appropriately.
Post by Pohjolainen, Topi
This would be a nice a piece of documentation to add into the code. Thanks for explaining.
I can add this, though I do freely admit I take this for granted and an axiom of HW accelerated graphics.
Post by Pohjolainen, Topi
I don't see how the bit mask prevents one from forcing anything.
By not being able to encode such a state (both are present) such a state is impossible to store and cannot be reached. From the point of view of code, an enum is slightly more pleasant to read than bitmaks IMO.
Post by Pohjolainen, Topi
I now do see a flaw in the RFC related to this. In brw_predraw_resolve_inputs() one would
actually need to smash down the recorded AUX mask bit when it reacts to ASTC5x5 being present.
Indeed, really two passes are needed over the textures. The first pass to detect if ASTC5x5 is present and a second to resolve auxiliary using textures if they are present.
Post by Pohjolainen, Topi
Well, if nothing else, I would really like to see you used brw_predraw_resolve_inputs() for the resolves.
I am happy with that as that walks through the textures anyways BUT it is called before brw_predraw_resolve_framebuffer() which might do some resolving too. The easy way out is to permute their calling order unless there is some other dangling assumption preventing permuting the call order of those two.
Post by Pohjolainen, Topi
-Kevin
Pohjolainen, Topi
2017-12-05 18:17:12 UTC
Reply
Permalink
Raw Message
Post by Rogovin, Kevin
Hi,
Post by Pohjolainen, Topi
Post by Rogovin, Kevin
1. it is essentially replication and moving around of the code of the patch series posted earlier but missing various
important bits: preventing the sampler from using the auxiliary buffer (this requires to modify surface state
sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently (blorp might read from an ASTC5x5
and there are more paths in blorp than blorp_surf_for_miptree() that sample from surfaces.
Can you explain both more in detail? Resolves done in
brw_predraw_resolve_inputs() mean that there is nothing interesting in the aux buffers and surface setup won't therefore enable auxiliary for texture surfaces.
That there is nothing interesting is irrelevant to the sampler if the SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the GPU command; The sampler will always try to read the auxiliary buffer if it is given the opportunity to do so. Indeed, it is quite feasible that less bandwidth is consumed if the sampler is given the chance to read an auxiliary buffer in place of the buffer; as such even if the surface is resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer is fully resolved (see inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c).
Post by Pohjolainen, Topi
In case of blorp, as far as I know all operations sampling something should go thru blorp_surf_for_miptree(). Can you point out cases that don't?
Blorp is used in more than blorp_surf_for_miptree(), for example implementing GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 (you can see this handled in the patch series I posted). I chose the hammer that the default is to just assume blorp is going to access auxiliary buffers unless a flag is set when the caller knows that blorp is going to sample from an astc5x5 (against see my patch series).
Post by Pohjolainen, Topi
Right. In the case of sampling both aux and astc5x5 in the same draw cycle the only thing to do is to disable aux. With my question of direction I meant the texture
cache flush between two cycles. Do we need to flush in both cases
1) ASTC5x5 in first cycle and AUX in the following
2) AUX in first cycle and ASTC5x5 in the following
YES we need to flush in both cases. What is happening is that the sampler hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. Then if the sampler first samples from an ASTC5x5 then an AUX it would not hang, but the other way it would. However, if there are multiple draws in flight where one samples from an ASTC5x5 and the other does not, the command buffer order gives ZERO guarantee that the sampler will sample in that order because fragments get executed out-of-order even across draw calls even within a subslice (this is why sendc is needed at end of shader in GEN).
I need to clarify this bit a little more. In the current setup we have only
one draw cycle in flight per context that uses sample engine. There may be
blitter calls in parallel (although I'm not sure) but they don't use sampling
engine anyway.

The draw cycle itself may have multiple draws programmed into it but as they
all are based on the same surface setup there is naturally no flushing problem.
Surfaces with auxiliary would be resolved before the draw and programmed
without auxiliary buffer.

Are you saying that this bug extends over hardware context?
Jason Ekstrand
2017-12-05 18:24:46 UTC
Reply
Permalink
Raw Message
On Tue, Dec 5, 2017 at 10:17 AM, Pohjolainen, Topi <
Post by Rogovin, Kevin
Post by Rogovin, Kevin
Hi,
Post by Pohjolainen, Topi
Post by Rogovin, Kevin
1. it is essentially replication and moving around of the code of
the patch series posted earlier but missing various
Post by Rogovin, Kevin
Post by Pohjolainen, Topi
Post by Rogovin, Kevin
important bits: preventing the sampler from using the auxiliary
buffer (this requires to modify surface state
Post by Rogovin, Kevin
Post by Pohjolainen, Topi
Post by Rogovin, Kevin
sent in brw_wm_surfaces.c). It also does not cover blorp
sufficiently (blorp might read from an ASTC5x5
Post by Rogovin, Kevin
Post by Pohjolainen, Topi
Post by Rogovin, Kevin
and there are more paths in blorp than blorp_surf_for_miptree()
that sample from surfaces.
Post by Rogovin, Kevin
Post by Pohjolainen, Topi
Can you explain both more in detail? Resolves done in
brw_predraw_resolve_inputs() mean that there is nothing interesting in
the aux buffers and surface setup won't therefore enable auxiliary for
texture surfaces.
Post by Rogovin, Kevin
That there is nothing interesting is irrelevant to the sampler if the
SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the
SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the
GPU command; The sampler will always try to read the auxiliary buffer if it
is given the opportunity to do so. Indeed, it is quite feasible that less
bandwidth is consumed if the sampler is given the chance to read an
auxiliary buffer in place of the buffer; as such even if the surface is
resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for
HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer
is fully resolved (see inte_mipmap_tree_sample_with_hiz() in
intel_mipmap_tree.c).
We do have code which checks for the resolve state and disables the
auxiliary buffer. However, I don't like relying on it for correctness as
that's bitten us before. I think it'd be better to check the W/A state for
ASTC5x5, assert that everything is resolved, and manually disable aux in
that case.
Post by Rogovin, Kevin
Post by Rogovin, Kevin
Post by Pohjolainen, Topi
In case of blorp, as far as I know all operations sampling something
should go thru blorp_surf_for_miptree(). Can you point out cases that don't?
Post by Rogovin, Kevin
Blorp is used in more than blorp_surf_for_miptree(), for example
implementing GetTexImage(). Indeed, it is possible for blorp to sample from
an ASTC5x5 (you can see this handled in the patch series I posted). I chose
the hammer that the default is to just assume blorp is going to access
auxiliary buffers unless a flag is set when the caller knows that blorp is
going to sample from an astc5x5 (against see my patch series).
This isn't true. 100% of the intel_mipmap_tree -> blorp_surf translations
are handled by that function. It's a perfectly reasonable place to handle
these things. It could also be handled in genX(blorp_exec) if that makes
someone more comfortable.
Post by Rogovin, Kevin
Post by Rogovin, Kevin
Right. In the case of sampling both aux and astc5x5 in the same draw
cycle the only thing to do is to disable aux. With my question of direction
I meant the texture
Post by Rogovin, Kevin
Post by Pohjolainen, Topi
cache flush between two cycles. Do we need to flush in both cases
1) ASTC5x5 in first cycle and AUX in the following
2) AUX in first cycle and ASTC5x5 in the following
YES we need to flush in both cases. What is happening is that the
sampler hardware is bugged. Let us suppose it was bugged in only 1
direction, take 1. Then if the sampler first samples from an ASTC5x5 then
an AUX it would not hang, but the other way it would. However, if there are
multiple draws in flight where one samples from an ASTC5x5 and the other
does not, the command buffer order gives ZERO guarantee that the sampler
will sample in that order because fragments get executed out-of-order even
across draw calls even within a subslice (this is why sendc is needed at
end of shader in GEN).
I need to clarify this bit a little more. In the current setup we have only
one draw cycle in flight per context that uses sample engine. There may be
blitter calls in parallel (although I'm not sure) but they don't use sampling
engine anyway.
The draw cycle itself may have multiple draws programmed into it but as they
all are based on the same surface setup there is naturally no flushing problem.
Surfaces with auxiliary would be resolved before the draw and programmed
without auxiliary buffer.
The problem is that a single invalidate doesn't actually cause a
synchronization point in the rendering operation. All it does is torch the
texture cache and it does so immediately and in parallel with currently
active rendering. If you can't have ASTC5x5 in the texture cache with a
CCS_E surface, then we need to do a CS stall to ensure that the previous
draw is complete before we invalidate. Otherwise, if the draw with ASTC5x5
is still in-flight, the texture cache will immediately start to get
populated with ASTC5x5 data again.
Rogovin, Kevin
2017-12-05 18:34:42 UTC
Reply
Permalink
Raw Message
Hi,
This isn't true.  100% of the intel_mipmap_tree -> blorp_surf translations are handled by that function. 
It's a perfectly reasonable place to handle these things.  It could also be handled in genX(blorp_exec) if that makes someone more comfortable.
This is where I placed the ASTC enumeration setting, in genX(blorp_exec). I added a boolean signaling if the caller to blorp believed it would sample from an ASTC, if it did it sets the enum as ASTC otherwise as AUX.

I confess I am not super familiar with blorp, but as far as I can tell, the only way for blorp to read an ASTC is for GetTexImage/GetTexSubImage since an ASTC surface cannot be used for an FBO.
The problem is that a single invalidate doesn't actually cause a synchronization point in the rendering operation.  All it does is torch the texture cache and it
does so immediately and in parallel with currently active rendering.  If you can't have ASTC5x5 in the texture cache with a CCS_E surface, then we need to
do a CS stall to ensure that the previous draw is complete before we invalidate.  Otherwise, if the draw with ASTC5x5 is still in-flight, the texture cache will
immediately start to get populated with ASTC5x5 data again.
Ahh futz, I forget to add that stall.. by luck the guilty application worked anyways (when I first wrote the work I did intel_batchbuffer_flush() and relaxed it down to texture invalidate).
Jason Ekstrand
2017-12-05 18:48:17 UTC
Reply
Permalink
Raw Message
Hi,
Post by Jason Ekstrand
This isn't true. 100% of the intel_mipmap_tree -> blorp_surf
translations are handled by that function.
Post by Jason Ekstrand
It's a perfectly reasonable place to handle these things. It could
also be handled in genX(blorp_exec) if that makes someone more comfortable.
This is where I placed the ASTC enumeration setting, in genX(blorp_exec).
I added a boolean signaling if the caller to blorp believed it would sample
from an ASTC, if it did it sets the enum as ASTC otherwise as AUX.
I confess I am not super familiar with blorp, but as far as I can tell,
the only way for blorp to read an ASTC is for GetTexImage/GetTexSubImage
since an ASTC surface cannot be used for an FBO.
I'd have to think about it harder but even those are not likely to actually
get ASTC decode. Maybe for some sort of decompression thing but if you're
doing GetCompressedTexImage, it'll probably turn into a blorp_copy which
will use R32G32B32A32_UINT.
Post by Jason Ekstrand
The problem is that a single invalidate doesn't actually cause a
synchronization point in the rendering operation. All it does is torch the
texture cache and it
Post by Jason Ekstrand
does so immediately and in parallel with currently active rendering. If
you can't have ASTC5x5 in the texture cache with a CCS_E surface, then we
need to
Post by Jason Ekstrand
do a CS stall to ensure that the previous draw is complete before we
invalidate. Otherwise, if the draw with ASTC5x5 is still in-flight, the
texture cache will
Post by Jason Ekstrand
immediately start to get populated with ASTC5x5 data again.
Ahh futz, I forget to add that stall.. by luck the guilty application
worked anyways (when I first wrote the work I did intel_batchbuffer_flush()
and relaxed it down to texture invalidate).
Rogovin, Kevin
2017-12-07 07:45:29 UTC
Reply
Permalink
Raw Message
I'd have to think about it harder but even those are not likely to actually get ASTC decode.  Maybe for some sort of decompression thing but if you're doing
GetCompressedTexImage, it'll probably turn into a blorp_copy which will use R32G32B32A32_UINT.
I am thinking for the case where an application calls glGetTexImage() or glGetTextureImage(), which I think is legal in GL and the GL implementation is expected to do the decompress.

-Kevin
 

Rogovin, Kevin
2017-12-05 18:25:42 UTC
Reply
Permalink
Raw Message
Post by Pohjolainen, Topi
Are you saying that this bug extends over hardware context?
Different HW contexts imply different execbuffer2 ioctl's. The kernel inserts a full blown flush of everything after (or before, I cannot remember) each execbuffer2 call. This way there is context isolation in HW buggineness.

-Kevin
Loading...