Discussion:
[Mesa-dev] [PATCH] i965: Simplify MOCS mashing in genX_state_upload.c.
Kenneth Graunke
2017-08-23 22:57:37 UTC
Permalink
Instead of having a proliferation of generation checks and MOCS values,
we can just #define MOCS_ALL to the generation-specific value for "use
as many caches as possible" and use that in various places.

This should make it easier to change MOCS values, as there are fewer
places that need updating.
---
src/mesa/drivers/dri/i965/genX_state_upload.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/src/mesa/drivers/dri/i965/genX_state_upload.c
index f1e9fa38ffc..f2bbe4e9897 100644
--- a/src/mesa/drivers/dri/i965/genX_state_upload.c
+++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
@@ -143,6 +143,16 @@ KSP(struct brw_context *brw, uint32_t offset)
}
#endif

+#if GEN_GEN == 10
+#define MOCS_ALL CNL_MOCS_WB
+#elif GEN_GEN == 9
+#define MOCS_ALL SKL_MOCS_WB
+#elif GEN_GEN == 8
+#define MOCS_ALL BDW_MOCS_WB
+#elif GEN_GEN == 7
+#define MOCS_ALL GEN7_MOCS_L3
+#endif
+
#include "genxml/genX_pack.h"

#define _brw_cmd_length(cmd) cmd ## _length
@@ -323,6 +333,7 @@ genX(emit_vertex_buffer_state)(struct brw_context *brw,

#if GEN_GEN >= 7
.AddressModifyEnable = true,
+ .VertexBufferMOCS = MOCS_ALL,
#endif

#if GEN_GEN < 8
@@ -331,16 +342,6 @@ genX(emit_vertex_buffer_state)(struct brw_context *brw,
#if GEN_GEN >= 5
.EndAddress = ro_bo(bo, end_offset - 1),
#endif
-#endif
-
-#if GEN_GEN == 10
- .VertexBufferMOCS = CNL_MOCS_WB,
-#elif GEN_GEN == 9
- .VertexBufferMOCS = SKL_MOCS_WB,
-#elif GEN_GEN == 8
- .VertexBufferMOCS = BDW_MOCS_WB,
-#elif GEN_GEN == 7
- .VertexBufferMOCS = GEN7_MOCS_L3,
#endif
};

@@ -847,7 +848,7 @@ genX(emit_index_buffer)(struct brw_context *brw)
ib.IndexFormat = brw_get_index_type(index_buffer->index_size);
ib.BufferStartingAddress = ro_bo(brw->ib.bo, 0);
#if GEN_GEN >= 8
- ib.IndexBufferMOCS = GEN_GEN >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB;
+ ib.IndexBufferMOCS = MOCS_ALL;
ib.BufferSize = brw->ib.size;
#else
ib.BufferEndingAddress = ro_bo(brw->ib.bo, brw->ib.size - 1);
@@ -3599,7 +3600,6 @@ genX(upload_3dstate_so_buffers)(struct brw_context *brw)
#else
struct brw_transform_feedback_object *brw_obj =
(struct brw_transform_feedback_object *) xfb_obj;
- uint32_t mocs_wb = GEN_GEN >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB;
#endif

/* Set up the up to 4 output buffers. These are the ranges defined in the
@@ -3634,7 +3634,7 @@ genX(upload_3dstate_so_buffers)(struct brw_context *brw)
sob.SOBufferEnable = true;
sob.StreamOffsetWriteEnable = true;
sob.StreamOutputBufferOffsetAddressEnable = true;
- sob.SOBufferMOCS = mocs_wb;
+ sob.SOBufferMOCS = MOCS_ALL;

sob.SurfaceSize = MAX2(xfb_obj->Size[i] / 4, 1) - 1;
sob.StreamOutputBufferOffsetAddress =
--
2.14.1
Lionel Landwerlin
2017-08-24 11:04:26 UTC
Permalink
Looks good, but it looks like you could replace an additional one in
upload_push_constant_packets().
Also why not name it GEN_MOCS ? (so it's a bit more consistent with
other macros defined per gen).

Thanks!
Post by Kenneth Graunke
Instead of having a proliferation of generation checks and MOCS values,
we can just #define MOCS_ALL to the generation-specific value for "use
as many caches as possible" and use that in various places.
This should make it easier to change MOCS values, as there are fewer
places that need updating.
---
src/mesa/drivers/dri/i965/genX_state_upload.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/src/mesa/drivers/dri/i965/genX_state_upload.c
index f1e9fa38ffc..f2bbe4e9897 100644
--- a/src/mesa/drivers/dri/i965/genX_state_upload.c
+++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
@@ -143,6 +143,16 @@ KSP(struct brw_context *brw, uint32_t offset)
}
#endif
+#if GEN_GEN == 10
+#define MOCS_ALL CNL_MOCS_WB
+#elif GEN_GEN == 9
+#define MOCS_ALL SKL_MOCS_WB
+#elif GEN_GEN == 8
+#define MOCS_ALL BDW_MOCS_WB
+#elif GEN_GEN == 7
+#define MOCS_ALL GEN7_MOCS_L3
+#endif
+
#include "genxml/genX_pack.h"
#define _brw_cmd_length(cmd) cmd ## _length
@@ -323,6 +333,7 @@ genX(emit_vertex_buffer_state)(struct brw_context *brw,
#if GEN_GEN >= 7
.AddressModifyEnable = true,
+ .VertexBufferMOCS = MOCS_ALL,
#endif
#if GEN_GEN < 8
@@ -331,16 +342,6 @@ genX(emit_vertex_buffer_state)(struct brw_context *brw,
#if GEN_GEN >= 5
.EndAddress = ro_bo(bo, end_offset - 1),
#endif
-#endif
-
-#if GEN_GEN == 10
- .VertexBufferMOCS = CNL_MOCS_WB,
-#elif GEN_GEN == 9
- .VertexBufferMOCS = SKL_MOCS_WB,
-#elif GEN_GEN == 8
- .VertexBufferMOCS = BDW_MOCS_WB,
-#elif GEN_GEN == 7
- .VertexBufferMOCS = GEN7_MOCS_L3,
#endif
};
@@ -847,7 +848,7 @@ genX(emit_index_buffer)(struct brw_context *brw)
ib.IndexFormat = brw_get_index_type(index_buffer->index_size);
ib.BufferStartingAddress = ro_bo(brw->ib.bo, 0);
#if GEN_GEN >= 8
- ib.IndexBufferMOCS = GEN_GEN >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB;
+ ib.IndexBufferMOCS = MOCS_ALL;
ib.BufferSize = brw->ib.size;
#else
ib.BufferEndingAddress = ro_bo(brw->ib.bo, brw->ib.size - 1);
@@ -3599,7 +3600,6 @@ genX(upload_3dstate_so_buffers)(struct brw_context *brw)
#else
struct brw_transform_feedback_object *brw_obj =
(struct brw_transform_feedback_object *) xfb_obj;
- uint32_t mocs_wb = GEN_GEN >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB;
#endif
/* Set up the up to 4 output buffers. These are the ranges defined in the
@@ -3634,7 +3634,7 @@ genX(upload_3dstate_so_buffers)(struct brw_context *brw)
sob.SOBufferEnable = true;
sob.StreamOffsetWriteEnable = true;
sob.StreamOutputBufferOffsetAddressEnable = true;
- sob.SOBufferMOCS = mocs_wb;
+ sob.SOBufferMOCS = MOCS_ALL;
sob.SurfaceSize = MAX2(xfb_obj->Size[i] / 4, 1) - 1;
sob.StreamOutputBufferOffsetAddress =
Kenneth Graunke
2017-08-24 15:15:37 UTC
Permalink
Post by Lionel Landwerlin
Looks good, but it looks like you could replace an additional one in
upload_push_constant_packets().
That one is a bit weird - it uses 0 on Gen8+. I've wondered about that,
actually - the docs claim that you must use 0 - but at least on Skylake,
0 is an entry in the table that means uncached. So is the requirement
that the bits be 0, or the requirement that you bypass caching?

Things we'll never know I guess. I'm not sure if it matters, though,
since it's just pulling the data into a segment of the L3 anyway...so
it's only read one time...

At any rate, I left it open coded because it's different than the others.
Post by Lionel Landwerlin
Also why not name it GEN_MOCS ? (so it's a bit more consistent with
other macros defined per gen).
Thanks!
I like this. Changed locally.
Lionel Landwerlin
2017-08-24 15:22:01 UTC
Permalink
Post by Kenneth Graunke
Post by Lionel Landwerlin
Looks good, but it looks like you could replace an additional one in
upload_push_constant_packets().
That one is a bit weird - it uses 0 on Gen8+. I've wondered about that,
actually - the docs claim that you must use 0 - but at least on Skylake,
0 is an entry in the table that means uncached. So is the requirement
that the bits be 0, or the requirement that you bypass caching?
Things we'll never know I guess. I'm not sure if it matters, though,
since it's just pulling the data into a segment of the L3 anyway...so
it's only read one time...
At any rate, I left it open coded because it's different than the others.
Post by Lionel Landwerlin
Also why not name it GEN_MOCS ? (so it's a bit more consistent with
other macros defined per gen).
Thanks!
I like this. Changed locally.
Reviewed-by: Lionel Landwerlin <***@intel.com>

Loading...