Discussion:
[RFC] r600/evergreen compute shader + glsl 4.30 support
Add Reply
Dave Airlie
2017-11-29 04:36:09 UTC
Reply
Permalink
Raw Message
This set of patches enables compute shaders on r600 and exposes GLSL 4.30
support. They are pretty alpha level, but I'd like to land some of them
(maybe disabled) so I can avoid the rebasing fun with the more intrusive
ones.

It is based on the previous ssbo support patch.

It may not be stable, I have a few patches sitting on top locally
for flushing various things I want to figure out if they are required or
if I can fix things properly.

It for some reason fails to launch compute on cayman and hangs instead,
I've got some traces from fglrx, just need to take time to work out what
crashes, I've tested it on CAICOS mostly.

It also along with robustness I think lets us expose GLES3.1 so we get
deqp to test stuff. (deqp *compute* is at about 90% passing)

Dave.
Dave Airlie
2017-11-29 04:36:10 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

Just reuse the cs atomics bit and emit the hw atomic state.
---
src/mesa/state_tracker/st_atom_atomicbuf.c | 4 ++++
src/mesa/state_tracker/st_context.c | 2 +-
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_atom_atomicbuf.c b/src/mesa/state_tracker/st_atom_atomicbuf.c
index d01c227..eda9e51 100644
--- a/src/mesa/state_tracker/st_atom_atomicbuf.c
+++ b/src/mesa/state_tracker/st_atom_atomicbuf.c
@@ -123,6 +123,10 @@ st_bind_tes_atomics(struct st_context *st)
void
st_bind_cs_atomics(struct st_context *st)
{
+ if (st->has_hw_atomics) {
+ st_bind_hw_atomic_buffers(st);
+ return;
+ }
struct gl_program *prog =
st->ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE];

diff --git a/src/mesa/state_tracker/st_context.c b/src/mesa/state_tracker/st_context.c
index da1cca4..7564a53 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -302,7 +302,7 @@ st_init_driver_flags(struct st_context *st)
/* Shader resources */
f->NewTextureBuffer = ST_NEW_SAMPLER_VIEWS;
if (st->has_hw_atomics)
- f->NewAtomicBuffer = ST_NEW_HW_ATOMICS;
+ f->NewAtomicBuffer = ST_NEW_HW_ATOMICS | ST_NEW_CS_ATOMICS;
else
f->NewAtomicBuffer = ST_NEW_ATOMIC_BUFFER;
f->NewShaderStorageBuffer = ST_NEW_STORAGE_BUFFER;
--
2.9.5
Nicolai Hähnle
2017-11-30 15:22:03 UTC
Reply
Permalink
Raw Message
Post by Dave Airlie
Just reuse the cs atomics bit and emit the hw atomic state.
---
src/mesa/state_tracker/st_atom_atomicbuf.c | 4 ++++
src/mesa/state_tracker/st_context.c | 2 +-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/mesa/state_tracker/st_atom_atomicbuf.c b/src/mesa/state_tracker/st_atom_atomicbuf.c
index d01c227..eda9e51 100644
--- a/src/mesa/state_tracker/st_atom_atomicbuf.c
+++ b/src/mesa/state_tracker/st_atom_atomicbuf.c
@@ -123,6 +123,10 @@ st_bind_tes_atomics(struct st_context *st)
void
st_bind_cs_atomics(struct st_context *st)
{
+ if (st->has_hw_atomics) {
+ st_bind_hw_atomic_buffers(st);
+ return;
+ }
struct gl_program *prog =
st->ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE];
diff --git a/src/mesa/state_tracker/st_context.c b/src/mesa/state_tracker/st_context.c
index da1cca4..7564a53 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -302,7 +302,7 @@ st_init_driver_flags(struct st_context *st)
/* Shader resources */
f->NewTextureBuffer = ST_NEW_SAMPLER_VIEWS;
if (st->has_hw_atomics)
- f->NewAtomicBuffer = ST_NEW_HW_ATOMICS;
+ f->NewAtomicBuffer = ST_NEW_HW_ATOMICS | ST_NEW_CS_ATOMICS;
else
f->NewAtomicBuffer = ST_NEW_ATOMIC_BUFFER;
f->NewShaderStorageBuffer = ST_NEW_STORAGE_BUFFER;
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
Dave Airlie
2017-11-29 04:36:11 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

This just lets us see packets marked for compute.

Signed-off-by: Dave Airlie <***@redhat.com>
---
src/gallium/drivers/r600/eg_debug.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/eg_debug.c b/src/gallium/drivers/r600/eg_debug.c
index 43c4f41..ceb7c16 100644
--- a/src/gallium/drivers/r600/eg_debug.c
+++ b/src/gallium/drivers/r600/eg_debug.c
@@ -148,6 +148,7 @@ static uint32_t *ac_parse_packet3(FILE *f, uint32_t *ib, int *num_dw,
unsigned count = PKT_COUNT_G(ib[0]);
unsigned op = PKT3_IT_OPCODE_G(ib[0]);
const char *predicate = PKT3_PREDICATE(ib[0]) ? "(predicate)" : "";
+ const char *compute_mode = (ib[0] & 0x2) ? "(C)" : "";
int i;

/* Print the name first. */
@@ -162,14 +163,14 @@ static uint32_t *ac_parse_packet3(FILE *f, uint32_t *ib, int *num_dw,
op == PKT3_SET_CONFIG_REG ||
op == PKT3_SET_UCONFIG_REG ||
op == PKT3_SET_SH_REG)
- fprintf(f, COLOR_CYAN "%s%s" COLOR_CYAN ":\n",
- name, predicate);
+ fprintf(f, COLOR_CYAN "%s%s%s" COLOR_CYAN ":\n",
+ name, compute_mode, predicate);
else
- fprintf(f, COLOR_GREEN "%s%s" COLOR_RESET ":\n",
- name, predicate);
+ fprintf(f, COLOR_GREEN "%s%s%s" COLOR_RESET ":\n",
+ name, compute_mode, predicate);
} else
- fprintf(f, COLOR_RED "PKT3_UNKNOWN 0x%x%s" COLOR_RESET ":\n",
- op, predicate);
+ fprintf(f, COLOR_RED "PKT3_UNKNOWN 0x%x%s%s" COLOR_RESET ":\n",
+ op, compute_mode, predicate);

/* Print the contents. */
switch (op) {
--
2.9.5
Dave Airlie
2017-11-29 04:36:16 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

---
src/gallium/drivers/r600/r600_state_common.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c
index b6a4728..e312b33 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -820,6 +820,8 @@ static inline void r600_shader_selector_key(const struct pipe_context *ctx,
key->tcs.prim_mode = rctx->tes_shader->info.properties[TGSI_PROPERTY_TES_PRIM_MODE];
key->tcs.first_atomic_counter = r600_get_hw_atomic_count(ctx, PIPE_SHADER_TESS_CTRL);
break;
+ case PIPE_SHADER_COMPUTE:
+ break;
default:
assert(0);
}
--
2.9.5
Dave Airlie
2017-11-29 04:36:14 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

---
src/gallium/drivers/r600/evergreen_compute.c | 13 -------------
1 file changed, 13 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
index 7831b43..ff51ea3 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -898,11 +898,6 @@ void evergreen_init_atom_start_compute_cs(struct r600_context *rctx)
r600_init_command_buffer(cb, 256);
cb->pkt_flags = RADEON_CP_PACKET3_COMPUTE_MODE;

- /* This must be first. */
- r600_store_value(cb, PKT3(PKT3_CONTEXT_CONTROL, 1, 0));
- r600_store_value(cb, 0x80000000);
- r600_store_value(cb, 0x80000000);
-
/* We're setting config registers here. */
r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0));
r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_CS_PARTIAL_FLUSH) | EVENT_INDEX(4));
@@ -952,14 +947,6 @@ void evergreen_init_atom_start_compute_cs(struct r600_context *rctx)
break;
}

- /* Config Registers */
- if (rctx->b.chip_class < CAYMAN)
- evergreen_init_common_regs(rctx, cb, rctx->b.chip_class, rctx->b.family,
- rctx->screen->b.info.drm_minor);
- else
- cayman_init_common_regs(cb, rctx->b.chip_class, rctx->b.family,
- rctx->screen->b.info.drm_minor);
-
/* The primitive type always needs to be POINTLIST for compute. */
r600_store_config_reg(cb, R_008958_VGT_PRIMITIVE_TYPE,
V_008958_DI_PT_POINTLIST);
--
2.9.5
Dave Airlie
2017-11-29 04:36:13 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

This appears to bad, compute shaders hang without it.
---
src/gallium/drivers/r600/r600_shader.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c
index 41af0f5..e72215f 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -3839,7 +3839,7 @@ static int r600_shader_from_tgsi(struct r600_context *rctx,
last = r600_isa_cf(ctx.bc->cf_last->op);

/* alu clause instructions don't have EOP bit, so add NOP */
- if (!last || last->flags & CF_ALU)
+ if (!last || last->flags & CF_ALU || ctx.bc->cf_last->op == CF_OP_LOOP_END || ctx.bc->cf_last->op == CF_OP_POP)
r600_bytecode_add_cfinst(ctx.bc, CF_OP_NOP);

ctx.bc->cf_last->end_of_program = 1;
--
2.9.5
Dave Airlie
2017-11-29 04:36:18 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

---
src/gallium/drivers/r600/r600_shader.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c
index 83b70b0..b3c29b9 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -190,6 +190,7 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
}
use_sb &= (shader->shader.processor_type != PIPE_SHADER_TESS_CTRL);
use_sb &= (shader->shader.processor_type != PIPE_SHADER_TESS_EVAL);
+ use_sb &= (shader->shader.processor_type != PIPE_SHADER_COMPUTE);

/* disable SB for shaders using doubles */
use_sb &= !shader->shader.uses_doubles;
@@ -279,6 +280,9 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
r600_update_ps_state(ctx, shader);
}
break;
+ case PIPE_SHADER_COMPUTE:
+ evergreen_update_ls_state(ctx, shader);
+ break;
default:
r = -EINVAL;
goto error;
@@ -1361,6 +1365,10 @@ static void tgsi_src(struct r600_shader_ctx *ctx,
r600_src->swizzle[2] = 0;
r600_src->swizzle[3] = 0;
r600_src->sel = 0;
+ } else if (ctx->info.system_value_semantic_name[tgsi_src->Register.Index] == TGSI_SEMANTIC_THREAD_ID) {
+ r600_src->sel = 0;
+ } else if (ctx->info.system_value_semantic_name[tgsi_src->Register.Index] == TGSI_SEMANTIC_BLOCK_ID) {
+ r600_src->sel = 1;
} else if (ctx->type != PIPE_SHADER_TESS_CTRL && ctx->info.system_value_semantic_name[tgsi_src->Register.Index] == TGSI_SEMANTIC_INVOCATIONID) {
r600_src->swizzle[0] = 3;
r600_src->swizzle[1] = 3;
@@ -3109,6 +3117,10 @@ static int r600_shader_from_tgsi(struct r600_context *rctx,
shader->rat_base = key.ps.nr_cbufs;
shader->image_size_const_offset = key.ps.image_size_const_offset;
break;
+ case PIPE_SHADER_COMPUTE:
+ shader->rat_base = 0;
+ shader->image_size_const_offset = 0;
+ break;
default:
break;
}
@@ -3193,6 +3205,8 @@ static int r600_shader_from_tgsi(struct r600_context *rctx,
if (add_tess_inout)
ctx.file_offset[TGSI_FILE_INPUT]+=2;
}
+ if (ctx.type == PIPE_SHADER_COMPUTE)
+ ctx.file_offset[TGSI_FILE_INPUT] = 2;

ctx.file_offset[TGSI_FILE_OUTPUT] =
ctx.file_offset[TGSI_FILE_INPUT] +
--
2.9.5
Dave Airlie
2017-11-29 04:36:19 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

---
src/gallium/drivers/r600/r600_pipe_common.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_pipe_common.c b/src/gallium/drivers/r600/r600_pipe_common.c
index 23f7d74..b611783 100644
--- a/src/gallium/drivers/r600/r600_pipe_common.c
+++ b/src/gallium/drivers/r600/r600_pipe_common.c
@@ -993,6 +993,10 @@ const char *r600_get_llvm_processor_name(enum radeon_family family)
static unsigned get_max_threads_per_block(struct r600_common_screen *screen,
enum pipe_shader_ir ir_type)
{
+ if (ir_type != PIPE_SHADER_IR_TGSI)
+ return 256;
+ if (screen->chip_class >= EVERGREEN)
+ return 2048;
return 256;
}
--
2.9.5
Dave Airlie
2017-11-29 04:36:15 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

This just adds the compute paths to state handling for the
main objects
---
src/gallium/drivers/r600/evergreen_state.c | 79 ++++++++++++++++++++++------
src/gallium/drivers/r600/r600_hw_context.c | 2 +
src/gallium/drivers/r600/r600_pipe.h | 6 ++-
src/gallium/drivers/r600/r600_state_common.c | 4 +-
4 files changed, 72 insertions(+), 19 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c
index 4a5c1aa..fe4892a 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1683,14 +1683,13 @@ static void evergreen_emit_msaa_state(struct r600_context *rctx, int nr_samples,
}

static void evergreen_emit_image_state(struct r600_context *rctx, struct r600_atom *atom,
- int immed_id_base, int res_id_base, int offset)
+ int immed_id_base, int res_id_base, int offset, uint32_t pkt_flags)
{
struct r600_image_state *state = (struct r600_image_state *)atom;
struct pipe_framebuffer_state *fb_state = &rctx->framebuffer.state;
struct radeon_winsys_cs *cs = rctx->b.gfx.cs;
struct r600_texture *rtex;
struct r600_resource *resource;
- uint32_t pkt_flags = 0;
int i;

for (i = 0; i < R600_MAX_IMAGES; i++) {
@@ -1698,7 +1697,8 @@ static void evergreen_emit_image_state(struct r600_context *rctx, struct r600_at
unsigned reloc, immed_reloc;
int idx = i + offset;

- idx += fb_state->nr_cbufs + (rctx->dual_src_blend ? 1 : 0);
+ if (!pkt_flags)
+ idx += fb_state->nr_cbufs + (rctx->dual_src_blend ? 1 : 0);
if (!image->base.resource)
continue;

@@ -1720,7 +1720,10 @@ static void evergreen_emit_image_state(struct r600_context *rctx, struct r600_at
RADEON_USAGE_READWRITE,
RADEON_PRIO_SHADER_RW_BUFFER);

- radeon_set_context_reg_seq(cs, R_028C60_CB_COLOR0_BASE + idx * 0x3C, 13);
+ if (pkt_flags)
+ radeon_compute_set_context_reg_seq(cs, R_028C60_CB_COLOR0_BASE + idx * 0x3C, 13);
+ else
+ radeon_set_context_reg_seq(cs, R_028C60_CB_COLOR0_BASE + idx * 0x3C, 13);

radeon_emit(cs, image->cb_color_base); /* R_028C60_CB_COLOR0_BASE */
radeon_emit(cs, image->cb_color_pitch); /* R_028C64_CB_COLOR0_PITCH */
@@ -1748,7 +1751,11 @@ static void evergreen_emit_image_state(struct r600_context *rctx, struct r600_at
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0)); /* R_028C84_CB_COLOR0_FMASK */
radeon_emit(cs, reloc);

- radeon_set_context_reg(cs, R_028B9C_CB_IMMED0_BASE + (idx * 4), resource->immed_buffer->gpu_address >> 8);
+ if (pkt_flags)
+ radeon_compute_set_context_reg(cs, R_028B9C_CB_IMMED0_BASE + (idx * 4), resource->immed_buffer->gpu_address >> 8);
+ else
+ radeon_set_context_reg(cs, R_028B9C_CB_IMMED0_BASE + (idx * 4), resource->immed_buffer->gpu_address >> 8);
+
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0)); /**/
radeon_emit(cs, immed_reloc);

@@ -1777,7 +1784,15 @@ static void evergreen_emit_fragment_image_state(struct r600_context *rctx, struc
{
evergreen_emit_image_state(rctx, atom,
R600_IMAGE_IMMED_RESOURCE_OFFSET,
- R600_IMAGE_REAL_RESOURCE_OFFSET, 0);
+ R600_IMAGE_REAL_RESOURCE_OFFSET, 0, 0);
+}
+
+static void evergreen_emit_compute_image_state(struct r600_context *rctx, struct r600_atom *atom)
+{
+ evergreen_emit_image_state(rctx, atom,
+ EG_FETCH_CONSTANTS_OFFSET_CS + R600_IMAGE_IMMED_RESOURCE_OFFSET,
+ EG_FETCH_CONSTANTS_OFFSET_CS + R600_IMAGE_REAL_RESOURCE_OFFSET,
+ 0, RADEON_CP_PACKET3_COMPUTE_MODE);
}

static void evergreen_emit_fragment_buffer_state(struct r600_context *rctx, struct r600_atom *atom)
@@ -1785,7 +1800,16 @@ static void evergreen_emit_fragment_buffer_state(struct r600_context *rctx, stru
int offset = util_bitcount(rctx->fragment_images.enabled_mask);
evergreen_emit_image_state(rctx, atom,
R600_IMAGE_IMMED_RESOURCE_OFFSET,
- R600_IMAGE_REAL_RESOURCE_OFFSET, offset);
+ R600_IMAGE_REAL_RESOURCE_OFFSET, offset, 0);
+}
+
+static void evergreen_emit_compute_buffer_state(struct r600_context *rctx, struct r600_atom *atom)
+{
+ int offset = util_bitcount(rctx->compute_images.enabled_mask);
+ evergreen_emit_image_state(rctx, atom,
+ EG_FETCH_CONSTANTS_OFFSET_CS + R600_IMAGE_IMMED_RESOURCE_OFFSET,
+ EG_FETCH_CONSTANTS_OFFSET_CS + R600_IMAGE_REAL_RESOURCE_OFFSET,
+ offset, RADEON_CP_PACKET3_COMPUTE_MODE);
}

static void evergreen_emit_framebuffer_state(struct r600_context *rctx, struct r600_atom *atom)
@@ -2323,7 +2347,7 @@ static void evergreen_emit_ps_sampler_views(struct r600_context *rctx, struct r6
static void evergreen_emit_cs_sampler_views(struct r600_context *rctx, struct r600_atom *atom)
{
evergreen_emit_sampler_views(rctx, &rctx->samplers[PIPE_SHADER_COMPUTE].views,
- EG_FETCH_CONSTANTS_OFFSET_CS + 2, RADEON_CP_PACKET3_COMPUTE_MODE);
+ EG_FETCH_CONSTANTS_OFFSET_CS + R600_MAX_CONST_BUFFERS, RADEON_CP_PACKET3_COMPUTE_MODE);
}

static void evergreen_emit_sampler_states(struct r600_context *rctx,
@@ -3900,11 +3924,14 @@ static void evergreen_set_shader_buffers(struct pipe_context *ctx,
unsigned old_mask;
bool skip_reloc = false;

- if (shader != PIPE_SHADER_FRAGMENT && count == 0)
+ if (shader != PIPE_SHADER_FRAGMENT &&
+ shader != PIPE_SHADER_COMPUTE && count == 0)
return;

- assert(shader == PIPE_SHADER_FRAGMENT);
- istate = &rctx->fragment_buffers;
+ if (shader == PIPE_SHADER_FRAGMENT)
+ istate = &rctx->fragment_buffers;
+ else if (shader == PIPE_SHADER_COMPUTE)
+ istate = &rctx->compute_buffers;

old_mask = istate->enabled_mask;
for (i = start_slot, idx = 0; i < start_slot + count; i++, idx++) {
@@ -4012,12 +4039,16 @@ static void evergreen_set_shader_images(struct pipe_context *ctx,
bool skip_reloc = false;
struct r600_image_state *istate = NULL;
int idx;
- if (shader != PIPE_SHADER_FRAGMENT && count == 0)
+ if (shader != PIPE_SHADER_FRAGMENT && shader != PIPE_SHADER_COMPUTE && count == 0)
return;

- istate = &rctx->fragment_images;
+ if (shader == PIPE_SHADER_FRAGMENT)
+ istate = &rctx->fragment_images;
+ else if (shader == PIPE_SHADER_COMPUTE)
+ istate = &rctx->compute_images;
+
+ assert (shader == PIPE_SHADER_FRAGMENT || shader == PIPE_SHADER_COMPUTE);

- assert (shader == PIPE_SHADER_FRAGMENT);
old_mask = istate->enabled_mask;
for (i = start_slot, idx = 0; i < start_slot + count; i++, idx++) {
unsigned res_type;
@@ -4202,7 +4233,9 @@ void evergreen_init_state_functions(struct r600_context *rctx)
}
r600_init_atom(rctx, &rctx->framebuffer.atom, id++, evergreen_emit_framebuffer_state, 0);
r600_init_atom(rctx, &rctx->fragment_images.atom, id++, evergreen_emit_fragment_image_state, 0);
+ r600_init_atom(rctx, &rctx->compute_images.atom, id++, evergreen_emit_compute_image_state, 0);
r600_init_atom(rctx, &rctx->fragment_buffers.atom, id++, evergreen_emit_fragment_buffer_state, 0);
+ r600_init_atom(rctx, &rctx->compute_buffers.atom, id++, evergreen_emit_compute_buffer_state, 0);
/* shader const */
r600_init_atom(rctx, &rctx->constbuf_state[PIPE_SHADER_VERTEX].atom, id++, evergreen_emit_vs_constant_buffers, 0);
r600_init_atom(rctx, &rctx->constbuf_state[PIPE_SHADER_GEOMETRY].atom, id++, evergreen_emit_gs_constant_buffers, 0);
@@ -4581,6 +4614,7 @@ void eg_trace_emit(struct r600_context *rctx)
}

bool evergreen_emit_atomic_buffer_setup(struct r600_context *rctx,
+ struct r600_pipe_shader *cs_shader,
struct r600_shader_atomic *combined_atomics,
uint8_t *atomic_used_mask_p)
{
@@ -4589,12 +4623,19 @@ bool evergreen_emit_atomic_buffer_setup(struct r600_context *rctx,
unsigned pkt_flags = 0;
uint8_t atomic_used_mask = 0;
int i, j, k;
+ bool is_compute = cs_shader ? true : false;

- for (i = 0; i < EG_NUM_HW_STAGES; i++) {
+ if (is_compute)
+ pkt_flags = RADEON_CP_PACKET3_COMPUTE_MODE;
+
+ for (i = 0; i < (is_compute ? 1 : EG_NUM_HW_STAGES); i++) {
uint8_t num_atomic_stage;
struct r600_pipe_shader *pshader;

- pshader = rctx->hw_shader_stages[i].shader;
+ if (is_compute)
+ pshader = cs_shader;
+ else
+ pshader = rctx->hw_shader_stages[i].shader;
if (!pshader)
continue;

@@ -4647,6 +4688,7 @@ bool evergreen_emit_atomic_buffer_setup(struct r600_context *rctx,
}

void evergreen_emit_atomic_buffer_save(struct r600_context *rctx,
+ bool is_compute,
struct r600_shader_atomic *combined_atomics,
uint8_t *atomic_used_mask_p)
{
@@ -4658,6 +4700,11 @@ void evergreen_emit_atomic_buffer_save(struct r600_context *rctx,
uint64_t dst_offset;
unsigned reloc;

+ if (is_compute) {
+ pkt_flags = RADEON_CP_PACKET3_COMPUTE_MODE;
+ event = EVENT_TYPE_CS_DONE;
+ }
+
mask = *atomic_used_mask_p;
if (!mask)
return;
diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c
index d9e4123..a4b3b66 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -351,6 +351,8 @@ void r600_begin_new_cs(struct r600_context *ctx)
if (ctx->b.chip_class >= EVERGREEN) {
r600_mark_atom_dirty(ctx, &ctx->fragment_images.atom);
r600_mark_atom_dirty(ctx, &ctx->fragment_buffers.atom);
+ r600_mark_atom_dirty(ctx, &ctx->compute_images.atom);
+ r600_mark_atom_dirty(ctx, &ctx->compute_buffers.atom);
}
r600_mark_atom_dirty(ctx, &ctx->hw_shader_stages[R600_HW_STAGE_PS].atom);
r600_mark_atom_dirty(ctx, &ctx->poly_offset_state.atom);
diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h
index e54fada..711accc 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -38,7 +38,7 @@

#include "tgsi/tgsi_scan.h"

-#define R600_NUM_ATOMS 54
+#define R600_NUM_ATOMS 56

#define R600_MAX_IMAGES 8
/*
@@ -522,7 +522,9 @@ struct r600_context {
struct r600_atomic_buffer_state atomic_buffer_state;
/* only have images on fragment shader */
struct r600_image_state fragment_images;
+ struct r600_image_state compute_images;
struct r600_image_state fragment_buffers;
+ struct r600_image_state compute_buffers;
/* Shaders and shader resources. */
struct r600_cso_state vertex_fetch_shader;
struct r600_shader_state hw_shader_stages[EG_NUM_HW_STAGES];
@@ -1023,9 +1025,11 @@ void eg_dump_debug_state(struct pipe_context *ctx, FILE *f,

struct r600_shader_atomic;
bool evergreen_emit_atomic_buffer_setup(struct r600_context *rctx,
+ struct r600_pipe_shader *cs_shader,
struct r600_shader_atomic *combined_atomics,
uint8_t *atomic_used_mask_p);
void evergreen_emit_atomic_buffer_save(struct r600_context *rctx,
+ bool is_compute,
struct r600_shader_atomic *combined_atomics,
uint8_t *atomic_used_mask_p);

diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c
index 3b2f445..b6a4728 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1891,7 +1891,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info
: info->mode;

if (rctx->b.chip_class >= EVERGREEN)
- evergreen_emit_atomic_buffer_setup(rctx, combined_atomics, &atomic_used_mask);
+ evergreen_emit_atomic_buffer_setup(rctx, NULL, combined_atomics, &atomic_used_mask);

if (index_size) {
index_offset += info->start * index_size;
@@ -2175,7 +2175,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info


if (rctx->b.chip_class >= EVERGREEN)
- evergreen_emit_atomic_buffer_save(rctx, combined_atomics, &atomic_used_mask);
+ evergreen_emit_atomic_buffer_save(rctx, false, combined_atomics, &atomic_used_mask);

if (rctx->trace_buf)
eg_trace_emit(rctx);
--
2.9.5
Dave Airlie
2017-11-29 04:36:27 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

This adds support to compute for the resq workarounds (buffer/cube sizes)

Signed-off-by: Dave Airlie <***@redhat.com>
---
src/gallium/drivers/r600/evergreen_compute.c | 7 +++++++
src/gallium/drivers/r600/r600_pipe.h | 2 ++
src/gallium/drivers/r600/r600_state_common.c | 16 ++++++++++++----
3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
index c9e649e..cf86440 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -721,6 +721,13 @@ static void compute_emit_cs(struct r600_context *rctx,
r600_set_atom_dirty(rctx, &rctx->cs_shader_state.atom, true);
}

+ bool need_buf_const = current->shader.uses_tex_buffers ||
+ current->shader.has_txq_cube_array_z_comp;
+
+ if (need_buf_const) {
+ eg_setup_buffer_constants(rctx, PIPE_SHADER_COMPUTE);
+ r600_update_driver_const_buffers(rctx, true);
+ }
if (evergreen_emit_atomic_buffer_setup(rctx, current, combined_atomics, &atomic_used_mask)) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_CS_PARTIAL_FLUSH) | EVENT_INDEX(4));
diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h
index 4028d98..65d1185 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -1044,4 +1044,6 @@ void evergreen_emit_atomic_buffer_save(struct r600_context *rctx,
uint8_t *atomic_used_mask_p);
void r600_update_compressed_resource_state(struct r600_context *rctx, bool compute_only);

+void eg_setup_buffer_constants(struct r600_context *rctx, int shader_type);
+void r600_update_driver_const_buffers(struct r600_context *rctx, bool compute_only);
#endif
diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c
index 0464a8e..bddda6b 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1216,12 +1216,17 @@ static void r600_set_sample_mask(struct pipe_context *pipe, unsigned sample_mask
r600_mark_atom_dirty(rctx, &rctx->sample_mask.atom);
}

-static void r600_update_driver_const_buffers(struct r600_context *rctx)
+void r600_update_driver_const_buffers(struct r600_context *rctx, bool compute_only)
{
int sh, size;
void *ptr;
struct pipe_constant_buffer cb;
- for (sh = 0; sh < PIPE_SHADER_TYPES; sh++) {
+ int start, end;
+
+ start = compute_only ? PIPE_SHADER_COMPUTE : 0;
+ end = compute_only ? PIPE_SHADER_TYPES : PIPE_SHADER_COMPUTE;
+
+ for (sh = start; sh < end; sh++) {
struct r600_shader_driver_constants_info *info = &rctx->driver_consts[sh];
if (!info->vs_ucp_dirty &&
!info->texture_const_dirty &&
@@ -1341,7 +1346,7 @@ static void r600_setup_buffer_constants(struct r600_context *rctx, int shader_ty
* 1. buffer size for TXQ
* 2. number of cube layers in a cube map array.
*/
-static void eg_setup_buffer_constants(struct r600_context *rctx, int shader_type)
+void eg_setup_buffer_constants(struct r600_context *rctx, int shader_type)
{
struct r600_textures_info *samplers = &rctx->samplers[shader_type];
struct r600_image_state *images = NULL;
@@ -1355,6 +1360,9 @@ static void eg_setup_buffer_constants(struct r600_context *rctx, int shader_type
if (shader_type == PIPE_SHADER_FRAGMENT) {
images = &rctx->fragment_images;
buffers = &rctx->fragment_buffers;
+ } else if (shader_type == PIPE_SHADER_COMPUTE) {
+ images = &rctx->compute_images;
+ buffers = &rctx->compute_buffers;
}

if (!samplers->views.dirty_buffer_constants &&
@@ -1781,7 +1789,7 @@ static bool r600_update_derived_state(struct r600_context *rctx)
}
}

- r600_update_driver_const_buffers(rctx);
+ r600_update_driver_const_buffers(rctx, false);

if (rctx->b.chip_class < EVERGREEN && rctx->ps_shader && rctx->vs_shader) {
if (!r600_adjust_gprs(rctx)) {
--
2.9.5
Dave Airlie
2017-11-29 04:36:30 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

---
src/gallium/drivers/r600/r600_pipe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c
index b013d69..e285608 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -332,7 +332,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param)

case PIPE_CAP_GLSL_FEATURE_LEVEL:
if (family >= CHIP_CEDAR)
- return 420;
+ return 430;
/* pre-evergreen geom shaders need newer kernel */
if (rscreen->b.info.drm_minor >= 37)
return 330;
--
2.9.5
Dave Airlie
2017-11-29 04:36:21 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

This just moves some code around to make it easier to add compute.
---
src/gallium/drivers/r600/r600_pipe.h | 10 ++++++++++
src/gallium/drivers/r600/r600_state_common.c | 24 +++++++++++++++++-------
2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h
index 4af87e1..4028d98 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -1023,6 +1023,16 @@ void eg_trace_emit(struct r600_context *rctx);
void eg_dump_debug_state(struct pipe_context *ctx, FILE *f,
unsigned flags);

+struct r600_pipe_shader_selector *r600_create_shader_state_tokens(struct pipe_context *ctx,
+ const struct tgsi_token *tokens,
+ unsigned pipe_shader_type);
+int r600_shader_select(struct pipe_context *ctx,
+ struct r600_pipe_shader_selector* sel,
+ bool *dirty);
+
+void r600_delete_shader_selector(struct pipe_context *ctx,
+ struct r600_pipe_shader_selector *sel);
+
struct r600_shader_atomic;
bool evergreen_emit_atomic_buffer_setup(struct r600_context *rctx,
struct r600_pipe_shader *cs_shader,
diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c
index 7c09086..0464a8e 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -829,7 +829,7 @@ static inline void r600_shader_selector_key(const struct pipe_context *ctx,

/* Select the hw shader variant depending on the current state.
* (*dirty) is set to 1 if current variant was changed */
-static int r600_shader_select(struct pipe_context *ctx,
+int r600_shader_select(struct pipe_context *ctx,
struct r600_pipe_shader_selector* sel,
bool *dirty)
{
@@ -897,17 +897,27 @@ static int r600_shader_select(struct pipe_context *ctx,
return 0;
}

+struct r600_pipe_shader_selector *r600_create_shader_state_tokens(struct pipe_context *ctx,
+ const struct tgsi_token *tokens,
+ unsigned pipe_shader_type)
+{
+ struct r600_pipe_shader_selector *sel = CALLOC_STRUCT(r600_pipe_shader_selector);
+ int i;
+
+ sel->type = pipe_shader_type;
+ sel->tokens = tgsi_dup_tokens(tokens);
+ tgsi_scan_shader(tokens, &sel->info);
+ return sel;
+}
+
static void *r600_create_shader_state(struct pipe_context *ctx,
const struct pipe_shader_state *state,
unsigned pipe_shader_type)
{
- struct r600_pipe_shader_selector *sel = CALLOC_STRUCT(r600_pipe_shader_selector);
int i;
+ struct r600_pipe_shader_selector *sel = r600_create_shader_state_tokens(ctx, state->tokens, pipe_shader_type);

- sel->type = pipe_shader_type;
- sel->tokens = tgsi_dup_tokens(state->tokens);
sel->so = state->stream_output;
- tgsi_scan_shader(state->tokens, &sel->info);

switch (pipe_shader_type) {
case PIPE_SHADER_GEOMETRY:
@@ -1048,8 +1058,8 @@ static void r600_bind_tes_state(struct pipe_context *ctx, void *state)
rctx->b.streamout.stride_in_dw = rctx->tes_shader->so.stride;
}

-static void r600_delete_shader_selector(struct pipe_context *ctx,
- struct r600_pipe_shader_selector *sel)
+void r600_delete_shader_selector(struct pipe_context *ctx,
+ struct r600_pipe_shader_selector *sel)
{
struct r600_pipe_shader *p = sel->current, *c;
while (p) {
--
2.9.5
Dave Airlie
2017-11-29 04:36:24 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

---
src/gallium/drivers/r600/evergreen_compute.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
index b976b61..7df1c55 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -630,14 +630,26 @@ static void evergreen_emit_dispatch(struct r600_context *rctx,
radeon_compute_set_context_reg(cs, R_0288E8_SQ_LDS_ALLOC,
lds_size | (num_waves << 14));

- /* Dispatch packet */
- radeon_emit(cs, PKT3C(PKT3_DISPATCH_DIRECT, 3, 0));
- radeon_emit(cs, info->grid[0]);
- radeon_emit(cs, info->grid[1]);
- radeon_emit(cs, info->grid[2]);
- /* VGT_DISPATCH_INITIATOR = COMPUTE_SHADER_EN */
- radeon_emit(cs, 1);
-
+ if (info->indirect) {
+ struct r600_resource *resource = r600_resource(info->indirect);
+ unsigned reloc = radeon_add_to_buffer_list(&rctx->b, &rctx->b.gfx,
+ resource,
+ RADEON_USAGE_READ,
+ RADEON_PRIO_SHADER_RW_BUFFER);
+ radeon_emit(cs, PKT3C(PKT3_DISPATCH_INDIRECT, 1, 0));
+ radeon_emit(cs, resource->gpu_address + info->indirect_offset);
+ radeon_emit(cs, 1);
+ radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
+ radeon_emit(cs, reloc);
+ } else {
+ /* Dispatch packet */
+ radeon_emit(cs, PKT3C(PKT3_DISPATCH_DIRECT, 3, 0));
+ radeon_emit(cs, info->grid[0]);
+ radeon_emit(cs, info->grid[1]);
+ radeon_emit(cs, info->grid[2]);
+ /* VGT_DISPATCH_INITIATOR = COMPUTE_SHADER_EN */
+ radeon_emit(cs, 1);
+ }
if (rctx->is_debug)
eg_trace_emit(rctx);
}
--
2.9.5
Dave Airlie
2017-11-29 04:36:12 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

This just makes it easier to bypass for TGSI later.

Signed-off-by: Dave Airlie <***@redhat.com>
---
src/gallium/drivers/r600/evergreen_compute.c | 50 ++++++++++++++++------------
1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
index 48c4a9c..7831b43 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -615,31 +615,11 @@ static void evergreen_emit_dispatch(struct r600_context *rctx,
eg_trace_emit(rctx);
}

-static void compute_emit_cs(struct r600_context *rctx,
- const struct pipe_grid_info *info)
+static void compute_setup_cbs(struct r600_context *rctx)
{
struct radeon_winsys_cs *cs = rctx->b.gfx.cs;
unsigned i;

- /* make sure that the gfx ring is only one active */
- if (radeon_emitted(rctx->b.dma.cs, 0)) {
- rctx->b.dma.flush(rctx, RADEON_FLUSH_ASYNC, NULL);
- }
-
- /* Initialize all the compute-related registers.
- *
- * See evergreen_init_atom_start_compute_cs() in this file for the list
- * of registers initialized by the start_compute_cs_cmd atom.
- */
- r600_emit_command_buffer(cs, &rctx->start_compute_cs_cmd);
-
- /* emit config state */
- if (rctx->b.chip_class == EVERGREEN)
- r600_emit_atom(rctx, &rctx->config_state.atom);
-
- rctx->b.flags |= R600_CONTEXT_WAIT_3D_IDLE | R600_CONTEXT_FLUSH_AND_INV;
- r600_flush_emit(rctx);
-
/* Emit colorbuffers. */
/* XXX support more than 8 colorbuffers (the offsets are not a multiple of 0x3C for CB8-11) */
for (i = 0; i < 8 && i < rctx->framebuffer.state.nr_cbufs; i++) {
@@ -673,8 +653,34 @@ static void compute_emit_cs(struct r600_context *rctx,

/* Set CB_TARGET_MASK XXX: Use cb_misc_state */
radeon_compute_set_context_reg(cs, R_028238_CB_TARGET_MASK,
- rctx->compute_cb_target_mask);
+ rctx->compute_cb_target_mask);
+}
+
+static void compute_emit_cs(struct r600_context *rctx,
+ const struct pipe_grid_info *info)
+{
+ struct radeon_winsys_cs *cs = rctx->b.gfx.cs;
+
+ /* make sure that the gfx ring is only one active */
+ if (radeon_emitted(rctx->b.dma.cs, 0)) {
+ rctx->b.dma.flush(rctx, RADEON_FLUSH_ASYNC, NULL);
+ }
+
+ /* Initialize all the compute-related registers.
+ *
+ * See evergreen_init_atom_start_compute_cs() in this file for the list
+ * of registers initialized by the start_compute_cs_cmd atom.
+ */
+ r600_emit_command_buffer(cs, &rctx->start_compute_cs_cmd);
+
+ /* emit config state */
+ if (rctx->b.chip_class == EVERGREEN)
+ r600_emit_atom(rctx, &rctx->config_state.atom);
+
+ rctx->b.flags |= R600_CONTEXT_WAIT_3D_IDLE | R600_CONTEXT_FLUSH_AND_INV;
+ r600_flush_emit(rctx);

+ compute_setup_cbs(rctx);

/* Emit vertex buffer state */
rctx->cs_vertex_buffer_state.atom.num_dw = 12 * util_bitcount(rctx->cs_vertex_buffer_state.dirty_mask);
--
2.9.5
Dave Airlie
2017-11-29 04:36:26 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

---
src/gallium/drivers/r600/evergreen_compute.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
index b8e1c20..c9e649e 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -709,6 +709,8 @@ static void compute_emit_cs(struct r600_context *rctx,
rctx->b.dma.flush(rctx, RADEON_FLUSH_ASYNC, NULL);
}

+ r600_update_compressed_resource_state(rctx, true);
+
r600_need_cs_space(rctx, 0, true);
if (rctx->cs_shader_state.shader->ir_type == PIPE_SHADER_IR_TGSI) {
r600_shader_select(&rctx->b.b, rctx->cs_shader_state.shader->sel, &compute_dirty);
@@ -771,7 +773,13 @@ static void compute_emit_cs(struct r600_context *rctx,
/* Emit sampler view (texture resource) state */
r600_emit_atom(rctx, &rctx->samplers[PIPE_SHADER_COMPUTE].views.atom);

- /* Emit compute shader state */
+ /* Emit images state */
+ r600_emit_atom(rctx, &rctx->compute_images.atom);
+
+ /* Emit buffers state */
+ r600_emit_atom(rctx, &rctx->compute_buffers.atom);
+
+ /* Emit shader state */
r600_emit_atom(rctx, &rctx->cs_shader_state.atom);

/* Emit dispatch state and dispatch packet */
--
2.9.5
Dave Airlie
2017-11-29 04:36:22 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

This appears to cause hangs with compute images. Unless
we can find more specifics, just don't do this for now.

Signed-off-by: Dave Airlie <***@redhat.com>
---
src/gallium/drivers/r600/r600_texture.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c
index 07782ff..02d6505 100644
--- a/src/gallium/drivers/r600/r600_texture.c
+++ b/src/gallium/drivers/r600/r600_texture.c
@@ -1053,14 +1053,6 @@ r600_choose_tiling(struct r600_common_screen *rscreen,
if (templ->bind & PIPE_BIND_LINEAR)
return RADEON_SURF_MODE_LINEAR_ALIGNED;

- /* Textures with a very small height are recommended to be linear. */
- if (templ->target == PIPE_TEXTURE_1D ||
- templ->target == PIPE_TEXTURE_1D_ARRAY ||
- /* Only very thin and long 2D textures should benefit from
- * linear_aligned. */
- (templ->width0 > 8 && templ->height0 <= 2))
- return RADEON_SURF_MODE_LINEAR_ALIGNED;
-
/* Textures likely to be mapped often. */
if (templ->usage == PIPE_USAGE_STAGING ||
templ->usage == PIPE_USAGE_STREAM)
--
2.9.5
Dave Airlie
2017-11-29 04:36:20 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

This just adds support for decompressing compute resources.

Signed-off-by: Dave Airlie <***@redhat.com>
---
src/gallium/drivers/r600/r600_pipe.h | 1 +
src/gallium/drivers/r600/r600_state_common.c | 31 ++++++++++++++++++++++------
2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h
index 711accc..4af87e1 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -1032,5 +1032,6 @@ void evergreen_emit_atomic_buffer_save(struct r600_context *rctx,
bool is_compute,
struct r600_shader_atomic *combined_atomics,
uint8_t *atomic_used_mask_p);
+void r600_update_compressed_resource_state(struct r600_context *rctx, bool compute_only);

#endif
diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c
index e312b33..7c09086 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1526,7 +1526,7 @@ static void r600_generate_fixed_func_tcs(struct r600_context *rctx)
ureg_create_shader_and_destroy(ureg, &rctx->b.b);
}

-static void r600_update_compressed_resource_state(struct r600_context *rctx)
+void r600_update_compressed_resource_state(struct r600_context *rctx, bool compute_only)
{
unsigned i;
unsigned counter;
@@ -1535,15 +1535,25 @@ static void r600_update_compressed_resource_state(struct r600_context *rctx)
if (counter != rctx->b.last_compressed_colortex_counter) {
rctx->b.last_compressed_colortex_counter = counter;

- for (i = 0; i < PIPE_SHADER_TYPES; ++i) {
- r600_update_compressed_colortex_mask(&rctx->samplers[i].views);
+ if (compute_only) {
+ r600_update_compressed_colortex_mask(&rctx->samplers[PIPE_SHADER_COMPUTE].views);
+ } else {
+ for (i = 0; i < PIPE_SHADER_TYPES; ++i) {
+ r600_update_compressed_colortex_mask(&rctx->samplers[i].views);
+ }
}
- r600_update_compressed_colortex_mask_images(&rctx->fragment_images);
+ if (!compute_only)
+ r600_update_compressed_colortex_mask_images(&rctx->fragment_images);
+ r600_update_compressed_colortex_mask_images(&rctx->compute_images);
}

/* Decompress textures if needed. */
for (i = 0; i < PIPE_SHADER_TYPES; i++) {
struct r600_samplerview_state *views = &rctx->samplers[i].views;
+
+ if (compute_only)
+ if (i != PIPE_SHADER_COMPUTE)
+ continue;
if (views->compressed_depthtex_mask) {
r600_decompress_depth_textures(rctx, views);
}
@@ -1554,7 +1564,16 @@ static void r600_update_compressed_resource_state(struct r600_context *rctx)

{
struct r600_image_state *istate;
- istate = &rctx->fragment_images;
+
+ if (!compute_only) {
+ istate = &rctx->fragment_images;
+ if (istate->compressed_depthtex_mask)
+ r600_decompress_depth_images(rctx, istate);
+ if (istate->compressed_colortex_mask)
+ r600_decompress_color_images(rctx, istate);
+ }
+
+ istate = &rctx->compute_images;
if (istate->compressed_depthtex_mask)
r600_decompress_depth_images(rctx, istate);
if (istate->compressed_colortex_mask)
@@ -1603,7 +1622,7 @@ static bool r600_update_derived_state(struct r600_context *rctx)
struct r600_pipe_shader *clip_so_current = NULL;

if (!rctx->blitter->running)
- r600_update_compressed_resource_state(rctx);
+ r600_update_compressed_resource_state(rctx, false);

SELECT_SHADER_OR_FAIL(ps);
--
2.9.5
Dave Airlie
2017-11-29 04:36:17 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

This is needed for compute shaders.

v1.1: make work for vectors, fix missing lds ops.

Signed-off-by: Dave Airlie <***@redhat.com>
---
src/gallium/drivers/r600/r600_shader.c | 165 +++++++++++++++++++++++++++++++++
1 file changed, 165 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c
index e72215f..83b70b0 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -970,6 +970,7 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx)
case TGSI_FILE_ADDRESS:
case TGSI_FILE_BUFFER:
case TGSI_FILE_IMAGE:
+ case TGSI_FILE_MEMORY:
break;

case TGSI_FILE_HW_ATOMIC:
@@ -8032,6 +8033,30 @@ static int tgsi_load_rat(struct r600_shader_ctx *ctx)
return 0;
}

+static int tgsi_load_lds(struct r600_shader_ctx *ctx)
+{
+ struct tgsi_full_instruction *inst = &ctx->parse.FullToken.FullInstruction;
+ struct r600_bytecode_alu alu;
+ int r;
+ int temp_reg = r600_get_temp(ctx);
+
+ memset(&alu, 0, sizeof(struct r600_bytecode_alu));
+ alu.op = ALU_OP1_MOV;
+ r600_bytecode_src(&alu.src[0], &ctx->src[1], 0);
+ alu.dst.sel = temp_reg;
+ alu.dst.write = 1;
+ alu.last = 1;
+ r = r600_bytecode_add_alu(ctx->bc, &alu);
+ if (r)
+ return r;
+
+ r = do_lds_fetch_values(ctx, temp_reg,
+ ctx->file_offset[inst->Dst[0].Register.File] + inst->Dst[0].Register.Index, inst->Dst[0].Register.WriteMask);
+ if (r)
+ return r;
+ return 0;
+}
+
static int tgsi_load(struct r600_shader_ctx *ctx)
{
struct tgsi_full_instruction *inst = &ctx->parse.FullToken.FullInstruction;
@@ -8041,6 +8066,8 @@ static int tgsi_load(struct r600_shader_ctx *ctx)
return tgsi_load_gds(ctx);
if (inst->Src[0].Register.File == TGSI_FILE_BUFFER)
return tgsi_load_buffer(ctx);
+ if (inst->Src[0].Register.File == TGSI_FILE_MEMORY)
+ return tgsi_load_lds(ctx);
return 0;
}

@@ -8188,11 +8215,82 @@ static int tgsi_store_rat(struct r600_shader_ctx *ctx)
return 0;
}

+static int tgsi_store_lds(struct r600_shader_ctx *ctx)
+{
+ struct tgsi_full_instruction *inst = &ctx->parse.FullToken.FullInstruction;
+ struct r600_bytecode_alu alu;
+ int r, i, lasti;
+ int write_mask = inst->Dst[0].Register.WriteMask;
+ int temp_reg = r600_get_temp(ctx);
+
+ /* LDS write */
+ memset(&alu, 0, sizeof(struct r600_bytecode_alu));
+ alu.op = ALU_OP1_MOV;
+ r600_bytecode_src(&alu.src[0], &ctx->src[0], 0);
+ alu.dst.sel = temp_reg;
+ alu.dst.write = 1;
+ alu.last = 1;
+ r = r600_bytecode_add_alu(ctx->bc, &alu);
+ if (r)
+ return r;
+
+ lasti = tgsi_last_instruction(write_mask);
+ for (i = 1; i <= lasti; i++) {
+ if (!(write_mask & (1 << i)))
+ continue;
+ r = single_alu_op2(ctx, ALU_OP2_ADD_INT,
+ temp_reg, i,
+ temp_reg, 0,
+ V_SQ_ALU_SRC_LITERAL, 4 * i);
+ if (r)
+ return r;
+ }
+ for (i = 0; i <= lasti; i++) {
+ if (!(write_mask & (1 << i)))
+ continue;
+
+ if ((i == 0 && ((write_mask & 3) == 3)) ||
+ (i == 2 && ((write_mask & 0xc) == 0xc))) {
+ memset(&alu, 0, sizeof(struct r600_bytecode_alu));
+ alu.op = LDS_OP3_LDS_WRITE_REL;
+
+ alu.src[0].sel = temp_reg;
+ alu.src[0].chan = i;
+ r600_bytecode_src(&alu.src[1], &ctx->src[1], i);
+ r600_bytecode_src(&alu.src[2], &ctx->src[1], i + 1);
+ alu.last = 1;
+ alu.is_lds_idx_op = true;
+ alu.lds_idx = 1;
+ r = r600_bytecode_add_alu(ctx->bc, &alu);
+ if (r)
+ return r;
+ i += 1;
+ continue;
+ }
+ memset(&alu, 0, sizeof(struct r600_bytecode_alu));
+ alu.op = LDS_OP2_LDS_WRITE;
+
+ alu.src[0].sel = temp_reg;
+ alu.src[0].chan = i;
+ r600_bytecode_src(&alu.src[1], &ctx->src[1], i);
+
+ alu.last = 1;
+ alu.is_lds_idx_op = true;
+
+ r = r600_bytecode_add_alu(ctx->bc, &alu);
+ if (r)
+ return r;
+ }
+ return 0;
+}
+
static int tgsi_store(struct r600_shader_ctx *ctx)
{
struct tgsi_full_instruction *inst = &ctx->parse.FullToken.FullInstruction;
if (inst->Dst[0].Register.File == TGSI_FILE_BUFFER)
return tgsi_store_buffer_rat(ctx);
+ else if (inst->Dst[0].Register.File == TGSI_FILE_MEMORY)
+ return tgsi_store_lds(ctx);
else
return tgsi_store_rat(ctx);
}
@@ -8410,6 +8508,71 @@ static int tgsi_atomic_op_gds(struct r600_shader_ctx *ctx)
return 0;
}

+static int get_lds_op(int opcode)
+{
+ switch (opcode) {
+ case TGSI_OPCODE_ATOMUADD:
+ return LDS_OP2_LDS_ADD_RET;
+ case TGSI_OPCODE_ATOMAND:
+ return LDS_OP2_LDS_AND_RET;
+ case TGSI_OPCODE_ATOMOR:
+ return LDS_OP2_LDS_OR_RET;
+ case TGSI_OPCODE_ATOMXOR:
+ return LDS_OP2_LDS_XOR_RET;
+ case TGSI_OPCODE_ATOMUMIN:
+ return LDS_OP2_LDS_MIN_UINT_RET;
+ case TGSI_OPCODE_ATOMUMAX:
+ return LDS_OP2_LDS_MAX_UINT_RET;
+ case TGSI_OPCODE_ATOMIMIN:
+ return LDS_OP2_LDS_MIN_INT_RET;
+ case TGSI_OPCODE_ATOMIMAX:
+ return LDS_OP2_LDS_MAX_INT_RET;
+ case TGSI_OPCODE_ATOMXCHG:
+ return LDS_OP2_LDS_XCHG_RET;
+ case TGSI_OPCODE_ATOMCAS:
+ return LDS_OP3_LDS_CMP_XCHG_RET;
+ default:
+ return -1;
+ }
+}
+
+static int tgsi_atomic_op_lds(struct r600_shader_ctx *ctx)
+{
+ struct tgsi_full_instruction *inst = &ctx->parse.FullToken.FullInstruction;
+ int lds_op = get_lds_op(inst->Instruction.Opcode);
+ int r;
+
+ struct r600_bytecode_alu alu;
+ memset(&alu, 0, sizeof(struct r600_bytecode_alu));
+ alu.op = lds_op;
+ alu.is_lds_idx_op = true;
+ alu.last = 1;
+ r600_bytecode_src(&alu.src[0], &ctx->src[1], 0);
+ r600_bytecode_src(&alu.src[1], &ctx->src[2], 0);
+ if (lds_op == LDS_OP3_LDS_CMP_XCHG_RET)
+ r600_bytecode_src(&alu.src[2], &ctx->src[3], 0);
+ else
+ alu.src[2].sel = V_SQ_ALU_SRC_0;
+ r = r600_bytecode_add_alu(ctx->bc, &alu);
+ if (r)
+ return r;
+
+ /* then read from LDS_OQ_A_POP */
+ memset(&alu, 0, sizeof(alu));
+
+ alu.op = ALU_OP1_MOV;
+ alu.src[0].sel = EG_V_SQ_ALU_SRC_LDS_OQ_A_POP;
+ alu.src[0].chan = 0;
+ tgsi_dst(ctx, &inst->Dst[0], 0, &alu.dst);
+ alu.dst.write = 1;
+ alu.last = 1;
+ r = r600_bytecode_add_alu(ctx->bc, &alu);
+ if (r)
+ return r;
+
+ return 0;
+}
+
static int tgsi_atomic_op(struct r600_shader_ctx *ctx)
{
struct tgsi_full_instruction *inst = &ctx->parse.FullToken.FullInstruction;
@@ -8419,6 +8582,8 @@ static int tgsi_atomic_op(struct r600_shader_ctx *ctx)
return tgsi_atomic_op_gds(ctx);
if (inst->Src[0].Register.File == TGSI_FILE_BUFFER)
return tgsi_atomic_op_rat(ctx);
+ if (inst->Src[0].Register.File == TGSI_FILE_MEMORY)
+ return tgsi_atomic_op_lds(ctx);
return 0;
}
--
2.9.5
Dave Airlie
2017-11-29 04:36:28 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

We just pass these in from outside in a constant buffer.

The shader side stores them once they are accessed once.

Signed-off-by: Dave Airlie <***@redhat.com>
---
src/gallium/drivers/r600/evergreen_compute.c | 9 +++-
src/gallium/drivers/r600/r600_pipe.h | 3 ++
src/gallium/drivers/r600/r600_shader.c | 62 ++++++++++++++++++++++++++++
src/gallium/drivers/r600/r600_state_common.c | 16 ++++++-
4 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
index cf86440..4c888a2 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -724,10 +724,17 @@ static void compute_emit_cs(struct r600_context *rctx,
bool need_buf_const = current->shader.uses_tex_buffers ||
current->shader.has_txq_cube_array_z_comp;

+ for (int i = 0; i < 3; i++) {
+ rctx->cs_block_grid_sizes[i] = info->block[i];
+ rctx->cs_block_grid_sizes[i + 4] = info->grid[i];
+ }
+ rctx->cs_block_grid_sizes[3] = rctx->cs_block_grid_sizes[7] = 0;
+ rctx->driver_consts[PIPE_SHADER_COMPUTE].cs_block_grid_size_dirty = true;
if (need_buf_const) {
eg_setup_buffer_constants(rctx, PIPE_SHADER_COMPUTE);
- r600_update_driver_const_buffers(rctx, true);
}
+ r600_update_driver_const_buffers(rctx, true);
+
if (evergreen_emit_atomic_buffer_setup(rctx, current, combined_atomics, &atomic_used_mask)) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_CS_PARTIAL_FLUSH) | EVENT_INDEX(4));
diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h
index 65d1185..0f5dc6b 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -78,6 +78,7 @@
/* start driver buffers after user buffers */
#define R600_BUFFER_INFO_CONST_BUFFER (R600_MAX_USER_CONST_BUFFERS)
#define R600_UCP_SIZE (4*4*8)
+#define R600_CS_BLOCK_GRID_SIZE (8 * 4)
#define R600_BUFFER_INFO_OFFSET (R600_UCP_SIZE)

#define R600_LDS_INFO_CONST_BUFFER (R600_MAX_USER_CONST_BUFFERS + 1)
@@ -396,6 +397,7 @@ struct r600_shader_driver_constants_info {
bool vs_ucp_dirty;
bool texture_const_dirty;
bool ps_sample_pos_dirty;
+ bool cs_block_grid_size_dirty;
};

struct r600_constbuf_state
@@ -575,6 +577,7 @@ struct r600_context {
struct r600_isa *isa;
float sample_positions[4 * 16];
float tess_state[8];
+ uint32_t cs_block_grid_sizes[8]; /* 3 for grid + 1 pad, 3 for block + 1 pad*/
bool tess_state_dirty;
struct r600_pipe_shader_selector *last_ls;
struct r600_pipe_shader_selector *last_tcs;
diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c
index b3c29b9..ee6f613 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -346,6 +346,8 @@ struct r600_shader_ctx {
boolean clip_vertex_write;
unsigned cv_output;
unsigned edgeflag_output;
+ int cs_block_size_reg;
+ int cs_grid_size_reg;
int fragcoord_input;
int native_integers;
int next_ring_offset;
@@ -1308,6 +1310,60 @@ static int load_sample_position(struct r600_shader_ctx *ctx, struct r600_shader_
return t1;
}

+static int load_block_grid_size(struct r600_shader_ctx *ctx, bool load_block)
+{
+ struct r600_bytecode_vtx vtx;
+ int r, t1;
+
+ if (load_block && ctx->cs_block_size_reg != -1)
+ return ctx->cs_block_size_reg;
+ if (!load_block && ctx->cs_grid_size_reg != -1)
+ return ctx->cs_grid_size_reg;
+ t1 = r600_get_temp(ctx);
+
+ struct r600_bytecode_alu alu;
+ memset(&alu, 0, sizeof(struct r600_bytecode_alu));
+ alu.op = ALU_OP1_MOV;
+ alu.src[0].sel = V_SQ_ALU_SRC_0;
+ alu.dst.sel = t1;
+ alu.dst.write = 1;
+ alu.last = 1;
+ r = r600_bytecode_add_alu(ctx->bc, &alu);
+ if (r)
+ return r;
+
+ memset(&vtx, 0, sizeof(struct r600_bytecode_vtx));
+ vtx.op = FETCH_OP_VFETCH;
+ vtx.buffer_id = R600_BUFFER_INFO_CONST_BUFFER;
+ vtx.fetch_type = SQ_VTX_FETCH_NO_INDEX_OFFSET;
+ vtx.src_gpr = t1;
+ vtx.src_sel_x = 0;
+
+ vtx.mega_fetch_count = 16;
+ vtx.dst_gpr = t1;
+ vtx.dst_sel_x = 0;
+ vtx.dst_sel_y = 1;
+ vtx.dst_sel_z = 2;
+ vtx.dst_sel_w = 7;
+ vtx.data_format = FMT_32_32_32_32;
+ vtx.num_format_all = 1;
+ vtx.format_comp_all = 0;
+ vtx.use_const_fields = 0;
+ vtx.offset = load_block ? 0 : 16; // first element is size of buffer
+ vtx.endian = r600_endian_swap(32);
+ vtx.srf_mode_all = 1; /* SRF_MODE_NO_ZERO */
+
+ r = r600_bytecode_add_vtx(ctx->bc, &vtx);
+ if (r)
+ return r;
+
+ if (load_block)
+ ctx->cs_block_size_reg = t1;
+ else
+ ctx->cs_grid_size_reg = t1;
+ return t1;
+}
+
static void tgsi_src(struct r600_shader_ctx *ctx,
const struct tgsi_full_src_register *tgsi_src,
struct r600_shader_src *r600_src)
@@ -1413,6 +1469,10 @@ static void tgsi_src(struct r600_shader_ctx *ctx,
r600_src->swizzle[1] = 3;
r600_src->swizzle[2] = 3;
r600_src->swizzle[3] = 3;
+ } else if (ctx->info.system_value_semantic_name[tgsi_src->Register.Index] == TGSI_SEMANTIC_GRID_SIZE) {
+ r600_src->sel = load_block_grid_size(ctx, false);
+ } else if (ctx->info.system_value_semantic_name[tgsi_src->Register.Index] == TGSI_SEMANTIC_BLOCK_SIZE) {
+ r600_src->sel = load_block_grid_size(ctx, true);
}
} else {
if (tgsi_src->Register.Indirect)
@@ -3139,6 +3199,8 @@ static int r600_shader_from_tgsi(struct r600_context *rctx,
ctx.face_gpr = -1;
ctx.fixed_pt_position_gpr = -1;
ctx.fragcoord_input = -1;
+ ctx.cs_block_size_reg = -1;
+ ctx.cs_grid_size_reg = -1;
ctx.colors_used = 0;
ctx.clip_vertex_write = 0;

diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c
index bddda6b..bd40774 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1230,7 +1230,8 @@ void r600_update_driver_const_buffers(struct r600_context *rctx, bool compute_on
struct r600_shader_driver_constants_info *info = &rctx->driver_consts[sh];
if (!info->vs_ucp_dirty &&
!info->texture_const_dirty &&
- !info->ps_sample_pos_dirty)
+ !info->ps_sample_pos_dirty &&
+ !info->cs_block_grid_size_dirty)
continue;

ptr = info->constants;
@@ -1257,6 +1258,17 @@ void r600_update_driver_const_buffers(struct r600_context *rctx, bool compute_on
info->ps_sample_pos_dirty = false;
}

+ if (info->cs_block_grid_size_dirty) {
+ assert(sh == PIPE_SHADER_COMPUTE);
+ if (!size) {
+ ptr = rctx->cs_block_grid_sizes;
+ size = R600_CS_BLOCK_GRID_SIZE;
+ } else {
+ memcpy(ptr, rctx->cs_block_grid_sizes, R600_CS_BLOCK_GRID_SIZE);
+ }
+ info->cs_block_grid_size_dirty = false;
+ }
+
if (info->texture_const_dirty) {
assert (ptr);
assert (size);
@@ -1264,6 +1276,8 @@ void r600_update_driver_const_buffers(struct r600_context *rctx, bool compute_on
memcpy(ptr, rctx->clip_state.state.ucp, R600_UCP_SIZE);
if (sh == PIPE_SHADER_FRAGMENT)
memcpy(ptr, rctx->sample_positions, R600_UCP_SIZE);
+ if (sh == PIPE_SHADER_COMPUTE)
+ memcpy(ptr, rctx->cs_block_grid_sizes, R600_CS_BLOCK_GRID_SIZE);
}
info->texture_const_dirty = false;
--
2.9.5
Dave Airlie
2017-11-29 04:36:25 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

---
src/gallium/drivers/r600/evergreen_compute.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
index 7df1c55..b8e1c20 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -701,6 +701,8 @@ static void compute_emit_cs(struct r600_context *rctx,
struct radeon_winsys_cs *cs = rctx->b.gfx.cs;
bool compute_dirty = false;
struct r600_pipe_shader *current;
+ struct r600_shader_atomic combined_atomics[8];
+ uint8_t atomic_used_mask;

/* make sure that the gfx ring is only one active */
if (radeon_emitted(rctx->b.dma.cs, 0)) {
@@ -716,6 +718,11 @@ static void compute_emit_cs(struct r600_context *rctx,
r600_context_add_resource_size(&rctx->b.b, (struct pipe_resource *)current->bo);
r600_set_atom_dirty(rctx, &rctx->cs_shader_state.atom, true);
}
+
+ if (evergreen_emit_atomic_buffer_setup(rctx, current, combined_atomics, &atomic_used_mask)) {
+ radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
+ radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_CS_PARTIAL_FLUSH) | EVENT_INDEX(4));
+ }
}

/* Initialize all the compute-related registers.
@@ -770,6 +777,8 @@ static void compute_emit_cs(struct r600_context *rctx,
/* Emit dispatch state and dispatch packet */
evergreen_emit_dispatch(rctx, info);

+ if (rctx->cs_shader_state.shader->ir_type == PIPE_SHADER_IR_TGSI)
+ evergreen_emit_atomic_buffer_save(rctx, true, combined_atomics, &atomic_used_mask);
/* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 0xffffffff
*/
rctx->b.flags |= R600_CONTEXT_INV_CONST_CACHE |
--
2.9.5
Dave Airlie
2017-11-29 04:36:23 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

This add paths to handle TGSI compute shaders and shader selection.

It also avoids emitting certain things on tgsi paths,
CBs, vertex buffers, config reg init (not required).
---
src/gallium/drivers/r600/evergreen_compute.c | 125 ++++++++++++++++-----
.../drivers/r600/evergreen_compute_internal.h | 6 +
2 files changed, 103 insertions(+), 28 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
index ff51ea3..b976b61 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -41,6 +41,7 @@
#include "util/u_memory.h"
#include "util/u_inlines.h"
#include "util/u_framebuffer.h"
+#include "tgsi/tgsi_parse.h"
#include "pipebuffer/pb_buffer.h"
#include "evergreend.h"
#include "r600_shader.h"
@@ -412,7 +413,20 @@ static void *evergreen_create_compute_state(struct pipe_context *ctx,
const char *code;
void *p;
boolean use_kill;
+#endif
+
+ shader->ctx = rctx;
+ shader->local_size = cso->req_local_mem;
+ shader->private_size = cso->req_private_mem;
+ shader->input_size = cso->req_input_mem;
+
+ shader->ir_type = cso->ir_type;

+ if (shader->ir_type == PIPE_SHADER_IR_TGSI) {
+ shader->sel = r600_create_shader_state_tokens(ctx, cso->prog, PIPE_SHADER_COMPUTE);
+ return shader;
+ }
+#ifdef HAVE_OPENCL
COMPUTE_DBG(rctx->screen, "*** evergreen_create_compute_state\n");
header = cso->prog;
code = cso->prog + sizeof(struct pipe_llvm_program_header);
@@ -429,11 +443,6 @@ static void *evergreen_create_compute_state(struct pipe_context *ctx,
rctx->b.ws->buffer_unmap(shader->code_bo->buf);
#endif

- shader->ctx = rctx;
- shader->local_size = cso->req_local_mem;
- shader->private_size = cso->req_private_mem;
- shader->input_size = cso->req_input_mem;
-
return shader;
}

@@ -447,22 +456,37 @@ static void evergreen_delete_compute_state(struct pipe_context *ctx, void *state
if (!shader)
return;

+ if (shader->ir_type == PIPE_SHADER_IR_TGSI) {
+ r600_delete_shader_selector(ctx, shader->sel);
+ } else {
#ifdef HAVE_OPENCL
- radeon_shader_binary_clean(&shader->binary);
+ radeon_shader_binary_clean(&shader->binary);
#endif
- r600_destroy_shader(&shader->bc);
+ r600_destroy_shader(&shader->bc);

- /* TODO destroy shader->code_bo, shader->const_bo
- * we'll need something like r600_buffer_free */
+ /* TODO destroy shader->code_bo, shader->const_bo
+ * we'll need something like r600_buffer_free */
+ }
FREE(shader);
}

static void evergreen_bind_compute_state(struct pipe_context *ctx, void *state)
{
struct r600_context *rctx = (struct r600_context *)ctx;
-
+ struct r600_pipe_compute *cstate = (struct r600_pipe_compute *)state;
COMPUTE_DBG(rctx->screen, "*** evergreen_bind_compute_state\n");

+ if (!state) {
+ rctx->cs_shader_state.shader = (struct r600_pipe_compute *)state;
+ return;
+ }
+
+ if (cstate->ir_type == PIPE_SHADER_IR_TGSI) {
+ bool compute_dirty;
+
+ r600_shader_select(ctx, cstate->sel, &compute_dirty);
+ }
+
rctx->cs_shader_state.shader = (struct r600_pipe_compute *)state;
}

@@ -486,7 +510,7 @@ static void evergreen_compute_upload_input(struct pipe_context *ctx,
/* We need to reserve 9 dwords (36 bytes) for implicit kernel
* parameters.
*/
- unsigned input_size = shader->input_size + 36;
+ unsigned input_size;
uint32_t *num_work_groups_start;
uint32_t *global_size_start;
uint32_t *local_size_start;
@@ -494,10 +518,12 @@ static void evergreen_compute_upload_input(struct pipe_context *ctx,
struct pipe_box box;
struct pipe_transfer *transfer = NULL;

+ if (!shader)
+ return;
if (shader->input_size == 0) {
return;
}
-
+ input_size = shader->input_size + 36;
if (!shader->kernel_param) {
/* Add space for the grid dimensions */
shader->kernel_param = (struct r600_resource *)
@@ -555,9 +581,10 @@ static void evergreen_emit_dispatch(struct r600_context *rctx,
unsigned wave_divisor = (16 * num_pipes);
int group_size = 1;
int grid_size = 1;
- unsigned lds_size = shader->local_size / 4 +
- shader->bc.nlds_dw;
+ unsigned lds_size = shader->local_size / 4;

+ if (shader->ir_type != PIPE_SHADER_IR_TGSI)
+ lds_size += shader->bc.nlds_dw;

/* Calculate group_size/grid_size */
for (i = 0; i < 3; i++) {
@@ -660,12 +687,25 @@ static void compute_emit_cs(struct r600_context *rctx,
const struct pipe_grid_info *info)
{
struct radeon_winsys_cs *cs = rctx->b.gfx.cs;
+ bool compute_dirty = false;
+ struct r600_pipe_shader *current;

/* make sure that the gfx ring is only one active */
if (radeon_emitted(rctx->b.dma.cs, 0)) {
rctx->b.dma.flush(rctx, RADEON_FLUSH_ASYNC, NULL);
}

+ r600_need_cs_space(rctx, 0, true);
+ if (rctx->cs_shader_state.shader->ir_type == PIPE_SHADER_IR_TGSI) {
+ r600_shader_select(&rctx->b.b, rctx->cs_shader_state.shader->sel, &compute_dirty);
+ current = rctx->cs_shader_state.shader->sel->current;
+ if (compute_dirty) {
+ rctx->cs_shader_state.atom.num_dw = current->command_buffer.num_dw;
+ r600_context_add_resource_size(&rctx->b.b, (struct pipe_resource *)current->bo);
+ r600_set_atom_dirty(rctx, &rctx->cs_shader_state.atom, true);
+ }
+ }
+
/* Initialize all the compute-related registers.
*
* See evergreen_init_atom_start_compute_cs() in this file for the list
@@ -674,17 +714,34 @@ static void compute_emit_cs(struct r600_context *rctx,
r600_emit_command_buffer(cs, &rctx->start_compute_cs_cmd);

/* emit config state */
- if (rctx->b.chip_class == EVERGREEN)
- r600_emit_atom(rctx, &rctx->config_state.atom);
+ if (rctx->b.chip_class == EVERGREEN) {
+ if (rctx->cs_shader_state.shader->ir_type == PIPE_SHADER_IR_TGSI) {
+ radeon_set_config_reg_seq(cs, R_008C04_SQ_GPR_RESOURCE_MGMT_1, 3);
+ radeon_emit(cs, S_008C04_NUM_CLAUSE_TEMP_GPRS(rctx->r6xx_num_clause_temp_gprs));
+ radeon_emit(cs, 0);
+ radeon_emit(cs, 0);
+ radeon_set_config_reg(cs, R_008D8C_SQ_DYN_GPR_CNTL_PS_FLUSH_REQ, (1 << 8));
+ } else
+ r600_emit_atom(rctx, &rctx->config_state.atom);
+ }

rctx->b.flags |= R600_CONTEXT_WAIT_3D_IDLE | R600_CONTEXT_FLUSH_AND_INV;
r600_flush_emit(rctx);

- compute_setup_cbs(rctx);
+ if (rctx->cs_shader_state.shader->ir_type != PIPE_SHADER_IR_TGSI) {
+
+ compute_setup_cbs(rctx);

- /* Emit vertex buffer state */
- rctx->cs_vertex_buffer_state.atom.num_dw = 12 * util_bitcount(rctx->cs_vertex_buffer_state.dirty_mask);
- r600_emit_atom(rctx, &rctx->cs_vertex_buffer_state.atom);
+ /* Emit vertex buffer state */
+ rctx->cs_vertex_buffer_state.atom.num_dw = 12 * util_bitcount(rctx->cs_vertex_buffer_state.dirty_mask);
+ r600_emit_atom(rctx, &rctx->cs_vertex_buffer_state.atom);
+ } else {
+ uint32_t rat_mask;
+
+ rat_mask = ((1ULL << ((unsigned)rctx->cb_misc_state.nr_image_rats + rctx->cb_misc_state.nr_buffer_rats * 4)) - 1);
+ radeon_compute_set_context_reg(cs, R_028238_CB_TARGET_MASK,
+ rat_mask);
+ }

/* Emit constant buffer state */
r600_emit_atom(rctx, &rctx->constbuf_state[PIPE_SHADER_COMPUTE].atom);
@@ -744,10 +801,17 @@ void evergreen_emit_cs_shader(struct r600_context *rctx,
struct r600_resource *code_bo;
unsigned ngpr, nstack;

- code_bo = shader->code_bo;
- va = shader->code_bo->gpu_address + state->pc;
- ngpr = shader->bc.ngpr;
- nstack = shader->bc.nstack;
+ if (shader->ir_type == PIPE_SHADER_IR_TGSI) {
+ code_bo = shader->sel->current->bo;
+ va = shader->sel->current->bo->gpu_address;
+ ngpr = shader->sel->current->shader.bc.ngpr;
+ nstack = shader->sel->current->shader.bc.nstack;
+ } else {
+ code_bo = shader->code_bo;
+ va = shader->code_bo->gpu_address + state->pc;
+ ngpr = shader->bc.ngpr;
+ nstack = shader->bc.nstack;
+ }

radeon_compute_set_context_reg_seq(cs, R_0288D0_SQ_PGM_START_LS, 3);
radeon_emit(cs, va >> 8); /* R_0288D0_SQ_PGM_START_LS */
@@ -771,10 +835,15 @@ static void evergreen_launch_grid(struct pipe_context *ctx,
struct r600_pipe_compute *shader = rctx->cs_shader_state.shader;
boolean use_kill;

- rctx->cs_shader_state.pc = info->pc;
- /* Get the config information for this kernel. */
- r600_shader_binary_read_config(&shader->binary, &shader->bc,
- info->pc, &use_kill);
+ if (shader->ir_type != PIPE_SHADER_IR_TGSI) {
+ rctx->cs_shader_state.pc = info->pc;
+ /* Get the config information for this kernel. */
+ r600_shader_binary_read_config(&shader->binary, &shader->bc,
+ info->pc, &use_kill);
+ } else {
+ use_kill = false;
+ rctx->cs_shader_state.pc = 0;
+ }
#endif

COMPUTE_DBG(rctx->screen, "*** evergreen_launch_grid: pc = %u\n", info->pc);
diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.h b/src/gallium/drivers/r600/evergreen_compute_internal.h
index 32f53ad..db3f24d 100644
--- a/src/gallium/drivers/r600/evergreen_compute_internal.h
+++ b/src/gallium/drivers/r600/evergreen_compute_internal.h
@@ -34,6 +34,12 @@ struct r600_pipe_compute {
struct r600_context *ctx;

struct ac_shader_binary binary;
+
+ enum pipe_shader_ir ir_type;
+
+ /* tgsi selector */
+ struct r600_pipe_shader_selector *sel;
+
struct r600_resource *code_bo;
struct r600_bytecode bc;
--
2.9.5
Dave Airlie
2017-11-29 04:36:29 UTC
Reply
Permalink
Raw Message
From: Dave Airlie <***@redhat.com>

---
src/gallium/drivers/r600/r600_pipe.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c
index 01f9bf6..b013d69 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -602,7 +602,7 @@ static int r600_get_shader_param(struct pipe_screen* pscreen,
return PIPE_SHADER_IR_TGSI;
}
case PIPE_SHADER_CAP_SUPPORTED_IRS:
- return 0;
+ return (1 << PIPE_SHADER_IR_TGSI);
case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED:
if (rscreen->b.family == CHIP_ARUBA ||
rscreen->b.family == CHIP_CAYMAN ||
@@ -619,7 +619,7 @@ static int r600_get_shader_param(struct pipe_screen* pscreen,
case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS:
case PIPE_SHADER_CAP_MAX_SHADER_IMAGES:
if (rscreen->b.family >= CHIP_CEDAR &&
- (shader == PIPE_SHADER_FRAGMENT))
+ (shader == PIPE_SHADER_FRAGMENT || shader == PIPE_SHADER_COMPUTE))
return 8;
return 0;
case PIPE_SHADER_CAP_MAX_HW_ATOMIC_COUNTERS:
--
2.9.5
Gert Wollny
2017-11-29 12:46:11 UTC
Reply
Permalink
Raw Message
Post by Dave Airlie
This set of patches enables compute shaders on r600 and exposes GLSL
4.30 support. They are pretty alpha level, but I'd like to land some
of them (maybe disabled) so I can avoid the rebasing fun with the
more intrusive ones.
It is based on the previous ssbo support patch.
It may not be stable, I have a few patches sitting on top locally
for flushing various things I want to figure out if they are required
or if I can fix things properly.
I run the arb_compute_shader piglits on BARTS, the piglits

basic-texelfetch
border-color
multiple-workgroups
basic-uniform-access
multiple-texture-reading
simple-barrier

result in GPU lockups and, consequently, fail. The other 20 tests pass.

Best,
Gert
Dave Airlie
2017-11-29 23:30:46 UTC
Reply
Permalink
Raw Message
Post by Gert Wollny
Post by Dave Airlie
This set of patches enables compute shaders on r600 and exposes GLSL
4.30 support. They are pretty alpha level, but I'd like to land some
of them (maybe disabled) so I can avoid the rebasing fun with the
more intrusive ones.
It is based on the previous ssbo support patch.
It may not be stable, I have a few patches sitting on top locally
for flushing various things I want to figure out if they are required
or if I can fix things properly.
I run the arb_compute_shader piglits on BARTS, the piglits
basic-texelfetch
border-color
multiple-workgroups
basic-uniform-access
multiple-texture-reading
simple-barrier
result in GPU lockups and, consequently, fail. The other 20 tests pass.
Does the attached patch help with the lockups at all?

Dave.
Gert Wollny
2017-11-30 07:20:12 UTC
Reply
Permalink
Raw Message
Post by Dave Airlie
Post by Gert Wollny
I run the arb_compute_shader piglits on BARTS, the piglits
   basic-texelfetch
   border-color
   multiple-workgroups
   basic-uniform-access
   multiple-texture-reading
   simple-barrier
result in GPU lockups and, consequently, fail. The other 20 tests pass.
Does the attached patch help with the lockups at all?
no, no changes with the arb_compute_shader tests,

Best,
Gert
Dave Airlie
2017-11-30 07:56:45 UTC
Reply
Permalink
Raw Message
Post by Gert Wollny
Post by Dave Airlie
Post by Gert Wollny
I run the arb_compute_shader piglits on BARTS, the piglits
basic-texelfetch
border-color
multiple-workgroups
basic-uniform-access
multiple-texture-reading
simple-barrier
result in GPU lockups and, consequently, fail. The other 20 tests pass.
Does the attached patch help with the lockups at all?
no, no changes with the arb_compute_shader tests,
Could you give:
https://cgit.freedesktop.org/~airlied/mesa/log/?h=r600-wip-cs a spin?

I'm guessing WIP hacks might fix it, but I really want to avoid flushing.
it might be necessary to reemit a bunch of graphics state.

Dave.
Gert Wollny
2017-11-30 12:06:56 UTC
Reply
Permalink
Raw Message
Post by Dave Airlie
Post by Gert Wollny
Post by Dave Airlie
Post by Gert Wollny
I run the arb_compute_shader piglits on BARTS, the piglits
   basic-texelfetch
   border-color
   multiple-workgroups
   basic-uniform-access
   multiple-texture-reading
   simple-barrier
result in GPU lockups and, consequently, fail. The other 20
tests
pass.
Does the attached patch help with the lockups at all?
no, no changes with the arb_compute_shader tests,
https://cgit.freedesktop.org/~airlied/mesa/log/?h=r600-wip-cs a spin?
I'm guessing WIP hacks might fix it, but I really want to avoid
flushing. it might be necessary to reemit a bunch of graphics state.
To compile this I had to switch to LLVM 3.9, because the tree doesn't
compile with LLVM >= 4.0.

Anyway, it fixes all the arb_compute_shader piglits and I by using
MESA_GL_VERSION_OVERRIDE=4.3 I am even able to walk around in a
properly rendered "Alien Isolation" (25-45 FPS, GPU: Radeon 6850 HD,
CPU: FX 6300).

Best,
Gert
Dave Airlie
2017-12-06 23:35:36 UTC
Reply
Permalink
Raw Message
Post by Gert Wollny
Post by Dave Airlie
Post by Gert Wollny
Post by Dave Airlie
Post by Gert Wollny
I run the arb_compute_shader piglits on BARTS, the piglits
basic-texelfetch
border-color
multiple-workgroups
basic-uniform-access
multiple-texture-reading
simple-barrier
result in GPU lockups and, consequently, fail. The other 20
tests
pass.
Does the attached patch help with the lockups at all?
no, no changes with the arb_compute_shader tests,
https://cgit.freedesktop.org/~airlied/mesa/log/?h=r600-wip-cs a spin?
I'm guessing WIP hacks might fix it, but I really want to avoid
flushing. it might be necessary to reemit a bunch of graphics state.
To compile this I had to switch to LLVM 3.9, because the tree doesn't
compile with LLVM >= 4.0.
Anyway, it fixes all the arb_compute_shader piglits and I by using
MESA_GL_VERSION_OVERRIDE=4.3 I am even able to walk around in a
properly rendered "Alien Isolation" (25-45 FPS, GPU: Radeon 6850 HD,
CPU: FX 6300).
I've pushed a bunch of the code, except for the final enables,

Care to give my r600-gl-4.3 branch a spin?

It has the workaround I think I need to stabilise stuff for now.

Dave.
Gert Wollny
2017-12-07 10:06:38 UTC
Reply
Permalink
Raw Message
Post by Dave Airlie
I've pushed a bunch of the code, except for the final enables,
Care to give my r600-gl-4.3 branch a spin?
It shows quite some regressions w.r.t the WIP branch:

WIP gl-4.3

execution/atomic-counter: pass fail
execution/basic-global-id: pass fail
execution/basic-group-id: pass fail
execution/basic-group-id-x: pass fail
execution/basic-group-id-z: pass fail
execution/basic-local-id-atomic: pass fail
execution/basic-local-index: pass fail
execution/basic-ssbo: pass fail
execution/basic-texelfetch: pass fail
execution/basic-uniform-access-atomic: pass fail
execution/border-color: pass fail
execution/multiple-texture-reading: pass fail
execution/separate-global-id: pass fail
execution/separate-global-id-2: pass fail
execution/shared-atomics: pass fail
execution/simple-barrier-atomics: pass fail

(all other shader piglits equal)
Post by Dave Airlie
It has the workaround I think I need to stabilise stuff for now.
Best,
Gert

Dave Airlie
2017-11-30 04:26:30 UTC
Reply
Permalink
Raw Message
Post by Dave Airlie
This set of patches enables compute shaders on r600 and exposes GLSL 4.30
support. They are pretty alpha level, but I'd like to land some of them
(maybe disabled) so I can avoid the rebasing fun with the more intrusive
ones.
It is based on the previous ssbo support patch.
It may not be stable, I have a few patches sitting on top locally
for flushing various things I want to figure out if they are required or
if I can fix things properly.
It for some reason fails to launch compute on cayman and hangs instead,
I've got some traces from fglrx, just need to take time to work out what
crashes, I've tested it on CAICOS mostly.
FYI, I got cayman shaders running today, but it appears cayman uses GDS
for atomics not the append/consume ctrs, at least the current code doesn't
work and tracing fglrx show is using GDS.

I've written the code but haven't debugged it into working yet.

Dave.
Loading...