Discussion:
[PATCH v3 00/10] Add precise/invariant semantics to TGSI
Add Reply
Karol Herbst
2017-06-16 19:08:13 UTC
Reply
Permalink
Raw Message
Running Tomb Raider on Nouveau I found some flicker caused by ignoring precise
modifiers on variables inside Nouveau.

This series add precise/invariant handling to TGSI, which can be then used by
drivers to disable certain unsafe optimisations which may otherwise alter
calculations, which depend on having the same result across shaders.

This series fixes this bug in Tomb Raider and one CTS test for 4.4 and 4.5

No piglit regression on my nve6

Changes since v2:
* documentation for precise modifier
* disable TGSI MAD peephole for precise instructions
* drop MAD split patch in nv50ir

Karol Herbst (10):
tgsi: add precise flag to tgsi_instruction
tgsi/dump: print _PRECISE modifier on Instructions
st/glsl_to_tgsi: handle precise modifier
tgsi: populate precise
tgsi/text: parse _PRECISE modifier
gallium/docs: add precise instruction modifier
st/glsl_to_tgsi: don't optimize mul+add to mad if expression is
precise
nv50/ir: add precise field to Instruction
nv50/ir/tgsi: handle precise for most ALU instructions
nv50/ir: disable mul+add to mad for precise instructions

src/gallium/auxiliary/tgsi/tgsi_build.c | 4 +++
src/gallium/auxiliary/tgsi/tgsi_dump.c | 4 +++
src/gallium/auxiliary/tgsi/tgsi_text.c | 17 ++++++++--
src/gallium/auxiliary/tgsi/tgsi_ureg.c | 8 ++++-
src/gallium/auxiliary/tgsi/tgsi_ureg.h | 14 ++++++++-
src/gallium/auxiliary/util/u_simple_shaders.c | 2 +-
src/gallium/docs/source/tgsi.rst | 8 ++++-
src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 +
.../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 ++
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++-
src/gallium/include/pipe/p_shader_tokens.h | 3 +-
src/gallium/state_trackers/nine/nine_shader.c | 6 ++--
src/mesa/state_tracker/st_atifs_to_tgsi.c | 36 +++++++++++-----------
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 21 ++++++++++---
src/mesa/state_tracker/st_mesa_to_tgsi.c | 6 ++--
15 files changed, 101 insertions(+), 37 deletions(-)
--
2.13.1
Karol Herbst
2017-06-16 19:08:14 UTC
Reply
Permalink
Raw Message
Signed-off-by: Karol Herbst <***@gmail.com>
Reviewed-by: Nicolai Hähnle <***@amd.com>
Reviewed-by: Roland Scheidegger <***@vmware.com>
---
src/gallium/auxiliary/tgsi/tgsi_build.c | 1 +
src/gallium/include/pipe/p_shader_tokens.h | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c b/src/gallium/auxiliary/tgsi/tgsi_build.c
index 00843241f8..55e4d064ed 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_build.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
@@ -642,6 +642,7 @@ tgsi_default_instruction( void )
instruction.Label = 0;
instruction.Texture = 0;
instruction.Memory = 0;
+ instruction.Precise = 0;
instruction.Padding = 0;

return instruction;
diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h
index 1e08d97329..aa0fb3e3b3 100644
--- a/src/gallium/include/pipe/p_shader_tokens.h
+++ b/src/gallium/include/pipe/p_shader_tokens.h
@@ -638,7 +638,8 @@ struct tgsi_instruction
unsigned Label : 1;
unsigned Texture : 1;
unsigned Memory : 1;
- unsigned Padding : 2;
+ unsigned Precise : 1;
+ unsigned Padding : 1;
};

/*
--
2.13.1
Karol Herbst
2017-06-16 19:08:16 UTC
Reply
Permalink
Raw Message
all subexpression inside an ir_assignment needs to be tagged as precise.

v2: make precise handling more global inside the visitor

Signed-off-by: Karol Herbst <***@gmail.com>
---
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 24d417d670..ce092347c3 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -87,6 +87,13 @@ static int swizzle_for_type(const glsl_type *type, int component = 0)
return swizzle;
}

+static unsigned is_precise(const ir_variable *ir)
+{
+ if (!ir)
+ return 0;
+ return ir->data.precise || ir->data.invariant;
+}
+
/**
* This struct is a corresponding struct to TGSI ureg_src.
*/
@@ -296,6 +303,7 @@ public:
ir_instruction *ir;

unsigned op:8; /**< TGSI opcode */
+ unsigned precise:1;
unsigned saturate:1;
unsigned is_64bit_expanded:1;
unsigned sampler_base:5;
@@ -435,6 +443,7 @@ public:
bool have_fma;
bool use_shared_memory;
bool has_tex_txf_lz;
+ bool precise;

variable_storage *find_variable_storage(ir_variable *var);

@@ -691,6 +700,7 @@ glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op,
STATIC_ASSERT(TGSI_OPCODE_LAST <= 255);

inst->op = op;
+ inst->precise = this->precise;
inst->info = tgsi_get_opcode_info(op);
inst->dst[0] = dst;
inst->dst[1] = dst1;
@@ -3147,6 +3157,8 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir)
st_dst_reg l;
st_src_reg r;

+ /* all generated instructions need to be flaged as precise */
+ this->precise = is_precise(ir->lhs->variable_referenced());
ir->rhs->accept(this);
r = this->result;

@@ -3238,6 +3250,7 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir)
} else {
emit_block_mov(ir, ir->rhs->type, &l, &r, NULL, false);
}
+ this->precise = 0;
}
--
2.13.1
Karol Herbst
2017-06-16 19:08:17 UTC
Reply
Permalink
Raw Message
Only implemented for glsl->tgsi. Other converters just set precise to 0.

v2: remove precise paramter from ureg_tex_insn and ureg_memory_insn

Signed-off-by: Karol Herbst <***@gmail.com>
---
src/gallium/auxiliary/tgsi/tgsi_build.c | 3 +++
src/gallium/auxiliary/tgsi/tgsi_ureg.c | 8 +++++-
src/gallium/auxiliary/tgsi/tgsi_ureg.h | 14 ++++++++++-
src/gallium/auxiliary/util/u_simple_shaders.c | 2 +-
src/gallium/state_trackers/nine/nine_shader.c | 6 ++---
src/mesa/state_tracker/st_atifs_to_tgsi.c | 36 +++++++++++++--------------
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 6 ++---
src/mesa/state_tracker/st_mesa_to_tgsi.c | 6 ++---
8 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c b/src/gallium/auxiliary/tgsi/tgsi_build.c
index 55e4d064ed..144a017768 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_build.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
@@ -651,6 +651,7 @@ tgsi_default_instruction( void )
static struct tgsi_instruction
tgsi_build_instruction(unsigned opcode,
unsigned saturate,
+ unsigned precise,
unsigned num_dst_regs,
unsigned num_src_regs,
struct tgsi_header *header)
@@ -665,6 +666,7 @@ tgsi_build_instruction(unsigned opcode,
instruction = tgsi_default_instruction();
instruction.Opcode = opcode;
instruction.Saturate = saturate;
+ instruction.Precise = precise;
instruction.NumDstRegs = num_dst_regs;
instruction.NumSrcRegs = num_src_regs;

@@ -1061,6 +1063,7 @@ tgsi_build_full_instruction(

*instruction = tgsi_build_instruction(full_inst->Instruction.Opcode,
full_inst->Instruction.Saturate,
+ full_inst->Instruction.Precise,
full_inst->Instruction.NumDstRegs,
full_inst->Instruction.NumSrcRegs,
header);
diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
index d2a0507d29..ca31bc4a75 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
@@ -1211,6 +1211,7 @@ struct ureg_emit_insn_result
ureg_emit_insn(struct ureg_program *ureg,
unsigned opcode,
boolean saturate,
+ unsigned precise,
unsigned num_dst,
unsigned num_src)
{
@@ -1224,6 +1225,7 @@ ureg_emit_insn(struct ureg_program *ureg,
out[0].insn = tgsi_default_instruction();
out[0].insn.Opcode = opcode;
out[0].insn.Saturate = saturate;
+ out[0].insn.Precise = precise;
out[0].insn.NumDstRegs = num_dst;
out[0].insn.NumSrcRegs = num_src;

@@ -1352,7 +1354,8 @@ ureg_insn(struct ureg_program *ureg,
const struct ureg_dst *dst,
unsigned nr_dst,
const struct ureg_src *src,
- unsigned nr_src )
+ unsigned nr_src,
+ unsigned precise )
{
struct ureg_emit_insn_result insn;
unsigned i;
@@ -1367,6 +1370,7 @@ ureg_insn(struct ureg_program *ureg,
insn = ureg_emit_insn(ureg,
opcode,
saturate,
+ precise,
nr_dst,
nr_src);

@@ -1404,6 +1408,7 @@ ureg_tex_insn(struct ureg_program *ureg,
insn = ureg_emit_insn(ureg,
opcode,
saturate,
+ 0,
nr_dst,
nr_src);

@@ -1440,6 +1445,7 @@ ureg_memory_insn(struct ureg_program *ureg,
insn = ureg_emit_insn(ureg,
opcode,
FALSE,
+ 0,
nr_dst,
nr_src);

diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
index 54f95ba565..ed8c177d51 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
@@ -546,7 +546,8 @@ ureg_insn(struct ureg_program *ureg,
const struct ureg_dst *dst,
unsigned nr_dst,
const struct ureg_src *src,
- unsigned nr_src );
+ unsigned nr_src,
+ unsigned precise );


void
@@ -586,6 +587,7 @@ struct ureg_emit_insn_result
ureg_emit_insn(struct ureg_program *ureg,
unsigned opcode,
boolean saturate,
+ unsigned precise,
unsigned num_dst,
unsigned num_src);

@@ -632,6 +634,7 @@ static inline void ureg_##op( struct ureg_program *ureg ) \
opcode, \
FALSE, \
0, \
+ 0, \
0); \
ureg_fixup_insn_size( ureg, insn.insn_token ); \
}
@@ -646,6 +649,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \
opcode, \
FALSE, \
0, \
+ 0, \
1); \
ureg_emit_src( ureg, src ); \
ureg_fixup_insn_size( ureg, insn.insn_token ); \
@@ -661,6 +665,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \
opcode, \
FALSE, \
0, \
+ 0, \
0); \
ureg_emit_label( ureg, insn.extended_token, label_token ); \
ureg_fixup_insn_size( ureg, insn.insn_token ); \
@@ -677,6 +682,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \
opcode, \
FALSE, \
0, \
+ 0, \
1); \
ureg_emit_label( ureg, insn.extended_token, label_token ); \
ureg_emit_src( ureg, src ); \
@@ -694,6 +700,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \
insn = ureg_emit_insn(ureg, \
opcode, \
dst.Saturate, \
+ 0, \
1, \
0); \
ureg_emit_dst( ureg, dst ); \
@@ -713,6 +720,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \
insn = ureg_emit_insn(ureg, \
opcode, \
dst.Saturate, \
+ 0, \
1, \
1); \
ureg_emit_dst( ureg, dst ); \
@@ -733,6 +741,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \
insn = ureg_emit_insn(ureg, \
opcode, \
dst.Saturate, \
+ 0, \
1, \
2); \
ureg_emit_dst( ureg, dst ); \
@@ -756,6 +765,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \
insn = ureg_emit_insn(ureg, \
opcode, \
dst.Saturate, \
+ 0, \
1, \
2); \
ureg_emit_texture( ureg, insn.extended_token, target, \
@@ -780,6 +790,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \
insn = ureg_emit_insn(ureg, \
opcode, \
dst.Saturate, \
+ 0, \
1, \
3); \
ureg_emit_dst( ureg, dst ); \
@@ -806,6 +817,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \
insn = ureg_emit_insn(ureg, \
opcode, \
dst.Saturate, \
+ 0, \
1, \
4); \
ureg_emit_texture( ureg, insn.extended_token, target, \
diff --git a/src/gallium/auxiliary/util/u_simple_shaders.c b/src/gallium/auxiliary/util/u_simple_shaders.c
index 5874d0e9aa..79331b5638 100644
--- a/src/gallium/auxiliary/util/u_simple_shaders.c
+++ b/src/gallium/auxiliary/util/u_simple_shaders.c
@@ -954,7 +954,7 @@ util_make_geometry_passthrough_shader(struct pipe_context *pipe,
}

/* EMIT IMM[0] */
- ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1);
+ ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1, 0);

/* END */
ureg_END(ureg);
diff --git a/src/gallium/state_trackers/nine/nine_shader.c b/src/gallium/state_trackers/nine/nine_shader.c
index 40fb6be88f..f405090811 100644
--- a/src/gallium/state_trackers/nine/nine_shader.c
+++ b/src/gallium/state_trackers/nine/nine_shader.c
@@ -1879,7 +1879,7 @@ DECL_SPECIAL(IFC)
struct ureg_dst tmp = ureg_writemask(tx_scratch(tx), TGSI_WRITEMASK_X);
src[0] = tx_src_param(tx, &tx->insn.src[0]);
src[1] = tx_src_param(tx, &tx->insn.src[1]);
- ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2);
+ ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0);
ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), tx_cond(tx));
return D3D_OK;
}
@@ -1897,7 +1897,7 @@ DECL_SPECIAL(BREAKC)
struct ureg_dst tmp = ureg_writemask(tx_scratch(tx), TGSI_WRITEMASK_X);
src[0] = tx_src_param(tx, &tx->insn.src[0]);
src[1] = tx_src_param(tx, &tx->insn.src[1]);
- ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2);
+ ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0);
ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), tx_cond(tx));
ureg_BRK(tx->ureg);
tx_endcond(tx);
@@ -3029,7 +3029,7 @@ NineTranslateInstruction_Generic(struct shader_translator *tx)

ureg_insn(tx->ureg, tx->insn.info->opcode,
dst, tx->insn.ndst,
- src, tx->insn.nsrc);
+ src, tx->insn.nsrc, 0);
return D3D_OK;
}

diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c b/src/mesa/state_tracker/st_atifs_to_tgsi.c
index 338ced56ed..13e013cebc 100644
--- a/src/mesa/state_tracker/st_atifs_to_tgsi.c
+++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c
@@ -105,18 +105,18 @@ apply_swizzle(struct st_translate *t,
imm[0] = src;
imm[1] = ureg_imm4f(t->ureg, 1.0f, 1.0f, 0.0f, 0.0f);
imm[2] = ureg_imm4f(t->ureg, 0.0f, 0.0f, 1.0f, 1.0f);
- ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3);
+ ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3, 0);

if (swizzle == GL_SWIZZLE_STR_DR_ATI) {
imm[0] = ureg_scalar(src, TGSI_SWIZZLE_Z);
} else {
imm[0] = ureg_scalar(src, TGSI_SWIZZLE_W);
}
- ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0], 1);
+ ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0], 1, 0);

imm[0] = ureg_src(tmp[0]);
imm[1] = ureg_src(tmp[1]);
- ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2);
+ ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2, 0);

return ureg_src(tmp[0]);
}
@@ -170,35 +170,35 @@ prepare_argument(struct st_translate *t, const unsigned argId,
src = ureg_scalar(src, TGSI_SWIZZLE_W);
break;
}
- ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1);
+ ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1, 0);

if (srcReg->argMod & GL_COMP_BIT_ATI) {
struct ureg_src modsrc[2];
modsrc[0] = ureg_imm1f(t->ureg, 1.0f);
modsrc[1] = ureg_negate(ureg_src(arg));

- ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
+ ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
}
if (srcReg->argMod & GL_BIAS_BIT_ATI) {
struct ureg_src modsrc[2];
modsrc[0] = ureg_src(arg);
modsrc[1] = ureg_imm1f(t->ureg, -0.5f);

- ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
+ ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
}
if (srcReg->argMod & GL_2X_BIT_ATI) {
struct ureg_src modsrc[2];
modsrc[0] = ureg_src(arg);
modsrc[1] = ureg_src(arg);

- ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
+ ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
}
if (srcReg->argMod & GL_NEGATE_BIT_ATI) {
struct ureg_src modsrc[2];
modsrc[0] = ureg_src(arg);
modsrc[1] = ureg_imm1f(t->ureg, -1.0f);

- ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2);
+ ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2, 0);
}
return ureg_src(arg);
}
@@ -217,25 +217,25 @@ emit_special_inst(struct st_translate *t, const struct instruction_desc *desc,
tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI + 2); /* re-purpose a3 */
src[0] = ureg_imm1f(t->ureg, 0.5f);
src[1] = ureg_negate(args[2]);
- ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2);
+ ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2, 0);
src[0] = ureg_src(tmp[0]);
src[1] = args[0];
src[2] = args[1];
- ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3);
+ ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0);
} else if (!strcmp(desc->name, "CND0")) {
src[0] = args[2];
src[1] = args[1];
src[2] = args[0];
- ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3);
+ ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0);
} else if (!strcmp(desc->name, "DOT2_ADD")) {
/* note: DP2A is not implemented in most pipe drivers */
tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI); /* re-purpose a1 */
src[0] = args[0];
src[1] = args[1];
- ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2);
+ ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2, 0);
src[0] = ureg_src(tmp[0]);
src[1] = ureg_scalar(args[2], TGSI_SWIZZLE_Z);
- ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2);
+ ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2, 0);
}
}

@@ -249,7 +249,7 @@ emit_arith_inst(struct st_translate *t,
return;
}

- ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount);
+ ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount, 0);
}

static void
@@ -292,7 +292,7 @@ emit_dstmod(struct st_translate *t,
if (dstMod & GL_SATURATE_BIT_ATI) {
dst = ureg_saturate(dst);
}
- ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2);
+ ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2, 0);
}

/**
@@ -336,7 +336,7 @@ compile_setupinst(struct st_translate *t,
ureg_tex_insn(t->ureg, TGSI_OPCODE_TEX, dst, 1, TGSI_TEXTURE_2D,
TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2);
} else if (texinst->Opcode == ATI_FRAGMENT_SHADER_PASS_OP) {
- ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1);
+ ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0);
}

t->regs_written[t->current_pass][r] = true;
@@ -408,11 +408,11 @@ finalize_shader(struct st_translate *t, unsigned numPasses)
/* copy the result into the OUT slot */
dst[0] = t->outputs[t->outputMapping[FRAG_RESULT_COLOR]];
src[0] = ureg_src(t->temps[0]);
- ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1);
+ ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0);
}

/* signal the end of the program */
- ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0);
+ ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0, 0);
}

/**
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index ce092347c3..6cc5a39510 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -5959,7 +5959,7 @@ compile_tgsi_instruction(struct st_translate *t,
case TGSI_OPCODE_IF:
case TGSI_OPCODE_UIF:
assert(num_dst == 0);
- ureg_insn(ureg, inst->op, NULL, 0, src, num_src);
+ ureg_insn(ureg, inst->op, NULL, 0, src, num_src, inst->precise);
return;

case TGSI_OPCODE_TEX:
@@ -6063,14 +6063,14 @@ compile_tgsi_instruction(struct st_translate *t,

case TGSI_OPCODE_SCS:
dst[0] = ureg_writemask(dst[0], TGSI_WRITEMASK_XY);
- ureg_insn(ureg, inst->op, dst, num_dst, src, num_src);
+ ureg_insn(ureg, inst->op, dst, num_dst, src, num_src, inst->precise);
break;

default:
ureg_insn(ureg,
inst->op,
dst, num_dst,
- src, num_src);
+ src, num_src, inst->precise);
break;
}
}
diff --git a/src/mesa/state_tracker/st_mesa_to_tgsi.c b/src/mesa/state_tracker/st_mesa_to_tgsi.c
index 984ff92130..f6eb5effc6 100644
--- a/src/mesa/state_tracker/st_mesa_to_tgsi.c
+++ b/src/mesa/state_tracker/st_mesa_to_tgsi.c
@@ -566,7 +566,7 @@ compile_instruction(
ureg_insn( ureg,
translate_opcode( inst->Opcode ),
dst, num_dst,
- src, num_src );
+ src, num_src, 0 );
break;

case OPCODE_XPD:
@@ -574,7 +574,7 @@ compile_instruction(
ureg_insn( ureg,
translate_opcode( inst->Opcode ),
dst, num_dst,
- src, num_src );
+ src, num_src, 0 );
break;

case OPCODE_RSQ:
@@ -593,7 +593,7 @@ compile_instruction(
ureg_insn( ureg,
translate_opcode( inst->Opcode ),
dst, num_dst,
- src, num_src );
+ src, num_src, 0);
break;
}
}
--
2.13.1
Karol Herbst
2017-06-16 19:08:19 UTC
Reply
Permalink
Raw Message
Signed-off-by: Karol Herbst <***@gmail.com>
---
src/gallium/docs/source/tgsi.rst | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst
index c65d721dec..76c82b3e88 100644
--- a/src/gallium/docs/source/tgsi.rst
+++ b/src/gallium/docs/source/tgsi.rst
@@ -26,7 +26,13 @@ each of the components of *dst*. When this happens, the result is said to be
Modifiers
^^^^^^^^^^^^^^^

-TGSI supports modifiers on inputs (as well as saturate modifier on instructions).
+TGSI supports modifiers on inputs (as well as saturate and precise modifier
+on instructions).
+
+For arithmetic instruction having a precise modifier certain optimizations
+which may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be
+optimized to TGSI_OPCODE_MAD, because some hardware only supports the fused
+MAD instruction.

For inputs which have a floating point type, both absolute value and
negation modifiers are supported (with absolute value being applied
--
2.13.1
Roland Scheidegger
2017-06-16 21:45:34 UTC
Reply
Permalink
Raw Message
While you're at it, you could also add something to TGSI_OPCODE_MAD
itself, that it can be either fused or unfused, whatever is fastest. But
either way,
Post by Karol Herbst
---
src/gallium/docs/source/tgsi.rst | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst
index c65d721dec..76c82b3e88 100644
--- a/src/gallium/docs/source/tgsi.rst
+++ b/src/gallium/docs/source/tgsi.rst
@@ -26,7 +26,13 @@ each of the components of *dst*. When this happens, the result is said to be
Modifiers
^^^^^^^^^^^^^^^
-TGSI supports modifiers on inputs (as well as saturate modifier on instructions).
+TGSI supports modifiers on inputs (as well as saturate and precise modifier
+on instructions).
+
+For arithmetic instruction having a precise modifier certain optimizations
+which may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be
+optimized to TGSI_OPCODE_MAD, because some hardware only supports the fused
+MAD instruction.
For inputs which have a floating point type, both absolute value and
negation modifiers are supported (with absolute value being applied
Karol Herbst
2017-06-16 19:08:18 UTC
Reply
Permalink
Raw Message
v2: use str_match_no_case to fix _SAT_PRECISE detection

Signed-off-by: Karol Herbst <***@gmail.com>
---
src/gallium/auxiliary/tgsi/tgsi_text.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c
index 93a05568f4..70ec0e4bc9 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -999,6 +999,7 @@ parse_texoffset_operand(
static boolean
match_inst(const char **pcur,
unsigned *saturate,
+ unsigned *precise,
const struct tgsi_opcode_info *info)
{
const char *cur = *pcur;
@@ -1007,16 +1008,24 @@ match_inst(const char **pcur,
if (str_match_nocase_whole(&cur, info->mnemonic)) {
*pcur = cur;
*saturate = 0;
+ *precise = 0;
return TRUE;
}

if (str_match_no_case(&cur, info->mnemonic)) {
/* the instruction has a suffix, figure it out */
- if (str_match_nocase_whole(&cur, "_SAT")) {
+ if (str_match_no_case(&cur, "_SAT")) {
*pcur = cur;
*saturate = 1;
- return TRUE;
}
+
+ if (str_match_no_case(&cur, "_PRECISE")) {
+ *pcur = cur;
+ *precise = 1;
+ }
+
+ if (*precise || *saturate)
+ return TRUE;
}

return FALSE;
@@ -1029,6 +1038,7 @@ parse_instruction(
{
uint i;
uint saturate = 0;
+ uint precise = 0;
const struct tgsi_opcode_info *info;
struct tgsi_full_instruction inst;
const char *cur;
@@ -1043,7 +1053,7 @@ parse_instruction(
cur = ctx->cur;

info = tgsi_get_opcode_info( i );
- if (match_inst(&cur, &saturate, info)) {
+ if (match_inst(&cur, &saturate, &precise, info)) {
if (info->num_dst + info->num_src + info->is_tex == 0) {
ctx->cur = cur;
break;
@@ -1064,6 +1074,7 @@ parse_instruction(

inst.Instruction.Opcode = i;
inst.Instruction.Saturate = saturate;
+ inst.Instruction.Precise = precise;
inst.Instruction.NumDstRegs = info->num_dst;
inst.Instruction.NumSrcRegs = info->num_src;
--
2.13.1
Nicolai Hähnle
2017-06-20 10:22:18 UTC
Reply
Permalink
Raw Message
Post by Karol Herbst
v2: use str_match_no_case to fix _SAT_PRECISE detection
---
src/gallium/auxiliary/tgsi/tgsi_text.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c
index 93a05568f4..70ec0e4bc9 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -999,6 +999,7 @@ parse_texoffset_operand(
static boolean
match_inst(const char **pcur,
unsigned *saturate,
+ unsigned *precise,
const struct tgsi_opcode_info *info)
{
const char *cur = *pcur;
@@ -1007,16 +1008,24 @@ match_inst(const char **pcur,
if (str_match_nocase_whole(&cur, info->mnemonic)) {
*pcur = cur;
*saturate = 0;
+ *precise = 0;
return TRUE;
}
if (str_match_no_case(&cur, info->mnemonic)) {
/* the instruction has a suffix, figure it out */
- if (str_match_nocase_whole(&cur, "_SAT")) {
+ if (str_match_no_case(&cur, "_SAT")) {
*pcur = cur;
*saturate = 1;
- return TRUE;
}
+
+ if (str_match_no_case(&cur, "_PRECISE")) {
+ *pcur = cur;
+ *precise = 1;
+ }
+
+ if (*precise || *saturate)
+ return TRUE;
I think this should instead be

if (!is_digit_alpha_underscore(cur))
return TRUE;

to mirror what str_match_nocase_whole does. Otherwise, we wouldn't flag
garbage after the suffixes.
Post by Karol Herbst
}
return FALSE;
@@ -1029,6 +1038,7 @@ parse_instruction(
{
uint i;
uint saturate = 0;
+ uint precise = 0;
const struct tgsi_opcode_info *info;
struct tgsi_full_instruction inst;
const char *cur;
@@ -1043,7 +1053,7 @@ parse_instruction(
cur = ctx->cur;
info = tgsi_get_opcode_info( i );
- if (match_inst(&cur, &saturate, info)) {
+ if (match_inst(&cur, &saturate, &precise, info)) {
if (info->num_dst + info->num_src + info->is_tex == 0) {
ctx->cur = cur;
break;
@@ -1064,6 +1074,7 @@ parse_instruction(
inst.Instruction.Opcode = i;
inst.Instruction.Saturate = saturate;
+ inst.Instruction.Precise = precise;
inst.Instruction.NumDstRegs = info->num_dst;
inst.Instruction.NumSrcRegs = info->num_src;
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
Karol Herbst
2017-06-20 11:00:25 UTC
Reply
Permalink
Raw Message
Post by Nicolai Hähnle
Post by Karol Herbst
v2: use str_match_no_case to fix _SAT_PRECISE detection
---
src/gallium/auxiliary/tgsi/tgsi_text.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c
b/src/gallium/auxiliary/tgsi/tgsi_text.c
index 93a05568f4..70ec0e4bc9 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -999,6 +999,7 @@ parse_texoffset_operand(
static boolean
match_inst(const char **pcur,
unsigned *saturate,
+ unsigned *precise,
const struct tgsi_opcode_info *info)
{
const char *cur = *pcur;
@@ -1007,16 +1008,24 @@ match_inst(const char **pcur,
if (str_match_nocase_whole(&cur, info->mnemonic)) {
*pcur = cur;
*saturate = 0;
+ *precise = 0;
return TRUE;
}
if (str_match_no_case(&cur, info->mnemonic)) {
/* the instruction has a suffix, figure it out */
- if (str_match_nocase_whole(&cur, "_SAT")) {
+ if (str_match_no_case(&cur, "_SAT")) {
*pcur = cur;
*saturate = 1;
- return TRUE;
}
+
+ if (str_match_no_case(&cur, "_PRECISE")) {
+ *pcur = cur;
+ *precise = 1;
+ }
+
+ if (*precise || *saturate)
+ return TRUE;
I think this should instead be
if (!is_digit_alpha_underscore(cur))
return TRUE;
to mirror what str_match_nocase_whole does. Otherwise, we wouldn't flag
garbage after the suffixes.
actually there is a check somewhere that after the instructions there
has to be a whitespace, but I didn't investigate it. But I'll see if I
can change it in a way so that we don't depend on later checks.
thanks for reviewing!
Post by Nicolai Hähnle
Post by Karol Herbst
}
return FALSE;
@@ -1029,6 +1038,7 @@ parse_instruction(
{
uint i;
uint saturate = 0;
+ uint precise = 0;
const struct tgsi_opcode_info *info;
struct tgsi_full_instruction inst;
const char *cur;
@@ -1043,7 +1053,7 @@ parse_instruction(
cur = ctx->cur;
info = tgsi_get_opcode_info( i );
- if (match_inst(&cur, &saturate, info)) {
+ if (match_inst(&cur, &saturate, &precise, info)) {
if (info->num_dst + info->num_src + info->is_tex == 0) {
ctx->cur = cur;
break;
@@ -1064,6 +1074,7 @@ parse_instruction(
inst.Instruction.Opcode = i;
inst.Instruction.Saturate = saturate;
+ inst.Instruction.Precise = precise;
inst.Instruction.NumDstRegs = info->num_dst;
inst.Instruction.NumSrcRegs = info->num_src;
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
Karol Herbst
2017-06-16 19:08:22 UTC
Reply
Permalink
Raw Message
Signed-off-by: Karol Herbst <***@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 ++
1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 1264dd4834..c633185893 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -3179,6 +3179,7 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn)
geni->subOp = tgsi::opcodeToSubOp(tgsi.getOpcode());
if (op == OP_MUL && dstTy == TYPE_F32)
geni->dnz = info->io.mul_zero_wins;
+ geni->precise = insn->Instruction.Precise;
}
break;
case TGSI_OPCODE_MAD:
@@ -3192,6 +3193,7 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn)
geni = mkOp3(op, dstTy, dst0[c], src0, src1, src2);
if (dstTy == TYPE_F32)
geni->dnz = info->io.mul_zero_wins;
+ geni->precise = insn->Instruction.Precise;
}
break;
case TGSI_OPCODE_MOV:
--
2.13.1
Karol Herbst
2017-06-16 19:08:23 UTC
Reply
Permalink
Raw Message
fixes
missrendering in TombRaider
KHR-GL44.gpu_shader5.precise_qualifier
KHR-GL45.gpu_shader5.precise_qualifier

Signed-off-by: Karol Herbst <***@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 4c92a1efb5..85f3f44832 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1669,6 +1669,10 @@ AlgebraicOpt::handleABS(Instruction *abs)
bool
AlgebraicOpt::handleADD(Instruction *add)
{
+ // we can't optimize to SAD/MAD if the instruction is tagged as precise
+ if (add->precise)
+ return false;
+
Value *src0 = add->getSrc(0);
Value *src1 = add->getSrc(1);

@@ -1712,7 +1716,7 @@ AlgebraicOpt::tryADDToMADOrSAD(Instruction *add, operation toOp)
return false;

if (src->getInsn()->saturate || src->getInsn()->postFactor ||
- src->getInsn()->dnz)
+ src->getInsn()->dnz || src->getInsn()->precise)
return false;

if (toOp == OP_SAD) {
--
2.13.1
Karol Herbst
2017-06-16 19:08:20 UTC
Reply
Permalink
Raw Message
Signed-off-by: Karol Herbst <***@gmail.com>
---
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 6cc5a39510..6ac267be94 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -1557,7 +1557,7 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir)

/* Quick peephole: Emit MAD(a, b, c) instead of ADD(MUL(a, b), c)
*/
- if (ir->operation == ir_binop_add) {
+ if (!this->precise && ir->operation == ir_binop_add) {
if (try_emit_mad(ir, 1))
return;
if (try_emit_mad(ir, 0))
--
2.13.1
Nicolai Hähnle
2017-06-20 10:23:09 UTC
Reply
Permalink
Raw Message
Post by Karol Herbst
---
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 6cc5a39510..6ac267be94 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -1557,7 +1557,7 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir)
/* Quick peephole: Emit MAD(a, b, c) instead of ADD(MUL(a, b), c)
*/
- if (ir->operation == ir_binop_add) {
+ if (!this->precise && ir->operation == ir_binop_add) {
if (try_emit_mad(ir, 1))
return;
if (try_emit_mad(ir, 0))
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
Karol Herbst
2017-06-16 19:08:21 UTC
Reply
Permalink
Raw Message
Signed-off-by: Karol Herbst <***@gmail.com>
---
src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index 5c09fed05c..6835c4fa8c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -884,6 +884,7 @@ public:
unsigned perPatch : 1;
unsigned exit : 1; // terminate program after insn
unsigned mask : 4; // for vector ops
+ unsigned precise : 1; // prevent algebraic optimisations like mul+add to mad

int8_t postFactor; // MUL/DIV(if < 0) by 1 << postFactor
--
2.13.1
Karol Herbst
2017-06-16 19:08:15 UTC
Reply
Permalink
Raw Message
Signed-off-by: Karol Herbst <***@gmail.com>
Reviewed-by: Nicolai Hähnle <***@amd.com>
---
src/gallium/auxiliary/tgsi/tgsi_dump.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c b/src/gallium/auxiliary/tgsi/tgsi_dump.c
index f6eba7424b..b58e64511c 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_dump.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c
@@ -584,6 +584,10 @@ iter_instruction(
TXT( "_SAT" );
}

+ if (inst->Instruction.Precise) {
+ TXT( "_PRECISE" );
+ }
+
for (i = 0; i < inst->Instruction.NumDstRegs; i++) {
const struct tgsi_full_dst_register *dst = &inst->Dst[i];
--
2.13.1
Loading...