[Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

Discussion:

Gert Wollny

2017-06-09 23:15:05 UTC

Dear all,

as I wrote before, I was looking into the temporary register renaming.

This series of patches implements a new approach that achieves a tigher
estimation of the life time of the temporaries, and as a result the Piano
and Voloplosion benchmarks implemented in gputest [1] now work. Before
they failed with "r600_pipe_shader_create - translation from TGSI failed!"

Piglit shows 7 fixes and 6 regressions compared to git 8fac894f, but they don't
seem to be related to shaders. I've also tested other programs like the unignie-*
benchmarks and they didn't show regressions.

I think that the patch will need a few more iterations to remove code duplication
and generally adhere to the mesa style, but I think it is atthe point where I could
need a bit of feedback to get it into shape to be acceptable, and I'd also like to
mention that since I'm new to mesa this I have no commit rights.

many thanks,
Gert

[1] http://www.geeks3d.com/gputest/

Gert Wollny (3):
mesa/st: glsl_to_tgsi move some helper classes to extra files
mesa/st: glsl_to_tgsi Implement a new lifetime tracker for temporaries
mesa/st: glsl_to_tgsi: tie in the new register renaming approach

configure.ac | 1 +
src/mesa/Makefile.am | 4 +-
src/mesa/Makefile.sources | 4 +
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 302 +-------
src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp | 241 +++++++
src/mesa/state_tracker/st_glsl_to_tgsi_private.h | 135 ++++
.../state_tracker/st_glsl_to_tgsi_temprename.cpp | 551 ++++++++++++++
.../state_tracker/st_glsl_to_tgsi_temprename.h | 114 +++
src/mesa/state_tracker/tests/Makefile.am | 40 ++
src/mesa/state_tracker/tests/st-renumerate-test | 210 ++++++
.../tests/test_glsl_to_tgsi_lifetime.cpp | 789 +++++++++++++++++++++
11 files changed, 2104 insertions(+), 287 deletions(-)
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.h
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h
create mode 100644 src/mesa/state_tracker/tests/Makefile.am
create mode 100755 src/mesa/state_tracker/tests/st-renumerate-test
create mode 100644 src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp

--
2.13.0

Gert Wollny

2017-06-09 23:15:06 UTC

Permalink

To prepare the implementation of a temp register lifetime tracker
some of the classes are moved into seperate header/implementation
files to make them accessible from other files.
---
src/mesa/Makefile.sources | 2 +
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 287 +--------------------
src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp | 241 +++++++++++++++++
src/mesa/state_tracker/st_glsl_to_tgsi_private.h | 135 ++++++++++
4 files changed, 381 insertions(+), 284 deletions(-)
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.h

diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources
index 8a65fbe663..4450d80090 100644
--- a/src/mesa/Makefile.sources
+++ b/src/mesa/Makefile.sources
@@ -505,6 +505,8 @@ STATETRACKER_FILES = \
state_tracker/st_glsl_to_nir.cpp \
state_tracker/st_glsl_to_tgsi.cpp \
state_tracker/st_glsl_to_tgsi.h \
+ state_tracker/st_glsl_to_tgsi_private.cpp \
+ state_tracker/st_glsl_to_tgsi_private.h \
state_tracker/st_glsl_types.cpp \
state_tracker/st_glsl_types.h \
state_tracker/st_manager.c \
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index c5d2e0fcd2..0e7f4b646a 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -55,6 +55,7 @@
#include "st_glsl_types.h"
#include "st_nir.h"
#include "st_shader_cache.h"
+#include "st_glsl_to_tgsi_private.h"

#include "util/hash_table.h"
#include <algorithm>
@@ -65,251 +66,8 @@

#define MAX_GLSL_TEXTURE_OFFSET 4

-class st_src_reg;
-class st_dst_reg;
+extern int swizzle_for_size(int size);

-static int swizzle_for_size(int size);
-
-static int swizzle_for_type(const glsl_type *type, int component = 0)
-{
- unsigned num_elements = 4;
-
- if (type) {
- type = type->without_array();
- if (type->is_scalar() || type->is_vector() || type->is_matrix())
- num_elements = type->vector_elements;
- }
-
- int swizzle = swizzle_for_size(num_elements);
- assert(num_elements + component <= 4);
-
- swizzle += component * MAKE_SWIZZLE4(1, 1, 1, 1);
- return swizzle;
-}
-
-/**
- * This struct is a corresponding struct to TGSI ureg_src.
- */
-class st_src_reg {
-public:
- st_src_reg(gl_register_file file, int index, const glsl_type *type,
- int component = 0, unsigned array_id = 0)
- {
- assert(file != PROGRAM_ARRAY || array_id != 0);
- this->file = file;
- this->index = index;
- this->swizzle = swizzle_for_type(type, component);
- this->negate = 0;
- this->abs = 0;
- this->index2D = 0;
- this->type = type ? type->base_type : GLSL_TYPE_ERROR;
- this->reladdr = NULL;
- this->reladdr2 = NULL;
- this->has_index2 = false;
- this->double_reg2 = false;
- this->array_id = array_id;
- this->is_double_vertex_input = false;
- }
-
- st_src_reg(gl_register_file file, int index, enum glsl_base_type type)
- {
- assert(file != PROGRAM_ARRAY); /* need array_id > 0 */
- this->type = type;
- this->file = file;
- this->index = index;
- this->index2D = 0;
- this->swizzle = SWIZZLE_XYZW;
- this->negate = 0;
- this->abs = 0;
- this->reladdr = NULL;
- this->reladdr2 = NULL;
- this->has_index2 = false;
- this->double_reg2 = false;
- this->array_id = 0;
- this->is_double_vertex_input = false;
- }
-
- st_src_reg(gl_register_file file, int index, enum glsl_base_type type, int index2D)
- {
- assert(file != PROGRAM_ARRAY); /* need array_id > 0 */
- this->type = type;
- this->file = file;
- this->index = index;
- this->index2D = index2D;
- this->swizzle = SWIZZLE_XYZW;
- this->negate = 0;
- this->abs = 0;
- this->reladdr = NULL;
- this->reladdr2 = NULL;
- this->has_index2 = false;
- this->double_reg2 = false;
- this->array_id = 0;
- this->is_double_vertex_input = false;
- }
-
- st_src_reg()
- {
- this->type = GLSL_TYPE_ERROR;
- this->file = PROGRAM_UNDEFINED;
- this->index = 0;
- this->index2D = 0;
- this->swizzle = 0;
- this->negate = 0;
- this->abs = 0;
- this->reladdr = NULL;
- this->reladdr2 = NULL;
- this->has_index2 = false;
- this->double_reg2 = false;
- this->array_id = 0;
- this->is_double_vertex_input = false;
- }
-
- explicit st_src_reg(st_dst_reg reg);
-
- int32_t index; /**< temporary index, VERT_ATTRIB_*, VARYING_SLOT_*, etc. */
- int16_t index2D;
- uint16_t swizzle; /**< SWIZZLE_XYZWONEZERO swizzles from Mesa. */
- int negate:4; /**< NEGATE_XYZW mask from mesa */
- unsigned abs:1;
- enum glsl_base_type type:5; /** GLSL_TYPE_* from GLSL IR (enum glsl_base_type) */
- unsigned has_index2:1;
- gl_register_file file:5; /**< PROGRAM_* from Mesa */
- /*
- * Is this the second half of a double register pair?
- * currently used for input mapping only.
- */
- unsigned double_reg2:1;
- unsigned is_double_vertex_input:1;
- unsigned array_id:10;
-
- /** Register index should be offset by the integer in this reg. */
- st_src_reg *reladdr;
- st_src_reg *reladdr2;
-
- st_src_reg get_abs()
- {
- st_src_reg reg = *this;
- reg.negate = 0;
- reg.abs = 1;
- return reg;
- }
-};
-
-class st_dst_reg {
-public:
- st_dst_reg(gl_register_file file, int writemask, enum glsl_base_type type, int index)
- {
- assert(file != PROGRAM_ARRAY); /* need array_id > 0 */
- this->file = file;
- this->index = index;
- this->index2D = 0;
- this->writemask = writemask;
- this->reladdr = NULL;
- this->reladdr2 = NULL;
- this->has_index2 = false;
- this->type = type;
- this->array_id = 0;
- }
-
- st_dst_reg(gl_register_file file, int writemask, enum glsl_base_type type)
- {
- assert(file != PROGRAM_ARRAY); /* need array_id > 0 */
- this->file = file;
- this->index = 0;
- this->index2D = 0;
- this->writemask = writemask;
- this->reladdr = NULL;
- this->reladdr2 = NULL;
- this->has_index2 = false;
- this->type = type;
- this->array_id = 0;
- }
-
- st_dst_reg()
- {
- this->type = GLSL_TYPE_ERROR;
- this->file = PROGRAM_UNDEFINED;
- this->index = 0;
- this->index2D = 0;
- this->writemask = 0;
- this->reladdr = NULL;
- this->reladdr2 = NULL;
- this->has_index2 = false;
- this->array_id = 0;
- }
-
- explicit st_dst_reg(st_src_reg reg);
-
- int32_t index; /**< temporary index, VERT_ATTRIB_*, VARYING_SLOT_*, etc. */
- int16_t index2D;
- gl_register_file file:5; /**< PROGRAM_* from Mesa */
- unsigned writemask:4; /**< Bitfield of WRITEMASK_[XYZW] */
- enum glsl_base_type type:5; /** GLSL_TYPE_* from GLSL IR (enum glsl_base_type) */
- unsigned has_index2:1;
- unsigned array_id:10;
-
- /** Register index should be offset by the integer in this reg. */
- st_src_reg *reladdr;
- st_src_reg *reladdr2;
-};
-
-st_src_reg::st_src_reg(st_dst_reg reg)
-{
- this->type = reg.type;
- this->file = reg.file;
- this->index = reg.index;
- this->swizzle = SWIZZLE_XYZW;
- this->negate = 0;
- this->abs = 0;
- this->reladdr = reg.reladdr;
- this->index2D = reg.index2D;
- this->reladdr2 = reg.reladdr2;
- this->has_index2 = reg.has_index2;
- this->double_reg2 = false;
- this->array_id = reg.array_id;
- this->is_double_vertex_input = false;
-}
-
-st_dst_reg::st_dst_reg(st_src_reg reg)
-{
- this->type = reg.type;
- this->file = reg.file;
- this->index = reg.index;
- this->writemask = WRITEMASK_XYZW;
- this->reladdr = reg.reladdr;
- this->index2D = reg.index2D;
- this->reladdr2 = reg.reladdr2;
- this->has_index2 = reg.has_index2;
- this->array_id = reg.array_id;
-}
-
-class glsl_to_tgsi_instruction : public exec_node {
-public:
- DECLARE_RALLOC_CXX_OPERATORS(glsl_to_tgsi_instruction)
-
- st_dst_reg dst[2];
- st_src_reg src[4];
- st_src_reg resource; /**< sampler or buffer register */
- st_src_reg *tex_offsets;
-
- /** Pointer to the ir source this tree came from for debugging */
- ir_instruction *ir;
-
- unsigned op:8; /**< TGSI opcode */
- unsigned saturate:1;
- unsigned is_64bit_expanded:1;
- unsigned sampler_base:5;
- unsigned sampler_array_size:6; /**< 1-based size of sampler array, 1 if not array */
- unsigned tex_target:4; /**< One of TEXTURE_*_INDEX */
- glsl_base_type tex_type:5;
- unsigned tex_shadow:1;
- unsigned image_format:9;
- unsigned tex_offset_num_offset:3;
- unsigned dead_mask:4; /**< Used in dead code elimination */
- unsigned buffer_access:3; /**< buffer access type */
-
- const struct tgsi_opcode_info *info;
-};

class variable_storage {
DECLARE_RZALLOC_CXX_OPERATORS(variable_storage)
@@ -390,11 +148,6 @@ find_array_type(struct inout_decl *decls, unsigned count, unsigned array_id)
return GLSL_TYPE_ERROR;
}

-struct rename_reg_pair {
- bool valid;
- int new_reg;
-};
-
struct glsl_to_tgsi_visitor : public ir_visitor {
public:
glsl_to_tgsi_visitor();
@@ -597,7 +350,7 @@ fail_link(struct gl_shader_program *prog, const char *fmt, ...)
prog->data->LinkStatus = linking_failure;
}

-static int
+int
swizzle_for_size(int size)
{
static const int size_swizzles[4] = {
@@ -611,40 +364,6 @@ swizzle_for_size(int size)
return size_swizzles[size - 1];
}

-static bool
-is_resource_instruction(unsigned opcode)
-{
- switch (opcode) {
- case TGSI_OPCODE_RESQ:
- case TGSI_OPCODE_LOAD:
- case TGSI_OPCODE_ATOMUADD:
- case TGSI_OPCODE_ATOMXCHG:
- case TGSI_OPCODE_ATOMCAS:
- case TGSI_OPCODE_ATOMAND:
- case TGSI_OPCODE_ATOMOR:
- case TGSI_OPCODE_ATOMXOR:
- case TGSI_OPCODE_ATOMUMIN:
- case TGSI_OPCODE_ATOMUMAX:
- case TGSI_OPCODE_ATOMIMIN:
- case TGSI_OPCODE_ATOMIMAX:
- return true;
- default:
- return false;
- }
-}
-
-static unsigned
-num_inst_dst_regs(const glsl_to_tgsi_instruction *op)
-{
- return op->info->num_dst;
-}
-
-static unsigned
-num_inst_src_regs(const glsl_to_tgsi_instruction *op)
-{
- return op->info->is_tex || is_resource_instruction(op->op) ?
- op->info->num_src - 1 : op->info->num_src;
-}

glsl_to_tgsi_instruction *
glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op,
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp
new file mode 100644
index 0000000000..337f21cf79
--- /dev/null
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp
@@ -0,0 +1,241 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ * Copyright © 2011 Bryan Cain
+ * Copyright © 2017 Gert Wollny
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "st_glsl_to_tgsi_private.h"
+#include <tgsi/tgsi_info.h>
+#include <mesa/program/prog_instruction.h>
+
+using std::vector;
+
+extern int swizzle_for_size(int size);
+
+static int swizzle_for_type(const glsl_type *type, int component = 0)
+{
+ unsigned num_elements = 4;
+
+ if (type) {
+ type = type->without_array();
+ if (type->is_scalar() || type->is_vector() || type->is_matrix())
+ num_elements = type->vector_elements;
+ }
+
+ int swizzle = swizzle_for_size(num_elements);
+ assert(num_elements + component <= 4);
+
+ swizzle += component * MAKE_SWIZZLE4(1, 1, 1, 1);
+ return swizzle;
+}
+
+
+
+st_src_reg::st_src_reg(gl_register_file file, int index, const glsl_type *type,
+ int component, unsigned array_id)
+{
+ assert(file != PROGRAM_ARRAY || array_id != 0);
+ this->file = file;
+ this->index = index;
+ this->swizzle = swizzle_for_type(type, component);
+ this->negate = 0;
+ this->abs = 0;
+ this->index2D = 0;
+ this->type = type ? type->base_type : GLSL_TYPE_ERROR;
+ this->reladdr = NULL;
+ this->reladdr2 = NULL;
+ this->has_index2 = false;
+ this->double_reg2 = false;
+ this->array_id = array_id;
+ this->is_double_vertex_input = false;
+}
+
+st_src_reg::st_src_reg(gl_register_file file, int index, enum glsl_base_type type)
+{
+ assert(file != PROGRAM_ARRAY); /* need array_id > 0 */
+ this->type = type;
+ this->file = file;
+ this->index = index;
+ this->index2D = 0;
+ this->swizzle = SWIZZLE_XYZW;
+ this->negate = 0;
+ this->abs = 0;
+ this->reladdr = NULL;
+ this->reladdr2 = NULL;
+ this->has_index2 = false;
+ this->double_reg2 = false;
+ this->array_id = 0;
+ this->is_double_vertex_input = false;
+}
+
+st_src_reg::st_src_reg(gl_register_file file, int index, enum glsl_base_type type, int index2D)
+{
+ assert(file != PROGRAM_ARRAY); /* need array_id > 0 */
+ this->type = type;
+ this->file = file;
+ this->index = index;
+ this->index2D = index2D;
+ this->swizzle = SWIZZLE_XYZW;
+ this->negate = 0;
+ this->abs = 0;
+ this->reladdr = NULL;
+ this->reladdr2 = NULL;
+ this->has_index2 = false;
+ this->double_reg2 = false;
+ this->array_id = 0;
+ this->is_double_vertex_input = false;
+}
+
+st_src_reg::st_src_reg()
+{
+ this->type = GLSL_TYPE_ERROR;
+ this->file = PROGRAM_UNDEFINED;
+ this->index = 0;
+ this->index2D = 0;
+ this->swizzle = 0;
+ this->negate = 0;
+ this->abs = 0;
+ this->reladdr = NULL;
+ this->reladdr2 = NULL;
+ this->has_index2 = false;
+ this->double_reg2 = false;
+ this->array_id = 0;
+ this->is_double_vertex_input = false;
+}
+
+
+st_src_reg st_src_reg::get_abs()
+{
+ st_src_reg reg = *this;
+ reg.negate = 0;
+ reg.abs = 1;
+ return reg;
+}
+
+st_src_reg::st_src_reg(st_dst_reg reg)
+{
+ this->type = reg.type;
+ this->file = reg.file;
+ this->index = reg.index;
+ this->swizzle = SWIZZLE_XYZW;
+ this->negate = 0;
+ this->abs = 0;
+ this->reladdr = reg.reladdr;
+ this->index2D = reg.index2D;
+ this->reladdr2 = reg.reladdr2;
+ this->has_index2 = reg.has_index2;
+ this->double_reg2 = false;
+ this->array_id = reg.array_id;
+ this->is_double_vertex_input = false;
+}
+
+st_dst_reg::st_dst_reg(st_src_reg reg)
+{
+ this->type = reg.type;
+ this->file = reg.file;
+ this->index = reg.index;
+ this->writemask = WRITEMASK_XYZW;
+ this->reladdr = reg.reladdr;
+ this->index2D = reg.index2D;
+ this->reladdr2 = reg.reladdr2;
+ this->has_index2 = reg.has_index2;
+ this->array_id = reg.array_id;
+}
+
+
+st_dst_reg::st_dst_reg(gl_register_file file, int writemask, enum glsl_base_type type, int index)
+{
+ assert(file != PROGRAM_ARRAY); /* need array_id > 0 */
+ this->file = file;
+ this->index = index;
+ this->index2D = 0;
+ this->writemask = writemask;
+ this->reladdr = NULL;
+ this->reladdr2 = NULL;
+ this->has_index2 = false;
+ this->type = type;
+ this->array_id = 0;
+}
+
+
+st_dst_reg::st_dst_reg(gl_register_file file, int writemask, enum glsl_base_type type)
+{
+ assert(file != PROGRAM_ARRAY); /* need array_id > 0 */
+ this->file = file;
+ this->index = 0;
+ this->index2D = 0;
+ this->writemask = writemask;
+ this->reladdr = NULL;
+ this->reladdr2 = NULL;
+ this->has_index2 = false;
+ this->type = type;
+ this->array_id = 0;
+}
+
+st_dst_reg::st_dst_reg()
+{
+ this->type = GLSL_TYPE_ERROR;
+ this->file = PROGRAM_UNDEFINED;
+ this->index = 0;
+ this->index2D = 0;
+ this->writemask = 0;
+ this->reladdr = NULL;
+ this->reladdr2 = NULL;
+ this->has_index2 = false;
+ this->array_id = 0;
+}
+
+bool
+is_resource_instruction(unsigned opcode)
+{
+ switch (opcode) {
+ case TGSI_OPCODE_RESQ:
+ case TGSI_OPCODE_LOAD:
+ case TGSI_OPCODE_ATOMUADD:
+ case TGSI_OPCODE_ATOMXCHG:
+ case TGSI_OPCODE_ATOMCAS:
+ case TGSI_OPCODE_ATOMAND:
+ case TGSI_OPCODE_ATOMOR:
+ case TGSI_OPCODE_ATOMXOR:
+ case TGSI_OPCODE_ATOMUMIN:
+ case TGSI_OPCODE_ATOMUMAX:
+ case TGSI_OPCODE_ATOMIMIN:
+ case TGSI_OPCODE_ATOMIMAX:
+ return true;
+ default:
+ return false;
+ }
+}
+
+unsigned
+num_inst_dst_regs(const glsl_to_tgsi_instruction *op)
+{
+ return op->info->num_dst;
+}
+
+unsigned
+num_inst_src_regs(const glsl_to_tgsi_instruction *op)
+{
+ return op->info->is_tex || is_resource_instruction(op->op) ?
+ op->info->num_src - 1 : op->info->num_src;
+}
+
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi_private.h b/src/mesa/state_tracker/st_glsl_to_tgsi_private.h
new file mode 100644
index 0000000000..59697badff
--- /dev/null
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi_private.h
@@ -0,0 +1,135 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ * Copyright © 2011 Bryan Cain
+ * Copyright © 2017 Gert Wollny
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <mesa/main/mtypes.h>
+#include <compiler/glsl_types.h>
+#include <compiler/glsl/ir.h>
+#include <tgsi/tgsi_info.h>
+#include <stack>
+#include <vector>
+
+class st_dst_reg;
+
+/**
+ * This struct is a corresponding struct to TGSI ureg_src.
+ */
+class st_src_reg {
+public:
+ st_src_reg(gl_register_file file, int index, const glsl_type *type,
+ int component = 0, unsigned array_id = 0);
+
+ st_src_reg(gl_register_file file, int index, enum glsl_base_type type);
+
+ st_src_reg(gl_register_file file, int index, enum glsl_base_type type, int index2D);
+
+ st_src_reg();
+
+ explicit st_src_reg(st_dst_reg reg);
+
+ st_src_reg get_abs();
+
+ int32_t index; /**< temporary index, VERT_ATTRIB_*, VARYING_SLOT_*, etc. */
+ int16_t index2D;
+
+ uint16_t swizzle; /**< SWIZZLE_XYZWONEZERO swizzles from Mesa. */
+ int negate:4; /**< NEGATE_XYZW mask from mesa */
+ unsigned abs:1;
+ enum glsl_base_type type:5; /** GLSL_TYPE_* from GLSL IR (enum glsl_base_type) */
+ unsigned has_index2:1;
+ gl_register_file file:5; /**< PROGRAM_* from Mesa */
+ /*
+ * Is this the second half of a double register pair?
+ * currently used for input mapping only.
+ */
+ unsigned double_reg2:1;
+ unsigned is_double_vertex_input:1;
+ unsigned array_id:10;
+ /** Register index should be offset by the integer in this reg. */
+ st_src_reg *reladdr;
+ st_src_reg *reladdr2;
+
+};
+
+class st_dst_reg {
+public:
+ st_dst_reg(gl_register_file file, int writemask, enum glsl_base_type type, int index);
+
+ st_dst_reg(gl_register_file file, int writemask, enum glsl_base_type type);
+
+ st_dst_reg();
+
+ explicit st_dst_reg(st_src_reg reg);
+
+ int32_t index; /**< temporary index, VERT_ATTRIB_*, VARYING_SLOT_*, etc. */
+ int16_t index2D;
+ gl_register_file file:5; /**< PROGRAM_* from Mesa */
+ unsigned writemask:4; /**< Bitfield of WRITEMASK_[XYZW] */
+ enum glsl_base_type type:5; /** GLSL_TYPE_* from GLSL IR (enum glsl_base_type) */
+ unsigned has_index2:1;
+ unsigned array_id:10;
+
+ /** Register index should be offset by the integer in this reg. */
+ st_src_reg *reladdr;
+ st_src_reg *reladdr2;
+};
+
+class glsl_to_tgsi_instruction : public exec_node {
+public:
+ DECLARE_RALLOC_CXX_OPERATORS(glsl_to_tgsi_instruction)
+
+ st_dst_reg dst[2];
+ st_src_reg src[4];
+ st_src_reg resource; /**< sampler or buffer register */
+ st_src_reg *tex_offsets;
+
+ /** Pointer to the ir source this tree came from for debugging */
+ ir_instruction *ir;
+
+ unsigned op:8; /**< TGSI opcode */
+ unsigned saturate:1;
+ unsigned is_64bit_expanded:1;
+ unsigned sampler_base:5;
+ unsigned sampler_array_size:6; /**< 1-based size of sampler array, 1 if not array */
+ unsigned tex_target:4; /**< One of TEXTURE_*_INDEX */
+ glsl_base_type tex_type:5;
+ unsigned tex_shadow:1;
+ unsigned image_format:9;
+ unsigned tex_offset_num_offset:3;
+ unsigned dead_mask:4; /**< Used in dead code elimination */
+ unsigned buffer_access:3; /**< buffer access type */
+
+ const struct tgsi_opcode_info *info;
+};
+
+struct rename_reg_pair {
+ bool valid;
+ int new_reg;
+};
+
+extern bool is_resource_instruction(unsigned opcode);
+extern unsigned num_inst_dst_regs(const glsl_to_tgsi_instruction *op);
+extern unsigned num_inst_src_regs(const glsl_to_tgsi_instruction *op);
+
+

--
2.13.0

Gert Wollny

2017-06-09 23:15:07 UTC

Permalink

This patch adds new classes and tests to implement a tracker for the
life time of temporary registers for the register renaming stage of
glsl_to_tgsi. The tracker aims at estimating the shortest possible
life time for each register. The code base requires c++11, the flag is
propagated from the LLVM_CXXFLAGS.
---
configure.ac | 1 +
src/mesa/Makefile.am | 4 +-
src/mesa/Makefile.sources | 2 +
.../state_tracker/st_glsl_to_tgsi_temprename.cpp | 551 ++++++++++++++
.../state_tracker/st_glsl_to_tgsi_temprename.h | 114 +++
src/mesa/state_tracker/tests/Makefile.am | 40 ++
src/mesa/state_tracker/tests/st-renumerate-test | 210 ++++++
.../tests/test_glsl_to_tgsi_lifetime.cpp | 789 +++++++++++++++++++++
8 files changed, 1709 insertions(+), 2 deletions(-)
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h
create mode 100644 src/mesa/state_tracker/tests/Makefile.am
create mode 100755 src/mesa/state_tracker/tests/st-renumerate-test
create mode 100644 src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp

diff --git a/configure.ac b/configure.ac
index f379ba8573..579e159420 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2827,6 +2827,7 @@ AC_CONFIG_FILES([Makefile
src/mesa/drivers/osmesa/osmesa.pc
src/mesa/drivers/x11/Makefile
src/mesa/main/tests/Makefile
+ src/mesa/state_tracker/tests/Makefile
src/util/Makefile
src/util/tests/hash_table/Makefile
src/vulkan/Makefile])
diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
index 53f311d2a9..72ffd61212 100644
--- a/src/mesa/Makefile.am
+++ b/src/mesa/Makefile.am
@@ -19,7 +19,7 @@
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.

-SUBDIRS = . main/tests
+SUBDIRS = . main/tests state_tracker/tests

if HAVE_XLIB_GLX
SUBDIRS += drivers/x11
@@ -101,7 +101,7 @@ AM_CFLAGS = \
$(VISIBILITY_CFLAGS) \
$(MSVC2013_COMPAT_CFLAGS)
AM_CXXFLAGS = \
- $(LLVM_CFLAGS) \
+ $(LLVM_CXXFLAGS) \
$(VISIBILITY_CXXFLAGS) \
$(MSVC2013_COMPAT_CXXFLAGS)

diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources
index 4450d80090..908d1acff6 100644
--- a/src/mesa/Makefile.sources
+++ b/src/mesa/Makefile.sources
@@ -507,6 +507,8 @@ STATETRACKER_FILES = \
state_tracker/st_glsl_to_tgsi.h \
state_tracker/st_glsl_to_tgsi_private.cpp \
state_tracker/st_glsl_to_tgsi_private.h \
+ state_tracker/st_glsl_to_tgsi_temprename.cpp \
+ state_tracker/st_glsl_to_tgsi_temprename.h \
state_tracker/st_glsl_types.cpp \
state_tracker/st_glsl_types.h \
state_tracker/st_manager.c \
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
new file mode 100644
index 0000000000..389a4b6b5f
--- /dev/null
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
@@ -0,0 +1,551 @@
+/*
+ * Copyright © 2017 Gert Wollny
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+
+#include "st_glsl_to_tgsi_temprename.h"
+#include <tgsi/tgsi_info.h>
+#include <mesa/program/prog_instruction.h>
+#include <stack>
+#include <limits>
+#include <iostream>
+
+
+using std::vector;
+using std::stack;
+using std::shared_ptr;
+using std::weak_ptr;
+using std::pair;
+using std::make_pair;
+using std::make_shared;
+using std::numeric_limits;
+
+tgsi_temp_lifetime::tgsi_temp_lifetime(exec_list *instructions, int ntemps):
+ lifetimes(ntemps)
+{
+ evaluate(instructions);
+}
+
+const std::vector<std::pair<int, int> >& tgsi_temp_lifetime::get_lifetimes() const
+{
+ return lifetimes;
+}
+
+void tgsi_temp_lifetime::evaluate(exec_list *instructions)
+{
+ int i = 0;
+ int loop_id = 0;
+ int if_id = 0;
+ int switch_id = 0;
+ int scope_level = 0;
+ bool is_at_end = false;
+ shared_ptr<prog_scope> current = make_shared<prog_scope>(sct_outer, 0, scope_level++, i);
+ stack<shared_ptr<prog_scope>> scope_stack;
+
+ vector<temp_access> acc(lifetimes.size());
+
+ foreach_in_list(glsl_to_tgsi_instruction, inst, instructions) {
+ if (is_at_end) {
+ std::cerr << "GLSL_TO_TGSI: shader has instructions past end marker\n";
+ break;
+ }
+
+ switch (inst->op) {
+ case TGSI_OPCODE_BGNLOOP: {
+ shared_ptr<prog_scope> scope = make_shared<prog_scope>(current, sct_loop, loop_id, scope_level, i);
+ ++loop_id;
+ ++scope_level;
+ scope_stack.push(current);
+ current = scope;
+ break;
+ }
+ case TGSI_OPCODE_ENDLOOP: {
+ --scope_level;
+ current->set_end(i);
+ current = scope_stack.top();
+ scope_stack.pop();
+ break;
+ }
+ case TGSI_OPCODE_IF:
+ case TGSI_OPCODE_UIF:{
+ for (unsigned j = 0; j < num_inst_src_regs(inst); j++) {
+ if (inst->src[j].file == PROGRAM_TEMPORARY) {
+ acc[inst->src[j].index].append(i, acc_read, current);
+ }
+ }
+ shared_ptr<prog_scope> scope = make_shared<prog_scope>(current, sct_if, if_id, scope_level, i);
+ ++if_id;
+ ++scope_level;
+ scope_stack.push(current);
+ current = scope;
+ break;
+ }
+ case TGSI_OPCODE_ELSE: {
+ current->set_end(i-1);
+ current = make_shared<prog_scope>(current->parent(), sct_else, current->id(),
+ current->level(), i);
+ break;
+ }
+ case TGSI_OPCODE_END:{
+ current->set_end(i);
+ is_at_end = true;
+ break;
+ }
+ case TGSI_OPCODE_ENDIF:{
+ --scope_level;
+ current->set_end(i-1);
+ current = scope_stack.top();
+ scope_stack.pop();
+ break;
+ }
+ case TGSI_OPCODE_SWITCH: {
+ shared_ptr<prog_scope> scope = make_shared<prog_scope>(current, sct_switch, switch_id, scope_level, i);
+ ++scope_level;
+ ++switch_id;
+ scope_stack.push(current);
+ current = scope;
+ break;
+ }
+ case TGSI_OPCODE_ENDSWITCH: {
+ --scope_level;
+ current->set_end(i-1);
+
+ // remove the case level
+ if (current->type() != sct_switch ) {
+ current = scope_stack.top();
+ scope_stack.pop();
+ }
+ current = scope_stack.top();
+ scope_stack.pop();
+ break;
+ }
+
+ case TGSI_OPCODE_CASE:
+ case TGSI_OPCODE_DEFAULT: {
+ if ( current->type() == sct_switch ) {
+ scope_stack.push(current);
+ current = make_shared<prog_scope>(current, sct_case, current->id(), scope_level, i);
+ }else{
+ auto p = current->parent();
+ auto scope = make_shared<prog_scope>(p, sct_case, p->id(), p->level(), i);
+ if (current->end() == -1)
+ scope->set_previous(current);
+ current = scope;
+ }
+ for (unsigned j = 0; j < num_inst_src_regs(inst); j++) {
+ if (inst->src[j].file == PROGRAM_TEMPORARY) {
+ acc[inst->src[j].index].append(i, acc_read, current);
+ }
+ }
+ }
+ case TGSI_OPCODE_BRK: {
+ if ( current->type() == sct_case) {
+ current->set_end(i-1);
+ }
+ }
+ case TGSI_OPCODE_CONT: {
+ current->set_continue(current, i);
+ break;
+ }
+
+ default: {
+
+ for (unsigned j = 0; j < num_inst_dst_regs(inst); j++) {
+ if (inst->dst[j].file == PROGRAM_TEMPORARY) {
+ acc[inst->dst[j].index].append(i, acc_write, current);
+ }
+ }
+
+ for (unsigned j = 0; j < num_inst_src_regs(inst); j++) {
+ if (inst->src[j].file == PROGRAM_TEMPORARY) {
+ acc[inst->src[j].index].append(i, acc_read, current);
+ }
+ }
+
+ for (unsigned j = 0; j < inst->tex_offset_num_offset; j++) {
+ if (inst->tex_offsets[j].file == PROGRAM_TEMPORARY) {
+ acc[inst->tex_offsets[j].index].append(i, acc_read, current);
+ }
+ }
+
+ } // end default
+ } // end switch (op)
+
+ ++i;
+ }
+
+ // make sure last scope is closed, even though no
+ // TGSI_OPCODE_END was given
+ if (current->end() < 0) {
+ current->set_end(i-1);
+ }
+
+ for(unsigned i = 1; i < lifetimes.size(); ++i) {
+ lifetimes[i] = acc[i].get_required_lifetime();
+ }
+}
+
+
+tgsi_temp_lifetime::prog_scope::prog_scope(e_scope_type type, int id, int lvl, int s_begin):
+ prog_scope(std::shared_ptr<prog_scope>(), type, id, lvl, s_begin)
+{
+}
+
+tgsi_temp_lifetime::prog_scope::prog_scope(std::shared_ptr<prog_scope> p, e_scope_type type, int id, int lvl, int s_begin):
+ scope_type(type),
+ scope_id(id),
+ nested_level(lvl),
+ scope_begin(s_begin),
+ scope_end(-1),
+ loop_continue(numeric_limits<int>::max()),
+ parent_scope(p)
+{
+}
+
+tgsi_temp_lifetime::e_scope_type tgsi_temp_lifetime::prog_scope::type() const
+{
+ return scope_type;
+}
+
+
+std::shared_ptr<tgsi_temp_lifetime::prog_scope>
+tgsi_temp_lifetime::prog_scope::parent() const
+{
+ return parent_scope;
+}
+
+int tgsi_temp_lifetime::prog_scope::level() const
+{
+ return nested_level;
+}
+
+bool tgsi_temp_lifetime::prog_scope::in_loop() const
+{
+ if (scope_type == sct_loop)
+ return true;
+ if (parent_scope)
+ return parent_scope->in_loop();
+ return false;
+}
+
+const tgsi_temp_lifetime::prog_scope *
+tgsi_temp_lifetime::prog_scope::get_parent_loop() const
+{
+ if (scope_type == sct_loop)
+ return this;
+ if (parent_scope)
+ return parent_scope->get_parent_loop();
+ else
+ return nullptr;
+}
+
+bool tgsi_temp_lifetime::prog_scope::contains(const prog_scope& other) const
+{
+ return (begin() <= other.begin()) && (end() >= other.end());
+}
+
+bool tgsi_temp_lifetime::prog_scope::is_conditional() const
+{
+ return scope_type == sct_if || scope_type == sct_else ||
+ scope_type == sct_case;
+}
+
+const tgsi_temp_lifetime::prog_scope *
+tgsi_temp_lifetime::prog_scope::in_ifelse() const
+{
+ if (scope_type == sct_if ||
+ scope_type == sct_else)
+ return this;
+ else if (parent_scope)
+ return parent_scope->in_ifelse();
+ else
+ return nullptr;
+}
+
+const tgsi_temp_lifetime::prog_scope *
+tgsi_temp_lifetime::prog_scope::in_switchcase() const
+{
+ if (scope_type == sct_case)
+ return this;
+ else if (parent_scope)
+ return parent_scope->in_switchcase();
+ else
+ return nullptr;
+}
+
+int tgsi_temp_lifetime::prog_scope::id() const
+{
+ return scope_id;
+}
+
+int tgsi_temp_lifetime::prog_scope::begin() const
+{
+ return scope_begin;
+}
+
+int tgsi_temp_lifetime::prog_scope::end() const
+{
+ return scope_end;
+}
+
+void tgsi_temp_lifetime::prog_scope::set_previous(std::shared_ptr<prog_scope> prev)
+{
+ previous = prev;
+}
+
+void tgsi_temp_lifetime::prog_scope::set_end(int end)
+{
+ if (scope_end == -1) {
+ scope_end = end;
+ if (previous)
+ previous->set_end(end);
+ }
+}
+
+void tgsi_temp_lifetime::prog_scope::set_continue(weak_ptr<prog_scope> scope, int i)
+{
+ if (scope_type == sct_loop) {
+ loop_to_continue_scope = scope;
+ loop_continue = i;
+ } else if (parent_scope)
+ parent()->set_continue(scope, i);
+}
+
+int tgsi_temp_lifetime::prog_scope::loop_continue_idx() const
+{
+ return loop_continue;
+}
+
+void tgsi_temp_lifetime::temp_access::append(int index, e_acc_type rw, std::shared_ptr<prog_scope> pstate)
+{
+ temp_access_record r = {index, rw, pstate};
+ timeline.push_back(r);
+}
+
+pair<int, int> tgsi_temp_lifetime::temp_access::get_required_lifetime() const
+{
+ bool keep_for_full_loop = false;
+
+ std::shared_ptr<prog_scope> lr_scope;
+ std::shared_ptr<prog_scope> fr_scope;
+ std::shared_ptr<prog_scope> fw_scope;
+ const prog_scope *fw_ifthen_scope = nullptr;
+ const prog_scope *fw_switch_scope = nullptr;
+
+ int first_write = -1;
+ int last_read = -1;
+ int first_read = numeric_limits<int>::max();
+
+ for (temp_access_record i: timeline) {
+ if (i.acc == acc_read) {
+ last_read = i.index;
+ lr_scope = i.pstate;
+ if (first_read > i.index) {
+ first_read = i.index;
+ fr_scope = i.pstate;
+ }
+ }else{
+ if (first_write == -1) {
+ first_write = i.index;
+ fw_scope = i.pstate;
+
+ // we write in an if-branch
+ fw_ifthen_scope = i.pstate->in_ifelse();
+ if (fw_ifthen_scope && fw_ifthen_scope->in_loop()) {
+ // value not always written, in loops we must keep it
+ keep_for_full_loop = true;
+ } else {
+ // test for switch-case
+ fw_switch_scope = i.pstate->in_switchcase();
+
+ if (fw_switch_scope && fw_switch_scope->in_loop()) {
+ keep_for_full_loop = true;
+ }
+ }
+ } else if (keep_for_full_loop) {
+
+ // written in if branch?
+ // disable because read first in else branch
+ // makes this invalid and this is not (yet) tracked
+ if (0 && fw_ifthen_scope) {
+ auto s = i.pstate->in_ifelse();
+ // also written in the else branch?
+ if (s && (s->id() == fw_ifthen_scope->id())) {
+ keep_for_full_loop = false;
+ }
+ }
+ }
+ }
+ }
+
+ // this temp is only read, this is undefined
+ // behaviour, so we can use the register otherwise
+ if (!fw_scope) {
+ return make_pair(-1, -1);
+ }
+
+ // Only written to, just make sure it doesn't overlap
+ if (!lr_scope) {
+ return make_pair(first_write, first_write);
+ }
+
+ int target_level = -1;
+ // evaluate the shared scope
+ while (target_level < 0) {
+ if (lr_scope->contains(*fw_scope)) {
+ target_level = lr_scope->level();
+ } else if (fw_scope->contains(*lr_scope)) {
+ target_level = fw_scope->level();
+ } else {
+ // scopes (partially) disjunct, move up
+ if (lr_scope->type() == sct_loop) {
+ last_read = lr_scope->end();
+ }
+ lr_scope = lr_scope->parent();
+ }
+ }
+
+ // propagate the read scope to the target_level
+ while (lr_scope->level() > target_level) {
+ // if the read is in a loop we need to extend the
+ // variables life time to the end of that loop
+ if (lr_scope->type() == sct_loop) {
+ last_read = lr_scope->end();
+ }
+ lr_scope = lr_scope->parent();
+ }
+
+
+ // propagate the first_write scope to the target_level
+ bool has_continue = false;
+ while (target_level < fw_scope->level()) {
+
+ // propagate lifetime also if there was a continue
+ // in a loop and the write was after the continue
+ if (has_continue || (fw_scope->loop_continue_idx() < first_write)) {
+ has_continue = true;
+ first_write = fw_scope->begin();
+ int lr = fw_scope->end();
+ if (last_read < lr)
+ last_read = lr;
+ }
+
+ if ((fw_scope->type() == sct_loop) && (first_read < first_write)) {
+ first_write = fw_scope->begin();
+ int lr = fw_scope->end();
+ if (last_read < lr)
+ last_read = lr;
+ }
+
+ fw_scope = fw_scope->parent();
+
+ // if the value is conditionally written in a loop
+ // then propagate its lifetime to the full loop
+ if (fw_scope->type() == sct_loop) {
+ if (keep_for_full_loop || (first_read < first_write)) {
+ first_write = fw_scope->begin();
+ int lr = fw_scope->end();
+ if (last_read < lr)
+ last_read = lr;
+ }
+ }
+
+ // if we currently don't propagate the lifetime but
+ // the enclosing scope is a conditional within a loop
+ // up to the last-read level we need to propagate,
+ // todo: to tighten the life time check whether the value
+ // is written in all consitional code path below the loop
+ if (!keep_for_full_loop &&
+ fw_scope->is_conditional() &&
+ fw_scope->in_loop()) {
+ keep_for_full_loop = true;
+ }
+ }
+
+
+
+
+ // same level and same range means it is first
+ // written and last read in the same scope
+ // ignore the case when first read is before
+ // first write, because it is undefined behaviour
+ if ((lr_scope->begin() == fw_scope->begin()) &&
+ (lr_scope->end() == fw_scope->end())) {
+ return make_pair(first_write, last_read);
+ }
+
+ // different scopes,
+ if (!keep_for_full_loop && first_read > first_write) {
+ return make_pair(first_write, last_read);
+ }else{
+ // 1. if the value is not always written in a loop
+ // it must be kept for the whole loop scope.
+ //
+ // 2. if the value is read (maybe conditionally)
+ // before it is written first it also must be
+ // kept valid for the whole loop
+ auto enclosing_loop = lr_scope->get_parent_loop();
+ assert(enclosing_loop);
+ return make_pair(enclosing_loop->begin(), enclosing_loop->end());
+ }
+}
+
+
+void evaluate_remapping(const vector<std::pair<int, int>>& lt, std::vector<rename_reg_pair>& result)
+{
+
+ struct out_access_record {
+ int end;
+ unsigned reg;
+ bool tested;
+ };
+
+ std::multimap<int, out_access_record> m;
+ for (unsigned i = 1; i < lt.size(); ++i) {
+ out_access_record oar = {lt[i].second, i, false};
+ m.insert(make_pair(lt[i].first, oar));
+ }
+
+ while (true) {
+ auto trgt = m.begin();
+
+ while ( trgt != m.end() && trgt->second.tested)
+ ++trgt;
+
+ if (trgt == m.end())
+ break;
+
+ while (trgt != m.end()) {
+ auto src = m.upper_bound(trgt->second.end - 1);
+ if (src == m.end()) {
+ trgt->second.tested = true;
+ } else {
+ result[src->second.reg].new_reg = trgt->second.reg;
+ result[src->second.reg].valid = true;
+ trgt->second.end = src->second.end;
+ m.erase(src);
+ break;
+ }
+ ++trgt;
+ }
+ }
+}
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h b/src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h
new file mode 100644
index 0000000000..e764f0f0c2
--- /dev/null
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h
@@ -0,0 +1,114 @@
+/*
+ * Copyright © 2017 Gert Wollny
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "st_glsl_to_tgsi_private.h"
+#include <memory>
+#include <map>
+
+class tgsi_temp_lifetime {
+public:
+
+ enum e_scope_type {
+ sct_outer,
+ sct_loop,
+ sct_if,
+ sct_else,
+ sct_switch,
+ sct_case,
+ sct_unknown
+ };
+
+ enum e_acc_type {
+ acc_read,
+ acc_write
+ };
+
+ class prog_scope {
+
+ public:
+ prog_scope(e_scope_type type, int id, int lvl, int s_begin);
+ prog_scope(std::shared_ptr<prog_scope> p, e_scope_type type, int id, int lvl, int s_begin);
+
+ e_scope_type type() const;
+ std::shared_ptr<prog_scope> parent() const;
+ int level() const;
+ int id() const;
+ int end() const;
+ int begin() const;
+ const prog_scope *in_ifelse() const;
+ const prog_scope *in_switchcase() const;
+ int loop_continue_idx() const;
+ bool in_loop() const;
+ const prog_scope *get_parent_loop() const;
+ bool is_conditional() const;
+ bool contains(const prog_scope& other) const;
+ void set_end(int end);
+ void set_previous(std::shared_ptr<prog_scope> prev);
+ void set_continue(std::weak_ptr<prog_scope> scope, int i);
+ private:
+ e_scope_type scope_type;
+ int scope_id;
+ int nested_level;
+ int scope_begin;
+ int scope_end;
+ int loop_continue;
+ std::weak_ptr<prog_scope> loop_to_continue_scope;
+ std::shared_ptr<prog_scope> previous;
+ std::shared_ptr<prog_scope> parent_scope;
+
+ };
+
+ class temp_access {
+
+ public:
+
+ void append(int index, e_acc_type rw, std::shared_ptr<prog_scope> pstate);
+ std::pair<int, int> get_required_lifetime() const;
+
+ private:
+
+ struct temp_access_record {
+ int index;
+ e_acc_type acc;
+ std::shared_ptr<prog_scope> pstate;
+ };
+
+ std::vector< temp_access_record > timeline;
+
+ };
+
+ tgsi_temp_lifetime(exec_list *instructions, int ntemps);
+
+ const std::vector<std::pair<int, int> >& get_lifetimes() const;
+
+
+private:
+
+ void evaluate(exec_list *instructions);
+
+ std::vector<std::pair<int, int> > lifetimes;
+};
+
+void evaluate_remapping(const std::vector<std::pair<int, int>>& lt, std::vector<rename_reg_pair>& result);
+
+
diff --git a/src/mesa/state_tracker/tests/Makefile.am b/src/mesa/state_tracker/tests/Makefile.am
new file mode 100644
index 0000000000..a2bcad8dde
--- /dev/null
+++ b/src/mesa/state_tracker/tests/Makefile.am
@@ -0,0 +1,40 @@
+AM_CFLAGS = \
+ $(PTHREAD_CFLAGS)
+
+AM_CXXFLAGS = \
+ $(LLVM_CXXFLAGS)
+
+AM_CPPFLAGS = \
+ -I$(top_srcdir)/src/gtest/include \
+ -I$(top_srcdir)/src \
+ -I$(top_srcdir)/src/mapi \
+ -I$(top_builddir)/src/mesa \
+ -I$(top_srcdir)/src/mesa \
+ -I$(top_srcdir)/include \
+ -I$(top_srcdir)/src/gallium/include \
+ -I$(top_srcdir)/src/gallium/auxiliary \
+ $(DEFINES) $(INCLUDE_DIRS)
+
+TESTS = st-renumerate-test
+check_PROGRAMS = st-renumerate-test
+
+st_renumerate_test_SOURCES = \
+ test_glsl_to_tgsi_lifetime.cpp
+
+st_renumerate_test_LDFLAGS = \
+ $(LLVM_LDFLAGS)
+
+st_renumerate_test_LDADD = \
+ $(top_builddir)/src/mesa/libmesagallium.la \
+ $(top_builddir)/src/mapi/shared-glapi/libglapi.la \
+ $(top_builddir)/src/gallium/auxiliary/libgallium.la \
+ $(top_builddir)/src/util/libmesautil.la \
+ $(top_builddir)/src/gallium/drivers/trace/libtrace.la \
+ $(top_builddir)/src/gallium/winsys/sw/null/libws_null.la \
+ $(top_builddir)/src/gallium/drivers/softpipe/libsoftpipe.la \
+ $(top_builddir)/src/gtest/libgtest.la \
+ $(GALLIUM_COMMON_LIB_DEPS) \
+ $(LLVM_LIBS) \
+ $(PTHREAD_LIBS) \
+ $(DLOPEN_LIBS)
+
diff --git a/src/mesa/state_tracker/tests/st-renumerate-test b/src/mesa/state_tracker/tests/st-renumerate-test
new file mode 100755
index 0000000000..bc2f325899
--- /dev/null
+++ b/src/mesa/state_tracker/tests/st-renumerate-test
@@ -0,0 +1,210 @@
+#! /bin/sh
+
+# st-renumerate-test - temporary wrapper script for .libs/st-renumerate-test
+# Generated by libtool (GNU libtool) 2.4.6
+#
+# The st-renumerate-test program cannot be directly executed until all the libtool
+# libraries that it depends on are installed.
+#
+# This wrapper script should never be moved out of the build directory.
+# If it is, it will not operate correctly.
+
+# Sed substitution that helps us do robust quoting. It backslashifies
+# metacharacters that are still active within double-quoted strings.
+sed_quote_subst='s|$[`"$\\]$|\\\1|g'
+
+# Be Bourne compatible
+if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
+ emulate sh
+ NULLCMD=:
+ # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which
+ # is contrary to our usage. Disable this feature.
+ alias -g '${1+"$@"}'='"$@"'
+ setopt NO_GLOB_SUBST
+else
+ case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
+fi
+BIN_SH=xpg4; export BIN_SH # for Tru64
+DUALCASE=1; export DUALCASE # for MKS sh
+
+# The HP-UX ksh and POSIX shell print the target directory to stdout
+# if CDPATH is set.
+(unset CDPATH) >/dev/null 2>&1 && unset CDPATH
+
+relink_command=""
+
+# This environment variable determines our operation mode.
+if test "$libtool_install_magic" = "%%%MAGIC variable%%%"; then
+ # install mode needs the following variables:
+ generated_by_libtool_version='2.4.6'
+ notinst_deplibs=' ../../../../src/mapi/shared-glapi/libglapi.la'
+else
+ # When we are sourced in execute mode, $file and $ECHO are already set.
+ if test "$libtool_execute_magic" != "%%%MAGIC variable%%%"; then
+ file="$0"
+
+# A function that is used when there is no print builtin or printf.
+func_fallback_echo ()
+{
+ eval 'cat <<_LTECHO_EOF
+$1
+_LTECHO_EOF'
+}
+ ECHO="printf %s\\n"
+ fi
+
+# Very basic option parsing. These options are (a) specific to
+# the libtool wrapper, (b) are identical between the wrapper
+# /script/ and the wrapper /executable/ that is used only on
+# windows platforms, and (c) all begin with the string --lt-
+# (application programs are unlikely to have options that match
+# this pattern).
+#
+# There are only two supported options: --lt-debug and
+# --lt-dump-script. There is, deliberately, no --lt-help.
+#
+# The first argument to this parsing function should be the
+# script's ../../../../libtool value, followed by no.
+lt_option_debug=
+func_parse_lt_options ()
+{
+ lt_script_arg0=$0
+ shift
+ for lt_opt
+ do
+ case "$lt_opt" in
+ --lt-debug) lt_option_debug=1 ;;
+ --lt-dump-script)
+ lt_dump_D=`$ECHO "X$lt_script_arg0" | /bin/sed -e 's/^X//' -e 's%/[^/]*$%%'`
+ test "X$lt_dump_D" = "X$lt_script_arg0" && lt_dump_D=.
+ lt_dump_F=`$ECHO "X$lt_script_arg0" | /bin/sed -e 's/^X//' -e 's%^.*/%%'`
+ cat "$lt_dump_D/$lt_dump_F"
+ exit 0
+ ;;
+ --lt-*)
+ $ECHO "Unrecognized --lt- option: '$lt_opt'" 1>&2
+ exit 1
+ ;;
+ esac
+ done
+
+ # Print the debug banner immediately:
+ if test -n "$lt_option_debug"; then
+ echo "st-renumerate-test:st-renumerate-test:$LINENO: libtool wrapper (GNU libtool) 2.4.6" 1>&2
+ fi
+}
+
+# Used when --lt-debug. Prints its arguments to stdout
+# (redirection is the responsibility of the caller)
+func_lt_dump_args ()
+{
+ lt_dump_args_N=1;
+ for lt_arg
+ do
+ $ECHO "st-renumerate-test:st-renumerate-test:$LINENO: newargv[$lt_dump_args_N]: $lt_arg"
+ lt_dump_args_N=`expr $lt_dump_args_N + 1`
+ done
+}
+
+# Core function for launching the target application
+func_exec_program_core ()
+{
+
+ if test -n "$lt_option_debug"; then
+ $ECHO "st-renumerate-test:st-renumerate-test:$LINENO: newargv[0]: $progdir/$program" 1>&2
+ func_lt_dump_args ${1+"$@"} 1>&2
+ fi
+ exec "$progdir/$program" ${1+"$@"}
+
+ $ECHO "$0: cannot exec $program $*" 1>&2
+ exit 1
+}
+
+# A function to encapsulate launching the target application
+# Strips options in the --lt-* namespace from $@ and
+# launches target application with the remaining arguments.
+func_exec_program ()
+{
+ case " $* " in
+ *\ --lt-*)
+ for lt_wr_arg
+ do
+ case $lt_wr_arg in
+ --lt-*) ;;
+ *) set x "$@" "$lt_wr_arg"; shift;;
+ esac
+ shift
+ done ;;
+ esac
+ func_exec_program_core ${1+"$@"}
+}
+
+ # Parse options
+ func_parse_lt_options "$0" ${1+"$@"}
+
+ # Find the directory that this script lives in.
+ thisdir=`$ECHO "$file" | /bin/sed 's%/[^/]*$%%'`
+ test "x$thisdir" = "x$file" && thisdir=.
+
+ # Follow symbolic links until we get to the real thisdir.
+ file=`ls -ld "$file" | /bin/sed -n 's/.*-> //p'`
+ while test -n "$file"; do
+ destdir=`$ECHO "$file" | /bin/sed 's%/[^/]*$%%'`
+
+ # If there was a directory component, then change thisdir.
+ if test "x$destdir" != "x$file"; then
+ case "$destdir" in
+ [\\/]* | [A-Za-z]:[\\/]*) thisdir="$destdir" ;;
+ *) thisdir="$thisdir/$destdir" ;;
+ esac
+ fi
+
+ file=`$ECHO "$file" | /bin/sed 's%^.*/%%'`
+ file=`ls -ld "$thisdir/$file" | /bin/sed -n 's/.*-> //p'`
+ done
+
+ # Usually 'no', except on cygwin/mingw when embedded into
+ # the cwrapper.
+ WRAPPER_SCRIPT_BELONGS_IN_OBJDIR=no
+ if test "$WRAPPER_SCRIPT_BELONGS_IN_OBJDIR" = "yes"; then
+ # special case for '.'
+ if test "$thisdir" = "."; then
+ thisdir=`pwd`
+ fi
+ # remove .libs from thisdir
+ case "$thisdir" in
+ *[\\/].libs ) thisdir=`$ECHO "$thisdir" | /bin/sed 's%[\\/][^\\/]*$%%'` ;;
+ .libs ) thisdir=. ;;
+ esac
+ fi
+
+ # Try to get the absolute directory name.
+ absdir=`cd "$thisdir" && pwd`
+ test -n "$absdir" && thisdir="$absdir"
+
+ program='st-renumerate-test'
+ progdir="$thisdir/.libs"
+
+
+ if test -f "$progdir/$program"; then
+ # Add our own library path to LD_LIBRARY_PATH
+ LD_LIBRARY_PATH="/home/gerddie/src/Freedesktop/mesa-orig/src/mapi/shared-glapi/.libs:$LD_LIBRARY_PATH"
+
+ # Some systems cannot cope with colon-terminated LD_LIBRARY_PATH
+ # The second colon is a workaround for a bug in BeOS R4 sed
+ LD_LIBRARY_PATH=`$ECHO "$LD_LIBRARY_PATH" | /bin/sed 's/::*$//'`
+
+ export LD_LIBRARY_PATH
+
+ if test "$libtool_execute_magic" != "%%%MAGIC variable%%%"; then
+ # Run the actual program with our arguments.
+ func_exec_program ${1+"$@"}
+ fi
+ else
+ # The program doesn't exist.
+ $ECHO "$0: error: '$progdir/$program' does not exist" 1>&2
+ $ECHO "This script is just a wrapper for $program." 1>&2
+ $ECHO "See the libtool documentation for more information." 1>&2
+ exit 1
+ fi
+fi
diff --git a/src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp b/src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp
new file mode 100644
index 0000000000..3e094f0dda
--- /dev/null
+++ b/src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp
@@ -0,0 +1,789 @@
+/*
+ * Copyright © 2017 Gert Wollny
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <state_tracker/st_glsl_to_tgsi_temprename.h>
+#include <tgsi/tgsi_ureg.h>
+#include <tgsi/tgsi_info.h>
+#include <compiler/glsl/list.h>
+#include <gtest/gtest.h>
+
+using std::vector;
+using std::pair;
+
+struct MockCodeline {
+ MockCodeline(unsigned _op): op(_op) {}
+ MockCodeline(unsigned _op, const vector<int>& _dst, const vector<int>& _src, const vector<int>&_to):
+ op(_op), dst(_dst), src(_src), tex_offsets(_to){}
+ unsigned op;
+ vector<int> dst;
+ vector<int> src;
+ vector<int> tex_offsets;
+};
+
+const int in0 = 0;
+const int in1 = -1;
+const int in2 = -2;
+
+const int out0 = 0;
+const int out1 = -1;
+
+
+
+/*
+ * supported opcodes for mock-shaders:
+ *
+ * Arithmentic
+ *
+ * TGSI_OPCODE_MOV, TGSI_OPCODE_UADD, TGSI_OPCODE_UMAD
+ *
+ * Flow control
+ * loops:
+ * TGSI_OPCODE_BGNLOOP, TGSI_OPCODE_ENDLOOP,
+ * TGSI_OPCODE_BRK, TGSI_OPCODE_CONT
+ *
+ * if/switch:
+ * TGSI_OPCODE_IF, TGSI_OPCODE_UIF, TGSI_OPCODE_ELSE, TGSI_OPCODE_ENDIF
+ * TGSI_OPCODE_SWITCH, TGSI_OPCODE_CASE, TGSI_OPCODE_DEFAULT
+ *
+ * TGSI_OPCODE_END
+ */
+
+
+class MockShader {
+public:
+ MockShader(const vector<MockCodeline>& source);
+ ~MockShader();
+
+ void free();
+
+ exec_list* get_program();
+ int get_num_temps();
+private:
+ st_src_reg create_src_register(int src_idx);
+ st_dst_reg create_dst_register(int dst_idx);
+ exec_list* program;
+ int num_temps;
+ void *mem_ctx;
+};
+
+using expectation = vector<vector<int>>;
+
+MockShader::~MockShader()
+{
+ free();
+ ralloc_free(mem_ctx);
+}
+
+int MockShader::get_num_temps()
+{
+ return num_temps;
+}
+
+
+exec_list* MockShader::get_program()
+{
+ return program;
+}
+
+MockShader::MockShader(const vector<MockCodeline>& source):
+ num_temps(0)
+{
+ mem_ctx = ralloc_context(NULL);
+
+ program = new(mem_ctx) exec_list();
+
+ for (MockCodeline i: source) {
+ glsl_to_tgsi_instruction *next_instr = new(mem_ctx) glsl_to_tgsi_instruction();
+ next_instr->op = i.op;
+ next_instr->info = tgsi_get_opcode_info(i.op);
+
+ assert(i.src.size() < 4);
+ assert(i.dst.size() < 3);
+ assert(i.tex_offsets.size() < 3);
+
+ for (unsigned k = 0; k < i.src.size(); ++k) {
+ next_instr->src[k] = create_src_register(i.src[k]);
+ }
+ for (unsigned k = 0; k < i.dst.size(); ++k) {
+ next_instr->dst[k] = create_dst_register(i.dst[k]);
+ }
+
+ // set texture registers
+ next_instr->tex_offset_num_offset = i.tex_offsets.size();
+ next_instr->tex_offsets = new st_src_reg[i.tex_offsets.size()];
+ for (unsigned k = 0; k < i.tex_offsets.size(); ++k) {
+ next_instr->tex_offsets[k] = create_src_register(i.tex_offsets[k]);
+ }
+
+ program->push_tail(next_instr);
+ }
+ ++num_temps;
+}
+
+void MockShader::free()
+{
+ // the list is not fully initialized, so
+ // tearing it down also must be done manually.
+ exec_node *p;
+ while ((p = program->pop_head())) {
+ glsl_to_tgsi_instruction * instr = static_cast<glsl_to_tgsi_instruction *>(p);
+ if (instr->tex_offset_num_offset > 0)
+ delete[] instr->tex_offsets;
+ delete p;
+ }
+ program = 0;
+ num_temps = 0;
+}
+
+st_src_reg MockShader::create_src_register(int src_idx)
+{
+ gl_register_file file;
+ int idx = 0;
+ if (src_idx > 0) {
+ file = PROGRAM_TEMPORARY;
+ idx = src_idx;
+ if (num_temps < idx)
+ num_temps = idx;
+ } else {
+ file = PROGRAM_INPUT;
+ idx = -src_idx;
+ }
+ return st_src_reg(file, idx, GLSL_TYPE_INT);
+
+}
+
+st_dst_reg MockShader::create_dst_register(int dst_idx)
+{
+ gl_register_file file;
+ int idx = 0;
+ if (dst_idx > 0) {
+ file = PROGRAM_TEMPORARY;
+ idx = dst_idx;
+ if (num_temps < idx)
+ num_temps = idx;
+ } else {
+ file = PROGRAM_OUTPUT;
+ idx = - dst_idx;
+ }
+ return st_dst_reg(file, 0xF, GLSL_TYPE_INT, idx);
+}
+
+/**
+ This is a text class to check the exact life times
+*/
+class LifetimeEvaluatorExactTest : public testing::Test {
+protected:
+ void run(const vector<MockCodeline>& code, const expectation& e);
+};
+
+/**
+ This is a text class to check that the life time is at least
+ in the expected range
+*/
+class LifetimeEvaluatorAtLeastTest : public testing::Test {
+protected:
+ void run(const vector<MockCodeline>& code, const expectation& e);
+};
+
+
+void LifetimeEvaluatorExactTest::run(const vector<MockCodeline>& code, const expectation& e)
+{
+ MockShader shader(code);
+
+ tgsi_temp_lifetime ana(shader.get_program(), shader.get_num_temps());
+ auto lifetimes = ana.get_lifetimes();
+
+ // lifetimes[0] not used, but created for simpler processing
+ ASSERT_EQ(lifetimes.size(), e.size());
+
+ for (unsigned i = 1; i < lifetimes.size(); ++i) {
+ EXPECT_EQ(lifetimes[i].first, e[i][0]);
+ EXPECT_EQ(lifetimes[i].second, e[i][1]);
+ }
+}
+
+void LifetimeEvaluatorAtLeastTest::run(const vector<MockCodeline>& code, const expectation& e)
+{
+ MockShader shader(code);
+
+ tgsi_temp_lifetime ana(shader.get_program(), shader.get_num_temps());
+ auto lifetimes = ana.get_lifetimes();
+
+ // lifetimes[0] not used, but created for simpler processing
+ ASSERT_EQ(lifetimes.size(), e.size());
+
+ for (unsigned i = 1; i < lifetimes.size(); ++i) {
+ EXPECT_LE(lifetimes[i].first, e[i][0]);
+ EXPECT_GE(lifetimes[i].second, e[i][1]);
+ }
+}
+
+TEST_F(LifetimeEvaluatorExactTest, SimpleMoveAdd)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_UADD, {out0}, {1, in0}, {}},
+ { TGSI_OPCODE_END}
+ };
+ run(code, expectation({{-1, -1}, {0,1}}));
+}
+
+
+TEST_F(LifetimeEvaluatorExactTest, SimpleMoveAddMove)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_UADD, {2}, {1, in0}, {}},
+ { TGSI_OPCODE_MOV, {out0}, {2}, {}},
+ { TGSI_OPCODE_END}
+ };
+ run(code, expectation({{-1, -1}, {0,1}, {1,2}}));
+}
+
+TEST_F(LifetimeEvaluatorExactTest, SimpleMoveAddMoveTexoffset)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_MOV, {2}, {in1}, {}},
+ { TGSI_OPCODE_UADD, {out0}, {}, {1,2}},
+ { TGSI_OPCODE_END}
+ };
+ run(code, expectation({{-1, -1}, {0,2}, {1,2}}));
+}
+
+
+TEST_F(LifetimeEvaluatorExactTest, SimpleMoveInLoop)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 0
+ { TGSI_OPCODE_BGNLOOP }, // 1
+ { TGSI_OPCODE_UADD, {2}, {1, in0}, {}}, // 2
+ { TGSI_OPCODE_UADD, {3}, {1, 2}, {}}, // 3
+ { TGSI_OPCODE_UADD, {3}, {3, in1}, {}}, // 4
+ { TGSI_OPCODE_ENDLOOP }, // 5
+ { TGSI_OPCODE_MOV, {out0}, {3}, {}}, // 6
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0, 5}, {2,3}, {3, 6}}));
+}
+
+
+// in loop if/else value written only in one path, and read later
+// - value must survive the whole loop
+TEST_F(LifetimeEvaluatorExactTest, MoveInIfInLoop)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 0
+ { TGSI_OPCODE_BGNLOOP }, // 1
+ { TGSI_OPCODE_IF}, // 2
+ { TGSI_OPCODE_UADD, {2}, {1, in0}, {}}, // 3
+ { TGSI_OPCODE_ENDIF}, // 4
+ { TGSI_OPCODE_UADD, {3}, {1, 2}, {}}, // 5
+ { TGSI_OPCODE_UADD, {3}, {3, in1}, {}}, // 6
+ { TGSI_OPCODE_ENDLOOP }, // 7
+ { TGSI_OPCODE_MOV, {out0}, {3}, {}}, // 8
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0, 7}, {1,7}, {5, 8}}));
+}
+
+
+// in loop if/else value written in both path, and read later
+// - value must survive from first write to last read in loop
+// for now we only check that the minimum life time is correct
+TEST_F(LifetimeEvaluatorAtLeastTest, WriteInIfAndElseInLoop)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 0
+ { TGSI_OPCODE_BGNLOOP }, // 1
+ { TGSI_OPCODE_IF}, // 2
+ { TGSI_OPCODE_UADD, {2}, {1, in0}, {}}, // 3
+ { TGSI_OPCODE_ELSE }, // 4
+ { TGSI_OPCODE_MOV, {2}, {1}, {}}, // 5
+ { TGSI_OPCODE_ENDIF}, // 6
+ { TGSI_OPCODE_UADD, {3}, {1, 2}, {}}, // 7
+ { TGSI_OPCODE_UADD, {3}, {3, in1}, {}}, // 8
+ { TGSI_OPCODE_ENDLOOP }, // 9
+ { TGSI_OPCODE_MOV, {out0}, {3}, {}}, // 10
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0, 9}, {3,7}, {7, 10}}));
+}
+
+// in loop if/else value written in both path, red in else path
+// before read and also read later
+// - value must survive from first write to last read in loop
+TEST_F(LifetimeEvaluatorExactTest, WriteInIfAndElseReadInElseInLoop)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 0
+ { TGSI_OPCODE_BGNLOOP }, // 1
+ { TGSI_OPCODE_IF}, // 2
+ { TGSI_OPCODE_UADD, {2}, {1, in0}, {}}, // 3
+ { TGSI_OPCODE_ELSE }, // 4
+ { TGSI_OPCODE_ADD, {2}, {1, 2}, {}}, // 5
+ { TGSI_OPCODE_ENDIF}, // 6
+ { TGSI_OPCODE_UADD, {3}, {1, 2}, {}}, // 7
+ { TGSI_OPCODE_UADD, {3}, {3, in1}, {}}, // 8
+ { TGSI_OPCODE_ENDLOOP }, // 9
+ { TGSI_OPCODE_MOV, {out0}, {3}, {}}, // 10
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0, 9}, {1,9}, {7, 10}}));
+}
+
+// in loop if/else read in one path before written in the same loop
+// - value must survive the whole loop
+TEST_F(LifetimeEvaluatorExactTest, ReadInIfInLoopBeforeWrite)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 0
+ { TGSI_OPCODE_BGNLOOP }, // 1
+ { TGSI_OPCODE_IF, {}, {in0}, {}}, // 2
+ { TGSI_OPCODE_UADD, {2}, {1, 3}, {}}, // 3
+ { TGSI_OPCODE_ENDIF}, // 4
+ { TGSI_OPCODE_UADD, {3}, {1, 2}, {}}, // 5
+ { TGSI_OPCODE_UADD, {3}, {3, in1}, {}}, // 6
+ { TGSI_OPCODE_ENDLOOP }, // 7
+ { TGSI_OPCODE_MOV, {out0}, {3}, {}}, // 8
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0, 7}, {1,7}, {1, 8}}));
+}
+
+/* Write in nested ifs in loop, for now we do test whether the
+ * life time is atleast what is required, but we know that the
+ * implementation doesn't do a full check and sets larger boundaries
+ */
+TEST_F(LifetimeEvaluatorAtLeastTest, NestedIfInLoopAlwaysWriteButNotPropagated)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP }, // 1
+ { TGSI_OPCODE_IF, {}, {in0}, {}},
+ { TGSI_OPCODE_IF, {}, {in0}, {}},
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_ELSE},
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 5
+ { TGSI_OPCODE_ENDIF},
+ { TGSI_OPCODE_ELSE},
+ { TGSI_OPCODE_IF, {}, {in0}, {}},
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_ELSE}, // 10
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_ENDIF},
+ { TGSI_OPCODE_ENDIF},
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}},
+ { TGSI_OPCODE_ENDLOOP }, // 15
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{3, 14}}));
+}
+
+
+
+TEST_F(LifetimeEvaluatorExactTest, NestedIfInLoopWriteNotAlways)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP }, // 0
+ { TGSI_OPCODE_IF, {}, {in0}, {}},
+ { TGSI_OPCODE_IF, {}, {in0}, {}},
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_ELSE},
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 5
+ { TGSI_OPCODE_ENDIF},
+ { TGSI_OPCODE_ELSE},
+ { TGSI_OPCODE_IF, {}, {in0}, {}},
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_ENDIF}, // 10
+ { TGSI_OPCODE_ENDIF},
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}},
+ { TGSI_OPCODE_ENDLOOP }, // 13
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0, 13}}));
+}
+
+
+// if a continue is in the loop, all variables written after the
+// continue and used outside the loop must be maintained for the
+// whole loop
+TEST_F(LifetimeEvaluatorExactTest, LoopWithWriteAfterContinue)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP }, // 0
+ { TGSI_OPCODE_IF, {}, {in0}, {}},
+ { TGSI_OPCODE_CONT},
+ { TGSI_OPCODE_ENDIF},
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_ENDLOOP }, // 5
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}}, // 6
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0, 6}}));
+}
+
+// if a continue is in the loop, all variables written after the
+// continue and used outside the loop must be maintained for the
+// whole loop, but not further
+TEST_F(LifetimeEvaluatorExactTest, NestedLoopWithWriteAfterContinue)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP }, // 0
+ { TGSI_OPCODE_BGNLOOP }, // 1
+ { TGSI_OPCODE_IF, {}, {in0}, {}},
+ { TGSI_OPCODE_CONT},
+ { TGSI_OPCODE_ENDIF},
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_ENDLOOP }, // 6
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}}, // 7
+ { TGSI_OPCODE_ENDLOOP }, // 6
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{1, 7}}));
+}
+
+// if a continue is in the loop, all variables written after the
+// continue and used outside the loop must be maintained for all
+// loops up untto the read scope, but not further
+TEST_F(LifetimeEvaluatorExactTest, Nested2LoopWithWriteAfterContinue)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP }, // 0
+ { TGSI_OPCODE_BGNLOOP }, // 1
+ { TGSI_OPCODE_BGNLOOP }, // 2
+ { TGSI_OPCODE_IF, {}, {in0}, {}},
+ { TGSI_OPCODE_CONT},
+ { TGSI_OPCODE_ENDIF},
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_ENDLOOP },
+ { TGSI_OPCODE_ENDLOOP },
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}}, // 9
+ { TGSI_OPCODE_ENDLOOP }, // 6
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{1, 9}}));
+}
+
+TEST_F(LifetimeEvaluatorExactTest, LoopWithWriteInSwitch)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP },
+ { TGSI_OPCODE_SWITCH, {}, {in0}, {} },
+ { TGSI_OPCODE_CASE, {}, {in0}, {} },
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_BRK },
+ { TGSI_OPCODE_DEFAULT },
+ { TGSI_OPCODE_BRK },
+ { TGSI_OPCODE_ENDSWITCH },
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}},
+ { TGSI_OPCODE_ENDLOOP },
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0, 9}}));
+}
+
+// value written in one case, and read in other, in loop
+// - must survive the loop
+TEST_F(LifetimeEvaluatorExactTest, LoopWithReadWriteInSwitchDifferentCase)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP }, // 0
+ { TGSI_OPCODE_SWITCH, {}, {in0}, {} }, // 0
+ { TGSI_OPCODE_CASE, {}, {in0}, {} }, // 0
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_BRK }, // 0
+ { TGSI_OPCODE_DEFAULT }, // 0
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}},
+ { TGSI_OPCODE_BRK }, // 0
+ { TGSI_OPCODE_ENDSWITCH }, // 0
+ { TGSI_OPCODE_ENDLOOP }, // 6
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0, 9}}));
+}
+
+
+TEST_F(LifetimeEvaluatorExactTest, LoopRWInSwitchCaseLastCaseWithoutBreak)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP }, // 0
+ { TGSI_OPCODE_SWITCH, {}, {in0}, {} }, // 0
+ { TGSI_OPCODE_CASE, {}, {in0}, {} }, // 0
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_BRK }, // 0
+ { TGSI_OPCODE_DEFAULT }, // 0
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}},
+ { TGSI_OPCODE_ENDSWITCH }, // 0
+ { TGSI_OPCODE_ENDLOOP }, // 6
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0, 8}}));
+}
+
+
+// value read/write in same case, stays there
+
+
+TEST_F(LifetimeEvaluatorExactTest, LoopWithReadWriteInSwitchSameCase)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP }, // 0
+ { TGSI_OPCODE_SWITCH, {}, {in0}, {} }, // 0
+ { TGSI_OPCODE_CASE, {}, {in0}, {} }, // 0
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}},
+ { TGSI_OPCODE_BRK }, // 0
+ { TGSI_OPCODE_DEFAULT }, // 0
+ { TGSI_OPCODE_BRK }, // 0
+ { TGSI_OPCODE_ENDSWITCH }, // 0
+ { TGSI_OPCODE_ENDLOOP }, // 6
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{3,4}}));
+}
+
+// value read/write in all cases, should only live from first
+// write to last read, but currently the whole loop is used.
+TEST_F(LifetimeEvaluatorAtLeastTest, LoopWithReadWriteInSwitchSameCase)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP }, // 0
+ { TGSI_OPCODE_SWITCH, {}, {in0}, {}}, // 0
+ { TGSI_OPCODE_CASE, {}, {in0}, {} }, // 0
+ { TGSI_OPCODE_MOV, {}, {in0}, {}},
+ { TGSI_OPCODE_BRK }, // 0
+ { TGSI_OPCODE_DEFAULT }, // 0
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}},
+ { TGSI_OPCODE_BRK }, // 0
+ { TGSI_OPCODE_ENDSWITCH }, // 0
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}},
+ { TGSI_OPCODE_ENDLOOP }, // 6
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{3,9}}));
+}
+
+// value written in one case, and read in other, in loop, may fall through
+// - must survive the loop
+
+// value read/write in differnt loops
+TEST_F(LifetimeEvaluatorExactTest, LoopsWithDifferntScopes)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP }, // 0
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 1
+ { TGSI_OPCODE_ENDLOOP }, // 2
+ { TGSI_OPCODE_BGNLOOP }, // 3
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}}, // 4
+ { TGSI_OPCODE_ENDLOOP }, // 5
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{1,5}}));
+}
+
+// value read/write in differnt loops
+TEST_F(LifetimeEvaluatorExactTest, LoopsWithDifferntScopesConditionalWrite)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP }, // 0
+ { TGSI_OPCODE_IF, {}, {in0}, {}},
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 3
+ { TGSI_OPCODE_ENDIF}, // 1
+ { TGSI_OPCODE_ENDLOOP }, // 5
+ { TGSI_OPCODE_BGNLOOP }, // 6
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}}, // 7
+ { TGSI_OPCODE_ENDLOOP }, // 5
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0,7}}));
+}
+
+// first read before first write wiredness with nested loops
+TEST_F(LifetimeEvaluatorExactTest, LoopsWithDifferntScopesConditionalReadBeforeWrite)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_BGNLOOP }, // 0
+ { TGSI_OPCODE_BGNLOOP }, // 1
+ { TGSI_OPCODE_IF, {}, {in0}, {}}, // 2
+ { TGSI_OPCODE_MOV, {out0}, {1}, {}}, // 3
+ { TGSI_OPCODE_ENDIF}, // 4
+ { TGSI_OPCODE_ENDLOOP }, // 5
+ { TGSI_OPCODE_BGNLOOP }, // 6
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 7
+ { TGSI_OPCODE_ENDLOOP }, // 8
+ { TGSI_OPCODE_ENDLOOP }, // 9
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0,9}}));
+}
+
+// register is only written. This should not happen,
+// but to handle the case we want the register to life
+// at least one instruction
+TEST_F(LifetimeEvaluatorExactTest, WriteOnly)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 3
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0,0}}));
+}
+
+// register read in if
+TEST_F(LifetimeEvaluatorExactTest, SimpleReadForIf)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 3
+ { TGSI_OPCODE_ADD, {out0}, {in0, in1}, {}}, // 3
+ { TGSI_OPCODE_IF, {}, {1}, {}},
+ { TGSI_OPCODE_ENDIF}
+ };
+ run (code, expectation({{-1,-1},{0,2}}));
+}
+
+// register read in switch
+TEST_F(LifetimeEvaluatorExactTest, SimpleReadForSwitchAndCase)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 3
+ { TGSI_OPCODE_SWITCH, {}, {1}, {}},
+ { TGSI_OPCODE_CASE, {}, {1}, {}},
+ { TGSI_OPCODE_CASE, {}, {1}, {}},
+ { TGSI_OPCODE_END, {}, {1}, {}},
+ };
+ run (code, expectation({{-1,-1},{0,3}}));
+}
+
+TEST_F(LifetimeEvaluatorExactTest, DistinceScopesAndNoEndProgramId)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV, {1}, {in0}, {}}, // 3
+ { TGSI_OPCODE_IF, {}, {1}, {}},
+ { TGSI_OPCODE_MOV, {2}, {1}, {}},
+ { TGSI_OPCODE_ENDIF},
+ { TGSI_OPCODE_IF, {}, {1}, {}},
+ { TGSI_OPCODE_MOV, {out0}, {2}, {}},
+ { TGSI_OPCODE_ENDIF},
+
+ };
+ run (code, expectation({{-1,-1},{0,4}, {2,5}}));
+}
+
+/* Check that two destination registers are used
+*/
+TEST_F(LifetimeEvaluatorExactTest, TwoDestRegisters)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_DFRACEXP , {1,2}, {in0}, {}}, // 3
+ { TGSI_OPCODE_ADD, {out0}, {1,2}, {}},
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0,1}, {0,1}}));
+}
+
+/* Check that two destination registers are used
+*/
+TEST_F(LifetimeEvaluatorExactTest, ThreeSourceRegisters)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_DFRACEXP , {1,2}, {in0}, {}}, // 3
+ { TGSI_OPCODE_ADD , {3}, {in0, in1}, {}}, // 3
+ { TGSI_OPCODE_MAD, {out0}, {1,2, 3}, {}},
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0,2}, {0,2}, {1,2}}));
+}
+
+/* Check that two destination registers are used
+*/
+TEST_F(LifetimeEvaluatorExactTest, OverwriteWrittenOnlyTemps)
+{
+ const vector<MockCodeline> code = {
+ { TGSI_OPCODE_MOV , {1}, {in0}, {}}, // 3
+ { TGSI_OPCODE_MOV , {2}, {in1}, {}}, // 3
+ { TGSI_OPCODE_END}
+ };
+ run (code, expectation({{-1,-1},{0,0}, {1,1}}));
+}
+
+
+TEST(RegisterRemapping, RegisterRemapping)
+{
+ rename_reg_pair proto{false, 0};
+ vector<rename_reg_pair> result(7, proto);
+
+ vector<pair<int, int>> lt({{-1,-1},
+ {0, 1}, // 1
+ {0, 2}, // 2
+ {1, 2}, // 3
+ {2, 10}, // 4
+ {3, 5}, // 5
+ {5, 10} // 6
+ });
+
+
+
+ evaluate_remapping(lt, result);
+
+ vector<int> remap({0,1, 2, 3, 4, 5, 6});
+
+ std::transform(remap.begin(), remap.end(), result.begin(), remap.begin(),
+ [](int x, const rename_reg_pair& rn) {return rn.valid ? rn.new_reg : x;});
+
+ vector<int> expect({0, 1, 2, 1, 1, 2, 2});
+
+ for(unsigned i = 1; i < remap.size(); ++i) {
+ EXPECT_EQ(remap[i], expect[i]);
+ }
+
+}
+
+
+TEST(RegisterRemapping, RegisterRemapping2)
+{
+ rename_reg_pair proto{false, 0};
+ vector<rename_reg_pair> result(7, proto);
+
+ vector<pair<int, int>> lt({{-1,-1},
+ {0, 1}, // 1
+ {0, 2}, // 2
+ {3, 3}, // 3
+ {4, 4}, // 4
+ });
+
+
+
+ evaluate_remapping(lt, result);
+
+ vector<int> remap({0, 1, 2, 3, 4});
+
+ std::transform(remap.begin(), remap.end(), result.begin(), remap.begin(),
+ [](int x, const rename_reg_pair& rn) {return rn.valid ? rn.new_reg : x;});
+
+ vector<int> expect({0, 1, 2, 1, 1});
+
+ for(unsigned i = 1; i < remap.size(); ++i) {
+ EXPECT_EQ(remap[i], expect[i]);
+ }
+
+}

--
2.13.0

Gert Wollny

2017-06-09 23:15:08 UTC

Permalink

This patch replaces the old register livetime estimation with the
new approach.
---
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 0e7f4b646a..b76ad42536 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -55,10 +55,11 @@
#include "st_glsl_types.h"
#include "st_nir.h"
#include "st_shader_cache.h"
-#include "st_glsl_to_tgsi_private.h"
+#include "st_glsl_to_tgsi_temprename.h"

#include "util/hash_table.h"
#include <algorithm>
+#include <iostream>

#define PROGRAM_ANY_CONST ((1 << PROGRAM_STATE_VAR) | \
(1 << PROGRAM_CONSTANT) | \
@@ -323,6 +324,7 @@ public:

void merge_two_dsts(void);
void merge_registers(void);
+ void merge_registers_alternative(void);
void renumber_registers(void);

void emit_block_mov(ir_assignment *ir, const struct glsl_type *type,
@@ -5042,6 +5044,17 @@ glsl_to_tgsi_visitor::merge_two_dsts(void)
}
}

+void
+glsl_to_tgsi_visitor::merge_registers_alternative(void)
+{
+ rename_reg_pair proto ={false, 0};
+ std::vector<rename_reg_pair> renames(this->next_temp, proto);
+ tgsi_temp_lifetime analysis(&this->instructions, this->next_temp);
+ auto lt = analysis.get_lifetimes();
+ evaluate_remapping(lt, renames);
+ rename_temp_registers(&renames[0]);
+}
+
/* Merges temporary registers together where possible to reduce the number of
* registers needed to run a program.
*
@@ -6492,7 +6505,7 @@ get_mesa_program_tgsi(struct gl_context *ctx,

v->merge_two_dsts();
if (!skip_merge_registers)
- v->merge_registers();
+ v->merge_registers_alternative();
v->renumber_registers();

/* Write the END instruction. */

--
2.13.0

Marek Olšák

2017-06-11 14:12:05 UTC

Permalink

Hi Gert,

Have you measured the CPU overhead of the new code?

Marek

Post by Gert Wollny
Dear all,
as I wrote before, I was looking into the temporary register renaming.
This series of patches implements a new approach that achieves a tigher
estimation of the life time of the temporaries, and as a result the Piano
and Voloplosion benchmarks implemented in gputest [1] now work. Before
they failed with "r600_pipe_shader_create - translation from TGSI failed!"
Piglit shows 7 fixes and 6 regressions compared to git 8fac894f, but they don't
seem to be related to shaders. I've also tested other programs like the unignie-*
benchmarks and they didn't show regressions.
I think that the patch will need a few more iterations to remove code duplication
and generally adhere to the mesa style, but I think it is atthe point where I could
need a bit of feedback to get it into shape to be acceptable, and I'd also like to
mention that since I'm new to mesa this I have no commit rights.
many thanks,
Gert
[1] http://www.geeks3d.com/gputest/
mesa/st: glsl_to_tgsi move some helper classes to extra files
mesa/st: glsl_to_tgsi Implement a new lifetime tracker for temporaries
mesa/st: glsl_to_tgsi: tie in the new register renaming approach
configure.ac | 1 +
src/mesa/Makefile.am | 4 +-
src/mesa/Makefile.sources | 4 +
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 302 +-------
src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp | 241 +++++++
src/mesa/state_tracker/st_glsl_to_tgsi_private.h | 135 ++++
.../state_tracker/st_glsl_to_tgsi_temprename.cpp | 551 ++++++++++++++
.../state_tracker/st_glsl_to_tgsi_temprename.h | 114 +++
src/mesa/state_tracker/tests/Makefile.am | 40 ++
src/mesa/state_tracker/tests/st-renumerate-test | 210 ++++++
.../tests/test_glsl_to_tgsi_lifetime.cpp | 789 +++++++++++++++++++++
11 files changed, 2104 insertions(+), 287 deletions(-)
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.h
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h
create mode 100644 src/mesa/state_tracker/tests/Makefile.am
create mode 100755 src/mesa/state_tracker/tests/st-renumerate-test
create mode 100644 src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp
--
2.13.0
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Marek Olšák

2017-06-11 14:15:00 UTC

Permalink

Also, I don't know if people will like that it uses STL. I personally
have no issue with that as long as it doesn't break apps (e.g. the STL
shipped with apps should be the same as the STL shipped with the
distribution).

Marek

Post by Marek OlÅ¡Ã¡k
Hi Gert,
Have you measured the CPU overhead of the new code?
Marek

Gert Wollny

2017-06-11 17:21:04 UTC

Permalink

Hello Marek,

thanks for chiming in.

Post by Marek OlÅ¡Ã¡k
Also, I don't know if people will like that it uses STL. I personally
have no issue with that as long as it doesn't break apps (e.g. the
STL shipped with apps should be the same as the STL shipped with the
distribution).

Well, on Linux I would take it for granted that the STL used to run the
code is the same like the one the code was compiled with, and there are
already quite some places in the mesa code where STL constructs are
used (if that wounld't have been the case, then I would tried to avoid
the STL). I am actually more concerned that propagating the C++11
requirement to the whole of src/mesa might not be welcomed (although
everything compiles and runs fine).

Post by Marek OlÅ¡Ã¡k

Post by Marek OlÅ¡Ã¡k
Hi Gert,
Have you measured the CPU overhead of the new code?

So far no, I guess one would do that with the shader-db to get
reasonable complex shaders, but I only have a r600 based card so I'm
not sure whether I can run this. In any case, tomorrow I will take a
look into this.

Best,
Gert

Post by Marek OlÅ¡Ã¡k

Post by Marek OlÅ¡Ã¡k
Marek

Post by Gert Wollny
Dear all,
as I wrote before, I was looking into the temporary register renaming.
This series of patches implements a new approach that achieves a tigher
estimation of the life time of the temporaries, and as a result the Piano
and Voloplosion benchmarks implemented in gputest [1] now work. Before
they failed with "r600_pipe_shader_create - translation from TGSI failed!"
Piglit shows 7 fixes and 6 regressions compared to git 8fac894f, but they don't
seem to be related to shaders. I've also tested other programs like the unignie-*
benchmarks and they didn't show regressions.
I think that the patch will need a few more iterations to remove code duplication
and generally adhere to the mesa style, but I think it is atthe point where I could
need a bit of feedback to get it into shape to be acceptable, and I'd also like to
mention that since I'm new to mesa this I have no commit rights.
many thanks,
Gert
[1] http://www.geeks3d.com/gputest/
mesa/st: glsl_to_tgsi move some helper classes to extra files
mesa/st: glsl_to_tgsi Implement a new lifetime tracker for
temporaries
mesa/st: glsl_to_tgsi: tie in the new register renaming
approach
configure.ac                                       |   1 +
src/mesa/Makefile.am                               |   4 +-
src/mesa/Makefile.sources                          |   4 +
src/mesa/state_tracker/st_glsl_to_tgsi.cpp         | 302 +----
---
src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp | 241 +++++++
src/mesa/state_tracker/st_glsl_to_tgsi_private.h   | 135 ++++
.../state_tracker/st_glsl_to_tgsi_temprename.cpp   | 551
++++++++++++++
.../state_tracker/st_glsl_to_tgsi_temprename.h     | 114 +++
src/mesa/state_tracker/tests/Makefile.am           |  40 ++
src/mesa/state_tracker/tests/st-renumerate-test    | 210 ++++++
.../tests/test_glsl_to_tgsi_lifetime.cpp           | 789
+++++++++++++++++++++
11 files changed, 2104 insertions(+), 287 deletions(-)
create mode 100644
src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp
create mode 100644
src/mesa/state_tracker/st_glsl_to_tgsi_private.h
create mode 100644
src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
create mode 100644
src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h
create mode 100644 src/mesa/state_tracker/tests/Makefile.am
create mode 100755 src/mesa/state_tracker/tests/st-renumerate-
test
create mode 100644
src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp
--
2.13.0
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Gert Wollny

2017-06-12 08:09:03 UTC

Permalink

Post by Gert Wollny

Post by Marek OlÅ¡Ã¡k
Have you measured the CPU overhead of the new code?

I did runs of the shader-db/run program with valgrind/callgrind

Here the original merge_registers reports 0.21% and my code reports
0.50%.

If it is important to cut down on these addes 0.3%, I think, I can
eliminate 0.1% by changing the implementation to not used so many
dynamically allocated objects, but beyond that it will be difficult.

Best,
Gert

Michel Dänzer

2017-06-12 06:44:42 UTC

Permalink

Post by Gert Wollny
Piglit shows 7 fixes and 6 regressions compared to git 8fac894f, but they don't
seem to be related to shaders.

Which tests regressed (maybe you can put up a piglit HTML summary
somewhere generated from a run with and without your patches)? Do they
consistently pass without your patches and fail with them?

--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

Gert Wollny

2017-06-12 09:32:56 UTC

Permalink

Post by Michel DÃ¤nzer

Post by Gert Wollny
Piglit shows 7 fixes and 6 regressions compared to git 8fac894f, but they don't
seem to be related to shaders.

Which tests regressed (maybe you can put up a piglit HTML summary
somewhere generated from a run with and without your patches)? Do
they consistently pass without your patches and fail with them?

I had to redo the results, because I realized that I had compared the
system mesa version (with EGL support) versus the test version (without
EGL support).

Now both tested versions were configure with the same options and I run
both versions two times. The result diff of the quick test are:

piglit summary console -d results/o1 results/o2 results/n1 results/n2

glx/glx-multithread-texture: pass pass fail fail

glx/glx-visuals-stencil: fail fail pass pass

glx/glx_arb_sync_control/timing -fullscreen -divisor 1: pass fail pass
fail

glx/glx_arb_sync_control/timing -fullscreen -divisor 2: pass fail fail
warn

glx/glx_arb_sync_control/timing -fullscreen -msc-delta 1: fail fail
warn fail

glx/glx_arb_sync_control/timing -fullscreen -msc-delta 2: fail fail
fail pass

glx/glx_arb_sync_control/timing -msc-delta 2: warn fail pass fail

glx/glx_arb_sync_control/timing -waitformsc -msc-delta 2: fail pass
pass fail

spec/arb_shader_bit_encoding/execution/and-clamp: fail fail pass fail

spec/arb_shading_language_420pack/active sampler conflict: fail fail
pass fail

spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec2-index-
rd: fail fail pass pass

spec/nv_conditional_render/drawpixels: fail pass fail pass

spec/nv_conditional_render/vertex_array: fail pass pass pass

summary:
       name:     o1     o2     n1     n2
       ----  ------ ------ ------ ------
       pass:  31583  31584  31588  31585
       fail:   1454   1454   1449   1452
      crash:      5      5      5      5
       skip:  17356  17356  17356  17356
    timeout:      0      0      0      0
       warn:     14     13     14     14
incomplete:      0      0      0      0
dmesg-warn:      0      0      0      0
dmesg-fail:      0      0      0      0
    changes:      0      6      9      9
      fixes:      0      3      7      3
regressions:      0      3      2      6
      total:  50412  50412  50412  50412

Best,
Gert

Michel Dänzer

2017-06-12 09:58:56 UTC

Permalink

Post by Gert Wollny

Post by Michel DÃ¤nzer

Post by Gert Wollny
Piglit shows 7 fixes and 6 regressions compared to git 8fac894f, but they don't
seem to be related to shaders.

Which tests regressed (maybe you can put up a piglit HTML summary
somewhere generated from a run with and without your patches)? Do
they consistently pass without your patches and fail with them?

I had to redo the results, because I realized that I had compared the
system mesa version (with EGL support) versus the test version (without
EGL support).
Now both tested versions were configure with the same options and I run
piglit summary console -d results/o1 results/o2 results/n1 results/n2
glx/glx-multithread-texture: pass pass fail fail

Might want to make sure this isn't a regression caused by your patches,
but FWIW this test seems to fail for me with radeonsi with "random" values.

Post by Gert Wollny
glx/glx_arb_sync_control/timing -fullscreen -divisor 1: pass fail pass
fail
glx/glx_arb_sync_control/timing -fullscreen -divisor 2: pass fail fail
warn
glx/glx_arb_sync_control/timing -fullscreen -msc-delta 1: fail fail
warn fail
glx/glx_arb_sync_control/timing -fullscreen -msc-delta 2: fail fail
fail pass
glx/glx_arb_sync_control/timing -msc-delta 2: warn fail pass fail
glx/glx_arb_sync_control/timing -waitformsc -msc-delta 2: fail pass
pass fail

You can ignore these, their results are somewhat random. The piglit
patches below help a little, but the results are still not 100% reliable
when piglit runs multiple tests concurrently.

https://patchwork.freedesktop.org/patch/150486/
https://patchwork.freedesktop.org/patch/150484/
https://patchwork.freedesktop.org/patch/150485/

In summary, it looks like your patches don't cause any piglit regressions.

--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

Nicolai Hähnle

2017-06-12 10:17:23 UTC

Permalink

Post by Gert Wollny

Post by Michel DÃ¤nzer

Post by Gert Wollny
Piglit shows 7 fixes and 6 regressions compared to git 8fac894f, but they don't
seem to be related to shaders.

Which tests regressed (maybe you can put up a piglit HTML summary
somewhere generated from a run with and without your patches)? Do
they consistently pass without your patches and fail with them?

I had to redo the results, because I realized that I had compared the
system mesa version (with EGL support) versus the test version (without
EGL support).
Now both tested versions were configure with the same options and I run
piglit summary console -d results/o1 results/o2 results/n1 results/n2
glx/glx-multithread-texture: pass pass fail fail
glx/glx-visuals-stencil: fail fail pass pass
glx/glx_arb_sync_control/timing -fullscreen -divisor 1: pass fail pass
fail
glx/glx_arb_sync_control/timing -fullscreen -divisor 2: pass fail fail
warn
glx/glx_arb_sync_control/timing -fullscreen -msc-delta 1: fail fail
warn fail
glx/glx_arb_sync_control/timing -fullscreen -msc-delta 2: fail fail
fail pass
glx/glx_arb_sync_control/timing -msc-delta 2: warn fail pass fail
glx/glx_arb_sync_control/timing -waitformsc -msc-delta 2: fail pass
pass fail

The above are probably noise.

Post by Gert Wollny
spec/arb_shader_bit_encoding/execution/and-clamp: fail fail pass fail
spec/arb_shading_language_420pack/active sampler conflict: fail fail
pass fail
spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec2-index-
rd: fail fail pass pass
spec/nv_conditional_render/drawpixels: fail pass fail pass
spec/nv_conditional_render/vertex_array: fail pass pass pass

It's disconcerting that you have tests here whose pass status is
unstable. Those tests really should be deterministic.

You should definitely investigate
spec/arb_shader_bit_encoding/execution/and-clamp to see if it's related
to your patches.

Cheers,
Nicolai

Post by Gert Wollny
name: o1 o2 n1 n2
---- ------ ------ ------ ------
pass: 31583 31584 31588 31585
fail: 1454 1454 1449 1452
crash: 5 5 5 5
skip: 17356 17356 17356 17356
timeout: 0 0 0 0
warn: 14 13 14 14
incomplete: 0 0 0 0
dmesg-warn: 0 0 0 0
dmesg-fail: 0 0 0 0
changes: 0 6 9 9
fixes: 0 3 7 3
regressions: 0 3 2 6
total: 50412 50412 50412 50412
Best,
Gert
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

Gert Wollny

2017-06-12 13:35:42 UTC

Permalink

Hello Nicolai,

Post by Gert Wollny
spec/arb_shader_bit_encoding/execution/and-clamp: fail fail pass fail

It's disconcerting that you have tests here whose pass status is
unstable. Those tests really should be deterministic.

When I run only these tests ("all" and specified with -t ) then they
always pass (= three times in a row).

spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec2-

Post by Gert Wollny
index-
rd: fail fail pass pass

This one actually now passes because of the patch (before it failed
because it needed 125 registers, and only 124 are free to be used.

Post by Gert Wollny
spec/arb_shading_language_420pack/active sampler conflict: fail
fail pass fail

spec/nv_conditional_render/drawpixels: fail pass fail pass

spec/nv_conditional_render/vertex_array: fail pass pass pass >

These, however, are unstable, independent on whether my patches are
applied or not.

Best,
Gert

Nicolai Hähnle

2017-06-12 10:28:07 UTC

Permalink

Plenty of style issues aside, can you explain where and why you get
tighter lifetimes?

Cheers,
Nicolai

Post by Gert Wollny
many thanks,
Gert
[1] http://www.geeks3d.com/gputest/
mesa/st: glsl_to_tgsi move some helper classes to extra files
mesa/st: glsl_to_tgsi Implement a new lifetime tracker for temporaries
mesa/st: glsl_to_tgsi: tie in the new register renaming approach
configure.ac | 1 +
src/mesa/Makefile.am | 4 +-
src/mesa/Makefile.sources | 4 +
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 302 +-------
src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp | 241 +++++++
src/mesa/state_tracker/st_glsl_to_tgsi_private.h | 135 ++++
.../state_tracker/st_glsl_to_tgsi_temprename.cpp | 551 ++++++++++++++
.../state_tracker/st_glsl_to_tgsi_temprename.h | 114 +++
src/mesa/state_tracker/tests/Makefile.am | 40 ++
src/mesa/state_tracker/tests/st-renumerate-test | 210 ++++++
.../tests/test_glsl_to_tgsi_lifetime.cpp | 789 +++++++++++++++++++++
11 files changed, 2104 insertions(+), 287 deletions(-)
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.h
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h
create mode 100644 src/mesa/state_tracker/tests/Makefile.am
create mode 100755 src/mesa/state_tracker/tests/st-renumerate-test
create mode 100644 src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

Gert Wollny

2017-06-12 12:34:21 UTC

Permalink

Plenty of style issues aside, can you explain where and why you get
tighter lifetimes?

In the original code if a temporary is used within a loop it gets the
whole life time of the loop assigned.

With this patch I track in more detail where a temporary is accesses
and base the lifetime on this: For instance, if a variable is first
unconditionally written and later read for the last time in the same
scope (loop, if, or switch branch), then the lifetime can be restricted
to that first-written - last-read range.

The code gets more complex because it tries to resolve this also for
nested scopes, and one also has to take care about whether a variable
is written only conditionally within a loop, or conditionally read
before it is written (in the source code sense, but not in the program
flow sense).

Shaders that profit from this better lifetime estimation are the ones
that have many short living values within long loops, think

for () {
float4 t[1] = f(in2);
float4 t[2] = g(in1);
float4 t[3] = op(t[1], t[2]);
...
sum += t[200];
}

Here the old code would keep all of the t[i] alive for the whole loop,
and in fact with the GpuTest Piano benchmark I have seen a shader with

2000 temporaries where many were used in a loop but only required for

two or three lines so that my code could merge them to less then 100
temporaries, and this made it possible for the tgsi to bytecode layer
in r600g to actually translate the shader.

Best,
Gert

PS: Regarding style, I am fully aware that I have to iterate over this
code a few more times. I tried to adhere to the way the existing code
represents itself to me, but I'm happy to listen to any advise I can
get.

In any case, I though it might be best to send this patch out now for
discussion. Now, with the unit tests in place I will rework it and
focus also more on style questions. One thing that comes up immediately
up is that I will try to reduce the use of dynamically allocated
memory, since 60% of the run time of my code is with memory allocation
and de-allocation.

Nicolai Hähnle

2017-06-12 18:52:51 UTC

Permalink

Post by Gert Wollny

Post by Nicolai HÃ¤hnle
Plenty of style issues aside, can you explain where and why you get
tighter lifetimes?

In the original code if a temporary is used within a loop it gets the
whole life time of the loop assigned.
With this patch I track in more detail where a temporary is accesses
and base the lifetime on this: For instance, if a variable is first
unconditionally written and later read for the last time in the same
scope (loop, if, or switch branch), then the lifetime can be restricted
to that first-written - last-read range.
The code gets more complex because it tries to resolve this also for
nested scopes, and one also has to take care about whether a variable
is written only conditionally within a loop, or conditionally read
before it is written (in the source code sense, but not in the program
flow sense).
Shaders that profit from this better lifetime estimation are the ones
that have many short living values within long loops, think
for () {
float4 t[1] = f(in2);
float4 t[2] = g(in1);
float4 t[3] = op(t[1], t[2]);
...
sum += t[200];
}
Here the old code would keep all of the t[i] alive for the whole loop,
and in fact with the GpuTest Piano benchmark I have seen a shader with

Post by Nicolai HÃ¤hnle
2000 temporaries where many were used in a loop but only required for

two or three lines so that my code could merge them to less then 100
temporaries, and this made it possible for the tgsi to bytecode layer
in r600g to actually translate the shader.

Okay. I think you should seriously re-think your algorithm in a way that
makes it a more natural evolution from the algorithm that's already there.

Basically, the current algorithm tracks (first_write, last_read), so
think about what you need to track in order to obtain a single-pass
algorithm that computes lifetime (first, last) for every temporary. I
think the following should do it for a first cut:

struct st_calculate_lifetime_state {
/* First and last instruction that has this temporary */
unsigned first;
unsigned last;

/* First instruction of the outer-most in-active loop that
* contains this temporary. (A loop is active if we're
* currently processing instructions in it.)
unsigned loop_first;

/* Position of a read without preceding dominating write. */
unsigned undef_read[4];

/* First write in the program that is dominating our
* current position, per channel.
*/
unsigned first_dominating_write[4];
};

In addition, you need to keep a stack of active scopes (loops and ifs),
but you really only need to remember the start of the scope (and for
loops, probably the position of the first BREAK).

Here's a sketch of the "state machine" that you need to run while
traversing the program, assuming no BRK and CONT:

Init: last = 0, everything else = ~0

These are updates on individual variables on use:

On any use (source or dest):
- if first > cur_pos, set first = loop_first = cur_pos
- if loop_first < first, set first = loop_first
- update last

On use as source:
- if first_dominating_write > cur_pos and undef_read > cur_pos, set
undef_read = cur_pos

On use as dest:
- if first_dominating_write > cur_pos, set first_dominating_write = cur_pos

These are updates of all temporaries on scope change:

On ENDLOOP, for all temps:
- if loop_first > start of loop, set loop_first = start of loop
- first < start of loop, update last to end of loop
- if undef_read between start and end of loop: set first = MIN(first,
start of loop) and last = end of loop
- if first_dominating_write < end of loop, set undef_read = ~0

On ELSE and ENDIF, for all temps:
- if first_dominating_write > start of scope, set first_dominating_write
= ~0

I'm not sure right now whether BREAK / CONT need special treatment at
all. I think what you need is:

On ENDLOOP:
- if first_dominating_write is between the first BREAK in the loop and
the end of the loop, set first_dominating_write = ~0

And for CONT, you probably don't really need anything, because CONTs
cannot make you skip code forever.

What this state machine doesn't yet cover is

IF ..
MOV TEMP[0], ...
ELSE
MOV TEMP[0], ...
ENDIF

Still, I'd start with it and see whether you need to cover that case.

And even that case can probably be dealt with in a fairly efficient and
pragmatic way. The idea is to keep track of the nesting level of IFs,
plus an

uint32 dominating_write_in_true_block[4];

per temp. Then:

On ELSE:
- if first_dominating_write < cur_pos, set the bit corresponding to the
current nesting level in dominating_write_in_true_block

On ENDIF:
- don't reset first_dominating_write if the bit corresponding to the
current nesting level in dominating_write_in_true_block is set
- unconditionally clear that bit

It won't cover cases with nesting level > 32, but do we really care?

I hope I didn't miss anything, because after all this is admittedly
subtle stuff. Still, I think this kind of state-machine approach should
work and allow you to avoid *lots* of allocations and pointer-chasing.

Cheers,
Nicolai

Post by Gert Wollny
Best,
Gert
PS: Regarding style, I am fully aware that I have to iterate over this
code a few more times. I tried to adhere to the way the existing code
represents itself to me, but I'm happy to listen to any advise I can
get.
In any case, I though it might be best to send this patch out now for
discussion. Now, with the unit tests in place I will rework it and
focus also more on style questions. One thing that comes up immediately
up is that I will try to reduce the use of dynamically allocated
memory, since 60% of the run time of my code is with memory allocation
and de-allocation.

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

Nicolai Hähnle

2017-06-12 19:00:38 UTC