Discussion:
[PATCH] nvc0/ir: propagate immediates to CALL input MOVs
(too old to reply)
Tobias Klausmann
2017-08-12 19:33:23 UTC
Permalink
Raw Message
On using builtin functions we have to move the input to registers $0 and $1, if
one of the input value is an immediate, we fail to propagate the immediate:

...
mov u32 $r477 0x00000003 (0)
...
mov u32 $r0 %r473 (0)
mov u32 $r1 $r477 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...

With this patch the immediate is propagated, potentially causing the first MOV
to be superfluous, which we'd remove in that case:

...

mov u32 $r0 %r473 (0)
mov u32 $r1 0x00000003 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...

Shaderdb stats:
total instructions in shared programs : 4893460 -> 4893324 (-0.00%)
total gprs used in shared programs : 582972 -> 582881 (-0.02%)
total local used in shared programs : 17960 -> 17960 (0.00%)

local gpr inst bytes
helped 0 91 112 112
hurt 0 0 0 0

Signed-off-by: Tobias Klausmann <***@mni.thm.de>
---
.../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index c8f0701572..861d08af24 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i)
int builtin;

bld.setPosition(i, false);
- bld.mkMovToReg(0, i->getSrc(0));
- bld.mkMovToReg(1, i->getSrc(1));
+
+ // Generate movs to the input regs for the call we want to generate
+ for (int s = 0; i->srcExists(s); ++s) {
+ Instruction *ld = i->getSrc(s)->getInsn();
+ ImmediateValue imm;
+ // check if we are moving an immediate, propagate it in that case
+ if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) ||
+ !ld->src(0).getImmediate(imm))
+ bld.mkMovToReg(s, i->getSrc(s));
+ else {
+ bld.mkMovToReg(s, ld->getSrc(0));
+ // Clear the src, to make code elimination possible here before we
+ // delete the instruction i later
+ i->setSrc(s, NULL);
+ if (ld->getDef(0)->refCount() == 0)
+ delete_Instruction(prog, ld);
+ }
+ }
+
switch (i->dType) {
case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break;
case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break;
--
2.14.0
Ilia Mirkin
2017-08-12 20:20:29 UTC
Permalink
Raw Message
On Sat, Aug 12, 2017 at 3:33 PM, Tobias Klausmann
Post by Tobias Klausmann
On using builtin functions we have to move the input to registers $0 and $1, if
...
mov u32 $r477 0x00000003 (0)
...
mov u32 $r0 %r473 (0)
mov u32 $r1 $r477 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...
With this patch the immediate is propagated, potentially causing the first MOV
...
mov u32 $r0 %r473 (0)
mov u32 $r1 0x00000003 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...
total instructions in shared programs : 4893460 -> 4893324 (-0.00%)
total gprs used in shared programs : 582972 -> 582881 (-0.02%)
total local used in shared programs : 17960 -> 17960 (0.00%)
local gpr inst bytes
helped 0 91 112 112
hurt 0 0 0 0
---
.../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index c8f0701572..861d08af24 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i)
int builtin;
bld.setPosition(i, false);
- bld.mkMovToReg(0, i->getSrc(0));
- bld.mkMovToReg(1, i->getSrc(1));
+
+ // Generate movs to the input regs for the call we want to generate
+ for (int s = 0; i->srcExists(s); ++s) {
+ Instruction *ld = i->getSrc(s)->getInsn();
+ ImmediateValue imm;
+ // check if we are moving an immediate, propagate it in that case
+ if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) ||
+ !ld->src(0).getImmediate(imm))
At this point you don't even have to use getImmediate - you can just
look at ld->src(0).getFile() == FILE_IMMEDIATE.

Normally you'd just do i->src(s).getImmediate(imm) and moved on with
life. But you kinda need the ld here, which is annoying. Perhaps you
can just drop the manual deletion of the op here... which would let
you do the much simpler thing. Can you see if there's any effect from
that?
Post by Tobias Klausmann
+ bld.mkMovToReg(s, i->getSrc(s));
+ else {
+ bld.mkMovToReg(s, ld->getSrc(0));
+ // Clear the src, to make code elimination possible here before we
+ // delete the instruction i later
+ i->setSrc(s, NULL);
i gets deleted later on. move the deletion of ld after that happens?
Post by Tobias Klausmann
+ if (ld->getDef(0)->refCount() == 0)
ld->isDead()
Post by Tobias Klausmann
+ delete_Instruction(prog, ld);
+ }
+ }
+
switch (i->dType) {
case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break;
case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break;
--
2.14.0
_______________________________________________
Nouveau mailing list
https://lists.freedesktop.org/mailman/listinfo/nouveau
Tobias Klausmann
2017-08-12 21:05:56 UTC
Permalink
Raw Message
Post by Ilia Mirkin
On Sat, Aug 12, 2017 at 3:33 PM, Tobias Klausmann
Post by Tobias Klausmann
On using builtin functions we have to move the input to registers $0 and $1, if
...
mov u32 $r477 0x00000003 (0)
...
mov u32 $r0 %r473 (0)
mov u32 $r1 $r477 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...
With this patch the immediate is propagated, potentially causing the first MOV
...
mov u32 $r0 %r473 (0)
mov u32 $r1 0x00000003 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...
total instructions in shared programs : 4893460 -> 4893324 (-0.00%)
total gprs used in shared programs : 582972 -> 582881 (-0.02%)
total local used in shared programs : 17960 -> 17960 (0.00%)
local gpr inst bytes
helped 0 91 112 112
hurt 0 0 0 0
---
.../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index c8f0701572..861d08af24 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i)
int builtin;
bld.setPosition(i, false);
- bld.mkMovToReg(0, i->getSrc(0));
- bld.mkMovToReg(1, i->getSrc(1));
+
+ // Generate movs to the input regs for the call we want to generate
+ for (int s = 0; i->srcExists(s); ++s) {
+ Instruction *ld = i->getSrc(s)->getInsn();
+ ImmediateValue imm;
+ // check if we are moving an immediate, propagate it in that case
+ if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) ||
+ !ld->src(0).getImmediate(imm))
At this point you don't even have to use getImmediate - you can just
look at ld->src(0).getFile() == FILE_IMMEDIATE.
That was actually the fallback to the before non-working getImmediate()
and is easily doable if favored.
Post by Ilia Mirkin
Normally you'd just do i->src(s).getImmediate(imm) and moved on with
life. But you kinda need the ld here, which is annoying. Perhaps you
can just drop the manual deletion of the op here... which would let
you do the much simpler thing. Can you see if there's any effect from
that?
Will do!
Post by Ilia Mirkin
Post by Tobias Klausmann
+ bld.mkMovToReg(s, i->getSrc(s));
+ else {
+ bld.mkMovToReg(s, ld->getSrc(0));
+ // Clear the src, to make code elimination possible here before we
+ // delete the instruction i later
+ i->setSrc(s, NULL);
i gets deleted later on. move the deletion of ld after that happens?
this would cause more indirection (saving of the insn for later), not
sure if that makes the code more readable or if this clear is more
straight forward. As you see i'd go with the clear, but if you really
want i can add the extra save and delete.
Post by Ilia Mirkin
Post by Tobias Klausmann
+ if (ld->getDef(0)->refCount() == 0)
ld->isDead()
ok
Post by Ilia Mirkin
Post by Tobias Klausmann
+ delete_Instruction(prog, ld);
+ }
+ }
+
switch (i->dType) {
case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break;
case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break;
--
2.14.0
_______________________________________________
Nouveau mailing list
https://lists.freedesktop.org/mailman/listinfo/nouveau
Loading...