Discussion:
[Bug 104214] Dota crashes when switching from game to desktop
(too old to reply)
b***@freedesktop.org
2017-12-11 16:35:06 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

Bug ID: 104214
Summary: Dota crashes when switching from game to desktop
Product: Mesa
Version: 17.3
Hardware: Other
OS: All
Status: NEW
Severity: normal
Priority: medium
Component: Other
Assignee: mesa-***@lists.freedesktop.org
Reporter: ***@gmail.com
QA Contact: mesa-***@lists.freedesktop.org

I upgraded from mesa 17.2.6 to mesa 17.3. Since then, dota2 crashes when
switching from the game to the desktop with alt+tab. It may crash on the first
attempt, but usually I do not need more than 3 attempts to get the crash.

My laptop has an Intel graphics card (i7-7700HQ) and a downgrade to mesa 17.2.x
fixes the issue completely.

--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2017-12-11 16:38:41 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #1 from Sven <***@gmail.com> ---
Steps to reproduce:
Start Dota, stay on the main menu screen, switch back and forth between the
desktop and Dota. I'm using the mate desktop environment.

Result: (from dmesg)
[ 8804.741329] traps: GLRenderThread[18545] general protection ip:7f5136de2e81
sp:7f5135089960 error:0 in i965_dri.so[7f51369f8000+795000]

--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2017-12-11 17:02:53 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #2 from Eero Tamminen <***@intel.com> ---
I cannot reproduce this with KBL (i7-7567U) running Ubuntu 16.04 LTS with
latest Mesa git.

Can you reproduce the issue if you disable/change the "Enable Steam Overlay
while in-game" option from the DOTA2 Steam properties?


Could you provide backtrace of the issue?

0. Install Gdb (in Debian/Ubuntu: "sudo apt install gdb")
1. Start DOTA2
2. Connect Gdb to it (from a terminal) with following command:
sudo gdb /proc/$(pidof dota2)/self $(pidof dota2)
3. Enter "c" to Gdb, to continue
4. Reproduce the DOTA2 crash
5. Attach here the *full* output of "thread apply all bt" Gdb command

(Preferably you should do that after installing Mesa debug symbols package from
the same repository where you installed Mesa from. What package name / what
repository you need for that is distribution specific.)

--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2017-12-11 17:08:52 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #3 from Eero Tamminen <***@intel.com> ---
(In reply to Eero Tamminen from comment #2)
> sudo gdb /proc/$(pidof dota2)/self $(pidof dota2)

Sorry, should be:
sudo gdb /proc/$(pidof dota2)/exe $(pidof dota2)

--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2017-12-11 17:43:14 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #4 from Sven <***@gmail.com> ---
Yes, I can reproduce the issue with steam overlay disabled.

I will have to try gdb some other day when I have some more time.

--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2017-12-24 13:39:18 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #5 from Sven <***@gmail.com> ---
The issue didn't occur at work, where i have an i5-4690. At work I have Arch
Linux with mesa 17.3.0.

On my private laptop, I switched from Gentoo to Arch Linux yesterday (not
because of this issue). Arch also comes with mesa 17.3.0 and the issue occurs
too. In fact, it got worse in the sense that dota doesn't even start anymore
(crashed during black screen with dota logo, main menu doesn't show). Again,
like on Gentoo, downgrading from 17.3.0 to 17.2.6 fixed the issue and dota
starts (yes, I only downgraded the mesa package, didn't even restart X11). My
laptop has an i7-7700HQ.

I saw that Arch also has an 17.3.1 package. I will try that next. Then I will
try to provide some gdb trace or so.

--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2017-12-29 23:03:42 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #6 from Sven <***@gmail.com> ---
When using mesa 17.3.1 on Arch Linux, Dota still crashed on startup. The error
is

[ 3047.872388] GLRenderThread[12897]: segfault at 44 ip 00007fc93f36fce0 sp
00007fc93d1ec910 error 4 in i965_dri.so[7fc93ef5a000+7f8000]

The Arch Linux package version of mesa is 17.3.1-2. I'm investigating on how to
do the "thread apply all bt". I cannot attach gdb to the PID of a running dota
by hand, cause dota crashes on start right now.

--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-01 11:18:09 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

***@gmail.com changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2018-01-01 18:23:09 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #7 from ***@gmail.com ---
Same here, i am unable to start Dota since mesa 17.3.1. The game crash just
after the Dota2 launch screen.

GLRenderThread[887]: segfault at 44 ip 00007ff219676ce0 sp 00007ff2174f3910
error 4 in i965_dri.so[7ff219261000+7f8000]

Reverting mesa to 17.2.6 "fix" the issue.

I am also on Archlinux and i have an i7-7500U.

Built mesa with debug symbols and attached gdb with the following command :
GAME_DEBUGGER=gdb steam

Here's the result :

Thread 4 "GLRenderThread" received signal SIGSEGV, Segmentation fault.
intel_miptree_choose_aux_usage (brw=***@entry=0x557d90f43030, mt=0x0) at
intel_mipmap_tree.c:366
366 if (intel_miptree_supports_mcs(brw, mt)) {
(gdb) bt
#0 intel_miptree_choose_aux_usage (brw=***@entry=0x557d90f43030, mt=0x0) at
intel_mipmap_tree.c:366
#1 0x00007faee101f98e in miptree_create (flags=MIPTREE_CREATE_BUSY,
num_samples=1, depth0=1, height0=21885, width0=2463255520, last_level=0,
first_level=0, format=<optimized out>, target=3553, brw=0x557d90f43030) at
intel_mipmap_tree.c:725
#2 intel_miptree_create (brw=***@entry=0x557d90f43030, target=<optimized out>,
format=<optimized out>, first_level=***@entry=0,
last_level=***@entry=0, width0=***@entry=2463255520, height0=21885,
depth0=1, num_samples=1, flags=MIPTREE_CREATE_BUSY)
at intel_mipmap_tree.c:781
#3 0x00007faee101fb6e in intel_miptree_create_for_renderbuffer
(brw=***@entry=0x557d90f43030, format=<optimized out>,
width=***@entry=2463255520, height=***@entry=21885, num_samples=<optimized
out>) at intel_mipmap_tree.c:1172
#4 0x00007faee101b018 in intel_alloc_private_renderbuffer_storage
(ctx=0x557d90f43030, rb=0x557d8f7c1c00, internalFormat=6402, width=2463255520,
height=21885) at intel_fbo.c:305
#5 0x00007faee0d2bf64 in _mesa_resize_framebuffer
(ctx=***@entry=0x557d90f43030, fb=0x557d8f895900, width=2463255520,
height=21885) at main/framebuffer.c:298
#6 0x00007faee0fbb1a4 in driUpdateFramebufferSize
(ctx=***@entry=0x557d90f43030, dPriv=***@entry=0x557d90f11800) at
dri_util.c:842
#7 0x00007faee0ff873c in intel_update_renderbuffers
(context=***@entry=0x557d90f2e450, drawable=***@entry=0x557d90f11800)
at brw_context.c:1362
#8 0x00007faee0ff8e41 in intel_prepare_render (brw=***@entry=0x557d90f43030)
at brw_context.c:1379
#9 0x00007faee0ff470e in brw_clear (ctx=0x557d90f43030, mask=256) at
brw_clear.c:278
#10 0x00007faee0c910a5 in clear_bufferfv (no_error=false, value=0x557d915d28b4,
drawbuffer=0, buffer=6144, ctx=0x557d90f43030) at main/clear.c:599
#11 _mesa_ClearBufferfv (buffer=6144, drawbuffer=0, value=0x557d915d28b4) at
main/clear.c:634
#12 0x00007faedf641644 in ?? () from
/home/cduez/.local/share/Steam/steamapps/common/dota 2
beta/game/bin/linuxsteamrt64/librendersystemgl.so
#13 0x00007faedf62285f in ?? () from
/home/cduez/.local/share/Steam/steamapps/common/dota 2
beta/game/bin/linuxsteamrt64/librendersystemgl.so
#14 0x00007faedf5eda3f in ?? () from
/home/cduez/.local/share/Steam/steamapps/common/dota 2
beta/game/bin/linuxsteamrt64/librendersystemgl.so
#15 0x00007faedf5ee23c in ?? () from
/home/cduez/.local/share/Steam/steamapps/common/dota 2
beta/game/bin/linuxsteamrt64/librendersystemgl.so
#16 0x00007faedf60e5b1 in ?? () from
/home/cduez/.local/share/Steam/steamapps/common/dota 2
beta/game/bin/linuxsteamrt64/librendersystemgl.so
#17 0x00007faee7efbe26 in ?? () from
/home/cduez/.local/share/Steam/steamapps/common/dota 2
beta/game/bin/linuxsteamrt64/libtier0.so
#18 0x00007faeeb82408a in start_thread () from /usr/lib/libpthread.so.0
#19 0x00007faeebb3142f in clone () from /usr/lib/libc.so.6

I am not a gdb expert so if you need further informations, let me know.

--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-01 18:26:25 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #8 from ***@gmail.com ---
Created attachment 136472
--> https://bugs.freedesktop.org/attachment.cgi?id=136472&action=edit
GDB output

--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-01 18:29:03 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #9 from ***@gmail.com ---
I have put the gdb output in attachment for more clarity, sorry for the spam.

Also we might want to edit the subject because since mesa 17.3.1 it's not only
a problem about the overlay but the game doesn't start at all.

--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2018-01-02 01:44:01 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #10 from Sven <***@gmail.com> ---
Thank you Cyril for providing the requested gdb trace.

--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2018-01-02 14:35:23 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #11 from Cyril <***@gmail.com> ---
Used git bisect (between tag mesa-17.3.1 and mesa-17.2.6) and tested if i could
launch the game or not for each iteration. Got this commit at the end :


15e208c4ccdd94582a459d0066b587f91caf270c is the first bad commit
commit 15e208c4ccdd94582a459d0066b587f91caf270c
Author: Thomas Hellstrom <***@vmware.com>
Date: Thu Sep 14 13:09:05 2017 +0200

loader/dri3: Don't accidently free buffer holding new back content

Avoid freeing buffers holding new back content
(with GLX_SWAP_COPY_OML and GLX_SWAP_EXCHANGE_OML)
Prevously that would have resulted in back buffer content becoming
incorrect after a swap, although I haven't managed to trigger such a
situation yet.

Signed-off-by: Thomas Hellstrom <***@vmware.com>
Reviewed-by: Sinclair Yeh <***@vmware.com>



I was able to launch dota by reverting this one but i guess it's not the proper
fix :D

--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-02 23:23:09 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

Ian Romanick <***@freedesktop.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@freedesktop.org

--- Comment #12 from Ian Romanick <***@freedesktop.org> ---
Created attachment 136512
--> https://bugs.freedesktop.org/attachment.cgi?id=136512&action=edit
Fail gracefully when make_surface returns NULL

Does this patch help? If there is any difference in behavior with this patch,
can you describe it?

--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2018-01-03 09:22:59 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #13 from Cyril <***@gmail.com> ---
The game start with the patch on top of mesa-17.3.1.

Also tried the Steam overlay (Shift+tab) and switching desktop while in game,
had no crash so it seems good.

--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2018-01-03 15:22:31 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

Ian Romanick <***@freedesktop.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |NEEDINFO
Assignee|mesa-***@lists.freedesktop. |***@freedesktop.org
|org |

--- Comment #14 from Ian Romanick <***@freedesktop.org> ---
(In reply to Cyril from comment #13)
> The game start with the patch on top of mesa-17.3.1.
>
> Also tried the Steam overlay (Shift+tab) and switching desktop while in
> game, had no crash so it seems good.

And does everything look correct too?

--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-03 16:26:43 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #15 from Cyril <***@gmail.com> ---
(In reply to Ian Romanick from comment #14)
> (In reply to Cyril from comment #13)
> > The game start with the patch on top of mesa-17.3.1.
> >
> > Also tried the Steam overlay (Shift+tab) and switching desktop while in
> > game, had no crash so it seems good.
>
> And does everything look correct too?

I don't see any artifacts or glitches, it does look good to me. In the other
hand, i only tested for 3/4 min as i don't have enough time for an entire game
right now.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-03 20:55:24 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

Ian Romanick <***@freedesktop.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEEDINFO |ASSIGNED

--- Comment #16 from Ian Romanick <***@freedesktop.org> ---
(In reply to Cyril from comment #15)
> (In reply to Ian Romanick from comment #14)
> > (In reply to Cyril from comment #13)
> > > The game start with the patch on top of mesa-17.3.1.
> > >
> > > Also tried the Steam overlay (Shift+tab) and switching desktop while in
> > > game, had no crash so it seems good.
> >
> > And does everything look correct too?
>
> I don't see any artifacts or glitches, it does look good to me. In the other
> hand, i only tested for 3/4 min as i don't have enough time for an entire
> game right now.

Excellent. I've sent a patch to the list with your Tested-by added.

https://patchwork.freedesktop.org/patch/195626/

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-03 22:36:09 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #17 from Sven <***@gmail.com> ---
The bug I originally reported (dota crashed when switching between game and
desktop) is not yet fixed. I managed to rebuild mesa with the patch applied. I
will try to enable debug symbols next and then I'll try to provide the gdb
trace like Cyril did.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-03 22:45:15 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #18 from Evangelos Foutras <***@foutrelis.com> ---
Could this be related to bug 104342 (and the most likely duplicate bugs I've
added to its "See Also" bugs)? The backtrace here looks a bit different but the
brw_clear() call is common (and a few other calls further up).

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-03 23:00:13 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #19 from Sven <***@gmail.com> ---
It sound like bug 104392 .

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-04 09:38:14 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #20 from Cyril <***@gmail.com> ---
Sven, to enable debug symbol you have to add

options=(debug !strip)

in the PKGBUILB.


And then you have:

GAME_DEBUGGER=gdb steam

to attach gdb to the game.


I wasn't able to reproduce the crash with the alt-tab, even this morning. I am
using i3 as window manager.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-05 22:14:42 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #21 from Sven <***@gmail.com> ---
Created attachment 136577
--> https://bugs.freedesktop.org/attachment.cgi?id=136577&action=edit
gdb output from crash after alt+tab switching

And here's the output after dota crashed when I switch application with
alt+tab.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-07 08:09:53 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #22 from Evangelos Foutras <***@foutrelis.com> ---
(In reply to Cyril from comment #11)
> Used git bisect (between tag mesa-17.3.1 and mesa-17.2.6) and tested if i
> could launch the game or not for each iteration. Got this commit at the end :
>
> 15e208c4ccdd94582a459d0066b587f91caf270c is the first bad commit

I reached the same commit as the first commit that triggers segfaults with mpv
(see bug 104376 and its "see also" bugs).

The patch from comment 12 does *not* fix my mpv issue (testing on top of
master), so perhaps the actual issue is with commit 15e208c4cc? [1]

[1] https://cgit.freedesktop.org/mesa/mesa/commit/?id=15e208c4cc

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-08 21:58:24 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

Mark Janes <***@intel.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@vmware.com

--- Comment #23 from Mark Janes <***@intel.com> ---
Adding Thomas to the CC, since it bisects to his commit.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-09 02:32:43 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #24 from Sven <***@gmail.com> ---
I'm sorry but multiple issue seem to be discussed here:
- dota crashing on startup (seems to be fixed)
- dota crashing on alt+tab (not yet fixed)
- mpv crashing (not yet fixed)


The alt+tab crash I'm seeing seems to stem from a bug or problem in
intel_miptree_create_for_dri_image. I posted the stack trace above.

(In reply to Sven from comment #19)
> It sound like bug 104392 .

I must revert that comment. The backtrace posted there is significantly
different.

(In reply to Evangelos Foutras from comment #18)
> Could this be related to bug 104342 (and the most likely duplicate bugs I've
> added to its "See Also" bugs)? The backtrace here looks a bit different but
> the brw_clear() call is common (and a few other calls further up).

I'm not sure whether this is a duplicate of bug 104342. But yes, my backtrace
also contains brw_clear(). Is there some theory on what's going on or a patch I
could test?

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-10 09:45:18 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #25 from Thomas Hellström <***@vmware.com> ---
Created attachment 136645
--> https://bugs.freedesktop.org/attachment.cgi?id=136645&action=edit
Patch to fix a potential problem with renderbuffer freeing

There is a potential problem with the bisected commit.
I've attached a patch that should remedy that problem if that really was the
cause of the problem.

Having said that, it appears like the backtraces are taken with mesa_glthread
on.

DRI3 is not really thread-safe, so basically any change however small can break
that mode of operation, so please also retest with

export mesa_glthread=0

then relaunch the app.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-10 09:48:27 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #26 from Evangelos Foutras <***@foutrelis.com> ---
Any Mesa devs trying to repro this, please test with display compositing
disabled.

I posted some possibly useful printf output in [1]. I'm not sure if
dri3_handle_present_event() is wrong to free buffers that don't match
"buf->pixmap == ie->pixmap" but the commit message [2] doesn't seem to
sufficiently explain this.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=104376#c11

[2] https://cgit.freedesktop.org/mesa/mesa/commit/?id=15e208c4cc

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-10 10:18:52 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #27 from Evangelos Foutras <***@foutrelis.com> ---
(In reply to Thomas Hellström from comment #25)
> Created attachment 136645 [details] [review]
> Patch to fix a potential problem with renderbuffer freeing

Doesn't fix it for me (testing with mpv; probably same for dota).

With the printf()s from bug 104376, and printing !!buf->busy just before the
dri3_free_render_buffer() call in dri3_handle_present_event(), I get this just
before it crashes:

==================
buf->busy = 0
dri3_handle_present_event() freed 0x7fffbc8d6380
dri3_get_buffer() freeing buffer = 0x7fffbc8d6380; draw->buffers[buf_id] =
(nil)
==================

I'm totally unfamiliar with gpu driver code but I'm wondering why the
"buf->pixmap == ie->pixmap" check isn't used anymore; why didn't commit
15e208c4cc just stick "draw->cur_blit_source != b" into the second if
conditional and leave the rest as it was?

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-10 10:26:00 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #28 from Thomas Hellström <***@vmware.com> ---
(In reply to Evangelos Foutras from comment #27)

> I'm totally unfamiliar with gpu driver code but I'm wondering why the
> "buf->pixmap == ie->pixmap" check isn't used anymore; why didn't commit
> 15e208c4cc just stick "draw->cur_blit_source != b" into the second if
> conditional and leave the rest as it was?

Because then we would potentially leak buffers since we'd have nothing that
freed that buffer if freeing had been skipped because draw->cur_blit_source !=
b.

Anyway, I think I know what might be going on now. Updated patch soon.

/Thomas

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-10 10:33:10 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

Thomas Hellström <***@vmware.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #136645|0 |1
is obsolete| |

--- Comment #29 from Thomas Hellström <***@vmware.com> ---
Created attachment 136647
--> https://bugs.freedesktop.org/attachment.cgi?id=136647&action=edit
Updated patch to avoid freeing renderbuffers currently in use

So what might be happening is that dri3_get_buffer() decides to reuse a render
buffer, but then we switch from page-flipping to non-page flipping and receive
an event that lowers the back buffer count and thus also frees that render
buffer from under dri3_get_buffer().

So only allow updating the back buffer count in dri3_find_back().

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-10 11:49:52 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #30 from Evangelos Foutras <***@foutrelis.com> ---
(In reply to Thomas Hellström from comment #29)
> Created attachment 136647 [details] [review]
> Updated patch to avoid freeing renderbuffers currently in use

Seems to fix my mpv crashes, thanks! (Perhaps Cyril or Sven might be able to
confirm the same about dota.)

FWIW the "buf->busy == 0" condition seems to always evaluate to true in my
tests; if I omit it and print !!buf->busy before the dri3_free_render_buffer()
call, it always outputs "0".

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-10 12:33:37 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #31 from Thomas Hellström <***@vmware.com> ---
(In reply to Evangelos Foutras from comment #30)

> FWIW the "buf->busy == 0" condition seems to always evaluate to true in my
> tests; if I omit it and print !!buf->busy before the
> dri3_free_render_buffer() call, it always outputs "0".

Yes, I think with the current hardcoded number of back buffers in the flip
chain, it should probably always evaluate to true. However if someone in the
future were to change that number, it might in some situations evaluate to
false, so I'd rather keep that test around to make the code less fragile.

/Thomas

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-10 17:44:45 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

Thomas Hellström <***@vmware.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@outlook.com

--- Comment #32 from Thomas Hellström <***@vmware.com> ---
*** Bug 104376 has been marked as a duplicate of this bug. ***

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-10 17:46:56 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

Thomas Hellström <***@vmware.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
Assignee|***@freedesktop.org |***@vmware.com

--- Comment #33 from Thomas Hellström <***@vmware.com> ---
Taking bug as it appears to be my fault.

Ian, please assign back if you feel differently.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-11 01:01:18 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #34 from Evangelos Foutras <***@foutrelis.com> ---
I built mesa/lib32-mesa packages for Arch Linux which include the patch from
comment 29 (not sure if lib32-mesa is needed but it can't hurt either):

https://pkgbuild.com/~foutrelis/test-builds/mesa/bug-56944-patch-136647/

@Sven, @Cyril: Please check if dota still crashes (either at startup or
alt-tab).

(The packages are signed with my packager key so you can 'pacman -U <URL>'
them.)

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-11 03:51:57 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #35 from ***@fratti.ch ---
That patch fixes https://bugs.freedesktop.org/show_bug.cgi?id=104301 too, so
I'm guessing this was the same issue.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-11 05:35:35 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #36 from Evangelos Foutras <***@foutrelis.com> ---
*** Bug 104342 has been marked as a duplicate of this bug. ***

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-11 06:03:10 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

Evangelos Foutras <***@foutrelis.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@fratti.ch
CC| |***@xmw.de

--- Comment #37 from Evangelos Foutras <***@foutrelis.com> ---
*** Bug 104301 has been marked as a duplicate of this bug. ***

--- Comment #38 from Evangelos Foutras <***@foutrelis.com> ---
*** Bug 104443 has been marked as a duplicate of this bug. ***

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-11 08:07:10 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #39 from Florian Bruhin <***@the-compiler.org> ---
FWIW the crashes I've seen in mpv and on YouTube when resizing the window also
seem to be gone with the package from comment 34.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-11 09:11:51 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #40 from Andriy Khulap <***@globallogic.com> ---
I was able to reproduce the Bug 104301 on the following system:
Intel(R) HD Graphics 530 (Skylake GT2) (0x191b)
Ubuntu 16.04 LTS (Kernel 4.4.0)
Mesa 17.3.0 and latest from git.
mpv 0.14.0-git-e1993d5

Debugging showed the following sequence:
1. dri3_get_buffer() is called with draw->num_back=3 and so dri3_find_back()
returns buf_id=2.
2. then dri3_fence_await() is called and starts to process events:
XCB_PRESENT_COMPLETE_NOTIFY which dri3_update_num_back() to num_back=2;
XCB_PRESENT_EVENT_IDLE_NOTIFY which free currently used buffer with id=2.
3. buffer becomes corrupted (e.g. buffer->image->bo=0x55), dri3_get_buffer()
continues to use that buffer and then tries to dri3_free_render_buffer().
buffer->image->bo is not NULL so passes the check in brw_bo_unreference() and
causes segmentation fault.

Patch from Comment 29 solved that issue for me, I can't reproduce it any more.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-11 09:17:34 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #41 from Andriy Khulap <***@globallogic.com> ---
BTW my quick fix was removing the event processing from dri3_fence_await().
(added by a727c804a2c17db306c68e259ae845aa6382d3b1 loader/dri3: Process event
after each fence wait)

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-11 09:21:12 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #42 from Cyril <***@gmail.com> ---
(In reply to Evangelos Foutras from comment #34)
> I built mesa/lib32-mesa packages for Arch Linux which include the patch from
> comment 29 (not sure if lib32-mesa is needed but it can't hurt either):
>
> https://pkgbuild.com/~foutrelis/test-builds/mesa/bug-56944-patch-136647/
>
> @Sven, @Cyril: Please check if dota still crashes (either at startup or
> alt-tab).
>
> (The packages are signed with my packager key so you can 'pacman -U <URL>'
> them.)

Tested and approved. I can launch dota. I wasn't affected by the alt-tab bug so
i can't say if it better or not.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-11 09:39:39 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

Evangelos Foutras <***@foutrelis.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #43 from Evangelos Foutras <***@foutrelis.com> ---
*** Bug 104392 has been marked as a duplicate of this bug. ***

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-11 09:56:38 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #44 from Thomas Hellström <***@vmware.com> ---
I sent out the patch for review on the mesa-dev list now.

If anybody wants a Reported-by: or Tested-by: tag, please reply to that email,

Thanks,
Thomas

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-11 22:41:03 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #45 from Mark Janes <***@intel.com> ---
Thomas: do you have any ideas on how we could catch this category of bug in
automated testing? We have comprehensive automated tests for GL/Vulkan apis,
but not much for dri3.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-12 00:42:31 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #46 from Sven <***@gmail.com> ---
(In reply to Evangelos Foutras from comment #30)
> (In reply to Thomas Hellström from comment #29)
> > Created attachment 136647 [details] [review] [review]
> > Updated patch to avoid freeing renderbuffers currently in use
>
> Seems to fix my mpv crashes, thanks! (Perhaps Cyril or Sven might be able to
> confirm the same about dota.)
>
> FWIW the "buf->busy == 0" condition seems to always evaluate to true in my
> tests; if I omit it and print !!buf->busy before the
> dri3_free_render_buffer() call, it always outputs "0".

I applied the patch from comment #29 (and that patch only) and recompiled mesa
17.3.1. So far, dota hasn't crashed on alt+tab.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-12 06:38:26 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #47 from Thomas Hellström <***@vmware.com> ---
(In reply to Mark Janes from comment #45)
> Thomas: do you have any ideas on how we could catch this category of bug in
> automated testing? We have comprehensive automated tests for GL/Vulkan
> apis, but not much for dri3.

When enabling dri3 in our Xorg driver we caught a number of viewport bugs in
mesa core dri3 using glretrace with various game traces. We currently do not
support page-flipping in our xorg driver, which might be why this wasn't
caught, but that would otherwise be a good candidate for automated testing:

Generate apitraces with frequent window resizing, and automate glretraces with
image capture and image comparisons, and in addition find a way to trigger
transition to- and from page-flipping.

/Thomas

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-12 08:51:23 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

ubitux <***@pkh.me> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@pkh.me

--- Comment #48 from ubitux <***@pkh.me> ---
*** Bug 104579 has been marked as a duplicate of this bug. ***

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-12 11:17:26 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

Tapani PÀlli <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@free.fr

--- Comment #49 from Tapani PÀlli <***@gmail.com> ---
(In reply to Thomas Hellström from comment #47)
> (In reply to Mark Janes from comment #45)
> > Thomas: do you have any ideas on how we could catch this category of bug in
> > automated testing? We have comprehensive automated tests for GL/Vulkan
> > apis, but not much for dri3.
>
> When enabling dri3 in our Xorg driver we caught a number of viewport bugs in
> mesa core dri3 using glretrace with various game traces. We currently do not
> support page-flipping in our xorg driver, which might be why this wasn't
> caught, but that would otherwise be a good candidate for automated testing:
>
> Generate apitraces with frequent window resizing, and automate glretraces
> with image capture and image comparisons, and in addition find a way to
> trigger transition to- and from page-flipping.
>
> /Thomas

I think Martin has/had some plans to add these kind of tests to his ezbench
system, FYI Martin.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-12 13:53:48 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #50 from Martin Peres <***@free.fr> ---
(In reply to Tapani PÀlli from comment #49)
> (In reply to Thomas Hellström from comment #47)
> > (In reply to Mark Janes from comment #45)
> > > Thomas: do you have any ideas on how we could catch this category of bug in
> > > automated testing? We have comprehensive automated tests for GL/Vulkan
> > > apis, but not much for dri3.
> >
> > When enabling dri3 in our Xorg driver we caught a number of viewport bugs in
> > mesa core dri3 using glretrace with various game traces. We currently do not
> > support page-flipping in our xorg driver, which might be why this wasn't
> > caught, but that would otherwise be a good candidate for automated testing:
> >
> > Generate apitraces with frequent window resizing, and automate glretraces
> > with image capture and image comparisons, and in addition find a way to
> > trigger transition to- and from page-flipping.
> >
> > /Thomas
>
> I think Martin has/had some plans to add these kind of tests to his ezbench
> system, FYI Martin.

This is already implemented, however the error reporting in case the image does
not get generated is not correct (just to clean up the trainee's code that
landed in ezbench to be more resistant against that).

And by the way, Eero also added support for doing the same with vulkan traces
(through vkreplay).

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-13 00:40:21 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #51 from Jason Ekstrand <***@jlekstrand.net> ---
Can we write a piglit test or two that reproduces this bug? It would be very
good if had a nice self-contained test that we can run in CI and avoid these
types of issues in the future.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-13 06:11:48 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

***@yandex.com changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@yandex.com

--- Comment #52 from ***@yandex.com ---
*** Bug 104583 has been marked as a duplicate of this bug. ***

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-13 09:46:57 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #53 from Thomas Hellström <***@vmware.com> ---
(In reply to Jason Ekstrand from comment #51)
> Can we write a piglit test or two that reproduces this bug? It would be
> very good if had a nice self-contained test that we can run in CI and avoid
> these types of issues in the future.

As a comment to this, the origin of this bug is in itself an interesting
sequence of events that might in one way answer your question:

1) Implemented and enabled dri3 support in the vmware Xorg driver.
2) A lot of glretrace automated tests started failing.
3) Fixed / worked around a number of viewport issues in the mesa dri3 core.
3a) Fixed a number of synchronization issues in the mesa dri3 core.
4) Some glretrace automated tests still failing.
5) Implemented dri3 SWAP_COPY_OML to have glretrace mimic the capturing
platform behaviour (which was in some cases WGL)
6) SWAP_COPY_OML didn't work due to a long standing bug in mesa core dri.
7) Fixed that. glretrace started to render transparently with SWAP_COPY_OML
fbconfigs.
8) This was caused by a long standing issue in the Xorg GLX layer. Worked
around that.
9) Kwin and other applications started having transparent rendering issues.
10) Implemented a proper fix for 8)
11) Item 3) was reported to cause a multi-threaded game to fail to start on
some platforms with page-flip only. Turned out mesa core dri3 is not
thread-safe. Implemented a game-specific workaround.
12) This bug surfaced (was caused by 5)) and was fixed in a way that will
decrease the chances of it ever happening again. (Perhaps an extra comment in
the source is needed).

So why am I listing this? The reason is that I agree we need more piglit tests
to verify GLX and dri3 functionality, but while this bug probably has had the
strongest user impact, that doesn't necessarily mean it's likely to happen
again. I think the biggest problem with the mesa dri3 implementation currently
is that it was written without multi-threading in mind, and adding
thread-safety as a hindsight might prove difficult and error-prone. It's also
currently lacking a well-documented and well defined strategy to handle
drawable size-changes and viewport changes.

So IMHO we should try to write piglit tests for the areas where we know there
are remaining issues.

/Thomas

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-13 19:27:26 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #54 from Jason Ekstrand <***@jlekstrand.net> ---
(In reply to Thomas Hellström from comment #53)
> So IMHO we should try to write piglit tests for the areas where we know
> there are remaining issues.

That's a reasonable thing to say. However, our DRI testing in piglit is rather
pitiful across the board IMHO. Part of the reason why you had to fix 6
different bugs in order to turn on DRI3 is that we've been lazy about testing
when working on the DRI code. Window system stuff is annoyingly full of
edge-cases and the piglit tests we have tend to only touch-test things. This
is only one of the 3 or 4 DRI bugs we've shipped in the last few months that
piglit has been perfectly happy with and then it messes up users badly.

While I would love to give someone the general task of improving piglit testing
of DRI, a good place to start is to write tests for known bugs. If we have a
really nasty threading bug, let's make a test that hammers on threading.

Ok, I've said my piece. Maybe I'm being unreasonable but it sounds good in my
head. :-)

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-18 20:02:09 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #55 from Juan A. Suarez <***@igalia.com> ---
Is this bug already fixed? With commit:

commit 897c54d522ab960a879b763a15e489f630c491ee
Author: Thomas Hellstrom <***@vmware.com>
Date: Thu Jan 11 10:19:23 2018 +0100

loader/dri3: Avoid freeing renderbuffers in use

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-19 11:00:48 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #56 from Sven <***@gmail.com> ---
(In reply to Juan A. Suarez from comment #55)
> Is this bug already fixed? With commit:
>
> commit 897c54d522ab960a879b763a15e489f630c491ee
> Author: Thomas Hellstrom <***@vmware.com>
> Date: Thu Jan 11 10:19:23 2018 +0100
>
> loader/dri3: Avoid freeing renderbuffers in use

The patch from comment #29 (which is identical to the commit, right?) has been
included in recent versions of Mesa on Arch Linux. So far, I have had zero
crashes. So yes, I believe this is fixed.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-19 12:39:11 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

Thomas Hellström <***@vmware.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED

--- Comment #57 from Thomas Hellström <***@vmware.com> ---
Should be fixed now.

--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@freedesktop.org
2018-01-20 20:20:43 UTC
Permalink
Raw Message
https://bugs.freedesktop.org/show_bug.cgi?id=104214

--- Comment #58 from ***@outlook.com ---
I can confirm this has been fixed for me, I was experiencing the browser
crashes explained in 104376.

--
You are receiving this mail because:
You are the QA Contact for the bug.
Loading...