Discussion:
[Mesa-dev] [Bug 108933] Unreal Tournament (UT99) segfault on opengl init
b***@freedesktop.org
2018-12-03 18:59:58 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

Bug ID: 108933
Summary: Unreal Tournament (UT99) segfault on opengl init
Product: Mesa
Version: git
Hardware: Other
OS: All
Status: NEW
Severity: normal
Priority: medium
Component: Mesa core
Assignee: mesa-***@lists.freedesktop.org
Reporter: ***@rkmail.ru
QA Contact: mesa-***@lists.freedesktop.org

Unreal Tournament crashed upon opengl context creation. Tried both with stock
OpenGLDrv.so and with UTPG one.

Mesa-git 89b4798c0619a2ba99046d5ad36f0e6851625f7a, tried both with radeonsi and
llvmpipe, same result.


Game used to work with Mesa 18.0

Program received signal SIGSEGV, Segmentation fault.
0xeee7b5e2 in ?? () from /usr/lib/libstdc++.so.6
(gdb) bt
#0 0xeee7b5e2 in ?? () from /usr/lib/libstdc++.so.6
#1 0xeedeaa4a in bool std::has_facet<std::ctype<char> >(std::locale const&) ()
from /usr/lib/libstdc++.so.6
#2 0xeeddca1f in std::basic_ios<char, std::char_traits<char>
::_M_cache_locale(std::locale const&) () from /usr/lib/libstdc++.so.6
#3 0xeeddce8b in std::basic_ios<char, std::char_traits<char>
::init(std::basic_streambuf<char, std::char_traits<char> >*) () from
/usr/lib/libstdc++.so.6
#4 0xeed82018 in std::ios_base::Init::Init() () from /usr/lib/libstdc++.so.6
#5 0xf464dbcc in _GLOBAL__sub_I_st_glsl_to_tgsi_array_merge.cpp () from
/usr/lib/dri/swrast_dri.so
#6 0xf7fe5d3b in call_init.part () from /lib/ld-linux.so.2
#7 0xf7fe5e47 in _dl_init () from /lib/ld-linux.so.2
#8 0xf7fea3f2 in dl_open_worker () from /lib/ld-linux.so.2
#9 0xf7985c9b in _dl_catch_error () from /lib/libc.so.6
#10 0xf7fe9a69 in _dl_open () from /lib/ld-linux.so.2
#11 0xf7f7dc65 in dlopen_doit () from /lib/libdl.so.2
#12 0xf7985c9b in _dl_catch_error () from /lib/libc.so.6
#13 0xf7f7e36e in _dlerror_run () from /lib/libdl.so.2
#14 0xf7f7dcee in dlopen@@GLIBC_2.1 () from /lib/libdl.so.2
#15 0xf55a532c in loader_open_driver () from /usr/lib/libGLX_mesa.so.0
#16 0xf559a95b in driOpenDriver () from /usr/lib/libGLX_mesa.so.0
#17 0xf5599eac in driswCreateScreen () from /usr/lib/libGLX_mesa.so.0
#18 0xf55758de in __glXInitialize () from /usr/lib/libGLX_mesa.so.0
#19 0xf55709d5 in GetGLXPrivScreenConfig () from /usr/lib/libGLX_mesa.so.0
#20 0xf5571428 in glXChooseVisual () from /usr/lib/libGLX_mesa.so.0
#21 0xf7b5dc10 in X11_GL_GetVisual () from ./libSDL-1.1.so.0
#22 0xf7b62e85 in X11_CreateWindow () from ./libSDL-1.1.so.0
#23 0xf7b636cd in X11_SetVideoMode () from ./libSDL-1.1.so.0
#24 0xf7b53c1c in SDL_SetVideoMode () from ./libSDL-1.1.so.0
#25 0xf6b6467a in USDLViewport::ResizeViewport(unsigned int, int, int, int) ()
from ./SDLDrv.so
#26 0xf6157721 in UOpenGLRenderDevice::SetRes(int, int, int, int) () from
./OpenGLDrv.so
#27 0xf61573a7 in UOpenGLRenderDevice::Init(UViewport *, int, int, int, int) ()
from ./OpenGLDrv.so
#28 0xf6b641ed in USDLViewport::TryRenderDevice(char const *, int, int, int,
int) ()
from ./SDLDrv.so
#29 0xf6b64f97 in USDLViewport::OpenWindow(unsigned int, int, int, int, int,
int) ()
from ./SDLDrv.so
#30 0xf7da63a2 in UGameEngine::Init(void) () from ./Engine.so
#31 0x0804d7a6 in _start ()
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2018-12-06 01:50:42 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #1 from ***@yahoo.com ---
Mesa developers need to be able to reproduce the problem and they are not going
to install UT to debug this. So we'll have to do a bit more investigating on
this one.


There are few things for you to try:

1. See if changing your locale has effect on the crash. That is "LANG" and
"LC_*" environment variables. If that makes difference, make sure that you have
the locale files and they are not corrupted. (Keep the suspected files, you may
need to report bug to glibc or gcc.)

2. The problem might be random memory corruption, since both games are native
for linux, try running them under valgrind. It might produce a lot of noise
(using uninitialized value ...) but if there is out-of-bound write or
use-after-free it should get it.

3. Try git bisect. First see if you can use existing releases to narrow down
the moment when things broke, then use them as good and bad points to find what
commit broke them. Do 18.0, 18.2, then maybe 18.1 . No need to try different
patch versions (aka 18.0.1/2/3/4/..), as they are in separate branches that get
changes backported. Do check 18.0, just to be sure that it still works.


I don't have the linux version installed atm, and it would take some manual
tweaking to trick the loki installer to run on x64 and Unreal Anthology.
Running Windows UT99 under wine crashes at startup, Unreal.log shows that it
crashed while glGetString(GL_EXTENSIONS). That might be totally unrelated...
but it reminds me of a bug where the game does not have big enough buffer to
get all GL_EXTENSIONS.

The game writes log file in ~/.loki/ut/System/*.log see if there is something
relevant.
--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-12-06 07:44:04 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #2 from ***@rkmail.ru ---
Created attachment 142739
--> https://bugs.freedesktop.org/attachment.cgi?id=142739&action=edit
valgrind log

(In reply to iive from comment #1)
Post by b***@freedesktop.org
1. See if changing your locale has effect on the crash.
I've tried with C locale, got same result. Also, I tried different kernel
versions, just in case it has something to do with recent cpu vulnerability
fixes, but it's also not the case.
Post by b***@freedesktop.org
2. The problem might be random memory corruption, since both games are
native for linux, try running them under valgrind.
I'm not competent enough to use valgrind correctly, but it's interesting that
running 'valgrid ./ut-bin' gives SIGILL fault rather than SIGSEGV. Also, fault
in Core.so (game component), but it works fine with game's built-in software
renderer (SDLSoftDrv.so)
Post by b***@freedesktop.org
3. Try git bisect.
Bisecting this involves lots of cross-compiling, unfortunately I cannot afford
it right now :(
Post by b***@freedesktop.org
Unreal.log shows that it crashed while glGetString(GL_EXTENSIONS).
Too long GL extension list was definitely a problem a some point, but there are
plenty of fixed OpenGLDrv.so libraries all over the Internet (they explicitly
have it mentioned in changelog). I tried all of them I could find, no changes.
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2018-12-06 14:46:31 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #3 from ***@yahoo.com ---
I got the UT99 working.

valgrind doesn't show anything more, the first error is the one from the
report.

When trying to narrow mesa releases that work, I got mesa-18.1.7 working and
mesa-18.2.0 not working. However the bisect failed.
Even compiling mesa-18.0.0 also produces broken compilation.

Since I've done major update before compiling my mesa-18.2.0, it makes sense
that the bug is gcc/glibc related.

I had libstdc++.so.6.25 used, so I got an older libstdc++.so.6.24 one instead.
The problem remained. (I'm sure it got used, because I forgot to fix the link
and got another error.)

I tried compiling with -O0, but the bug remains. It is not miscompilation, per
se. It is more likely to be something related to ABI/API.

Here is the backtrace of current Mesa 19.0.0-devel (git-3b2ad8b290)
---
#1 0xf1be6ba8 in bool std::has_facet<std::ctype<char> >(std::locale const&) ()
from /usr/lib/libstdc++.so.6
#2 0xf1bd6f1a in std::basic_ios<char, std::char_traits<char>
::_M_cache_locale(std::locale const&) () from /usr/lib/libstdc++.so.6
#3 0xf1bd7399 in std::basic_ios<char, std::char_traits<char>
::init(std::basic_streambuf<char, std::char_traits<char> >*) () from
/usr/lib/libstdc++.so.6
#4 0xf1b75563 in std::ios_base::Init::Init() () from /usr/lib/libstdc++.so.6
#5 0xf5485d4b in __static_initialization_and_destruction_0 (__initialize_p=1,
__priority=65535) at /usr/include/c++/8.2.0/iostream:74
#6 0xf5485d93 in _GLOBAL__sub_I_st_glsl_to_tgsi_temprename.cpp(void) () at
state_tracker/st_glsl_to_tgsi_temprename.cpp:1426
#7 0xf595cc82 in __do_global_ctors_aux () from
/usr/lib/xorg/modules/dri/r600_dri.so
#8 0xf2230dc0 in ?? () from /usr/lib/libLLVM-6.0.so
#9 0xf514c025 in _init () from /usr/lib/xorg/modules/dri/r600_dri.so
#10 0xf4df26bc in ?? () from /usr/lib/libLLVM-6.0.so
---

The disassembly of the function in frame #1 looks like this:
---
[...]
0xf1be6b99 <+73>: mov -0x2a8(%ebx),%eax
0xf1be6b9f <+79>: mov %eax,0x4(%esp)
0xf1be6ba3 <+83>: call 0xf1b5a570 <***@plt>
=> 0xf1be6ba8 <+88>: test %eax,%eax
0xf1be6baa <+90>: setne %al
0xf1be6bad <+93>: add $0x18,%esp
0xf1be6bb0 <+96>: pop %ebx
0xf1be6bb1 <+97>: ret
---

Frame #6 is the closing bracket of dump_instruction() debug function.
Hollowing the whole function (#if/#endif) just moved the line number.

I'm not that familiar with C++ and debugging it. Maybe somebody could weight
in.
It seems that on init something calls global constructors and one of them needs
something that is not yet initialized or something.
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2018-12-06 15:44:27 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #4 from Michel DÀnzer <***@daenzer.net> ---
Which version of g++/gcc are you using? Some 8.2 snapshots have a bug which
causes mis-compilation of Mesa code:
https://bugzilla.redhat.com/show_bug.cgi?id=1645400
--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-12-06 18:07:34 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #5 from ***@yahoo.com ---
The upgrade was from gcc-7.3.0 to gcc-8.2.0. You can see 8.2.0 include in the
backtrace.

I don't think that we can blame gcc-8.2.0 for the redhat bugreport, as they do
use a development snapshot. (Now stable releases of GCC are always x.y.0).

I would repeat that I did a compile with -O0 for CFLAGS and CXXFLAGS.
--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-12-08 14:51:41 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #6 from ***@yahoo.com ---
Created attachment 142752
--> https://bugs.freedesktop.org/attachment.cgi?id=142752&action=edit
Workaround for mesa crashing on UT99 because of static global constructor from
C++ iostream

Just few observations so far.

1.
I said that using libstdc++.so.24 gave different error. That's because I
replaced it after mesa compilation.
If mesa is compiled with g++-8.2 and libstdc++.so.24 it works with UT99.

2.
Another workaround is to remove completely all "include <iostream>".
There are 3 places where it is used.
"st_glsl_to_tgsi.cpp" and "st_glsl_to_tgsi_temprename.cpp" - for these just put
"#define NDEBUG 1" at the top of the files and all output would be disabled.
Things are more complicated for "st_glsl_to_tgsi_array_merge.cpp/h". The
debugging there is disabled by default, however not all printing functions are
cut out. So you need to add a bunch of extra #if/#endif to disable them. Note
that the header file also contains inline functions.
(Also, I'm with older LLVM, so if LLVM ever uses iostream, it may cause the
same problem.)

3.
As to why "include <iostream>" causes/triggers the problem.
A bit of googling turned out this problem:
https://isocpp.org/wiki/faq/ctors#static-init-order

Static global constructors are called before main(). But their order is random.
If one depends on another, they could be called in the wrong order.

The "iosteam" has this line "static ios_base::Init __ioinit;". If it looks
familiar, you've seen it in the backtrace.

For now the main question is why it fails only with UT99 but not others. I
suspect that the problem may be linked to ldopen() usage and not loading
libstdc++ by the application. Unfortunately I'm having problem finding simple
sample demo programs that does that. (And even then, there might be something
more to it.)
--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-12-08 15:55:38 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #7 from Gustaw Smolarczyk <***@gmail.com> ---
Static initialization order is undefined between translation units (i.e. source
files) but it is defined within one translation unit - it is the global
variable definition order. Since #include <iostream> defines (not declares) a
static global variable with initializer, you can safely use std::cout and
friends from other static initializers that are defined after the <iostream>
include.

The segfault looks like a mismatch in the standard library. Does UT99 use the
/usr/lib/libstdc++.so.6 or does it use a local version? If it's the latter,
what happens when you force it to use the distro one?
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2018-12-08 19:33:50 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #8 from ***@yahoo.com ---
(In reply to Gustaw Smolarczyk from comment #7)
Post by b***@freedesktop.org
Static initialization order is undefined between translation units (i.e.
source files) but it is defined within one translation unit - it is the
global variable definition order. Since #include <iostream> defines (not
declares) a static global variable with initializer, you can safely use
std::cout and friends from other static initializers that are defined after
the <iostream> include.
I've already tried placing "#include <locale>" before "iostream", but it has no
effect. I retested it to be sure.
Post by b***@freedesktop.org
The segfault looks like a mismatch in the standard library. Does UT99 use
the /usr/lib/libstdc++.so.6 or does it use a local version? If it's the
latter, what happens when you force it to use the distro one?
The game does not link to dynamic libstdc++ , most likely it had been
statically linked. The binaries are dated from 2006.

The binaries themselves could be obtained from the freely available linux
installer(s), but they do not contain enough to reproduce the crash. (You need
some of the game files).
--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-12-08 20:32:16 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #9 from Gustaw Smolarczyk <***@gmail.com> ---
(In reply to iive from comment #8)
Post by b***@freedesktop.org
I've already tried placing "#include <locale>" before "iostream", but it has
no effect. I retested it to be sure.
That won't do anything. The locale stuff is handled by libstdc++.so itself, the
include order in the mesa source file doesn't matter.
Post by b***@freedesktop.org
The game does not link to dynamic libstdc++ , most likely it had been
statically linked. The binaries are dated from 2006.
The binaries themselves could be obtained from the freely available linux
installer(s), but they do not contain enough to reproduce the crash. (You
need some of the game files).
The Core.so binary seems to export the __dynamic_cast symbol. It suggests that
it has been statically linked with some old libstdc++ library that is
incompatible with the most recent one.

It might be impossible to run it correctly with any library written in C++.
Mesa and LLVM usually avoid using RTTI, so the <iostream> might be the only
thing that struggles. However, that is still a work-around. Some other driver
might still not work correctly.

I am not sure if removing the iostream sub-library usage from mesa is
acceptable in general.
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2018-12-08 22:41:05 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #10 from ***@yahoo.com ---
(In reply to Gustaw Smolarczyk from comment #9)
Post by b***@freedesktop.org
The Core.so binary seems to export the __dynamic_cast symbol. It suggests
that it has been statically linked with some old libstdc++ library that is
incompatible with the most recent one.
You solved the mystery!

It makes sense since this is the last called function in the disassembly.

I can confirm that the issue does go away after changing the string
"__dynamic_cast" to "__dynamicZcast" in Core.so .

Can you recommend a more clean way to remove that?
Post by b***@freedesktop.org
It might be impossible to run it correctly with any library written in C++.
Mesa and LLVM usually avoid using RTTI, so the <iostream> might be the only
thing that struggles. However, that is still a work-around. Some other
driver might still not work correctly.
How about versioning __dynamic_cast?
If it behaves differently in different version...

Another solution might be linking Mesa plugins statically to the g++ listdc++.
I'm not sure if this is supported atm. It would have been very useful when
Steam used older version.
Post by b***@freedesktop.org
I am not sure if removing the iostream sub-library usage from mesa is
acceptable in general.
Mesa3D is mostly written in C. There are few parts in C++ and it seems that the
files I've patched are the only ones using "iostream". Since iostream is used
only for debugging, it is feasible to disable it on release builds.

But as I've said before, I don't know what LLVM compiled with latest libstdc++
would do.
--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-12-08 23:07:20 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #11 from ***@yahoo.com ---
(In reply to iive from comment #10)
Post by b***@freedesktop.org
I can confirm that the issue does go away after changing the string
"__dynamic_cast" to "__dynamicZcast" in Core.so .
That doesn't work. It is enough for the game to load, render the intro and let
you in the menu. When you start an actual game however, the game exits with
"undefined symbol" error.
--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-12-09 01:51:55 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #12 from Gustaw Smolarczyk <***@gmail.com> ---
(In reply to iive from comment #10)
Post by b***@freedesktop.org
You solved the mystery!
It makes sense since this is the last called function in the disassembly.
I can confirm that the issue does go away after changing the string
"__dynamic_cast" to "__dynamicZcast" in Core.so .
Can you recommend a more clean way to remove that?
As you have already found, that doesn't work unless the symbol is unused.

You could try patching all of the binaries that reference __dynamic_cast, but I
can't promise it would work correctly in the end.
Post by b***@freedesktop.org
How about versioning __dynamic_cast?
If it behaves differently in different version...
That's a question for libstdc++ developers. It's possible that UT was
statically linked against libstdc++.so.5 which is completely incompatible with
libstdc++.so.6 that is used today. If it wasn't, that might imply there is some
kind of a bug in compatibility between different libstdc++.so.6 versions. Also,
I recall there being strange stuff going on with RTTI and static libstdc++
(like dynamic casts not working correctly across libraries), though I don't
think they would end up with a crash...
Post by b***@freedesktop.org
Another solution might be linking Mesa plugins statically to the g++
listdc++. I'm not sure if this is supported atm. It would have been very
useful when Steam used older version.
Right, but I don't think it's currently supported. It would increase the disk
and memory usage for everything that uses mesa. You would also need to do the
same for LLVM, unless you use a driver that doesn't need it (like i965).

Maybe just adding -static-libstdc++ to the linker options would suffice. It is
currently used for scons on Windows build.
Post by b***@freedesktop.org
Mesa3D is mostly written in C. There are few parts in C++ and it seems that
the files I've patched are the only ones using "iostream". Since iostream is
used only for debugging, it is feasible to disable it on release builds.
<iostream> might not be the only include that you need to be wary about. It
might be that any ios thing is dangerous, like fstream.

Making mesa compatible with applications that link statically against any
libstdc++ might be desirable, but that needs to be discussed. You might want to
send your patch to the mailing list [1] if you want to trigger the discussion.
Post by b***@freedesktop.org
But as I've said before, I don't know what LLVM compiled with latest
libstdc++ would do.
I believe recent libstdc++ versions are compatible with each other. Moreover, I
think LLVM is 99% of the time linked against /usr/lib/libstdc++.so.6, so it
will use the most recent version all the time, even if it was compiled with an
older version.

On a slightly unrelated topic, UT99 seems to work fine (at least for me) while
run on wine (or Steam's proton). You might want to try this path as a
work-around.

[1] https://www.mesa3d.org/submittingpatches.html
--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-12-09 09:03:01 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=108933

--- Comment #13 from ***@rkmail.ru ---
(In reply to Gustaw Smolarczyk from comment #12)
Post by b***@freedesktop.org
On a slightly unrelated topic, UT99 seems to work fine (at least for me)
while run on wine (or Steam's proton). You might want to try this path as a
work-around.
On my system, UT stops working in wine after you run it few time, until next
reboot. This seems to be unrelated to mesa, as it behaves like this with any
renderer, including software and third-party d3d9 one paired with nine.
(Just a side note, I don't really care about playing it. But I think that
checking if older native programs still work is generally a good idea)
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
Loading...