Discussion:
Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
Add Reply
Miguel Angel Vico
2017-12-20 16:51:51 UTC
Reply
Permalink
Raw Message
Hi all,

As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion thread
some weeks ago seeking feedback on the current prototype of the library
and advice on how to move all this forward, from a prototype stage to
production. For further reference, see:

https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html

From the thread above, we came up with very interesting high level
design ideas for one of the currently missing parts in the library:
Usage transitions. That's something I'll personally work on during the
following weeks.


In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.

Below I'm seeking feedback on a bunch of changes I had to make to
different components of the graphics stack:

** Allocator **

An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.

You can pull these changes from

https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver

** Mesa **

James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.

You can pull these changes from

https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau

Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.

Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.

You can pull these changes (written on top of the above) from:

https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import

** kmscube **

Mostly minor fixes and improvements on top of James's port to use the
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.

You can pull these changes from:

https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau


With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.


Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.

At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.

These are the few options we've considered to start with:

A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.

B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.

C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.

We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
drmModeAddFB2() one). You can take a look at the new interfaces here:

https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8

There may be other options that haven't been explored yet that could be
a better choice than the above, so any suggestion will be greatly
appreciated.


Thanks,
--
Miguel
Daniel Vetter
2017-12-20 19:51:15 UTC
Reply
Permalink
Raw Message
Since this also involves the kernel let's add dri-devel ...
Post by Miguel Angel Vico
Hi all,
As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion thread
some weeks ago seeking feedback on the current prototype of the library
and advice on how to move all this forward, from a prototype stage to
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level
Usage transitions. That's something I'll personally work on during the
following weeks.
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to
** Allocator **
An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could be
a better choice than the above, so any suggestion will be greatly
appreciated.
What kind of metadata are we talking about here? Addfb has tons of
stuff already that's "metadata". The only thing I've spotted is
PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
userspace, but definitely not something addfb ever needs. addfb only
needs the resulting pitch that we actually allocated (and might decide
it doesn't like that, but that's a different issue).

And since there's no patches for nouveau itself I can't really say
anything beyond that.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Kristian Høgsberg
2017-12-20 19:54:10 UTC
Reply
Permalink
Raw Message
Post by Daniel Vetter
Since this also involves the kernel let's add dri-devel ...
Post by Miguel Angel Vico
Hi all,
As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion thread
some weeks ago seeking feedback on the current prototype of the library
and advice on how to move all this forward, from a prototype stage to
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level
Usage transitions. That's something I'll personally work on during the
following weeks.
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to
** Allocator **
An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could be
a better choice than the above, so any suggestion will be greatly
appreciated.
What kind of metadata are we talking about here? Addfb has tons of
stuff already that's "metadata". The only thing I've spotted is
PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
userspace, but definitely not something addfb ever needs. addfb only
needs the resulting pitch that we actually allocated (and might decide
it doesn't like that, but that's a different issue).
And since there's no patches for nouveau itself I can't really say
anything beyond that.
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.

Kristian
Post by Daniel Vetter
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/dri-devel
Miguel Angel Vico
2017-12-20 20:41:34 UTC
Reply
Permalink
Raw Message
Inline.

On Wed, 20 Dec 2017 11:54:10 -0800
Post by Kristian Høgsberg
Post by Daniel Vetter
Since this also involves the kernel let's add dri-devel ...
Yeah, I forgot. Thanks Daniel!
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
Hi all,
As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion thread
some weeks ago seeking feedback on the current prototype of the library
and advice on how to move all this forward, from a prototype stage to
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level
Usage transitions. That's something I'll personally work on during the
following weeks.
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to
** Allocator **
An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could be
a better choice than the above, so any suggestion will be greatly
appreciated.
What kind of metadata are we talking about here? Addfb has tons of
stuff already that's "metadata". The only thing I've spotted is
PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
userspace, but definitely not something addfb ever needs. addfb only
needs the resulting pitch that we actually allocated (and might decide
it doesn't like that, but that's a different issue).
Sorry I failed to make it clearer. Metadata here refers to all
allocation parameters the generic allocator was given to allocate
memory. That currently means the final capability set used for
the allocation, including all constraints (such as memory alignment,
pitch alignment, and others) and capabilities, describing allocation
properties like tiling formats, compression, and such.
Post by Kristian Høgsberg
Post by Daniel Vetter
And since there's no patches for nouveau itself I can't really say
anything beyond that.
I can work on implementing these interfaces for nouveau, maybe
partially, if that's going to help. I just thought it'd be better to
first start a discussion on what would be the right way to pass
allocator metadata to display drivers before starting to seriously
implement any of the proposed options.
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.

Thanks.
Post by Kristian Høgsberg
Kristian
Post by Daniel Vetter
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
--
Miguel
Kristian Kristensen
2017-12-20 23:22:06 UTC
Reply
Permalink
Raw Message
Post by Miguel Angel Vico
Inline.
On Wed, 20 Dec 2017 11:54:10 -0800
Post by Kristian Høgsberg
Post by Daniel Vetter
Since this also involves the kernel let's add dri-devel ...
Yeah, I forgot. Thanks Daniel!
Post by Kristian Høgsberg
Post by Daniel Vetter
On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico <
Post by Miguel Angel Vico
Hi all,
As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion
thread
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
some weeks ago seeking feedback on the current prototype of the
library
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
and advice on how to move all this forward, from a prototype stage to
https://lists.freedesktop.org/archives/mesa-dev/2017-
November/177632.html
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
From the thread above, we came up with very interesting high level
Usage transitions. That's something I'll personally work on during the
following weeks.
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver
for
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to
** Allocator **
An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/
nouveau-driver
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
** Mesa **
James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_
external_objects-nouveau
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_
unix_allocator_import
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
** kmscube **
Mostly minor fixes and improvements on top of James's port to use
the
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/
allocator-nouveau
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires
modification/addition
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several
people
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to
add
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
their own ioctl to process allocator metadata, but the metadata
is
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers
mechanism
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
to indicate there is another plane for each "real" plane added.
It
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_
addfb_with_metadata__4.14-rc8
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
There may be other options that haven't been explored yet that could
be
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
a better choice than the above, so any suggestion will be greatly
appreciated.
What kind of metadata are we talking about here? Addfb has tons of
stuff already that's "metadata". The only thing I've spotted is
PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
userspace, but definitely not something addfb ever needs. addfb only
needs the resulting pitch that we actually allocated (and might decide
it doesn't like that, but that's a different issue).
Sorry I failed to make it clearer. Metadata here refers to all
allocation parameters the generic allocator was given to allocate
memory. That currently means the final capability set used for
the allocation, including all constraints (such as memory alignment,
pitch alignment, and others) and capabilities, describing allocation
properties like tiling formats, compression, and such.
Post by Kristian Høgsberg
Post by Daniel Vetter
And since there's no patches for nouveau itself I can't really say
anything beyond that.
I can work on implementing these interfaces for nouveau, maybe
partially, if that's going to help. I just thought it'd be better to
first start a discussion on what would be the right way to pass
allocator metadata to display drivers before starting to seriously
implement any of the proposed options.
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I
don't buy is that you need all those combinations when passing buffers
around between codecs, cameras and display controllers. Even if you're
sharing between the same 3D drivers in different processes, I expect just
locking down, say, 64 different combinations (you can add more over time)
and assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.

If you want us the redesign KMS and the rest of the eco system around blobs
instead of the modifiers that are now moderately pervasive, you have to
justify it a little better than just "we didn't find it suitable".

Kristian
Post by Miguel Angel Vico
Thanks.
Post by Kristian Høgsberg
Kristian
Post by Daniel Vetter
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
--
Miguel
Ilia Mirkin
2017-12-21 01:05:34 UTC
Reply
Permalink
Raw Message
On Wed, Dec 20, 2017 at 6:22 PM, Kristian Kristensen
Post by Kristian Kristensen
Post by Miguel Angel Vico
On Wed, 20 Dec 2017 11:54:10 -0800
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
There's probably a world of stuff that we don't know about in nouveau,
but I have a hard time coming up with more than 64-bits worth of
tiling info for dGPU surfaces...

There's 8 bits (sorta, not fully populated, but might as well use
them) of "micro" tiling which is done at the PTE level by the memory
controller and includes compression settings, and then there's 4 bits
of tiling per dimension for macro blocks (which configures different
sizes for each dimension for tile sizes) -- that's only 20 bits. MSAA
level (which is part of the micro tiling setting usually, but may not
necessarily have to be) - another couple of bits, maybe something else
weird for another few bits. Anyways, this is *nowhere* close to 64
bits.

What am I missing?

-ilia
Daniel Vetter
2017-12-21 08:05:32 UTC
Reply
Permalink
Raw Message
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen
Post by Kristian Kristensen
Post by Miguel Angel Vico
Inline.
On Wed, 20 Dec 2017 11:54:10 -0800
Post by Kristian Høgsberg
Post by Daniel Vetter
Since this also involves the kernel let's add dri-devel ...
Yeah, I forgot. Thanks Daniel!
Post by Kristian Høgsberg
Post by Daniel Vetter
On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico
Post by Miguel Angel Vico
Hi all,
As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion thread
some weeks ago seeking feedback on the current prototype of the library
and advice on how to move all this forward, from a prototype stage to
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level
Usage transitions. That's something I'll personally work on during the
following weeks.
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to
** Allocator **
An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires
modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could be
a better choice than the above, so any suggestion will be greatly
appreciated.
What kind of metadata are we talking about here? Addfb has tons of
stuff already that's "metadata". The only thing I've spotted is
PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
userspace, but definitely not something addfb ever needs. addfb only
needs the resulting pitch that we actually allocated (and might decide
it doesn't like that, but that's a different issue).
Sorry I failed to make it clearer. Metadata here refers to all
allocation parameters the generic allocator was given to allocate
memory. That currently means the final capability set used for
the allocation, including all constraints (such as memory alignment,
pitch alignment, and others) and capabilities, describing allocation
properties like tiling formats, compression, and such.
Yeah, that part was all clear. I'd want more details of what exact
kind of metadata. fast-clear colors? tiling layouts? aux data for the
compressor? hiz (or whatever you folks call it) tree?

As you say, we've discussed massive amounts of different variants on
this, and there's different answers for different questions. Consensus
seems to be that bigger stuff (compression data, hiz, clear colors,
...) should be stored in aux planes, while the exact layout and what
kind of aux planes you have are encoded in the modifier.
Post by Kristian Kristensen
Post by Miguel Angel Vico
Post by Kristian Høgsberg
Post by Daniel Vetter
And since there's no patches for nouveau itself I can't really say
anything beyond that.
I can work on implementing these interfaces for nouveau, maybe
partially, if that's going to help. I just thought it'd be better to
first start a discussion on what would be the right way to pass
allocator metadata to display drivers before starting to seriously
implement any of the proposed options.
It's not so much wiring down the interfaces, but actually implementing
the features. "We need more than the 56bits of modifier" is a lot more
plausible when you have the full stack showing that you do actually
need it. Or well, not a full stack but at least a demo that shows what
you want to pull of but can't do right now.
Post by Kristian Kristensen
Post by Miguel Angel Vico
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
Tegra just redesigned it's modifier space from an ungodly amount of
bits to just a few layouts. Not even just the ones in used, but simply
limiting to the ones that make sense (there's dependencies apparently)
Also note that the modifier alone doesn't need to describe the layout
precisely, it only makes sense together with a specific pixel format
and size. E.g. a bunch of the i915 layouts change layout depending
upon bpp.
Post by Kristian Kristensen
If you want us the redesign KMS and the rest of the eco system around blobs
instead of the modifiers that are now moderately pervasive, you have to
justify it a little better than just "we didn't find it suitable".
Given that this involves the kernel and hence the kernel's userspace
requirements for merging stuff (assuming of course you want to
establish this as an upstream interface), then I'd say a sufficient
demonstration would be actually running out of bits in nouveau
(kernel+mesa).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Stone
2017-12-20 21:58:51 UTC
Reply
Permalink
Raw Message
Hi Miguel,
Post by Miguel Angel Vico
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Thanks for taking a look at this! I'm still winding out my to-do list
for the year, but hoping to get to this more seriously soon.

As a general comment, now that modifiers are a first-class concept in
many places (KMS FBs, KMS plane format advertisement, V4L2 buffers,
EGL/Vulkan image import/export, Wayland buffer import, etc), I'd like
to see them included as a first-class concept in the allocator. I
understand one of the primary reservations against using them was that
QNX didn't have such a concept, but just specifying them to be ignored
on non-Linux platforms would probably work fine.
Post by Miguel Angel Vico
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
This worries me. If the data is static for the lifetime of the buffer
- describing the tiling layout, for instance - then it would form
effective ABI for all the consumers/producers using that buffer type.
If it is dynamic, you also have a world of synchronisation problems
when multiple users race each other with different uses of that buffer
(and presumably you would need to reload the metadata on every use?).
Either way, anyone using this would need to have a very well-developed
compatibility story, given that you can mix and match kernel and
userspace versions.
Post by Miguel Angel Vico
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
Similarly, this seems to be missing either a 'mandatory' flag so
userspace can inform the kernel it must fail if it does not understand
certain capabilities, or a way for the kernel to inform userspace
which capabilities it does/doesn't understand.

The capabilities in the example are also very oddly chosen. Address
alignment, pitch alignment, and maximum pitch are superfluous: the KMS
driver is the single source of truth for these values for FBs, so it
isn't useful for userspace to provide it. Specifically for pitch
alignment and maximum pitch, the pitch values are already given in the
same ioctl, so all you can check with these values (before the driver
does its own check again) is that userspace is self-consistent. These
three capabilities all relate to BO allocation rather than FB
creation: if a BO is rendered into with the wrong pitch, or allocated
at the wrong base address, we've already lost because the allocation
was incorrect.

Did you have some other capabilities in mind which would be more
relevant to FBs?
Post by Miguel Angel Vico
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
I also have my reservations about this one. The general idea behind
FBs is that, if the buffers are identical but for memory addresses and
pixel content, the parameters should be equal but the per-plane buffer
contents different. Conversely, if the buffers differ in any way but
the above, the parameters should be different. For instance, if
buffers have identical layouts (tiling/swizzling/compression),
identical pixel content once interpreted, but the only thing which
differs is the compression status (fully resolved / not resolved), I
would expect to see identical parameters and differing data in the
auxiliary compression plane. We had quite a long discussion for
framebuffer compression on Intel where we shot down the concept of
expressing compression status as a plane property for basically this
reason.

Drivers which were created in the DRI2 era where the only metadata
transited was handle/pitch/format, worked around that by adding
auxiliary data hanging off the buffer to describe their actual layout,
but this is a mistake we're trying to move away from. So I reflexively
get a bit itchy when I see the kernel being used to transit magic
blobs of data which are supplied by userspace, and only interpreted by
different userspace. Having tiling formats hidden away means that
we've had real-world bugs in AMD hardware, where we end up displaying
garbage because we cannot generically reason about the buffer
attributes.

As with Kristian, I'd also like to hear any examples of metadata which
wouldn't fit inside 56 bits. A quick finger count says that if you
have 128 different possibilities for all of: tiling layout, micro-tile
size, macro-tile size, supermacro-tile size, swizzling/addressing
mode, and compression, this only uses 42 of the 56 bytes available to
you, still leaving two free 128-value axes. Is your concern about the
lack of space along these axes I've identified, or that you need more
axes, or ... ?

Cheers,
Daniel
James Jones
2017-12-21 08:06:36 UTC
Reply
Permalink
Raw Message
Post by Daniel Stone
Hi Miguel,
Post by Miguel Angel Vico
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Thanks for taking a look at this! I'm still winding out my to-do list
for the year, but hoping to get to this more seriously soon.
As a general comment, now that modifiers are a first-class concept in
many places (KMS FBs, KMS plane format advertisement, V4L2 buffers,
EGL/Vulkan image import/export, Wayland buffer import, etc), I'd like
to see them included as a first-class concept in the allocator. I
understand one of the primary reservations against using them was that
QNX didn't have such a concept, but just specifying them to be ignored
on non-Linux platforms would probably work fine.
The allocator mechanisms and format modifiers are orthogonal though.
Either capability sets can be represented using format modifiers (the
direction one part of this thread is suggesting, which I think is a bad
idea), or format modifiers could easily be included as a vendor-agnostic
capability, similar to pitch layout. There are no "First class
citizens" in the allocator mechanism itself. That's the whole idea:
Apps don't need to care about things like how the OS represents its
surface metadata beyond some truly universal things like width and
height (assertions). The rest is abstracted away such that the apps are
portable, even if the drivers/backends aren't. Even if the solution
within Linux is "just use format modifiers", there's still some benefit
to making the kernel ABI use something slightly higher level that
translates to DRM format modifiers inside the kernel, just to keep the
apps OS-agnostic.
Post by Daniel Stone
Post by Miguel Angel Vico
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
This worries me. If the data is static for the lifetime of the buffer
- describing the tiling layout, for instance - then it would form
effective ABI for all the consumers/producers using that buffer type.
If it is dynamic, you also have a world of synchronisation problems
when multiple users race each other with different uses of that buffer
(and presumably you would need to reload the metadata on every use?).
Either way, anyone using this would need to have a very well-developed
compatibility story, given that you can mix and match kernel and
userspace versions.
I think the metadata is static. The surface meta-state is not, but that
would be a commit time thing if anything, not a GEM or FB object thing.
Still attaching metadata to GEM objects, which seem to be opaque blobs
of memory in the general case, rather than attaching it to FB's mapped
onto the GEM objects always felt architecturally wrong to me. You can
have multiple FBs in one GEM object, for example. There's no reason to
assume they would share the same format let alone tiling layout.
Post by Daniel Stone
Post by Miguel Angel Vico
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
Similarly, this seems to be missing either a 'mandatory' flag so
userspace can inform the kernel it must fail if it does not understand
certain capabilities, or a way for the kernel to inform userspace
which capabilities it does/doesn't understand.
I think that will fall out of the discussion over exactly what
capability sets look like. Regardless, yes, the kernel must fail if it
can't support a given capability set, just as it would fail if it
couldn't support a given DRM Format modifier. Like the format
modifiers, the userspace allocator driver would have queried the DRM
kernel driver when reporting supported capability sets for a usage that
required creating FBs, so it would always be user error to reach such a
state. There would be no ambiguity as to whether a given set or
individual capability was supported by the kernel driver at FB creation
time.
Post by Daniel Stone
The capabilities in the example are also very oddly chosen. Address
alignment, pitch alignment, and maximum pitch are superfluous: the KMS
driver is the single source of truth for these values for FBs, so it
isn't useful for userspace to provide it. Specifically for pitch
alignment and maximum pitch, the pitch values are already given in the
same ioctl, so all you can check with these values (before the driver
does its own check again) is that userspace is self-consistent. These
three capabilities all relate to BO allocation rather than FB
creation: if a BO is rendered into with the wrong pitch, or allocated
at the wrong base address, we've already lost because the allocation
was incorrect.
Did you have some other capabilities in mind which would be more
relevant to FBs?
We should probably create an example using some vendor-specific
capabilities. Tiling parameters are the quintessential vendor-specific
example.
Post by Daniel Stone
Post by Miguel Angel Vico
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
I also have my reservations about this one. The general idea behind
FBs is that, if the buffers are identical but for memory addresses and
pixel content, the parameters should be equal but the per-plane buffer
contents different. Conversely, if the buffers differ in any way but
the above, the parameters should be different. For instance, if
buffers have identical layouts (tiling/swizzling/compression),
identical pixel content once interpreted, but the only thing which
differs is the compression status (fully resolved / not resolved), I
would expect to see identical parameters and differing data in the
auxiliary compression plane. We had quite a long discussion for
framebuffer compression on Intel where we shot down the concept of
expressing compression status as a plane property for basically this
reason.
Actual compression data is read by the GPU though. Specifying metadata
(I.e., a few bits saying whether there is a compression plane and if so,
what it's layout is) only accessed by kernel drivers and userspace
drivers as an FB plane seems to be stretching the abstraction a bit.
That means you probably have to create a CPU-only GEM buffer to put it
in. Not impossible, just a lot of busy work, and potentially bug-prone:
If previously your kernel driver only supported creating GEM buffers
that were HW accessible, you have to sprinkle a bunch of "But not this
type of GEM buffer" in all your validation code for HW operations on GEM
buffers.
Post by Daniel Stone
Drivers which were created in the DRI2 era where the only metadata
transited was handle/pitch/format, worked around that by adding
auxiliary data hanging off the buffer to describe their actual layout,
but this is a mistake we're trying to move away from. So I reflexively
get a bit itchy when I see the kernel being used to transit magic
blobs of data which are supplied by userspace, and only interpreted by
different userspace. Having tiling formats hidden away means that
we've had real-world bugs in AMD hardware, where we end up displaying
garbage because we cannot generically reason about the buffer
attributes.
Agreed. For those not clear, and to verify my own understanding, the
above paragraph is basically an argument similar to my own above against
using Miguel's option (A), correct?
Post by Daniel Stone
As with Kristian, I'd also like to hear any examples of metadata which
wouldn't fit inside 56 bits. A quick finger count says that if you
have 128 different possibilities for all of: tiling layout, micro-tile
size, macro-tile size, supermacro-tile size, swizzling/addressing
mode, and compression, this only uses 42 of the 56 bytes available to
you, still leaving two free 128-value axes. Is your concern about the
lack of space along these axes I've identified, or that you need more
axes, or ... ?
Your worst case analysis above isn't far off from our HW, give or take
some bits and axes here and there. We've started an internal discussion
about how to lay out all the bits we need. It's hard to even enumerate
them all without having a complete understanding of what capability sets
are going to include, a fully-optimized implementation of the mechanism
on our HW, and lot's of test scenarios though.

However, making some assumptions, I suspect it's probably going to come
down to yes we can fit what we need in some number of bits marginally
less than 56 now, with the current use cases and hardware, but we're
very concerned about extensibility given the number has only ever grown
in our HW, is uncomfortably close to the limit if it isn't over it
already, and it's been demonstrated it takes a monumental effort to
change the mechanism if it isn't extensible. While it's hard to change
the mechanism one more time now, better to change it to something truly
extensible now because it will be much, much harder to make such a
change ~5 years from now in a world where it's baked in to pervasively
deployed Wayland and X protocol, the EGL and Vulkan extensions have been
defined for a few years and in use by apps besides Wayland, and the
allocator stuff is deployed on ~5 operating systems that have some
derivative version of DRM modifiers to support it and a bunch of funky
embedded apps using it. Further, we're volunteering to handle the bulk
of the effort needed to make the change now, so I hope architectural
correctness and maintainability can be the primary points of debate.

Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Daniel Vetter
2017-12-21 08:36:21 UTC
Reply
Permalink
Raw Message
However, making some assumptions, I suspect it's probably going to come down
to yes we can fit what we need in some number of bits marginally less than
56 now, with the current use cases and hardware, but we're very concerned
about extensibility given the number has only ever grown in our HW, is
uncomfortably close to the limit if it isn't over it already, and it's been
demonstrated it takes a monumental effort to change the mechanism if it
isn't extensible. While it's hard to change the mechanism one more time
now, better to change it to something truly extensible now because it will
be much, much harder to make such a change ~5 years from now in a world
where it's baked in to pervasively deployed Wayland and X protocol, the EGL
and Vulkan extensions have been defined for a few years and in use by apps
besides Wayland, and the allocator stuff is deployed on ~5 operating systems
that have some derivative version of DRM modifiers to support it and a bunch
of funky embedded apps using it. Further, we're volunteering to handle the
bulk of the effort needed to make the change now, so I hope architectural
correctness and maintainability can be the primary points of debate.
I think that's already happened. So no matter what we do, we're going
to live with an ecosystem that uses modifiers all over the place in 5
years. Even if it's not fully pervasive we will have to keep the
support around for 10 years (at least on the kernel side).

So the option is between reving the entire ecosystem now, or reving it
in a few years when the current scheme has run out of steam for good.
And I much prefer the 2nd option for the simple reason that by then
the magic 8ball has gained another 5 years of clarity for looking into
the future.

I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward. In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Kristian Kristensen
2017-12-21 17:47:39 UTC
Reply
Permalink
Raw Message
Post by James Jones
Post by James Jones
However, making some assumptions, I suspect it's probably going to come
down
Post by James Jones
to yes we can fit what we need in some number of bits marginally less
than
Post by James Jones
56 now, with the current use cases and hardware, but we're very concerned
about extensibility given the number has only ever grown in our HW, is
uncomfortably close to the limit if it isn't over it already, and it's
been
Post by James Jones
demonstrated it takes a monumental effort to change the mechanism if it
isn't extensible. While it's hard to change the mechanism one more time
now, better to change it to something truly extensible now because it
will
Post by James Jones
be much, much harder to make such a change ~5 years from now in a world
where it's baked in to pervasively deployed Wayland and X protocol, the
EGL
Post by James Jones
and Vulkan extensions have been defined for a few years and in use by
apps
Post by James Jones
besides Wayland, and the allocator stuff is deployed on ~5 operating
systems
Post by James Jones
that have some derivative version of DRM modifiers to support it and a
bunch
Post by James Jones
of funky embedded apps using it. Further, we're volunteering to handle
the
Post by James Jones
bulk of the effort needed to make the change now, so I hope architectural
correctness and maintainability can be the primary points of debate.
I think that's already happened. So no matter what we do, we're going
to live with an ecosystem that uses modifiers all over the place in 5
years. Even if it's not fully pervasive we will have to keep the
support around for 10 years (at least on the kernel side).
So the option is between reving the entire ecosystem now, or reving it
in a few years when the current scheme has run out of steam for good.
And I much prefer the 2nd option for the simple reason that by then
the magic 8ball has gained another 5 years of clarity for looking into
the future.
I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward.
I agree and let me elaborate a bit. The problem we're seeing isn't that we
need more that 2^56 modifiers for a future GPU. The problem is that flags
like USE_SCANOUT (which your allocator proposal essentially keeps) is
inadequate. The available tiling and compression formats vary with which
(in KMS terms) CRTC you want to use, which plane you're on whether you want
rotation or no and how much you want to scale etc. It's not realistic to
think that we could model this in a centralized allocator library that's
detached from the display driver. To be fair, this is not a point about
blobs vs modifiers, it's saying that the use flags don't belong in the
allocator, they belong in the APIs that will be using the buffer - and not
as literal use flags, but as a way to discover supported modifiers for a
given use case.

Kristian
Post by James Jones
In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Rob Clark
2017-12-21 22:34:44 UTC
Reply
Permalink
Raw Message
Post by Daniel Vetter
However, making some assumptions, I suspect it's probably going to come down
to yes we can fit what we need in some number of bits marginally less than
56 now, with the current use cases and hardware, but we're very concerned
about extensibility given the number has only ever grown in our HW, is
uncomfortably close to the limit if it isn't over it already, and it's been
demonstrated it takes a monumental effort to change the mechanism if it
isn't extensible. While it's hard to change the mechanism one more time
now, better to change it to something truly extensible now because it will
be much, much harder to make such a change ~5 years from now in a world
where it's baked in to pervasively deployed Wayland and X protocol, the EGL
and Vulkan extensions have been defined for a few years and in use by apps
besides Wayland, and the allocator stuff is deployed on ~5 operating systems
that have some derivative version of DRM modifiers to support it and a bunch
of funky embedded apps using it. Further, we're volunteering to handle the
bulk of the effort needed to make the change now, so I hope architectural
correctness and maintainability can be the primary points of debate.
I think that's already happened. So no matter what we do, we're going
to live with an ecosystem that uses modifiers all over the place in 5
years. Even if it's not fully pervasive we will have to keep the
support around for 10 years (at least on the kernel side).
So the option is between reving the entire ecosystem now, or reving it
in a few years when the current scheme has run out of steam for good.
And I much prefer the 2nd option for the simple reason that by then
the magic 8ball has gained another 5 years of clarity for looking into
the future.
Drive by comment (and disclaimer, haven't had chance to read rest of
thread yet), but I think there is a reasonable path to increase the
modifier space to something like 2^^568 (minus the cases were
modifiers[0]==modifiers[1]==modifiers[2]==modifiers[3]).. (Yeah, yeah,
I'm sure there is a 640k should be enough joke here somewhere)

Fortunately currently modifiers array is at end of 'struct
drm_mode_fb_cmd2', so there maybe some other options to extend it as
well. Possibly reserving the modifier value ~0 now might be a good
idea.

It does seem like, if possible, starting out with modifiers for now at
the kernel interface would make life easier, vs trying to reinvent
both kernel and userspace APIs at the same time. Userspace APIs are
easier to change or throw away. Presumably by the time we get to the
point of changing kernel uabi, we are already using, and pretty happy
with, serialized liballoc data over the wire in userspace so it is
only a matter of changing the kernel interface.

The downside of this is needing a per-driver userspace bit to map
liballoc to modifiers. We kinda have this already in mesa, even for
the modesetting-only drivers that can be paired with a render-only
driver.

BR,
-R
Post by Daniel Vetter
I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward. In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Miguel Angel Vico
2017-12-28 18:24:38 UTC
Reply
Permalink
Raw Message
(Adding dri-devel back, and trying to respond to some comments from
the different forks)
Post by James Jones
Your worst case analysis above isn't far off from our HW, give or take
some bits and axes here and there. We've started an internal discussion
about how to lay out all the bits we need. It's hard to even enumerate
them all without having a complete understanding of what capability sets
are going to include, a fully-optimized implementation of the mechanism
on our HW, and lot's of test scenarios though.
(thanks James for most of the info below)

To elaborate a bit, if we want to share an allocation across GPUs for 3D
rendering, it seems we would need 12 bits to express our
swizzling/tiling memory layouts for fermi+. In addition to that,
maxwell uses 3 more bits for this, and we need an extra bit to identify
pre-fermi representations.

We also need one bit to differentiate between Tegra and desktop, and
another one to indicate whether the layout is otherwise linear.

Then things like whether compression is used (one more bit), and we can
probably get by with 3 bits for the type of compression if we are
creative. However, it'd be way easier to just track arch + page kind,
which would be like 32 bits on its own.

Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
bits.

If device-local properties are included, we might need a couple more
bits for caching.

We may also need to express locality information, which may take at
least another 2 or 3 bits.

If we want to share array textures too, you also need to pass the array
pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
its own.

So yes, as James mentioned, with some effort, we could technically fit
our current allocation parameters in a modifier, but I'm still not
convinced this is as future proof as it could be as our hardware grows
in capabilities.
Post by James Jones
So I reflexively
get a bit itchy when I see the kernel being used to transit magic
blobs of data which are supplied by userspace, and only interpreted by
different userspace. Having tiling formats hidden away means that
we've had real-world bugs in AMD hardware, where we end up displaying
garbage because we cannot generically reason about the buffer
attributes.
I'm a bit confused. Can't modifiers be specified by vendors and only
interpreted by drivers? My understanding was that modifiers could
actually be treated as opaque 64-bit data, in which case they would
qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
scalable. What am I missing?
Post by James Jones
I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward. In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.
Not sure whether I might be misunderstanding your statement, but one of
the allocator main features is negotiation of nearly optimal allocation
parameters given a set of uses on different devices/engines by the
capability merge operation. A client should have queried what every
device/engine is capable of for the given uses, find the optimal set of
capabilities, and use it for allocating a buffer. At the moment these
parameters are given to KMS, they are expected to be good. If they
aren't, the client didn't do things right.
Post by James Jones
It does seem like, if possible, starting out with modifiers for now at
the kernel interface would make life easier, vs trying to reinvent
both kernel and userspace APIs at the same time. Userspace APIs are
easier to change or throw away. Presumably by the time we get to the
point of changing kernel uabi, we are already using, and pretty happy
with, serialized liballoc data over the wire in userspace so it is
only a matter of changing the kernel interface.
I guess we can indeed start with modifiers for now, if that's what it
takes to get the allocator mechanisms rolling. However, it seems to me
that we won't be able to encode the same type of information included
in capability sets with modifiers in all cases. For instance, if we end
up encoding usage transition information in capability sets, how that
would translate to modifiers?

I assume display doesn't really care about a lot of the data capability
sets may encode, but is it correct to think of modifiers as things only
display needs? If we are to treat modifiers as a first-class citizen, I
would expect to use them beyond that.
Post by James Jones
I agree and let me elaborate a bit. The problem we're seeing isn't that we
need more that 2^56 modifiers for a future GPU. The problem is that flags
like USE_SCANOUT (which your allocator proposal essentially keeps) is
inadequate. The available tiling and compression formats vary with which
(in KMS terms) CRTC you want to use, which plane you're on whether you want
rotation or no and how much you want to scale etc. It's not realistic to
think that we could model this in a centralized allocator library that's
detached from the display driver. To be fair, this is not a point about
blobs vs modifiers, it's saying that the use flags don't belong in the
allocator, they belong in the APIs that will be using the buffer - and not
as literal use flags, but as a way to discover supported modifiers for a
given use case.
Why detached from the display driver? I don't see why there couldn't be
an allocator driver with access to display capabilities that can be
used in the negotiation step to find the optimal set of allocation
parameters.
Post by James Jones
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
If someone has N knobs available, I don't understand why there
shouldn't be a mechanism that allows making use of them all, regardless
of performance numbers.
Post by James Jones
Yeah, that part was all clear. I'd want more details of what exact
kind of metadata. fast-clear colors? tiling layouts? aux data for the
compressor? hiz (or whatever you folks call it) tree?
As you say, we've discussed massive amounts of different variants on
this, and there's different answers for different questions. Consensus
seems to be that bigger stuff (compression data, hiz, clear colors,
...) should be stored in aux planes, while the exact layout and what
kind of aux planes you have are encoded in the modifier.
My understanding is that capability sets may include all metadata you
mentioned. Besides tiling/swizzling layout and compression parameters,
things like zero-bandwidth-clears (I guess the same or similar to
fast-clear colors?), hiz-like data, device-local properties such as
caches, or locality information could/will be also included in a
capability set. We are even considering encoding some sort of usage
transition information in the capability set itself.


Thanks,
Miguel.

Loading...