Discussion:
Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
(too old to reply)
Miguel Angel Vico
2017-12-20 16:51:51 UTC
Permalink
Hi all,

As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion thread
some weeks ago seeking feedback on the current prototype of the library
and advice on how to move all this forward, from a prototype stage to
production. For further reference, see:

https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html

From the thread above, we came up with very interesting high level
design ideas for one of the currently missing parts in the library:
Usage transitions. That's something I'll personally work on during the
following weeks.


In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.

Below I'm seeking feedback on a bunch of changes I had to make to
different components of the graphics stack:

** Allocator **

An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.

You can pull these changes from

https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver

** Mesa **

James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.

You can pull these changes from

https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau

Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.

Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.

You can pull these changes (written on top of the above) from:

https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import

** kmscube **

Mostly minor fixes and improvements on top of James's port to use the
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.

You can pull these changes from:

https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau


With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.


Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.

At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.

These are the few options we've considered to start with:

A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.

B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.

C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.

We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
drmModeAddFB2() one). You can take a look at the new interfaces here:

https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8

There may be other options that haven't been explored yet that could be
a better choice than the above, so any suggestion will be greatly
appreciated.


Thanks,
--
Miguel
Daniel Vetter
2017-12-20 19:51:15 UTC
Permalink
Since this also involves the kernel let's add dri-devel ...
Post by Miguel Angel Vico
Hi all,
As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion thread
some weeks ago seeking feedback on the current prototype of the library
and advice on how to move all this forward, from a prototype stage to
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level
Usage transitions. That's something I'll personally work on during the
following weeks.
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to
** Allocator **
An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could be
a better choice than the above, so any suggestion will be greatly
appreciated.
What kind of metadata are we talking about here? Addfb has tons of
stuff already that's "metadata". The only thing I've spotted is
PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
userspace, but definitely not something addfb ever needs. addfb only
needs the resulting pitch that we actually allocated (and might decide
it doesn't like that, but that's a different issue).

And since there's no patches for nouveau itself I can't really say
anything beyond that.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Kristian Høgsberg
2017-12-20 19:54:10 UTC
Permalink
Post by Daniel Vetter
Since this also involves the kernel let's add dri-devel ...
Post by Miguel Angel Vico
Hi all,
As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion thread
some weeks ago seeking feedback on the current prototype of the library
and advice on how to move all this forward, from a prototype stage to
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level
Usage transitions. That's something I'll personally work on during the
following weeks.
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to
** Allocator **
An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could be
a better choice than the above, so any suggestion will be greatly
appreciated.
What kind of metadata are we talking about here? Addfb has tons of
stuff already that's "metadata". The only thing I've spotted is
PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
userspace, but definitely not something addfb ever needs. addfb only
needs the resulting pitch that we actually allocated (and might decide
it doesn't like that, but that's a different issue).
And since there's no patches for nouveau itself I can't really say
anything beyond that.
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.

Kristian
Post by Daniel Vetter
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/dri-devel
Miguel Angel Vico
2017-12-20 20:41:34 UTC
Permalink
Inline.

On Wed, 20 Dec 2017 11:54:10 -0800
Post by Kristian Høgsberg
Post by Daniel Vetter
Since this also involves the kernel let's add dri-devel ...
Yeah, I forgot. Thanks Daniel!
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
Hi all,
As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion thread
some weeks ago seeking feedback on the current prototype of the library
and advice on how to move all this forward, from a prototype stage to
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level
Usage transitions. That's something I'll personally work on during the
following weeks.
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to
** Allocator **
An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could be
a better choice than the above, so any suggestion will be greatly
appreciated.
What kind of metadata are we talking about here? Addfb has tons of
stuff already that's "metadata". The only thing I've spotted is
PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
userspace, but definitely not something addfb ever needs. addfb only
needs the resulting pitch that we actually allocated (and might decide
it doesn't like that, but that's a different issue).
Sorry I failed to make it clearer. Metadata here refers to all
allocation parameters the generic allocator was given to allocate
memory. That currently means the final capability set used for
the allocation, including all constraints (such as memory alignment,
pitch alignment, and others) and capabilities, describing allocation
properties like tiling formats, compression, and such.
Post by Kristian Høgsberg
Post by Daniel Vetter
And since there's no patches for nouveau itself I can't really say
anything beyond that.
I can work on implementing these interfaces for nouveau, maybe
partially, if that's going to help. I just thought it'd be better to
first start a discussion on what would be the right way to pass
allocator metadata to display drivers before starting to seriously
implement any of the proposed options.
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.

Thanks.
Post by Kristian Høgsberg
Kristian
Post by Daniel Vetter
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
--
Miguel
Kristian Kristensen
2017-12-20 23:22:06 UTC
Permalink
Post by Miguel Angel Vico
Inline.
On Wed, 20 Dec 2017 11:54:10 -0800
Post by Kristian Høgsberg
Post by Daniel Vetter
Since this also involves the kernel let's add dri-devel ...
Yeah, I forgot. Thanks Daniel!
Post by Kristian Høgsberg
Post by Daniel Vetter
On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico <
Post by Miguel Angel Vico
Hi all,
As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion
thread
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
some weeks ago seeking feedback on the current prototype of the
library
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
and advice on how to move all this forward, from a prototype stage to
https://lists.freedesktop.org/archives/mesa-dev/2017-
November/177632.html
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
From the thread above, we came up with very interesting high level
Usage transitions. That's something I'll personally work on during the
following weeks.
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver
for
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to
** Allocator **
An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/
nouveau-driver
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
** Mesa **
James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_
external_objects-nouveau
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_
unix_allocator_import
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
** kmscube **
Mostly minor fixes and improvements on top of James's port to use
the
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/
allocator-nouveau
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires
modification/addition
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several
people
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to
add
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
their own ioctl to process allocator metadata, but the metadata
is
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers
mechanism
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
to indicate there is another plane for each "real" plane added.
It
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_
addfb_with_metadata__4.14-rc8
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
There may be other options that haven't been explored yet that could
be
Post by Kristian Høgsberg
Post by Daniel Vetter
Post by Miguel Angel Vico
a better choice than the above, so any suggestion will be greatly
appreciated.
What kind of metadata are we talking about here? Addfb has tons of
stuff already that's "metadata". The only thing I've spotted is
PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
userspace, but definitely not something addfb ever needs. addfb only
needs the resulting pitch that we actually allocated (and might decide
it doesn't like that, but that's a different issue).
Sorry I failed to make it clearer. Metadata here refers to all
allocation parameters the generic allocator was given to allocate
memory. That currently means the final capability set used for
the allocation, including all constraints (such as memory alignment,
pitch alignment, and others) and capabilities, describing allocation
properties like tiling formats, compression, and such.
Post by Kristian Høgsberg
Post by Daniel Vetter
And since there's no patches for nouveau itself I can't really say
anything beyond that.
I can work on implementing these interfaces for nouveau, maybe
partially, if that's going to help. I just thought it'd be better to
first start a discussion on what would be the right way to pass
allocator metadata to display drivers before starting to seriously
implement any of the proposed options.
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I
don't buy is that you need all those combinations when passing buffers
around between codecs, cameras and display controllers. Even if you're
sharing between the same 3D drivers in different processes, I expect just
locking down, say, 64 different combinations (you can add more over time)
and assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.

If you want us the redesign KMS and the rest of the eco system around blobs
instead of the modifiers that are now moderately pervasive, you have to
justify it a little better than just "we didn't find it suitable".

Kristian
Post by Miguel Angel Vico
Thanks.
Post by Kristian Høgsberg
Kristian
Post by Daniel Vetter
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
--
Miguel
Ilia Mirkin
2017-12-21 01:05:34 UTC
Permalink
On Wed, Dec 20, 2017 at 6:22 PM, Kristian Kristensen
Post by Kristian Kristensen
Post by Miguel Angel Vico
On Wed, 20 Dec 2017 11:54:10 -0800
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
There's probably a world of stuff that we don't know about in nouveau,
but I have a hard time coming up with more than 64-bits worth of
tiling info for dGPU surfaces...

There's 8 bits (sorta, not fully populated, but might as well use
them) of "micro" tiling which is done at the PTE level by the memory
controller and includes compression settings, and then there's 4 bits
of tiling per dimension for macro blocks (which configures different
sizes for each dimension for tile sizes) -- that's only 20 bits. MSAA
level (which is part of the micro tiling setting usually, but may not
necessarily have to be) - another couple of bits, maybe something else
weird for another few bits. Anyways, this is *nowhere* close to 64
bits.

What am I missing?

-ilia
Daniel Vetter
2017-12-21 08:05:32 UTC
Permalink
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen
Post by Kristian Kristensen
Post by Miguel Angel Vico
Inline.
On Wed, 20 Dec 2017 11:54:10 -0800
Post by Kristian Høgsberg
Post by Daniel Vetter
Since this also involves the kernel let's add dri-devel ...
Yeah, I forgot. Thanks Daniel!
Post by Kristian Høgsberg
Post by Daniel Vetter
On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico
Post by Miguel Angel Vico
Hi all,
As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion thread
some weeks ago seeking feedback on the current prototype of the library
and advice on how to move all this forward, from a prototype stage to
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level
Usage transitions. That's something I'll personally work on during the
following weeks.
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to
** Allocator **
An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires
modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could be
a better choice than the above, so any suggestion will be greatly
appreciated.
What kind of metadata are we talking about here? Addfb has tons of
stuff already that's "metadata". The only thing I've spotted is
PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
userspace, but definitely not something addfb ever needs. addfb only
needs the resulting pitch that we actually allocated (and might decide
it doesn't like that, but that's a different issue).
Sorry I failed to make it clearer. Metadata here refers to all
allocation parameters the generic allocator was given to allocate
memory. That currently means the final capability set used for
the allocation, including all constraints (such as memory alignment,
pitch alignment, and others) and capabilities, describing allocation
properties like tiling formats, compression, and such.
Yeah, that part was all clear. I'd want more details of what exact
kind of metadata. fast-clear colors? tiling layouts? aux data for the
compressor? hiz (or whatever you folks call it) tree?

As you say, we've discussed massive amounts of different variants on
this, and there's different answers for different questions. Consensus
seems to be that bigger stuff (compression data, hiz, clear colors,
...) should be stored in aux planes, while the exact layout and what
kind of aux planes you have are encoded in the modifier.
Post by Kristian Kristensen
Post by Miguel Angel Vico
Post by Kristian Høgsberg
Post by Daniel Vetter
And since there's no patches for nouveau itself I can't really say
anything beyond that.
I can work on implementing these interfaces for nouveau, maybe
partially, if that's going to help. I just thought it'd be better to
first start a discussion on what would be the right way to pass
allocator metadata to display drivers before starting to seriously
implement any of the proposed options.
It's not so much wiring down the interfaces, but actually implementing
the features. "We need more than the 56bits of modifier" is a lot more
plausible when you have the full stack showing that you do actually
need it. Or well, not a full stack but at least a demo that shows what
you want to pull of but can't do right now.
Post by Kristian Kristensen
Post by Miguel Angel Vico
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
Tegra just redesigned it's modifier space from an ungodly amount of
bits to just a few layouts. Not even just the ones in used, but simply
limiting to the ones that make sense (there's dependencies apparently)
Also note that the modifier alone doesn't need to describe the layout
precisely, it only makes sense together with a specific pixel format
and size. E.g. a bunch of the i915 layouts change layout depending
upon bpp.
Post by Kristian Kristensen
If you want us the redesign KMS and the rest of the eco system around blobs
instead of the modifiers that are now moderately pervasive, you have to
justify it a little better than just "we didn't find it suitable".
Given that this involves the kernel and hence the kernel's userspace
requirements for merging stuff (assuming of course you want to
establish this as an upstream interface), then I'd say a sufficient
demonstration would be actually running out of bits in nouveau
(kernel+mesa).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Chad Versace
2018-02-21 06:14:47 UTC
Permalink
Post by Daniel Vetter
Post by Kristian Kristensen
Post by Miguel Angel Vico
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.

I summarized this opinion in VK_EXT_image_drm_format_modifier,
where I wrote an "introdution to modifiers" section. Here's an excerpt:

One goal of modifiers in the Linux ecosystem is to enumerate for each
vendor a reasonably sized set of tiling formats that are appropriate for
images shared across processes, APIs, and/or devices, where each
participating component may possibly be from different vendors.
A non-goal is to enumerate all tiling formats supported by all vendors.
Some tiling formats used internally by vendors are inappropriate for
sharing; no modifiers should be assigned to such tiling formats.
Post by Daniel Vetter
Tegra just redesigned it's modifier space from an ungodly amount of
bits to just a few layouts. Not even just the ones in used, but simply
limiting to the ones that make sense (there's dependencies apparently)
Also note that the modifier alone doesn't need to describe the layout
precisely, it only makes sense together with a specific pixel format
and size. E.g. a bunch of the i915 layouts change layout depending
upon bpp.
Daniel Vetter
2018-02-21 18:26:55 UTC
Permalink
Post by Chad Versace
Post by Kristian Kristensen
Post by Miguel Angel Vico
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier,
One goal of modifiers in the Linux ecosystem is to enumerate for each
vendor a reasonably sized set of tiling formats that are appropriate for
images shared across processes, APIs, and/or devices, where each
participating component may possibly be from different vendors.
A non-goal is to enumerate all tiling formats supported by all vendors.
Some tiling formats used internally by vendors are inappropriate for
sharing; no modifiers should be assigned to such tiling formats.
fwiw (since the source of truth wrt modifiers is the kernel's uapi
header):

Acked-by: Daniel Vetter <***@ffwll.ch>

I'm happy to merge modifier #define additions for pretty much anything
where there's a need for sharing across devices/drivers/apis, explicitly
including stuff that's only relevant for userspace and which the kernel
nevers sees (in e.g. a kms addfb2 call). Trying to preemptively enumerate
everything that's possible doesn't seem like a wise idea. But even then we
can probably spare the oddball vendor prefix is a driver team really
insists that this is what they want, best using some code that makes the
case for them.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Chad Versace
2018-02-21 23:23:45 UTC
Permalink
Post by Daniel Vetter
Post by Chad Versace
Post by Kristian Kristensen
Post by Miguel Angel Vico
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier,
One goal of modifiers in the Linux ecosystem is to enumerate for each
vendor a reasonably sized set of tiling formats that are appropriate for
images shared across processes, APIs, and/or devices, where each
participating component may possibly be from different vendors.
A non-goal is to enumerate all tiling formats supported by all vendors.
Some tiling formats used internally by vendors are inappropriate for
sharing; no modifiers should be assigned to such tiling formats.
fwiw (since the source of truth wrt modifiers is the kernel's uapi
Linux would eventually encounter big problems if the kernel and Vulkan
disagreed on the fundamental, unspoken Theory of Modifiers. So your
acked-by is definitely worth something here. Thanks for confirming.
Post by Daniel Vetter
I'm happy to merge modifier #define additions for pretty much anything
where there's a need for sharing across devices/drivers/apis, explicitly
including stuff that's only relevant for userspace and which the kernel
nevers sees (in e.g. a kms addfb2 call). Trying to preemptively enumerate
everything that's possible doesn't seem like a wise idea. But even then we
can probably spare the oddball vendor prefix is a driver team really
insists that this is what they want, best using some code that makes the
case for them.
Yep. I believe Jason Ekstrand has tentative plans for such a modifier
that improves performance for interop in GL and Vulkan but the kernel
and Intel display hw wouldn't understand: a modifier for CCS_E images
that are fully compressed.
Alex Deucher
2018-02-22 00:00:16 UTC
Permalink
Post by Chad Versace
Post by Kristian Kristensen
Post by Miguel Angel Vico
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier,
One goal of modifiers in the Linux ecosystem is to enumerate for each
vendor a reasonably sized set of tiling formats that are appropriate for
images shared across processes, APIs, and/or devices, where each
participating component may possibly be from different vendors.
A non-goal is to enumerate all tiling formats supported by all vendors.
Some tiling formats used internally by vendors are inappropriate for
sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode
are defined more by the asic specific constraints than the tiling mode
itself. At a high level we have basically 3 tiling modes (out of 16
possible) that would be the minimum we'd want to expose for gfx6-8.
gfx9 uses a completely new scheme.
1. Linear (per asic stride requirements, not usable by many hw blocks)
2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4
possible), bank width (4 possible), bank height (4 possible), macro
tile aspect (4 possible) all of which are asic config specific)

I guess we could do something like:
AMD_GFX6_LINEAR_ALIGNED_64B
AMD_GFX6_LINEAR_ALIGNED_256B
AMD_GFX6_LINEAR_ALIGNED_512B
AMD_GFX6_1D_THIN_DISPLAY
AMD_GFX6_1D_THIN_DEPTH
AMD_GFX6_1D_THIN_ROTATED
AMD_GFX6_1D_THIN_THIN
AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
etc.

We only probably need 40 bits to encode all of the tiling parameters
so we could do family, plus tiling encoding that still seems unwieldy
to deal with from an application perspective. All of the parameters
affect the alignment requirements.

Alex
Kristian Høgsberg
2018-02-22 18:04:51 UTC
Permalink
Post by Alex Deucher
Post by Chad Versace
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier,
One goal of modifiers in the Linux ecosystem is to enumerate for each
vendor a reasonably sized set of tiling formats that are
appropriate for
Post by Alex Deucher
Post by Chad Versace
images shared across processes, APIs, and/or devices, where each
participating component may possibly be from different vendors.
A non-goal is to enumerate all tiling formats supported by all vendors.
Some tiling formats used internally by vendors are inappropriate for
sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode
are defined more by the asic specific constraints than the tiling mode
itself. At a high level we have basically 3 tiling modes (out of 16
possible) that would be the minimum we'd want to expose for gfx6-8.
gfx9 uses a completely new scheme.
1. Linear (per asic stride requirements, not usable by many hw blocks)
2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4
possible), bank width (4 possible), bank height (4 possible), macro
tile aspect (4 possible) all of which are asic config specific)
AMD_GFX6_LINEAR_ALIGNED_64B
AMD_GFX6_LINEAR_ALIGNED_256B
AMD_GFX6_LINEAR_ALIGNED_512B
AMD_GFX6_1D_THIN_DISPLAY
AMD_GFX6_1D_THIN_DEPTH
AMD_GFX6_1D_THIN_ROTATED
AMD_GFX6_1D_THIN_THIN
AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
Post by Alex Deucher
etc.
We only probably need 40 bits to encode all of the tiling parameters
so we could do family, plus tiling encoding that still seems unwieldy
to deal with from an application perspective. All of the parameters
affect the alignment requirements.
We discussed this earlier in the thread, here's what I said:

Another point here is that the modifier doesn't need to encode all the
thing you have to communicate to the HW. For a given width, height, format,
compression type and maybe a few other high-level parameters, I'm skeptical
that the remaining tile parameters aren't just mechanically derivable using
a fixed table or formula. So instead of thinking of the modifiers as
something you can just memcpy into a state packet, it identifies a family
of configurations - enough information to deterministically derive the full
exact configuration. The formula may change, for example for different
hardware or if it's determined to not be optimal, and in that case, we can
use a new modifier to represent to new formula.

Kristian
Post by Alex Deucher
Alex
Bas Nieuwenhuizen
2018-02-22 18:49:08 UTC
Permalink
Post by Kristian Kristensen
Post by Alex Deucher
Post by Chad Versace
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a
total of
Post by Alex Deucher
Post by Chad Versace
56 bits that configure your tiling/swizzling for color buffers. What
I don't
Post by Alex Deucher
Post by Chad Versace
buy is that you need all those combinations when passing buffers
around
Post by Alex Deucher
Post by Chad Versace
between codecs, cameras and display controllers. Even if you're
sharing
Post by Alex Deucher
Post by Chad Versace
between the same 3D drivers in different processes, I expect just
locking
Post by Alex Deucher
Post by Chad Versace
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier,
One goal of modifiers in the Linux ecosystem is to enumerate for
each
Post by Alex Deucher
Post by Chad Versace
vendor a reasonably sized set of tiling formats that are
appropriate for
Post by Alex Deucher
Post by Chad Versace
images shared across processes, APIs, and/or devices, where each
participating component may possibly be from different vendors.
A non-goal is to enumerate all tiling formats supported by all
vendors.
Post by Alex Deucher
Post by Chad Versace
Some tiling formats used internally by vendors are inappropriate for
sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode
are defined more by the asic specific constraints than the tiling mode
itself. At a high level we have basically 3 tiling modes (out of 16
possible) that would be the minimum we'd want to expose for gfx6-8.
gfx9 uses a completely new scheme.
1. Linear (per asic stride requirements, not usable by many hw blocks)
2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4
possible), bank width (4 possible), bank height (4 possible), macro
tile aspect (4 possible) all of which are asic config specific)
AMD_GFX6_LINEAR_ALIGNED_64B
AMD_GFX6_LINEAR_ALIGNED_256B
AMD_GFX6_LINEAR_ALIGNED_512B
AMD_GFX6_1D_THIN_DISPLAY
AMD_GFX6_1D_THIN_DEPTH
AMD_GFX6_1D_THIN_ROTATED
AMD_GFX6_1D_THIN_THIN
AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
Post by Alex Deucher
etc.
We only probably need 40 bits to encode all of the tiling parameters
so we could do family, plus tiling encoding that still seems unwieldy
to deal with from an application perspective. All of the parameters
affect the alignment requirements.
Another point here is that the modifier doesn't need to encode all the
thing you have to communicate to the HW. For a given width, height, format,
compression type and maybe a few other high-level parameters, I'm skeptical
that the remaining tile parameters aren't just mechanically derivable using
a fixed table or formula. So instead of thinking of the modifiers as
something you can just memcpy into a state packet, it identifies a family
of configurations - enough information to deterministically derive the full
exact configuration. The formula may change, for example for different
hardware or if it's determined to not be optimal, and in that case, we can
use a new modifier to represent to new formula.
I think this is not so much about being able to dump it in a state
packet, but about sharing between different GPUs of AMD. We have
basically only a few interesting tiling modes if you look at a single
GPU, but checking if those are equal depends on the other bits which
may or may not be different per chip for the same conceptual tiling
mode. We could just put a chip identifier in, but that would preclude
any sharing while I think we can do some.

- Bas
Post by Kristian Kristensen
Kristian
Post by Alex Deucher
Alex
_______________________________________________
dri-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/dri-devel
Alex Deucher
2018-02-22 21:16:52 UTC
Permalink
On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen
Post by Bas Nieuwenhuizen
Post by Kristian Kristensen
Post by Alex Deucher
Post by Chad Versace
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a
total of
Post by Alex Deucher
Post by Chad Versace
56 bits that configure your tiling/swizzling for color buffers. What
I don't
Post by Alex Deucher
Post by Chad Versace
buy is that you need all those combinations when passing buffers
around
Post by Alex Deucher
Post by Chad Versace
between codecs, cameras and display controllers. Even if you're
sharing
Post by Alex Deucher
Post by Chad Versace
between the same 3D drivers in different processes, I expect just
locking
Post by Alex Deucher
Post by Chad Versace
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier,
One goal of modifiers in the Linux ecosystem is to enumerate for
each
Post by Alex Deucher
Post by Chad Versace
vendor a reasonably sized set of tiling formats that are
appropriate for
Post by Alex Deucher
Post by Chad Versace
images shared across processes, APIs, and/or devices, where each
participating component may possibly be from different vendors.
A non-goal is to enumerate all tiling formats supported by all
vendors.
Post by Alex Deucher
Post by Chad Versace
Some tiling formats used internally by vendors are inappropriate for
sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode
are defined more by the asic specific constraints than the tiling mode
itself. At a high level we have basically 3 tiling modes (out of 16
possible) that would be the minimum we'd want to expose for gfx6-8.
gfx9 uses a completely new scheme.
1. Linear (per asic stride requirements, not usable by many hw blocks)
2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4
possible), bank width (4 possible), bank height (4 possible), macro
tile aspect (4 possible) all of which are asic config specific)
AMD_GFX6_LINEAR_ALIGNED_64B
AMD_GFX6_LINEAR_ALIGNED_256B
AMD_GFX6_LINEAR_ALIGNED_512B
AMD_GFX6_1D_THIN_DISPLAY
AMD_GFX6_1D_THIN_DEPTH
AMD_GFX6_1D_THIN_ROTATED
AMD_GFX6_1D_THIN_THIN
AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
Post by Alex Deucher
etc.
We only probably need 40 bits to encode all of the tiling parameters
so we could do family, plus tiling encoding that still seems unwieldy
to deal with from an application perspective. All of the parameters
affect the alignment requirements.
Another point here is that the modifier doesn't need to encode all the
thing you have to communicate to the HW. For a given width, height, format,
compression type and maybe a few other high-level parameters, I'm skeptical
that the remaining tile parameters aren't just mechanically derivable using
a fixed table or formula. So instead of thinking of the modifiers as
something you can just memcpy into a state packet, it identifies a family
of configurations - enough information to deterministically derive the full
exact configuration. The formula may change, for example for different
hardware or if it's determined to not be optimal, and in that case, we can
use a new modifier to represent to new formula.
I think this is not so much about being able to dump it in a state
packet, but about sharing between different GPUs of AMD. We have
basically only a few interesting tiling modes if you look at a single
GPU, but checking if those are equal depends on the other bits which
may or may not be different per chip for the same conceptual tiling
mode. We could just put a chip identifier in, but that would preclude
any sharing while I think we can do some.
Right. And the 2D ones, while they are the most complicated, are also
the most interesting from a performance perspective so ideally you'd
find a match on one of those. If you don't expose the 2D modes,
there's not much point in supporting modifiers at all.

Alex
James Jones
2018-02-27 06:10:36 UTC
Permalink
Post by Alex Deucher
On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen
Post by Bas Nieuwenhuizen
Post by Kristian Kristensen
Post by Alex Deucher
Post by Chad Versace
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a
total of
Post by Alex Deucher
Post by Chad Versace
56 bits that configure your tiling/swizzling for color buffers. What
I don't
Post by Alex Deucher
Post by Chad Versace
buy is that you need all those combinations when passing buffers
around
Post by Alex Deucher
Post by Chad Versace
between codecs, cameras and display controllers. Even if you're
sharing
Post by Alex Deucher
Post by Chad Versace
between the same 3D drivers in different processes, I expect just
locking
Post by Alex Deucher
Post by Chad Versace
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier,
One goal of modifiers in the Linux ecosystem is to enumerate for
each
Post by Alex Deucher
Post by Chad Versace
vendor a reasonably sized set of tiling formats that are
appropriate for
Post by Alex Deucher
Post by Chad Versace
images shared across processes, APIs, and/or devices, where each
participating component may possibly be from different vendors.
A non-goal is to enumerate all tiling formats supported by all
vendors.
Post by Alex Deucher
Post by Chad Versace
Some tiling formats used internally by vendors are inappropriate for
sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode
are defined more by the asic specific constraints than the tiling mode
itself. At a high level we have basically 3 tiling modes (out of 16
possible) that would be the minimum we'd want to expose for gfx6-8.
gfx9 uses a completely new scheme.
1. Linear (per asic stride requirements, not usable by many hw blocks)
2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4
possible), bank width (4 possible), bank height (4 possible), macro
tile aspect (4 possible) all of which are asic config specific)
AMD_GFX6_LINEAR_ALIGNED_64B
AMD_GFX6_LINEAR_ALIGNED_256B
AMD_GFX6_LINEAR_ALIGNED_512B
AMD_GFX6_1D_THIN_DISPLAY
AMD_GFX6_1D_THIN_DEPTH
AMD_GFX6_1D_THIN_ROTATED
AMD_GFX6_1D_THIN_THIN
AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
Post by Alex Deucher
etc.
We only probably need 40 bits to encode all of the tiling parameters
so we could do family, plus tiling encoding that still seems unwieldy
to deal with from an application perspective. All of the parameters
affect the alignment requirements.
Another point here is that the modifier doesn't need to encode all the
thing you have to communicate to the HW. For a given width, height, format,
compression type and maybe a few other high-level parameters, I'm skeptical
that the remaining tile parameters aren't just mechanically derivable using
a fixed table or formula. So instead of thinking of the modifiers as
something you can just memcpy into a state packet, it identifies a family
of configurations - enough information to deterministically derive the full
exact configuration. The formula may change, for example for different
hardware or if it's determined to not be optimal, and in that case, we can
use a new modifier to represent to new formula.
I think this is not so much about being able to dump it in a state
packet, but about sharing between different GPUs of AMD. We have
basically only a few interesting tiling modes if you look at a single
GPU, but checking if those are equal depends on the other bits which
may or may not be different per chip for the same conceptual tiling
mode. We could just put a chip identifier in, but that would preclude
any sharing while I think we can do some.
Right. And the 2D ones, while they are the most complicated, are also
the most interesting from a performance perspective so ideally you'd
find a match on one of those. If you don't expose the 2D modes,
there's not much point in supporting modifiers at all.
This is essentially the problem I keep running into when trying to work
up something based on the suggestions here as well. Yes, for a given
build of our driver on a single device, we can re-derive exactly the
same tiling parameters given a few manageable constraints. That was the
essence of the design of the Vulkan external objects framework, and it
comes with all the limitations I'm trying to avoid by introducing the
more complex allocator framework:

-We want to share across GPUs.

-We potentially want to share across non-version-locked driver
components, even potentially between Nouveau-driven/Tegra-DRM driven
GPUs and NVIDIA proprietary driven GPUs. There's no way we can assure
the drivers use the same algorithm there.

Taking it further than even I would like to, in a discussion over DRM
format modifier usage in Vulkan, it was recently proposed that DRM
format modifiers be used to serialize data in a pre-tiled format. I
personally don't think DRM format modifiers should be used for this at
all, but something like extended allocator meta-data might be appropriate.

At this point I've heard engineers from Intel, AMD, and of course myself
at NVIDIA saying that while DRM format modifiers solve many more cases
than assuming pitch-linear or doing magic to pass around metadata, they
don't solve all the cases necessary to make optimal use of any of our HW
in at least some interesting cases. Hence it seems reasonable to
continue to improve the design of these mechanisms.

Responding to some earlier points that fell off my mail retention limit
Post by Alex Deucher
I understand that it's an incomplete example, but even so I don't think
this duplication is feasible. It's not a matter of how many use cases we
have to duplicate at this point in time, it's that all these APIs are live,
evolving APIs and keeping the allocator uptodate as various APIs grow new
corner cases doesn't seem practical. Further, it's not orthogonal or
composable - the allocator has to know about all producers and consumers
and if I add a new piece of hardware I have to extend the allocator to
understands its new use cases. With the modifier model, I just ask the new
driver which modifiers it supports for the use case I'm interested in and
feed those modifiers to the allocator.
There are currently 3 complete modern low-level 3D graphics APIs along
with some slightly longer in the tooth higher-level alternatives being
actively maintained at more or less the same feature level, countless
video decode/encode APIs with more or less equivalent functionality, and
more mode setting APIs than anyone wants. If that much total duplicated
effort is possible, it seems feasible to maintain a list of layouts and
related properties, most of which will see some re-use between all these
APIs.

Further, the central library doesn't need to be burdened by all of these
use cases unless they become cross-vendor. The usage itself is
vendor-extensible, so if AMD had wanted to add a bunch of Mantle-only
usage bits, they could have done so without cluttering the shared
library code or namespace.
Post by Alex Deucher
Vulkan isn't expected to know about video encode usage. You ask the video
codec about supported modifiers for encode and you ask Vulkan for supported
modifiers for, say optimal render usage. The allocator determines the
optimal lowest common denominator and allocates the buffer. Maybe that's
linear, or if you've designed both parts, maybe there's a simple shared
tiled format that the encoder can source from.
It was determined early on in attempts to design this mechanism that
such LCD intersection doesn't produce the optimal result. Only
considering the usage holistically can produce optimal layouts.
Post by Alex Deucher
For modifiers and liballocator as well, the meta data is copied by value
(and passed through IPC) and as such can't model shared mutable
information. That means, fast colors, compression aux buffers and such, has
to be in a share BO plane.
Again, this is making large design assumptions. Fast clear color data,
for example would be a very reasonable thing to include in static
metadata given our driver+HW architecture.

Thanks,
-James
Post by Alex Deucher
Alex
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Daniel Vetter
2018-03-07 17:23:39 UTC
Permalink
Post by Alex Deucher
On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen
Post by Bas Nieuwenhuizen
Post by Kristian Kristensen
Post by Alex Deucher
Post by Chad Versace
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian H??gsberg <
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a
total of
Post by Alex Deucher
Post by Chad Versace
56 bits that configure your tiling/swizzling for color buffers. What
I don't
Post by Alex Deucher
Post by Chad Versace
buy is that you need all those combinations when passing buffers
around
Post by Alex Deucher
Post by Chad Versace
between codecs, cameras and display controllers. Even if you're
sharing
Post by Alex Deucher
Post by Chad Versace
between the same 3D drivers in different processes, I expect just
locking
Post by Alex Deucher
Post by Chad Versace
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier,
One goal of modifiers in the Linux ecosystem is to enumerate for
each
Post by Alex Deucher
Post by Chad Versace
vendor a reasonably sized set of tiling formats that are
appropriate for
Post by Alex Deucher
Post by Chad Versace
images shared across processes, APIs, and/or devices, where each
participating component may possibly be from different vendors.
A non-goal is to enumerate all tiling formats supported by all
vendors.
Post by Alex Deucher
Post by Chad Versace
Some tiling formats used internally by vendors are inappropriate for
sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode
are defined more by the asic specific constraints than the tiling mode
itself. At a high level we have basically 3 tiling modes (out of 16
possible) that would be the minimum we'd want to expose for gfx6-8.
gfx9 uses a completely new scheme.
1. Linear (per asic stride requirements, not usable by many hw blocks)
2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4
possible), bank width (4 possible), bank height (4 possible), macro
tile aspect (4 possible) all of which are asic config specific)
AMD_GFX6_LINEAR_ALIGNED_64B
AMD_GFX6_LINEAR_ALIGNED_256B
AMD_GFX6_LINEAR_ALIGNED_512B
AMD_GFX6_1D_THIN_DISPLAY
AMD_GFX6_1D_THIN_DEPTH
AMD_GFX6_1D_THIN_ROTATED
AMD_GFX6_1D_THIN_THIN
AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
Post by Alex Deucher
etc.
We only probably need 40 bits to encode all of the tiling parameters
so we could do family, plus tiling encoding that still seems unwieldy
to deal with from an application perspective. All of the parameters
affect the alignment requirements.
Another point here is that the modifier doesn't need to encode all the
thing you have to communicate to the HW. For a given width, height, format,
compression type and maybe a few other high-level parameters, I'm skeptical
that the remaining tile parameters aren't just mechanically derivable using
a fixed table or formula. So instead of thinking of the modifiers as
something you can just memcpy into a state packet, it identifies a family
of configurations - enough information to deterministically derive the full
exact configuration. The formula may change, for example for different
hardware or if it's determined to not be optimal, and in that case, we can
use a new modifier to represent to new formula.
I think this is not so much about being able to dump it in a state
packet, but about sharing between different GPUs of AMD. We have
basically only a few interesting tiling modes if you look at a single
GPU, but checking if those are equal depends on the other bits which
may or may not be different per chip for the same conceptual tiling
mode. We could just put a chip identifier in, but that would preclude
any sharing while I think we can do some.
Right. And the 2D ones, while they are the most complicated, are also
the most interesting from a performance perspective so ideally you'd
find a match on one of those. If you don't expose the 2D modes,
there's not much point in supporting modifiers at all.
1. Make sure you have a test farm covering all your use cases and hw.

2. Create a struct that encodes everything. Make it a few kb big if it has
to be, whatever it takes.

3. Do a little library that contains a huge table mapping modifiers to
these structs, and one function that returns you the unique modifier for
the given tiling layout description struct. We can have that in the kernel
sources, or just delegate the entire AMD modifier block to some userspace
library you're managing (with just the few modifiers the kernel needs in
the uapi/drm_fourcc.h header). If the lib doesn't find the modifier, make
it crash with a nice loud backtrace.

4. Add modifiers to that lib until you stop failing on the test farm.

5 optional: Make the lib faster with hashing/compressing/whatever if it
turns out to be a bottleneck somewhere. Since you'll only ever need it on
import/export, add a small cache with the relevant few entries for the
device instance at hand and I don't expect this will be a problem, ever.

I'm pretty sure you'll finish step 4 before you run out of modifiers. If
you don't, then we suck it up, admit sheepishly that modifiers turned out
to be a stupid idea and rev the kernel's uapi. We know how to do that, but
I also don't want to rev uapi just for fun.

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Eric Anholt
2018-02-22 19:21:34 UTC
Permalink
Post by Kristian Kristensen
Post by Alex Deucher
Post by Chad Versace
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian HÞgsberg <
Post by Kristian Høgsberg
I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.
The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.
I understand that you may have n knobs with a total of more than a
total of
Post by Alex Deucher
Post by Chad Versace
56 bits that configure your tiling/swizzling for color buffers. What
I don't
Post by Alex Deucher
Post by Chad Versace
buy is that you need all those combinations when passing buffers
around
Post by Alex Deucher
Post by Chad Versace
between codecs, cameras and display controllers. Even if you're
sharing
Post by Alex Deucher
Post by Chad Versace
between the same 3D drivers in different processes, I expect just
locking
Post by Alex Deucher
Post by Chad Versace
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier,
One goal of modifiers in the Linux ecosystem is to enumerate for
each
Post by Alex Deucher
Post by Chad Versace
vendor a reasonably sized set of tiling formats that are
appropriate for
Post by Alex Deucher
Post by Chad Versace
images shared across processes, APIs, and/or devices, where each
participating component may possibly be from different vendors.
A non-goal is to enumerate all tiling formats supported by all
vendors.
Post by Alex Deucher
Post by Chad Versace
Some tiling formats used internally by vendors are inappropriate for
sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode
are defined more by the asic specific constraints than the tiling mode
itself. At a high level we have basically 3 tiling modes (out of 16
possible) that would be the minimum we'd want to expose for gfx6-8.
gfx9 uses a completely new scheme.
1. Linear (per asic stride requirements, not usable by many hw blocks)
2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4
possible), bank width (4 possible), bank height (4 possible), macro
tile aspect (4 possible) all of which are asic config specific)
AMD_GFX6_LINEAR_ALIGNED_64B
AMD_GFX6_LINEAR_ALIGNED_256B
AMD_GFX6_LINEAR_ALIGNED_512B
AMD_GFX6_1D_THIN_DISPLAY
AMD_GFX6_1D_THIN_DEPTH
AMD_GFX6_1D_THIN_ROTATED
AMD_GFX6_1D_THIN_THIN
AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
Post by Alex Deucher
etc.
We only probably need 40 bits to encode all of the tiling parameters
so we could do family, plus tiling encoding that still seems unwieldy
to deal with from an application perspective. All of the parameters
affect the alignment requirements.
Another point here is that the modifier doesn't need to encode all the
thing you have to communicate to the HW. For a given width, height, format,
compression type and maybe a few other high-level parameters, I'm skeptical
that the remaining tile parameters aren't just mechanically derivable using
a fixed table or formula. So instead of thinking of the modifiers as
something you can just memcpy into a state packet, it identifies a family
of configurations - enough information to deterministically derive the full
exact configuration. The formula may change, for example for different
hardware or if it's determined to not be optimal, and in that case, we can
use a new modifier to represent to new formula.
Agreed. For Broadcom's VC5+ stuff, our tiling layout depends on the
number of SDRAM banks and bank size, but all users of buffers will know
what those are, so I'm not planning on including those in the modifier.
Daniel Stone
2017-12-20 21:58:51 UTC
Permalink
Hi Miguel,
Post by Miguel Angel Vico
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Thanks for taking a look at this! I'm still winding out my to-do list
for the year, but hoping to get to this more seriously soon.

As a general comment, now that modifiers are a first-class concept in
many places (KMS FBs, KMS plane format advertisement, V4L2 buffers,
EGL/Vulkan image import/export, Wayland buffer import, etc), I'd like
to see them included as a first-class concept in the allocator. I
understand one of the primary reservations against using them was that
QNX didn't have such a concept, but just specifying them to be ignored
on non-Linux platforms would probably work fine.
Post by Miguel Angel Vico
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
This worries me. If the data is static for the lifetime of the buffer
- describing the tiling layout, for instance - then it would form
effective ABI for all the consumers/producers using that buffer type.
If it is dynamic, you also have a world of synchronisation problems
when multiple users race each other with different uses of that buffer
(and presumably you would need to reload the metadata on every use?).
Either way, anyone using this would need to have a very well-developed
compatibility story, given that you can mix and match kernel and
userspace versions.
Post by Miguel Angel Vico
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
Similarly, this seems to be missing either a 'mandatory' flag so
userspace can inform the kernel it must fail if it does not understand
certain capabilities, or a way for the kernel to inform userspace
which capabilities it does/doesn't understand.

The capabilities in the example are also very oddly chosen. Address
alignment, pitch alignment, and maximum pitch are superfluous: the KMS
driver is the single source of truth for these values for FBs, so it
isn't useful for userspace to provide it. Specifically for pitch
alignment and maximum pitch, the pitch values are already given in the
same ioctl, so all you can check with these values (before the driver
does its own check again) is that userspace is self-consistent. These
three capabilities all relate to BO allocation rather than FB
creation: if a BO is rendered into with the wrong pitch, or allocated
at the wrong base address, we've already lost because the allocation
was incorrect.

Did you have some other capabilities in mind which would be more
relevant to FBs?
Post by Miguel Angel Vico
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
I also have my reservations about this one. The general idea behind
FBs is that, if the buffers are identical but for memory addresses and
pixel content, the parameters should be equal but the per-plane buffer
contents different. Conversely, if the buffers differ in any way but
the above, the parameters should be different. For instance, if
buffers have identical layouts (tiling/swizzling/compression),
identical pixel content once interpreted, but the only thing which
differs is the compression status (fully resolved / not resolved), I
would expect to see identical parameters and differing data in the
auxiliary compression plane. We had quite a long discussion for
framebuffer compression on Intel where we shot down the concept of
expressing compression status as a plane property for basically this
reason.

Drivers which were created in the DRI2 era where the only metadata
transited was handle/pitch/format, worked around that by adding
auxiliary data hanging off the buffer to describe their actual layout,
but this is a mistake we're trying to move away from. So I reflexively
get a bit itchy when I see the kernel being used to transit magic
blobs of data which are supplied by userspace, and only interpreted by
different userspace. Having tiling formats hidden away means that
we've had real-world bugs in AMD hardware, where we end up displaying
garbage because we cannot generically reason about the buffer
attributes.

As with Kristian, I'd also like to hear any examples of metadata which
wouldn't fit inside 56 bits. A quick finger count says that if you
have 128 different possibilities for all of: tiling layout, micro-tile
size, macro-tile size, supermacro-tile size, swizzling/addressing
mode, and compression, this only uses 42 of the 56 bytes available to
you, still leaving two free 128-value axes. Is your concern about the
lack of space along these axes I've identified, or that you need more
axes, or ... ?

Cheers,
Daniel
James Jones
2017-12-21 08:06:36 UTC
Permalink
Post by Daniel Stone
Hi Miguel,
Post by Miguel Angel Vico
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Thanks for taking a look at this! I'm still winding out my to-do list
for the year, but hoping to get to this more seriously soon.
As a general comment, now that modifiers are a first-class concept in
many places (KMS FBs, KMS plane format advertisement, V4L2 buffers,
EGL/Vulkan image import/export, Wayland buffer import, etc), I'd like
to see them included as a first-class concept in the allocator. I
understand one of the primary reservations against using them was that
QNX didn't have such a concept, but just specifying them to be ignored
on non-Linux platforms would probably work fine.
The allocator mechanisms and format modifiers are orthogonal though.
Either capability sets can be represented using format modifiers (the
direction one part of this thread is suggesting, which I think is a bad
idea), or format modifiers could easily be included as a vendor-agnostic
capability, similar to pitch layout. There are no "First class
citizens" in the allocator mechanism itself. That's the whole idea:
Apps don't need to care about things like how the OS represents its
surface metadata beyond some truly universal things like width and
height (assertions). The rest is abstracted away such that the apps are
portable, even if the drivers/backends aren't. Even if the solution
within Linux is "just use format modifiers", there's still some benefit
to making the kernel ABI use something slightly higher level that
translates to DRM format modifiers inside the kernel, just to keep the
apps OS-agnostic.
Post by Daniel Stone
Post by Miguel Angel Vico
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
This worries me. If the data is static for the lifetime of the buffer
- describing the tiling layout, for instance - then it would form
effective ABI for all the consumers/producers using that buffer type.
If it is dynamic, you also have a world of synchronisation problems
when multiple users race each other with different uses of that buffer
(and presumably you would need to reload the metadata on every use?).
Either way, anyone using this would need to have a very well-developed
compatibility story, given that you can mix and match kernel and
userspace versions.
I think the metadata is static. The surface meta-state is not, but that
would be a commit time thing if anything, not a GEM or FB object thing.
Still attaching metadata to GEM objects, which seem to be opaque blobs
of memory in the general case, rather than attaching it to FB's mapped
onto the GEM objects always felt architecturally wrong to me. You can
have multiple FBs in one GEM object, for example. There's no reason to
assume they would share the same format let alone tiling layout.
Post by Daniel Stone
Post by Miguel Angel Vico
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
Similarly, this seems to be missing either a 'mandatory' flag so
userspace can inform the kernel it must fail if it does not understand
certain capabilities, or a way for the kernel to inform userspace
which capabilities it does/doesn't understand.
I think that will fall out of the discussion over exactly what
capability sets look like. Regardless, yes, the kernel must fail if it
can't support a given capability set, just as it would fail if it
couldn't support a given DRM Format modifier. Like the format
modifiers, the userspace allocator driver would have queried the DRM
kernel driver when reporting supported capability sets for a usage that
required creating FBs, so it would always be user error to reach such a
state. There would be no ambiguity as to whether a given set or
individual capability was supported by the kernel driver at FB creation
time.
Post by Daniel Stone
The capabilities in the example are also very oddly chosen. Address
alignment, pitch alignment, and maximum pitch are superfluous: the KMS
driver is the single source of truth for these values for FBs, so it
isn't useful for userspace to provide it. Specifically for pitch
alignment and maximum pitch, the pitch values are already given in the
same ioctl, so all you can check with these values (before the driver
does its own check again) is that userspace is self-consistent. These
three capabilities all relate to BO allocation rather than FB
creation: if a BO is rendered into with the wrong pitch, or allocated
at the wrong base address, we've already lost because the allocation
was incorrect.
Did you have some other capabilities in mind which would be more
relevant to FBs?
We should probably create an example using some vendor-specific
capabilities. Tiling parameters are the quintessential vendor-specific
example.
Post by Daniel Stone
Post by Miguel Angel Vico
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
I also have my reservations about this one. The general idea behind
FBs is that, if the buffers are identical but for memory addresses and
pixel content, the parameters should be equal but the per-plane buffer
contents different. Conversely, if the buffers differ in any way but
the above, the parameters should be different. For instance, if
buffers have identical layouts (tiling/swizzling/compression),
identical pixel content once interpreted, but the only thing which
differs is the compression status (fully resolved / not resolved), I
would expect to see identical parameters and differing data in the
auxiliary compression plane. We had quite a long discussion for
framebuffer compression on Intel where we shot down the concept of
expressing compression status as a plane property for basically this
reason.
Actual compression data is read by the GPU though. Specifying metadata
(I.e., a few bits saying whether there is a compression plane and if so,
what it's layout is) only accessed by kernel drivers and userspace
drivers as an FB plane seems to be stretching the abstraction a bit.
That means you probably have to create a CPU-only GEM buffer to put it
in. Not impossible, just a lot of busy work, and potentially bug-prone:
If previously your kernel driver only supported creating GEM buffers
that were HW accessible, you have to sprinkle a bunch of "But not this
type of GEM buffer" in all your validation code for HW operations on GEM
buffers.
Post by Daniel Stone
Drivers which were created in the DRI2 era where the only metadata
transited was handle/pitch/format, worked around that by adding
auxiliary data hanging off the buffer to describe their actual layout,
but this is a mistake we're trying to move away from. So I reflexively
get a bit itchy when I see the kernel being used to transit magic
blobs of data which are supplied by userspace, and only interpreted by
different userspace. Having tiling formats hidden away means that
we've had real-world bugs in AMD hardware, where we end up displaying
garbage because we cannot generically reason about the buffer
attributes.
Agreed. For those not clear, and to verify my own understanding, the
above paragraph is basically an argument similar to my own above against
using Miguel's option (A), correct?
Post by Daniel Stone
As with Kristian, I'd also like to hear any examples of metadata which
wouldn't fit inside 56 bits. A quick finger count says that if you
have 128 different possibilities for all of: tiling layout, micro-tile
size, macro-tile size, supermacro-tile size, swizzling/addressing
mode, and compression, this only uses 42 of the 56 bytes available to
you, still leaving two free 128-value axes. Is your concern about the
lack of space along these axes I've identified, or that you need more
axes, or ... ?
Your worst case analysis above isn't far off from our HW, give or take
some bits and axes here and there. We've started an internal discussion
about how to lay out all the bits we need. It's hard to even enumerate
them all without having a complete understanding of what capability sets
are going to include, a fully-optimized implementation of the mechanism
on our HW, and lot's of test scenarios though.

However, making some assumptions, I suspect it's probably going to come
down to yes we can fit what we need in some number of bits marginally
less than 56 now, with the current use cases and hardware, but we're
very concerned about extensibility given the number has only ever grown
in our HW, is uncomfortably close to the limit if it isn't over it
already, and it's been demonstrated it takes a monumental effort to
change the mechanism if it isn't extensible. While it's hard to change
the mechanism one more time now, better to change it to something truly
extensible now because it will be much, much harder to make such a
change ~5 years from now in a world where it's baked in to pervasively
deployed Wayland and X protocol, the EGL and Vulkan extensions have been
defined for a few years and in use by apps besides Wayland, and the
allocator stuff is deployed on ~5 operating systems that have some
derivative version of DRM modifiers to support it and a bunch of funky
embedded apps using it. Further, we're volunteering to handle the bulk
of the effort needed to make the change now, so I hope architectural
correctness and maintainability can be the primary points of debate.

Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Daniel Vetter
2017-12-21 08:36:21 UTC
Permalink
However, making some assumptions, I suspect it's probably going to come down
to yes we can fit what we need in some number of bits marginally less than
56 now, with the current use cases and hardware, but we're very concerned
about extensibility given the number has only ever grown in our HW, is
uncomfortably close to the limit if it isn't over it already, and it's been
demonstrated it takes a monumental effort to change the mechanism if it
isn't extensible. While it's hard to change the mechanism one more time
now, better to change it to something truly extensible now because it will
be much, much harder to make such a change ~5 years from now in a world
where it's baked in to pervasively deployed Wayland and X protocol, the EGL
and Vulkan extensions have been defined for a few years and in use by apps
besides Wayland, and the allocator stuff is deployed on ~5 operating systems
that have some derivative version of DRM modifiers to support it and a bunch
of funky embedded apps using it. Further, we're volunteering to handle the
bulk of the effort needed to make the change now, so I hope architectural
correctness and maintainability can be the primary points of debate.
I think that's already happened. So no matter what we do, we're going
to live with an ecosystem that uses modifiers all over the place in 5
years. Even if it's not fully pervasive we will have to keep the
support around for 10 years (at least on the kernel side).

So the option is between reving the entire ecosystem now, or reving it
in a few years when the current scheme has run out of steam for good.
And I much prefer the 2nd option for the simple reason that by then
the magic 8ball has gained another 5 years of clarity for looking into
the future.

I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward. In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Kristian Kristensen
2017-12-21 17:47:39 UTC
Permalink
Post by James Jones
Post by James Jones
However, making some assumptions, I suspect it's probably going to come
down
Post by James Jones
to yes we can fit what we need in some number of bits marginally less
than
Post by James Jones
56 now, with the current use cases and hardware, but we're very concerned
about extensibility given the number has only ever grown in our HW, is
uncomfortably close to the limit if it isn't over it already, and it's
been
Post by James Jones
demonstrated it takes a monumental effort to change the mechanism if it
isn't extensible. While it's hard to change the mechanism one more time
now, better to change it to something truly extensible now because it
will
Post by James Jones
be much, much harder to make such a change ~5 years from now in a world
where it's baked in to pervasively deployed Wayland and X protocol, the
EGL
Post by James Jones
and Vulkan extensions have been defined for a few years and in use by
apps
Post by James Jones
besides Wayland, and the allocator stuff is deployed on ~5 operating
systems
Post by James Jones
that have some derivative version of DRM modifiers to support it and a
bunch
Post by James Jones
of funky embedded apps using it. Further, we're volunteering to handle
the
Post by James Jones
bulk of the effort needed to make the change now, so I hope architectural
correctness and maintainability can be the primary points of debate.
I think that's already happened. So no matter what we do, we're going
to live with an ecosystem that uses modifiers all over the place in 5
years. Even if it's not fully pervasive we will have to keep the
support around for 10 years (at least on the kernel side).
So the option is between reving the entire ecosystem now, or reving it
in a few years when the current scheme has run out of steam for good.
And I much prefer the 2nd option for the simple reason that by then
the magic 8ball has gained another 5 years of clarity for looking into
the future.
I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward.
I agree and let me elaborate a bit. The problem we're seeing isn't that we
need more that 2^56 modifiers for a future GPU. The problem is that flags
like USE_SCANOUT (which your allocator proposal essentially keeps) is
inadequate. The available tiling and compression formats vary with which
(in KMS terms) CRTC you want to use, which plane you're on whether you want
rotation or no and how much you want to scale etc. It's not realistic to
think that we could model this in a centralized allocator library that's
detached from the display driver. To be fair, this is not a point about
blobs vs modifiers, it's saying that the use flags don't belong in the
allocator, they belong in the APIs that will be using the buffer - and not
as literal use flags, but as a way to discover supported modifiers for a
given use case.

Kristian
Post by James Jones
In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Rob Clark
2017-12-21 22:34:44 UTC
Permalink
Post by Daniel Vetter
However, making some assumptions, I suspect it's probably going to come down
to yes we can fit what we need in some number of bits marginally less than
56 now, with the current use cases and hardware, but we're very concerned
about extensibility given the number has only ever grown in our HW, is
uncomfortably close to the limit if it isn't over it already, and it's been
demonstrated it takes a monumental effort to change the mechanism if it
isn't extensible. While it's hard to change the mechanism one more time
now, better to change it to something truly extensible now because it will
be much, much harder to make such a change ~5 years from now in a world
where it's baked in to pervasively deployed Wayland and X protocol, the EGL
and Vulkan extensions have been defined for a few years and in use by apps
besides Wayland, and the allocator stuff is deployed on ~5 operating systems
that have some derivative version of DRM modifiers to support it and a bunch
of funky embedded apps using it. Further, we're volunteering to handle the
bulk of the effort needed to make the change now, so I hope architectural
correctness and maintainability can be the primary points of debate.
I think that's already happened. So no matter what we do, we're going
to live with an ecosystem that uses modifiers all over the place in 5
years. Even if it's not fully pervasive we will have to keep the
support around for 10 years (at least on the kernel side).
So the option is between reving the entire ecosystem now, or reving it
in a few years when the current scheme has run out of steam for good.
And I much prefer the 2nd option for the simple reason that by then
the magic 8ball has gained another 5 years of clarity for looking into
the future.
Drive by comment (and disclaimer, haven't had chance to read rest of
thread yet), but I think there is a reasonable path to increase the
modifier space to something like 2^^568 (minus the cases were
modifiers[0]==modifiers[1]==modifiers[2]==modifiers[3]).. (Yeah, yeah,
I'm sure there is a 640k should be enough joke here somewhere)

Fortunately currently modifiers array is at end of 'struct
drm_mode_fb_cmd2', so there maybe some other options to extend it as
well. Possibly reserving the modifier value ~0 now might be a good
idea.

It does seem like, if possible, starting out with modifiers for now at
the kernel interface would make life easier, vs trying to reinvent
both kernel and userspace APIs at the same time. Userspace APIs are
easier to change or throw away. Presumably by the time we get to the
point of changing kernel uabi, we are already using, and pretty happy
with, serialized liballoc data over the wire in userspace so it is
only a matter of changing the kernel interface.

The downside of this is needing a per-driver userspace bit to map
liballoc to modifiers. We kinda have this already in mesa, even for
the modesetting-only drivers that can be paired with a render-only
driver.

BR,
-R
Post by Daniel Vetter
I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward. In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Miguel Angel Vico
2017-12-28 18:24:38 UTC
Permalink
(Adding dri-devel back, and trying to respond to some comments from
the different forks)
Post by James Jones
Your worst case analysis above isn't far off from our HW, give or take
some bits and axes here and there. We've started an internal discussion
about how to lay out all the bits we need. It's hard to even enumerate
them all without having a complete understanding of what capability sets
are going to include, a fully-optimized implementation of the mechanism
on our HW, and lot's of test scenarios though.
(thanks James for most of the info below)

To elaborate a bit, if we want to share an allocation across GPUs for 3D
rendering, it seems we would need 12 bits to express our
swizzling/tiling memory layouts for fermi+. In addition to that,
maxwell uses 3 more bits for this, and we need an extra bit to identify
pre-fermi representations.

We also need one bit to differentiate between Tegra and desktop, and
another one to indicate whether the layout is otherwise linear.

Then things like whether compression is used (one more bit), and we can
probably get by with 3 bits for the type of compression if we are
creative. However, it'd be way easier to just track arch + page kind,
which would be like 32 bits on its own.

Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
bits.

If device-local properties are included, we might need a couple more
bits for caching.

We may also need to express locality information, which may take at
least another 2 or 3 bits.

If we want to share array textures too, you also need to pass the array
pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
its own.

So yes, as James mentioned, with some effort, we could technically fit
our current allocation parameters in a modifier, but I'm still not
convinced this is as future proof as it could be as our hardware grows
in capabilities.
Post by James Jones
So I reflexively
get a bit itchy when I see the kernel being used to transit magic
blobs of data which are supplied by userspace, and only interpreted by
different userspace. Having tiling formats hidden away means that
we've had real-world bugs in AMD hardware, where we end up displaying
garbage because we cannot generically reason about the buffer
attributes.
I'm a bit confused. Can't modifiers be specified by vendors and only
interpreted by drivers? My understanding was that modifiers could
actually be treated as opaque 64-bit data, in which case they would
qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
scalable. What am I missing?
Post by James Jones
I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward. In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.
Not sure whether I might be misunderstanding your statement, but one of
the allocator main features is negotiation of nearly optimal allocation
parameters given a set of uses on different devices/engines by the
capability merge operation. A client should have queried what every
device/engine is capable of for the given uses, find the optimal set of
capabilities, and use it for allocating a buffer. At the moment these
parameters are given to KMS, they are expected to be good. If they
aren't, the client didn't do things right.
Post by James Jones
It does seem like, if possible, starting out with modifiers for now at
the kernel interface would make life easier, vs trying to reinvent
both kernel and userspace APIs at the same time. Userspace APIs are
easier to change or throw away. Presumably by the time we get to the
point of changing kernel uabi, we are already using, and pretty happy
with, serialized liballoc data over the wire in userspace so it is
only a matter of changing the kernel interface.
I guess we can indeed start with modifiers for now, if that's what it
takes to get the allocator mechanisms rolling. However, it seems to me
that we won't be able to encode the same type of information included
in capability sets with modifiers in all cases. For instance, if we end
up encoding usage transition information in capability sets, how that
would translate to modifiers?

I assume display doesn't really care about a lot of the data capability
sets may encode, but is it correct to think of modifiers as things only
display needs? If we are to treat modifiers as a first-class citizen, I
would expect to use them beyond that.
Post by James Jones
I agree and let me elaborate a bit. The problem we're seeing isn't that we
need more that 2^56 modifiers for a future GPU. The problem is that flags
like USE_SCANOUT (which your allocator proposal essentially keeps) is
inadequate. The available tiling and compression formats vary with which
(in KMS terms) CRTC you want to use, which plane you're on whether you want
rotation or no and how much you want to scale etc. It's not realistic to
think that we could model this in a centralized allocator library that's
detached from the display driver. To be fair, this is not a point about
blobs vs modifiers, it's saying that the use flags don't belong in the
allocator, they belong in the APIs that will be using the buffer - and not
as literal use flags, but as a way to discover supported modifiers for a
given use case.
Why detached from the display driver? I don't see why there couldn't be
an allocator driver with access to display capabilities that can be
used in the negotiation step to find the optimal set of allocation
parameters.
Post by James Jones
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
If someone has N knobs available, I don't understand why there
shouldn't be a mechanism that allows making use of them all, regardless
of performance numbers.
Post by James Jones
Yeah, that part was all clear. I'd want more details of what exact
kind of metadata. fast-clear colors? tiling layouts? aux data for the
compressor? hiz (or whatever you folks call it) tree?
As you say, we've discussed massive amounts of different variants on
this, and there's different answers for different questions. Consensus
seems to be that bigger stuff (compression data, hiz, clear colors,
...) should be stored in aux planes, while the exact layout and what
kind of aux planes you have are encoded in the modifier.
My understanding is that capability sets may include all metadata you
mentioned. Besides tiling/swizzling layout and compression parameters,
things like zero-bandwidth-clears (I guess the same or similar to
fast-clear colors?), hiz-like data, device-local properties such as
caches, or locality information could/will be also included in a
capability set. We are even considering encoding some sort of usage
transition information in the capability set itself.


Thanks,
Miguel.
Rob Clark
2018-01-03 14:53:06 UTC
Permalink
Post by Miguel Angel Vico
(Adding dri-devel back, and trying to respond to some comments from
the different forks)
Post by James Jones
Your worst case analysis above isn't far off from our HW, give or take
some bits and axes here and there. We've started an internal discussion
about how to lay out all the bits we need. It's hard to even enumerate
them all without having a complete understanding of what capability sets
are going to include, a fully-optimized implementation of the mechanism
on our HW, and lot's of test scenarios though.
(thanks James for most of the info below)
To elaborate a bit, if we want to share an allocation across GPUs for 3D
rendering, it seems we would need 12 bits to express our
swizzling/tiling memory layouts for fermi+. In addition to that,
maxwell uses 3 more bits for this, and we need an extra bit to identify
pre-fermi representations.
We also need one bit to differentiate between Tegra and desktop, and
another one to indicate whether the layout is otherwise linear.
Then things like whether compression is used (one more bit), and we can
probably get by with 3 bits for the type of compression if we are
creative. However, it'd be way easier to just track arch + page kind,
which would be like 32 bits on its own.
Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
bits.
If device-local properties are included, we might need a couple more
bits for caching.
We may also need to express locality information, which may take at
least another 2 or 3 bits.
If we want to share array textures too, you also need to pass the array
pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
its own.
So yes, as James mentioned, with some effort, we could technically fit
our current allocation parameters in a modifier, but I'm still not
convinced this is as future proof as it could be as our hardware grows
in capabilities.
Post by James Jones
So I reflexively
get a bit itchy when I see the kernel being used to transit magic
blobs of data which are supplied by userspace, and only interpreted by
different userspace. Having tiling formats hidden away means that
we've had real-world bugs in AMD hardware, where we end up displaying
garbage because we cannot generically reason about the buffer
attributes.
I'm a bit confused. Can't modifiers be specified by vendors and only
interpreted by drivers? My understanding was that modifiers could
actually be treated as opaque 64-bit data, in which case they would
qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
scalable. What am I missing?
Post by James Jones
I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward. In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.
Not sure whether I might be misunderstanding your statement, but one of
the allocator main features is negotiation of nearly optimal allocation
parameters given a set of uses on different devices/engines by the
capability merge operation. A client should have queried what every
device/engine is capable of for the given uses, find the optimal set of
capabilities, and use it for allocating a buffer. At the moment these
parameters are given to KMS, they are expected to be good. If they
aren't, the client didn't do things right.
Post by James Jones
It does seem like, if possible, starting out with modifiers for now at
the kernel interface would make life easier, vs trying to reinvent
both kernel and userspace APIs at the same time. Userspace APIs are
easier to change or throw away. Presumably by the time we get to the
point of changing kernel uabi, we are already using, and pretty happy
with, serialized liballoc data over the wire in userspace so it is
only a matter of changing the kernel interface.
I guess we can indeed start with modifiers for now, if that's what it
takes to get the allocator mechanisms rolling. However, it seems to me
that we won't be able to encode the same type of information included
in capability sets with modifiers in all cases. For instance, if we end
up encoding usage transition information in capability sets, how that
would translate to modifiers?
I assume display doesn't really care about a lot of the data capability
sets may encode, but is it correct to think of modifiers as things only
display needs? If we are to treat modifiers as a first-class citizen, I
would expect to use them beyond that.
btw, the places where modifiers are used currently is limited to 2d
textures, without mipmap levels. Basically scanout buffers, winsys
buffers, decoded frames of video, and that sort of thing. I think we
can keep it that way, which avoids needing to encode additional info
(layer pitch, z tiling info for 3d textures, or whatever else).

So we just need to have something in userspace that translates the
relevant subset of capability set info to modifiers.

Maybe down the road, if capability sets are ubiquitous we can
"promote" that mechanism to kernel uabi.. although tbh I am not
entirely sure I can envision a use-case where kernel needs to know
about a cubemap array texture.

BR,
-R
Post by Miguel Angel Vico
Post by James Jones
I agree and let me elaborate a bit. The problem we're seeing isn't that we
need more that 2^56 modifiers for a future GPU. The problem is that flags
like USE_SCANOUT (which your allocator proposal essentially keeps) is
inadequate. The available tiling and compression formats vary with which
(in KMS terms) CRTC you want to use, which plane you're on whether you want
rotation or no and how much you want to scale etc. It's not realistic to
think that we could model this in a centralized allocator library that's
detached from the display driver. To be fair, this is not a point about
blobs vs modifiers, it's saying that the use flags don't belong in the
allocator, they belong in the APIs that will be using the buffer - and not
as literal use flags, but as a way to discover supported modifiers for a
given use case.
Why detached from the display driver? I don't see why there couldn't be
an allocator driver with access to display capabilities that can be
used in the negotiation step to find the optimal set of allocation
parameters.
Post by James Jones
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
If someone has N knobs available, I don't understand why there
shouldn't be a mechanism that allows making use of them all, regardless
of performance numbers.
Post by James Jones
Yeah, that part was all clear. I'd want more details of what exact
kind of metadata. fast-clear colors? tiling layouts? aux data for the
compressor? hiz (or whatever you folks call it) tree?
As you say, we've discussed massive amounts of different variants on
this, and there's different answers for different questions. Consensus
seems to be that bigger stuff (compression data, hiz, clear colors,
...) should be stored in aux planes, while the exact layout and what
kind of aux planes you have are encoded in the modifier.
My understanding is that capability sets may include all metadata you
mentioned. Besides tiling/swizzling layout and compression parameters,
things like zero-bandwidth-clears (I guess the same or similar to
fast-clear colors?), hiz-like data, device-local properties such as
caches, or locality information could/will be also included in a
capability set. We are even considering encoding some sort of usage
transition information in the capability set itself.
Thanks,
Miguel.
James Jones
2018-01-03 19:26:33 UTC
Permalink
Post by Miguel Angel Vico
(Adding dri-devel back, and trying to respond to some comments from
the different forks)
Post by James Jones
Your worst case analysis above isn't far off from our HW, give or take
some bits and axes here and there. We've started an internal discussion
about how to lay out all the bits we need. It's hard to even enumerate
them all without having a complete understanding of what capability sets
are going to include, a fully-optimized implementation of the mechanism
on our HW, and lot's of test scenarios though.
(thanks James for most of the info below)
To elaborate a bit, if we want to share an allocation across GPUs for 3D
rendering, it seems we would need 12 bits to express our
swizzling/tiling memory layouts for fermi+. In addition to that,
maxwell uses 3 more bits for this, and we need an extra bit to identify
pre-fermi representations.
We also need one bit to differentiate between Tegra and desktop, and
another one to indicate whether the layout is otherwise linear.
Then things like whether compression is used (one more bit), and we can
probably get by with 3 bits for the type of compression if we are
creative. However, it'd be way easier to just track arch + page kind,
which would be like 32 bits on its own.
Not clear if this is an NV-only term, so for those not familiar, page
kind is very loosely the equivalent of a format modifier our HW uses
internally in its memory management subsystem. The value mappings vary
a bit for each HW generation.
Post by Miguel Angel Vico
Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
bits.
If device-local properties are included, we might need a couple more
bits for caching.
We may also need to express locality information, which may take at
least another 2 or 3 bits.
If we want to share array textures too, you also need to pass the array
pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
its own.
So yes, as James mentioned, with some effort, we could technically fit
our current allocation parameters in a modifier, but I'm still not
convinced this is as future proof as it could be as our hardware grows
in capabilities.
Post by James Jones
So I reflexively
get a bit itchy when I see the kernel being used to transit magic
blobs of data which are supplied by userspace, and only interpreted by
different userspace. Having tiling formats hidden away means that
we've had real-world bugs in AMD hardware, where we end up displaying
garbage because we cannot generically reason about the buffer
attributes.
I'm a bit confused. Can't modifiers be specified by vendors and only
interpreted by drivers? My understanding was that modifiers could
actually be treated as opaque 64-bit data, in which case they would
qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
scalable. What am I missing?
Post by James Jones
I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward. In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.
Not sure whether I might be misunderstanding your statement, but one of
the allocator main features is negotiation of nearly optimal allocation
parameters given a set of uses on different devices/engines by the
capability merge operation. A client should have queried what every
device/engine is capable of for the given uses, find the optimal set of
capabilities, and use it for allocating a buffer. At the moment these
parameters are given to KMS, they are expected to be good. If they
aren't, the client didn't do things right.
Post by James Jones
It does seem like, if possible, starting out with modifiers for now at
the kernel interface would make life easier, vs trying to reinvent
both kernel and userspace APIs at the same time. Userspace APIs are
easier to change or throw away. Presumably by the time we get to the
point of changing kernel uabi, we are already using, and pretty happy
with, serialized liballoc data over the wire in userspace so it is
only a matter of changing the kernel interface.
I guess we can indeed start with modifiers for now, if that's what it
takes to get the allocator mechanisms rolling. However, it seems to me
that we won't be able to encode the same type of information included
in capability sets with modifiers in all cases. For instance, if we end
up encoding usage transition information in capability sets, how that
would translate to modifiers?
I assume display doesn't really care about a lot of the data capability
sets may encode, but is it correct to think of modifiers as things only
display needs? If we are to treat modifiers as a first-class citizen, I
would expect to use them beyond that.
Right, this becomes a lot more interesting when modifiers or capability
sets start getting used to share things from Vulkan<->Vulkan, for
example. Of course, we don't need to change kernel ABIs for that, but
wayland protocols, Vulkan extensions, etc. might need modification.
Regardless, I agree with Miguel's sentiment. Let's at least defer this
debate a bit until we know more about what capability sets look like.
If modifiers alone still seem sufficient, so be it.
Post by Miguel Angel Vico
Post by James Jones
I agree and let me elaborate a bit. The problem we're seeing isn't that we
need more that 2^56 modifiers for a future GPU. The problem is that flags
like USE_SCANOUT (which your allocator proposal essentially keeps) is
inadequate. The available tiling and compression formats vary with which
(in KMS terms) CRTC you want to use, which plane you're on whether you want
rotation or no and how much you want to scale etc. It's not realistic to
think that we could model this in a centralized allocator library that's
detached from the display driver. To be fair, this is not a point about
blobs vs modifiers, it's saying that the use flags don't belong in the
allocator, they belong in the APIs that will be using the buffer - and not
as literal use flags, but as a way to discover supported modifiers for a
given use case.
Why detached from the display driver? I don't see why there couldn't be
an allocator driver with access to display capabilities that can be
used in the negotiation step to find the optimal set of allocation
parameters.
In addition, speaking to some other portions of your response, most of
the usage in the prototype is placeholder stuff for testing.
USE_SCANNOUT is partially expanded to include orientation as well, which
helps in some cases on our hardware. If there's more complex stuff for
other display hardware, it needs to be expanded further, or that HW is
free to expose a vendor-specific usage, since usage is extensible. It's
easy to mirror in all the relevant usage flags from other APIs or
engines too. That's a rather small amount of duplication.

The important part is the logic that selects optimal usage. I don't
think it's possible to select optimal usage with the queries spread
around all the APIs. Vulkan isn't going to know about video encode
usage. In many situations it won't know about display usage. It just
knows optimal texture/render usage. Therefore it can't optimize
parameters for usage it doesn't know about it. A centralized allocator
can, especially when all the usage ends up delegated to a single
device/GPU. It will have all the same information available to it on
the back end because it can access DRM devices, v4l devices, etc. to
query their capabilities via allocator backends, but it can have more
information available on the front end from the app, and a more complete
solution returned from a driver that is able to parse and consider that
additional information.

Additionally, I again offer the goal of an optimal gralloc
implementation built on top of the allocator mechanism. I find it
difficult to imagine building gralloc on top of Vulkan or EGL and DRM.
Does such a solution seem feasible to you? I've not researched this
significantly myself, but Google Android engineers shared that concern
when we had the initial discussions at XDC 2016.
Post by Miguel Angel Vico
Post by James Jones
I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
If someone has N knobs available, I don't understand why there
shouldn't be a mechanism that allows making use of them all, regardless
of performance numbers.
Post by James Jones
Yeah, that part was all clear. I'd want more details of what exact
kind of metadata. fast-clear colors? tiling layouts? aux data for the
compressor? hiz (or whatever you folks call it) tree?
As you say, we've discussed massive amounts of different variants on
this, and there's different answers for different questions. Consensus
seems to be that bigger stuff (compression data, hiz, clear colors,
...) should be stored in aux planes, while the exact layout and what
kind of aux planes you have are encoded in the modifier.
My understanding is that capability sets may include all metadata you
mentioned. Besides tiling/swizzling layout and compression parameters,
things like zero-bandwidth-clears (I guess the same or similar to
fast-clear colors?), hiz-like data, device-local properties such as
caches, or locality information could/will be also included in a
capability set. We are even considering encoding some sort of usage
transition information in the capability set itself.
I think there's some nuance here. The format of compression metadata
would clearly be a capability set thing. The compression data itself
would indeed be in some auxiliary surface on most/all hardware. Things
like fast clears are harder to nail down because implementations seem
more varied there. It might be very awkward on some hardware to put the
necessary meta-data in a DRM FB plane, while that might be the only
reasonable way to accomplish it on other hardware. I think we'll have
to work through some corner cases across lots of hardware before this
bottoms out.

Thanks,
-James
Post by Miguel Angel Vico
Thanks,
Miguel.
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Kristian Kristensen
2018-01-03 20:36:51 UTC
Permalink
Post by Miguel Angel Vico
(Adding dri-devel back, and trying to respond to some comments from
the different forks)
Your worst case analysis above isn't far off from our HW, give or take
Post by James Jones
some bits and axes here and there. We've started an internal discussion
about how to lay out all the bits we need. It's hard to even enumerate
them all without having a complete understanding of what capability sets
are going to include, a fully-optimized implementation of the mechanism
on our HW, and lot's of test scenarios though.
(thanks James for most of the info below)
To elaborate a bit, if we want to share an allocation across GPUs for 3D
rendering, it seems we would need 12 bits to express our
swizzling/tiling memory layouts for fermi+. In addition to that,
maxwell uses 3 more bits for this, and we need an extra bit to identify
pre-fermi representations.
We also need one bit to differentiate between Tegra and desktop, and
another one to indicate whether the layout is otherwise linear.
Then things like whether compression is used (one more bit), and we can
probably get by with 3 bits for the type of compression if we are
creative. However, it'd be way easier to just track arch + page kind,
which would be like 32 bits on its own.
Not clear if this is an NV-only term, so for those not familiar, page kind
is very loosely the equivalent of a format modifier our HW uses internally
in its memory management subsystem. The value mappings vary a bit for each
HW generation.
Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
Post by Miguel Angel Vico
bits.
If device-local properties are included, we might need a couple more
bits for caching.
We may also need to express locality information, which may take at
least another 2 or 3 bits.
If we want to share array textures too, you also need to pass the array
pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
its own.
So yes, as James mentioned, with some effort, we could technically fit
our current allocation parameters in a modifier, but I'm still not
convinced this is as future proof as it could be as our hardware grows
in capabilities.
So I reflexively
Post by James Jones
get a bit itchy when I see the kernel being used to transit magic
blobs of data which are supplied by userspace, and only interpreted by
different userspace. Having tiling formats hidden away means that
we've had real-world bugs in AMD hardware, where we end up displaying
garbage because we cannot generically reason about the buffer
attributes.
I'm a bit confused. Can't modifiers be specified by vendors and only
interpreted by drivers? My understanding was that modifiers could
actually be treated as opaque 64-bit data, in which case they would
qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
scalable. What am I missing?
I think in the interim figuring out how to expose kms capabilities
Post by James Jones
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward. In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.
Not sure whether I might be misunderstanding your statement, but one of
the allocator main features is negotiation of nearly optimal allocation
parameters given a set of uses on different devices/engines by the
capability merge operation. A client should have queried what every
device/engine is capable of for the given uses, find the optimal set of
capabilities, and use it for allocating a buffer. At the moment these
parameters are given to KMS, they are expected to be good. If they
aren't, the client didn't do things right.
It does seem like, if possible, starting out with modifiers for now at
Post by James Jones
the kernel interface would make life easier, vs trying to reinvent
both kernel and userspace APIs at the same time. Userspace APIs are
easier to change or throw away. Presumably by the time we get to the
point of changing kernel uabi, we are already using, and pretty happy
with, serialized liballoc data over the wire in userspace so it is
only a matter of changing the kernel interface.
I guess we can indeed start with modifiers for now, if that's what it
takes to get the allocator mechanisms rolling. However, it seems to me
that we won't be able to encode the same type of information included
in capability sets with modifiers in all cases. For instance, if we end
up encoding usage transition information in capability sets, how that
would translate to modifiers?
I assume display doesn't really care about a lot of the data capability
sets may encode, but is it correct to think of modifiers as things only
display needs? If we are to treat modifiers as a first-class citizen, I
would expect to use them beyond that.
Right, this becomes a lot more interesting when modifiers or capability
sets start getting used to share things from Vulkan<->Vulkan, for example.
Of course, we don't need to change kernel ABIs for that, but wayland
protocols, Vulkan extensions, etc. might need modification. Regardless, I
agree with Miguel's sentiment. Let's at least defer this debate a bit
until we know more about what capability sets look like. If modifiers alone
still seem sufficient, so be it.
Modifers aren't display only, but I suppose they are 2D color buffer only -
no mip maps, texture arrays, cube maps etc. But within that scope, they
should provide a mechanism for negotiating the optimal layout for a given
use case.

Another point here is that the modifier doesn't need to encode all the
thing you have to communicate to the HW. For a given width, height, format,
compression type and maybe a few other high-level parameters, I'm skeptical
that the remaining tile parameters aren't just mechanically derivable using
a fixed table or formula. So instead of thinking of the modifiers as
something you can just memcpy into a state packet, it identifies a family
of configurations - enough information to deterministically derive the full
exact configuration. The formula may change, for example for different
hardware or if it's determined to not be optimal, and in that case, we can
use a new modifier to represent to new formula.
Post by Miguel Angel Vico
I agree and let me elaborate a bit. The problem we're seeing isn't that we
Post by James Jones
need more that 2^56 modifiers for a future GPU. The problem is that flags
like USE_SCANOUT (which your allocator proposal essentially keeps) is
inadequate. The available tiling and compression formats vary with which
(in KMS terms) CRTC you want to use, which plane you're on whether you want
rotation or no and how much you want to scale etc. It's not realistic to
think that we could model this in a centralized allocator library that's
detached from the display driver. To be fair, this is not a point about
blobs vs modifiers, it's saying that the use flags don't belong in the
allocator, they belong in the APIs that will be using the buffer - and not
as literal use flags, but as a way to discover supported modifiers for a
given use case.
Why detached from the display driver? I don't see why there couldn't be
an allocator driver with access to display capabilities that can be
used in the negotiation step to find the optimal set of allocation
parameters.
In addition, speaking to some other portions of your response, most of the
usage in the prototype is placeholder stuff for testing. USE_SCANNOUT is
partially expanded to include orientation as well, which helps in some
cases on our hardware. If there's more complex stuff for other display
hardware, it needs to be expanded further, or that HW is free to expose a
vendor-specific usage, since usage is extensible. It's easy to mirror in
all the relevant usage flags from other APIs or engines too. That's a
rather small amount of duplication.
I understand that it's an incomplete example, but even so I don't think
this duplication is feasible. It's not a matter of how many use cases we
have to duplicate at this point in time, it's that all these APIs are live,
evolving APIs and keeping the allocator uptodate as various APIs grow new
corner cases doesn't seem practical. Further, it's not orthogonal or
composable - the allocator has to know about all producers and consumers
and if I add a new piece of hardware I have to extend the allocator to
understands its new use cases. With the modifier model, I just ask the new
driver which modifiers it supports for the use case I'm interested in and
feed those modifiers to the allocator.
The important part is the logic that selects optimal usage. I don't think
it's possible to select optimal usage with the queries spread around all
the APIs. Vulkan isn't going to know about video encode usage. In many
situations it won't know about display usage. It just knows optimal
texture/render usage. Therefore it can't optimize parameters for usage it
doesn't know about it. A centralized allocator can, especially when all
the usage ends up delegated to a single device/GPU. It will have all the
same information available to it on the back end because it can access DRM
devices, v4l devices, etc. to query their capabilities via allocator
backends, but it can have more information available on the front end from
the app, and a more complete solution returned from a driver that is able
to parse and consider that additional information.
Vulkan isn't expected to know about video encode usage. You ask the video
codec about supported modifiers for encode and you ask Vulkan for supported
modifiers for, say optimal render usage. The allocator determines the
optimal lowest common denominator and allocates the buffer. Maybe that's
linear, or if you've designed both parts, maybe there's a simple shared
tiled format that the encoder can source from.
Additionally, I again offer the goal of an optimal gralloc implementation
built on top of the allocator mechanism. I find it difficult to imagine
building gralloc on top of Vulkan or EGL and DRM. Does such a solution seem
feasible to you? I've not researched this significantly myself, but Google
Android engineers shared that concern when we had the initial discussions
at XDC 2016.
Post by Miguel Angel Vico
I understand that you may have n knobs with a total of more than a total
Post by James Jones
of
56 bits that configure your tiling/swizzling for color buffers. What I don't
buy is that you need all those combinations when passing buffers around
between codecs, cameras and display controllers. Even if you're sharing
between the same 3D drivers in different processes, I expect just locking
down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.
If someone has N knobs available, I don't understand why there
shouldn't be a mechanism that allows making use of them all, regardless
of performance numbers.
Yeah, that part was all clear. I'd want more details of what exact
Post by James Jones
kind of metadata. fast-clear colors? tiling layouts? aux data for the
compressor? hiz (or whatever you folks call it) tree?
As you say, we've discussed massive amounts of different variants on
this, and there's different answers for different questions. Consensus
seems to be that bigger stuff (compression data, hiz, clear colors,
...) should be stored in aux planes, while the exact layout and what
kind of aux planes you have are encoded in the modifier.
My understanding is that capability sets may include all metadata you
mentioned. Besides tiling/swizzling layout and compression parameters,
things like zero-bandwidth-clears (I guess the same or similar to
fast-clear colors?), hiz-like data, device-local properties such as
caches, or locality information could/will be also included in a
capability set. We are even considering encoding some sort of usage
transition information in the capability set itself.
I think there's some nuance here. The format of compression metadata
would clearly be a capability set thing. The compression data itself would
indeed be in some auxiliary surface on most/all hardware. Things like fast
clears are harder to nail down because implementations seem more varied
there. It might be very awkward on some hardware to put the necessary
meta-data in a DRM FB plane, while that might be the only reasonable way to
accomplish it on other hardware. I think we'll have to work through some
corner cases across lots of hardware before this bottoms out.
For modifiers and liballocator as well, the meta data is copied by value
(and passed through IPC) and as such can't model shared mutable
information. That means, fast colors, compression aux buffers and such, has
to be in a share BO plane.

Kristian
Thanks,
-James
Thanks,
Post by Miguel Angel Vico
Miguel.
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Daniel Vetter
2018-01-08 09:35:37 UTC
Permalink
Just wanted to clarify this one thing here, otherwise I think Rob/krh
covered it all.
Post by Miguel Angel Vico
Post by Daniel Vetter
I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward. In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.
Not sure whether I might be misunderstanding your statement, but one of
the allocator main features is negotiation of nearly optimal allocation
parameters given a set of uses on different devices/engines by the
capability merge operation. A client should have queried what every
device/engine is capable of for the given uses, find the optimal set of
capabilities, and use it for allocating a buffer. At the moment these
parameters are given to KMS, they are expected to be good. If they
aren't, the client didn't do things right.
Your example code has a new capability for PITCH_ALIGNMENT. That looks
wrong for addfb (which should only received the the computed intersection
of all requirements, not the requirements itself). And since that was the
only thing in your example code besides the bare boilerplate to wire it
all up it looks a bit confused.

Maybe we need to distinguish capabilities into constraints on properties
(like pitch alignment, or power-of-two pitch) and properties (like pitch)
themselves.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Miguel Angel Vico
2018-01-16 18:41:34 UTC
Permalink
Hi,

Besides the DRM modifiers discussion in the other forks or this thread
(I should've probably started separate threads), has anyone gotten the
chance to look at least at the mesa changes and allocator changes I
shared below?

With respect to Mesa changes, I think it might be worth merging the
EXT_external_objects nouveau implementation upstream. Should I just
send the list of patches as a formal RFR?

With respect to Allocator changes, it'd be nice getting someone else's
(out of NVIDIA) feedback.

Thanks.

On Wed, 20 Dec 2017 08:51:51 -0800
Post by Miguel Angel Vico
Hi all,
As many of you already know, I've been working with James Jones on the
Generic Device Allocator project lately. He started a discussion thread
some weeks ago seeking feedback on the current prototype of the library
and advice on how to move all this forward, from a prototype stage to
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level
Usage transitions. That's something I'll personally work on during the
following weeks.
In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to
** Allocator **
An allocator driver implementation on top of Nouveau. The current
implementation only handles pitch linear layouts, but that's enough
to have the kmscube port working using the allocator and Nouveau
drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the
EXT_external_objects extension to import allocator allocations to
OpenGL as a texture object. However, the Nouveau implementation of
these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import
extension to attach allocator metadata to texture objects so the
driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now,
it just serves as an experimental mechanism to import allocator
memory in OpenGL, and attach metadata to texture objects.
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the
allocator. Main thing is the allocator initialization path will use
EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
by the underlying EGL implementation.
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the
allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.
A) Have vendor-private ioctls to set properties on GEM objects that
are inherited by the FB objects. This is how our (NVIDIA) desktop
DRM driver currently works. This would require every vendor to add
their own ioctl to process allocator metadata, but the metadata is
actually a vendor-agnostic object more like DRM modifiers. We'd
like to come up with a vendor-agnostic solutions that can be
integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator
metadata blobs for each plane of the FB. Some people in the
community have mentioned this is their preferred design. This,
however, means we'd have to go through the exercise of adding
another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate
plane in the image, and using the existing DRM modifiers mechanism
to indicate there is another plane for each "real" plane added. It
isn't clear how this scales to surfaces that already need several
planes, but there are some people that see this as the only way
forward. Also, we would have to create a separate GEM buffer for
the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to
prototype the new path (which is actually very similar to the
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could be
a better choice than the above, so any suggestion will be greatly
appreciated.
Thanks,
--
Miguel
Loading...