Discussion:
[Mesa-dev] [RFC PATCH 00/14] Do not use userptr in anv if softpin is available.
Rafael Antognolli
2018-12-08 00:05:39 UTC
Permalink
This series changes anv_block_pool to use a list of BO's instead of a
single BO that gets reallocated and set with userptr.

The main changes are:
- The introduction of anv_state_table to track anv_states, and
recycle them;
- Addition of a list of BOs in anv_block_pool, instead of a single
BO;
- Forcing allocations to not cross boundaries between the previous
size of a block_pool and the new one;
- And pushing back the "padding" required to avoid such boundaries
cross to the pool of free anv_states.

I'm still working on increasing the test coverage (adding new tests for
some new cases we have now), but the series seems reasonable imho to
start getting some review already.

Cc: Jason Ekstrand <***@jlekstrand.net>

Rafael Antognolli (14):
anv/tests: Fix block_pool_no_free test.
anv/allocator: Add anv_state_table.
anv/allocator: Use anv_state_table on anv_state_pool_alloc.
anv/allocator: Use anv_state_table on back_alloc too.
anv/allocator: Remove usage of anv_free_list.
anv/allocator: Add getters for anv_block_pool.
anv: Update usage of block_pool->bo.
anv/allocator: Add support for a list of BOs in block pool.
anv: Validate the list of BOs from the block pool.
anv: Add clflush to states.
anv: Remove some asserts.
anv/allocator: Rework chunk return to the state pool.
anv/allocator: Add padding information.
anv/allocator: Add support for non-userptr.

src/intel/vulkan/anv_allocator.c | 741 ++++++++++++++++----
src/intel/vulkan/anv_batch_chain.c | 62 +-
src/intel/vulkan/anv_blorp.c | 2 +-
src/intel/vulkan/anv_private.h | 73 +-
src/intel/vulkan/gen8_cmd_buffer.c | 6 +-
src/intel/vulkan/genX_blorp_exec.c | 9 +-
src/intel/vulkan/genX_cmd_buffer.c | 20 +-
src/intel/vulkan/genX_gpu_memcpy.c | 3 -
src/intel/vulkan/tests/block_pool_no_free.c | 19 +-
9 files changed, 740 insertions(+), 195 deletions(-)
--
2.17.1
Rafael Antognolli
2018-12-08 00:05:48 UTC
Permalink
We now have multiple BOs in the block pool, but sometimes we still
reference only the first one in some instructions, and use relative
offsets in others. So we must be sure to add all the BOs from the block
pool to the validation list when submitting commands.
---
src/intel/vulkan/anv_batch_chain.c | 47 ++++++++++++++++++++++++++----
1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/anv_batch_chain.c
index bec4d647b7e..65df28ccb91 100644
--- a/src/intel/vulkan/anv_batch_chain.c
+++ b/src/intel/vulkan/anv_batch_chain.c
@@ -1356,6 +1356,36 @@ relocate_cmd_buffer(struct anv_cmd_buffer *cmd_buffer,
return true;
}

+static void
+anv_reloc_list_add_dep(struct anv_cmd_buffer *cmd_buffer,
+ struct anv_bo_list *bo_list)
+{
+ struct anv_bo_list *iter;
+ struct anv_bo *bo;
+ struct anv_reloc_list *relocs = cmd_buffer->batch.relocs;
+
+ anv_block_pool_foreach_bo(bo_list, iter, bo) {
+ _mesa_set_add(relocs->deps, bo);
+ }
+}
+
+static void
+anv_batch_bos_add(struct anv_cmd_buffer *cmd_buffer)
+{
+ struct anv_bo_list *bo_list;
+
+ bo_list = cmd_buffer->device->dynamic_state_pool.block_pool.bos;
+ anv_reloc_list_add_dep(cmd_buffer, bo_list);
+
+ bo_list = cmd_buffer->device->instruction_state_pool.block_pool.bos;
+ anv_reloc_list_add_dep(cmd_buffer, bo_list);
+
+ if (cmd_buffer->device->instance->physicalDevice.use_softpin) {
+ bo_list = cmd_buffer->device->binding_table_pool.block_pool.bos;
+ anv_reloc_list_add_dep(cmd_buffer, bo_list);
+ }
+}
+
static VkResult
setup_execbuf_for_cmd_buffer(struct anv_execbuf *execbuf,
struct anv_cmd_buffer *cmd_buffer)
@@ -1364,13 +1394,20 @@ setup_execbuf_for_cmd_buffer(struct anv_execbuf *execbuf,
struct anv_state_pool *ss_pool =
&cmd_buffer->device->surface_state_pool;

+ anv_batch_bos_add(cmd_buffer);
+
adjust_relocations_from_state_pool(ss_pool, &cmd_buffer->surface_relocs,
cmd_buffer->last_ss_pool_center);
- VkResult result = anv_execbuf_add_bo(execbuf, ss_pool->block_pool.bo,
- &cmd_buffer->surface_relocs, 0,
- &cmd_buffer->device->alloc);
- if (result != VK_SUCCESS)
- return result;
+ VkResult result;
+ struct anv_bo *bo;
+ struct anv_bo_list *iter;
+ anv_block_pool_foreach_bo(ss_pool->block_pool.bos, iter, bo) {
+ result = anv_execbuf_add_bo(execbuf, bo,
+ &cmd_buffer->surface_relocs, 0,
+ &cmd_buffer->device->alloc);
+ if (result != VK_SUCCESS)
+ return result;
+ }

/* First, we walk over all of the bos we've seen and add them and their
* relocations to the validate list.
--
2.17.1
Jason Ekstrand
2018-12-10 19:49:43 UTC
Permalink
On Fri, Dec 7, 2018 at 6:06 PM Rafael Antognolli <
Post by Rafael Antognolli
We now have multiple BOs in the block pool, but sometimes we still
reference only the first one in some instructions, and use relative
offsets in others. So we must be sure to add all the BOs from the block
pool to the validation list when submitting commands.
---
src/intel/vulkan/anv_batch_chain.c | 47 ++++++++++++++++++++++++++----
1 file changed, 42 insertions(+), 5 deletions(-)
diff --git a/src/intel/vulkan/anv_batch_chain.c
b/src/intel/vulkan/anv_batch_chain.c
index bec4d647b7e..65df28ccb91 100644
--- a/src/intel/vulkan/anv_batch_chain.c
+++ b/src/intel/vulkan/anv_batch_chain.c
@@ -1356,6 +1356,36 @@ relocate_cmd_buffer(struct anv_cmd_buffer *cmd_buffer,
return true;
}
+static void
+anv_reloc_list_add_dep(struct anv_cmd_buffer *cmd_buffer,
+ struct anv_bo_list *bo_list)
+{
+ struct anv_bo_list *iter;
+ struct anv_bo *bo;
+ struct anv_reloc_list *relocs = cmd_buffer->batch.relocs;
+
+ anv_block_pool_foreach_bo(bo_list, iter, bo) {
+ _mesa_set_add(relocs->deps, bo);
+ }
+}
+
+static void
+anv_batch_bos_add(struct anv_cmd_buffer *cmd_buffer)
+{
+ struct anv_bo_list *bo_list;
+
+ bo_list = cmd_buffer->device->dynamic_state_pool.block_pool.bos;
+ anv_reloc_list_add_dep(cmd_buffer, bo_list);
+
+ bo_list = cmd_buffer->device->instruction_state_pool.block_pool.bos;
+ anv_reloc_list_add_dep(cmd_buffer, bo_list);
+
+ if (cmd_buffer->device->instance->physicalDevice.use_softpin) {
+ bo_list = cmd_buffer->device->binding_table_pool.block_pool.bos;
+ anv_reloc_list_add_dep(cmd_buffer, bo_list);
I don't think want to add things to the command buffer's dependency set
this late (at submit time). Instead, I think we want to just do
anv_execbuf_add_bo for each of them directly at the top of
setup_execbuf_for_cmd_buffer.
Post by Rafael Antognolli
+ }
+}
+
static VkResult
setup_execbuf_for_cmd_buffer(struct anv_execbuf *execbuf,
struct anv_cmd_buffer *cmd_buffer)
@@ -1364,13 +1394,20 @@ setup_execbuf_for_cmd_buffer(struct anv_execbuf *execbuf,
struct anv_state_pool *ss_pool =
&cmd_buffer->device->surface_state_pool;
+ anv_batch_bos_add(cmd_buffer);
+
adjust_relocations_from_state_pool(ss_pool,
&cmd_buffer->surface_relocs,
cmd_buffer->last_ss_pool_center);
- VkResult result = anv_execbuf_add_bo(execbuf, ss_pool->block_pool.bo,
- &cmd_buffer->surface_relocs, 0,
- &cmd_buffer->device->alloc);
- if (result != VK_SUCCESS)
- return result;
+ VkResult result;
+ struct anv_bo *bo;
+ struct anv_bo_list *iter;
+ anv_block_pool_foreach_bo(ss_pool->block_pool.bos, iter, bo) {
+ result = anv_execbuf_add_bo(execbuf, bo,
+ &cmd_buffer->surface_relocs, 0,
+ &cmd_buffer->device->alloc);
I don't think we want to pass the relocation list on every BO. Instead, we
should have a softpin case where we walk the list and don't provide any
relocations and a non-softpin case where we assert that there is only one
BO and do provide the relocations.
Post by Rafael Antognolli
+ if (result != VK_SUCCESS)
+ return result;
+ }
/* First, we walk over all of the bos we've seen and add them and their
* relocations to the validate list.
--
2.17.1
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Rafael Antognolli
2018-12-08 00:05:45 UTC
Permalink
We will need specially the anv_block_pool_map, to find the
map relative to some BO that is not at the start of the block pool.
---
src/intel/vulkan/anv_allocator.c | 23 ++++++++++++++++++++---
src/intel/vulkan/anv_batch_chain.c | 5 +++--
src/intel/vulkan/anv_private.h | 7 +++++++
src/intel/vulkan/genX_blorp_exec.c | 5 +++--
4 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index cda6a1a9d25..acf3c80fbac 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -601,6 +601,15 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
return VK_SUCCESS;
}

+struct anv_pool_map
+anv_block_pool_map(struct anv_block_pool *pool, int32_t offset)
+{
+ return (struct anv_pool_map) {
+ .map = pool->map,
+ .offset = offset,
+ };
+}
+
/** Grows and re-centers the block pool.
*
* We grow the block pool in one or both directions in such a way that the
@@ -967,7 +976,9 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
st_idx + i);
state_i->alloc_size = pool->block_size;
state_i->offset = chunk_offset + pool->block_size * (i + 1);
- state_i->map = pool->block_pool.map + state_i->offset;
+ struct anv_pool_map pool_map = anv_block_pool_map(&pool->block_pool,
+ state_i->offset);
+ state_i->map = pool_map.map + pool_map.offset;
}
anv_state_table_push(&pool->buckets[block_bucket].free_list,
&pool->table, st_idx, push_back);
@@ -983,7 +994,9 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
st_idx + i);
state_i->alloc_size = alloc_size;
state_i->offset = chunk_offset + alloc_size * (i + 1);
- state_i->map = pool->block_pool.map + state_i->offset;
+ struct anv_pool_map pool_map = anv_block_pool_map(&pool->block_pool,
+ state_i->offset);
+ state_i->map = pool_map.map + pool_map.offset;
}
anv_state_table_push(&pool->buckets[bucket].free_list,
&pool->table, st_idx, push_back);
@@ -1002,7 +1015,11 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
state = anv_state_table_get(&pool->table, idx);
state->offset = offset;
state->alloc_size = alloc_size;
- state->map = pool->block_pool.map + offset;
+
+ struct anv_pool_map pool_map = anv_block_pool_map(&pool->block_pool,
+ state->offset);
+ state->map = pool_map.map + pool_map.offset;
+

done:
return *state;
diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/anv_batch_chain.c
index a9f8c5b79b1..6c06858efe1 100644
--- a/src/intel/vulkan/anv_batch_chain.c
+++ b/src/intel/vulkan/anv_batch_chain.c
@@ -679,8 +679,9 @@ anv_cmd_buffer_alloc_binding_table(struct anv_cmd_buffer *cmd_buffer,
return (struct anv_state) { 0 };

state.offset = cmd_buffer->bt_next;
- state.map = anv_binding_table_pool(device)->block_pool.map +
- bt_block->offset + state.offset;
+ struct anv_pool_map pool_map =
+ anv_block_pool_map(&anv_binding_table_pool(device)->block_pool, bt_block->offset + state.offset);
+ state.map = pool_map.map + pool_map.offset;

cmd_buffer->bt_next += state.alloc_size;

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 539523450ef..a364be8dad5 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -749,6 +749,11 @@ struct anv_state_stream {
struct anv_state_stream_block *block_list;
};

+struct anv_pool_map {
+ void *map;
+ int32_t offset;
+};
+
/* The block_pool functions exported for testing only. The block pool should
* only be used via a state pool (see below).
*/
@@ -762,6 +767,8 @@ int32_t anv_block_pool_alloc(struct anv_block_pool *pool,
uint32_t block_size);
int32_t anv_block_pool_alloc_back(struct anv_block_pool *pool,
uint32_t block_size);
+struct anv_pool_map anv_block_pool_map(struct anv_block_pool *pool,
+ int32_t offset);

VkResult anv_state_pool_init(struct anv_state_pool *pool,
struct anv_device *device,
diff --git a/src/intel/vulkan/genX_blorp_exec.c b/src/intel/vulkan/genX_blorp_exec.c
index c573e890946..5af6abb0894 100644
--- a/src/intel/vulkan/genX_blorp_exec.c
+++ b/src/intel/vulkan/genX_blorp_exec.c
@@ -63,8 +63,9 @@ blorp_surface_reloc(struct blorp_batch *batch, uint32_t ss_offset,
if (result != VK_SUCCESS)
anv_batch_set_error(&cmd_buffer->batch, result);

- void *dest = cmd_buffer->device->surface_state_pool.block_pool.map +
- ss_offset;
+ struct anv_pool_map pool_map = anv_block_pool_map(
+ &cmd_buffer->device->surface_state_pool.block_pool, ss_offset);
+ void *dest = pool_map.map + pool_map.offset;
uint64_t val = ((struct anv_bo*)address.buffer)->offset + address.offset +
delta;
write_reloc(cmd_buffer->device, dest, val, false);
--
2.17.1
Jason Ekstrand
2018-12-10 18:45:00 UTC
Permalink
On Fri, Dec 7, 2018 at 6:06 PM Rafael Antognolli <
Post by Rafael Antognolli
We will need specially the anv_block_pool_map, to find the
map relative to some BO that is not at the start of the block pool.
---
src/intel/vulkan/anv_allocator.c | 23 ++++++++++++++++++++---
src/intel/vulkan/anv_batch_chain.c | 5 +++--
src/intel/vulkan/anv_private.h | 7 +++++++
src/intel/vulkan/genX_blorp_exec.c | 5 +++--
4 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/src/intel/vulkan/anv_allocator.c
b/src/intel/vulkan/anv_allocator.c
index cda6a1a9d25..acf3c80fbac 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -601,6 +601,15 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
return VK_SUCCESS;
}
+struct anv_pool_map
+anv_block_pool_map(struct anv_block_pool *pool, int32_t offset)
+{
+ return (struct anv_pool_map) {
+ .map = pool->map,
+ .offset = offset,
+ };
Every caller of this function adds the two together. Why not just return
the ofsetted pointer?
Post by Rafael Antognolli
+}
+
/** Grows and re-centers the block pool.
*
* We grow the block pool in one or both directions in such a way that the
@@ -967,7 +976,9 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
st_idx + i);
state_i->alloc_size = pool->block_size;
state_i->offset = chunk_offset + pool->block_size * (i + 1);
- state_i->map = pool->block_pool.map + state_i->offset;
+ struct anv_pool_map pool_map =
anv_block_pool_map(&pool->block_pool,
+
state_i->offset);
+ state_i->map = pool_map.map + pool_map.offset;
}
anv_state_table_push(&pool->buckets[block_bucket].free_list,
&pool->table, st_idx, push_back);
@@ -983,7 +994,9 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
st_idx + i);
state_i->alloc_size = alloc_size;
state_i->offset = chunk_offset + alloc_size * (i + 1);
- state_i->map = pool->block_pool.map + state_i->offset;
+ struct anv_pool_map pool_map =
anv_block_pool_map(&pool->block_pool,
+
state_i->offset);
+ state_i->map = pool_map.map + pool_map.offset;
}
anv_state_table_push(&pool->buckets[bucket].free_list,
&pool->table, st_idx, push_back);
@@ -1002,7 +1015,11 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
state = anv_state_table_get(&pool->table, idx);
state->offset = offset;
state->alloc_size = alloc_size;
- state->map = pool->block_pool.map + offset;
+
+ struct anv_pool_map pool_map = anv_block_pool_map(&pool->block_pool,
+ state->offset);
+ state->map = pool_map.map + pool_map.offset;
+
return *state;
diff --git a/src/intel/vulkan/anv_batch_chain.c
b/src/intel/vulkan/anv_batch_chain.c
index a9f8c5b79b1..6c06858efe1 100644
--- a/src/intel/vulkan/anv_batch_chain.c
+++ b/src/intel/vulkan/anv_batch_chain.c
@@ -679,8 +679,9 @@ anv_cmd_buffer_alloc_binding_table(struct
anv_cmd_buffer *cmd_buffer,
return (struct anv_state) { 0 };
state.offset = cmd_buffer->bt_next;
- state.map = anv_binding_table_pool(device)->block_pool.map +
- bt_block->offset + state.offset;
+ struct anv_pool_map pool_map =
+ anv_block_pool_map(&anv_binding_table_pool(device)->block_pool,
bt_block->offset + state.offset);
+ state.map = pool_map.map + pool_map.offset;
cmd_buffer->bt_next += state.alloc_size;
diff --git a/src/intel/vulkan/anv_private.h
b/src/intel/vulkan/anv_private.h
index 539523450ef..a364be8dad5 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -749,6 +749,11 @@ struct anv_state_stream {
struct anv_state_stream_block *block_list;
};
+struct anv_pool_map {
+ void *map;
+ int32_t offset;
+};
+
/* The block_pool functions exported for testing only. The block pool should
* only be used via a state pool (see below).
*/
@@ -762,6 +767,8 @@ int32_t anv_block_pool_alloc(struct anv_block_pool *pool,
uint32_t block_size);
int32_t anv_block_pool_alloc_back(struct anv_block_pool *pool,
uint32_t block_size);
+struct anv_pool_map anv_block_pool_map(struct anv_block_pool *pool,
+ int32_t offset);
VkResult anv_state_pool_init(struct anv_state_pool *pool,
struct anv_device *device,
diff --git a/src/intel/vulkan/genX_blorp_exec.c
b/src/intel/vulkan/genX_blorp_exec.c
index c573e890946..5af6abb0894 100644
--- a/src/intel/vulkan/genX_blorp_exec.c
+++ b/src/intel/vulkan/genX_blorp_exec.c
@@ -63,8 +63,9 @@ blorp_surface_reloc(struct blorp_batch *batch, uint32_t ss_offset,
if (result != VK_SUCCESS)
anv_batch_set_error(&cmd_buffer->batch, result);
- void *dest = cmd_buffer->device->surface_state_pool.block_pool.map +
- ss_offset;
+ struct anv_pool_map pool_map = anv_block_pool_map(
+ &cmd_buffer->device->surface_state_pool.block_pool, ss_offset);
+ void *dest = pool_map.map + pool_map.offset;
uint64_t val = ((struct anv_bo*)address.buffer)->offset +
address.offset +
delta;
write_reloc(cmd_buffer->device, dest, val, false);
--
2.17.1
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Rafael Antognolli
2018-12-10 19:20:00 UTC
Permalink
Post by Rafael Antognolli
We will need specially the anv_block_pool_map, to find the
map relative to some BO that is not at the start of the block pool.
---
src/intel/vulkan/anv_allocator.c | 23 ++++++++++++++++++++---
src/intel/vulkan/anv_batch_chain.c | 5 +++--
src/intel/vulkan/anv_private.h | 7 +++++++
src/intel/vulkan/genX_blorp_exec.c | 5 +++--
4 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/
anv_allocator.c
index cda6a1a9d25..acf3c80fbac 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -601,6 +601,15 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
return VK_SUCCESS;
}
+struct anv_pool_map
+anv_block_pool_map(struct anv_block_pool *pool, int32_t offset)
+{
+ return (struct anv_pool_map) {
+ .map = pool->map,
+ .offset = offset,
+ };
Every caller of this function adds the two together. Why not just return the
ofsetted pointer?
Ugh, I guess so. I thought I would have a use case for having them
separated, but so far there isn't. Will fix that for the next version.
Post by Rafael Antognolli
+}
+
/** Grows and re-centers the block pool.
*
* We grow the block pool in one or both directions in such a way that the
@@ -967,7 +976,9 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
st_idx + i);
state_i->alloc_size = pool->block_size;
state_i->offset = chunk_offset + pool->block_size * (i + 1);
- state_i->map = pool->block_pool.map + state_i->offset;
+ struct anv_pool_map pool_map = anv_block_pool_map(&pool->
block_pool,
+ state_i->
offset);
+ state_i->map = pool_map.map + pool_map.offset;
}
anv_state_table_push(&pool->buckets[block_bucket].free_list,
&pool->table, st_idx, push_back);
@@ -983,7 +994,9 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
st_idx + i);
state_i->alloc_size = alloc_size;
state_i->offset = chunk_offset + alloc_size * (i + 1);
- state_i->map = pool->block_pool.map + state_i->offset;
+ struct anv_pool_map pool_map = anv_block_pool_map(&pool->
block_pool,
+ state_i->
offset);
+ state_i->map = pool_map.map + pool_map.offset;
}
anv_state_table_push(&pool->buckets[bucket].free_list,
&pool->table, st_idx, push_back);
@@ -1002,7 +1015,11 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
state = anv_state_table_get(&pool->table, idx);
state->offset = offset;
state->alloc_size = alloc_size;
- state->map = pool->block_pool.map + offset;
+
+ struct anv_pool_map pool_map = anv_block_pool_map(&pool->block_pool,
+ state->offset);
+ state->map = pool_map.map + pool_map.offset;
+
return *state;
diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/
anv_batch_chain.c
index a9f8c5b79b1..6c06858efe1 100644
--- a/src/intel/vulkan/anv_batch_chain.c
+++ b/src/intel/vulkan/anv_batch_chain.c
@@ -679,8 +679,9 @@ anv_cmd_buffer_alloc_binding_table(struct
anv_cmd_buffer *cmd_buffer,
return (struct anv_state) { 0 };
state.offset = cmd_buffer->bt_next;
- state.map = anv_binding_table_pool(device)->block_pool.map +
- bt_block->offset + state.offset;
+ struct anv_pool_map pool_map =
+ anv_block_pool_map(&anv_binding_table_pool(device)->block_pool,
bt_block->offset + state.offset);
+ state.map = pool_map.map + pool_map.offset;
cmd_buffer->bt_next += state.alloc_size;
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/
anv_private.h
index 539523450ef..a364be8dad5 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -749,6 +749,11 @@ struct anv_state_stream {
struct anv_state_stream_block *block_list;
};
+struct anv_pool_map {
+ void *map;
+ int32_t offset;
+};
+
/* The block_pool functions exported for testing only. The block pool should
* only be used via a state pool (see below).
*/
@@ -762,6 +767,8 @@ int32_t anv_block_pool_alloc(struct anv_block_pool *pool,
uint32_t block_size);
int32_t anv_block_pool_alloc_back(struct anv_block_pool *pool,
uint32_t block_size);
+struct anv_pool_map anv_block_pool_map(struct anv_block_pool *pool,
+ int32_t offset);
VkResult anv_state_pool_init(struct anv_state_pool *pool,
struct anv_device *device,
diff --git a/src/intel/vulkan/genX_blorp_exec.c b/src/intel/vulkan/
genX_blorp_exec.c
index c573e890946..5af6abb0894 100644
--- a/src/intel/vulkan/genX_blorp_exec.c
+++ b/src/intel/vulkan/genX_blorp_exec.c
@@ -63,8 +63,9 @@ blorp_surface_reloc(struct blorp_batch *batch, uint32_t
ss_offset,
if (result != VK_SUCCESS)
anv_batch_set_error(&cmd_buffer->batch, result);
- void *dest = cmd_buffer->device->surface_state_pool.block_pool.map +
- ss_offset;
+ struct anv_pool_map pool_map = anv_block_pool_map(
+ &cmd_buffer->device->surface_state_pool.block_pool, ss_offset);
+ void *dest = pool_map.map + pool_map.offset;
uint64_t val = ((struct anv_bo*)address.buffer)->offset + address.offset +
delta;
write_reloc(cmd_buffer->device, dest, val, false);
--
2.17.1
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Rafael Antognolli
2018-12-08 00:05:50 UTC
Permalink
They won't be true anymore once we add support for multiple BOs with
non-userptr.
---
src/intel/vulkan/genX_gpu_memcpy.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/src/intel/vulkan/genX_gpu_memcpy.c b/src/intel/vulkan/genX_gpu_memcpy.c
index 1bee1c6dc17..e20179fa675 100644
--- a/src/intel/vulkan/genX_gpu_memcpy.c
+++ b/src/intel/vulkan/genX_gpu_memcpy.c
@@ -133,9 +133,6 @@ genX(cmd_buffer_so_memcpy)(struct anv_cmd_buffer *cmd_buffer,
if (size == 0)
return;

- assert(dst.offset + size <= dst.bo->size);
- assert(src.offset + size <= src.bo->size);
-
/* The maximum copy block size is 4 32-bit components at a time. */
assert(size % 4 == 0);
unsigned bs = gcd_pow2_u64(16, size);
--
2.17.1
Jason Ekstrand
2018-12-10 19:55:03 UTC
Permalink
On Fri, Dec 7, 2018 at 6:06 PM Rafael Antognolli <
Post by Rafael Antognolli
They won't be true anymore once we add support for multiple BOs with
non-userptr.
Why not? I guess in the highly restricted case of state streams, you can
safely copy out-of-range of one BO because you know exactly what the next
BO is. Sounds dangerous but totally legit.
Post by Rafael Antognolli
---
src/intel/vulkan/genX_gpu_memcpy.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/src/intel/vulkan/genX_gpu_memcpy.c
b/src/intel/vulkan/genX_gpu_memcpy.c
index 1bee1c6dc17..e20179fa675 100644
--- a/src/intel/vulkan/genX_gpu_memcpy.c
+++ b/src/intel/vulkan/genX_gpu_memcpy.c
@@ -133,9 +133,6 @@ genX(cmd_buffer_so_memcpy)(struct anv_cmd_buffer *cmd_buffer,
if (size == 0)
return;
- assert(dst.offset + size <= dst.bo->size);
- assert(src.offset + size <= src.bo->size);
-
/* The maximum copy block size is 4 32-bit components at a time. */
assert(size % 4 == 0);
unsigned bs = gcd_pow2_u64(16, size);
--
2.17.1
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Rafael Antognolli
2018-12-08 00:05:52 UTC
Permalink
It's possible that we still have some space left in the block pool, but
we try to allocate a state larger than that state. This means such state
would start somewhere within the range of the old block_pool, and end
after that range, within the range of the new size.

That's fine when we use userptr, since the memory in the block pool is
CPU mapped continuously. However, by the end of this series, we will
have the block_pool split into different BOs, with different CPU
mapping ranges that are not necessarily continuous. So we must avoid
such case of a given state being part of two different BOs in the block
pool.

This commit solves the issue by detecting that we are growing the
block_pool even though we are not at the end of the range. If that
happens, we don't use the space left at the end of the old size, and
consider it as "padding" that can't be used in the allocation. We update
the size requested from the block pool to take the padding into account,
and return the offset after the padding, which happens to be at the
start of the new address range.

Additionally, we return the amount of padding we used, so the caller
knows that this happens and can return that padding back into a list of
free states, that can be reused later. This way we hopefully don't waste
any space, but also avoid having a state split between two different
BOs.
---
src/intel/vulkan/anv_allocator.c | 57 ++++++++++++++++++---
src/intel/vulkan/anv_private.h | 2 +-
src/intel/vulkan/tests/block_pool_no_free.c | 2 +-
3 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index bddeb4a0fbd..0d426edfb57 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -839,16 +839,35 @@ done:
static uint32_t
anv_block_pool_alloc_new(struct anv_block_pool *pool,
struct anv_block_state *pool_state,
- uint32_t block_size)
+ uint32_t block_size, uint32_t *padding)
{
struct anv_block_state state, old, new;

+ /* Most allocations won't generate any padding */
+ if (padding)
+ *padding = 0;
+
while (1) {
state.u64 = __sync_fetch_and_add(&pool_state->u64, block_size);
if (state.next + block_size <= state.end) {
assert(pool->map);
return state.next;
} else if (state.next <= state.end) {
+ if (pool->bo_flags & EXEC_OBJECT_PINNED && state.next < state.end) {
+ /* We need to grow the block pool, but still have some leftover
+ * space that can't be used by that particular allocation. So we
+ * add that as a "padding", and return it.
+ */
+ uint32_t leftover = state.end - state.next;
+ block_size += leftover;
+
+ /* If there is some leftover space in the pool, the caller must
+ * deal with it.
+ */
+ assert(leftover == 0 || padding);
+ *padding = leftover;
+ }
+
/* We allocated the first block outside the pool so we have to grow
* the pool. pool_state->next acts a mutex: threads who try to
* allocate now will get block indexes above the current limit and
@@ -872,9 +891,16 @@ anv_block_pool_alloc_new(struct anv_block_pool *pool,

int32_t
anv_block_pool_alloc(struct anv_block_pool *pool,
- uint32_t block_size)
+ uint32_t block_size, uint32_t *padding)
{
- return anv_block_pool_alloc_new(pool, &pool->state, block_size);
+ uint32_t offset;
+
+ offset = anv_block_pool_alloc_new(pool, &pool->state, block_size, padding);
+
+ if (padding && *padding > 0)
+ offset += *padding;
+
+ return offset;
}

/* Allocates a block out of the back of the block pool.
@@ -891,7 +917,7 @@ anv_block_pool_alloc_back(struct anv_block_pool *pool,
uint32_t block_size)
{
int32_t offset = anv_block_pool_alloc_new(pool, &pool->back_state,
- block_size);
+ block_size, NULL);

/* The offset we get out of anv_block_pool_alloc_new() is actually the
* number of bytes downwards from the middle to the end of the block.
@@ -947,16 +973,24 @@ static uint32_t
anv_fixed_size_state_pool_alloc_new(struct anv_fixed_size_state_pool *pool,
struct anv_block_pool *block_pool,
uint32_t state_size,
- uint32_t block_size)
+ uint32_t block_size,
+ uint32_t *padding)
{
struct anv_block_state block, old, new;
uint32_t offset;

+ /* We don't always use anv_block_pool_alloc(), which would set *padding to
+ * zero for us. So if we have a pointer to padding, we must zero it out
+ * ourselves here, to make sure we always return some sensible value.
+ */
+ if (padding)
+ *padding = 0;
+
/* If our state is large, we don't need any sub-allocation from a block.
* Instead, we just grab whole (potentially large) blocks.
*/
if (state_size >= block_size)
- return anv_block_pool_alloc(block_pool, state_size);
+ return anv_block_pool_alloc(block_pool, state_size, padding);

restart:
block.u64 = __sync_fetch_and_add(&pool->block.u64, state_size);
@@ -964,7 +998,7 @@ anv_fixed_size_state_pool_alloc_new(struct anv_fixed_size_state_pool *pool,
if (block.next < block.end) {
return block.next;
} else if (block.next == block.end) {
- offset = anv_block_pool_alloc(block_pool, block_size);
+ offset = anv_block_pool_alloc(block_pool, block_size, padding);
new.next = offset + state_size;
new.end = offset + block_size;
old.u64 = __sync_lock_test_and_set(&pool->block.u64, new.u64);
@@ -1033,6 +1067,7 @@ calculate_divisor(uint32_t size)
uint32_t bucket_size = anv_state_pool_get_bucket_size(bucket);
if (size % bucket_size == 0)
return bucket_size;
+ bucket--;
}

return 0;
@@ -1156,10 +1191,12 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
}
}

+ uint32_t padding;
offset = anv_fixed_size_state_pool_alloc_new(&pool->buckets[bucket],
&pool->block_pool,
alloc_size,
- pool->block_size);
+ pool->block_size,
+ &padding);
/* Everytime we allocate a new state, add it to the state pool */
uint32_t idx = anv_state_table_add(&pool->table, 1);
state = anv_state_table_get(&pool->table, idx);
@@ -1169,6 +1206,10 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
struct anv_pool_map pool_map = anv_block_pool_map(&pool->block_pool,
state->offset);
state->map = pool_map.map + pool_map.offset;
+ if (padding > 0) {
+ uint32_t return_offset = offset - padding;
+ anv_state_pool_return_chunk(pool, return_offset, padding, 0);
+ }


done:
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index b01b392daee..93fdc1e56a2 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -781,7 +781,7 @@ VkResult anv_block_pool_init(struct anv_block_pool *pool,
uint64_t bo_flags);
void anv_block_pool_finish(struct anv_block_pool *pool);
int32_t anv_block_pool_alloc(struct anv_block_pool *pool,
- uint32_t block_size);
+ uint32_t block_size, uint32_t *padding);
int32_t anv_block_pool_alloc_back(struct anv_block_pool *pool,
uint32_t block_size);
struct anv_pool_map anv_block_pool_map(struct anv_block_pool *pool,
diff --git a/src/intel/vulkan/tests/block_pool_no_free.c b/src/intel/vulkan/tests/block_pool_no_free.c
index 730297d4e36..a588455436a 100644
--- a/src/intel/vulkan/tests/block_pool_no_free.c
+++ b/src/intel/vulkan/tests/block_pool_no_free.c
@@ -46,7 +46,7 @@ static void *alloc_blocks(void *_job)
int32_t block, *data;

for (unsigned i = 0; i < BLOCKS_PER_THREAD; i++) {
- block = anv_block_pool_alloc(job->pool, block_size);
+ block = anv_block_pool_alloc(job->pool, block_size, NULL);
data = job->pool->map + block;
*data = block;
assert(block >= 0);
--
2.17.1
Rafael Antognolli
2018-12-08 00:05:41 UTC
Permalink
Add a structure to hold anv_states. This table will initially be used to
recicle anv_states, instead of relying on a linked list implemented in
GPU memory. Later it could be used so that all anv_states just point to
the content of this struct, instead of making copies of anv_states
everywhere.

TODO:
1) I need to refine the API, specially anv_state_table_add(). So far
we have to add an item, get the pointer to the anv_state, and then
fill the content. I tried some different things so far but need to
come back to this one.

2) There's a lot of common code between this table backing store
memory and the anv_block_pool buffer, due to how we grow it. I think
it's possible to refactory this and reuse code on both places.

3) Add unit tests.
---
src/intel/vulkan/anv_allocator.c | 246 ++++++++++++++++++++++++++++++-
src/intel/vulkan/anv_private.h | 44 ++++++
2 files changed, 288 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 67f2f73aa11..3590ede6050 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -100,6 +100,9 @@
/* Allocations are always at least 64 byte aligned, so 1 is an invalid value.
* We use it to indicate the free list is empty. */
#define EMPTY 1
+#define EMPTY2 UINT32_MAX
+
+#define PAGE_SIZE 4096

struct anv_mmap_cleanup {
void *map;
@@ -130,6 +133,246 @@ round_to_power_of_two(uint32_t value)
return 1 << ilog2_round_up(value);
}

+struct anv_state_table_cleanup {
+ void *map;
+ size_t size;
+};
+
+#define ANV_STATE_TABLE_CLEANUP_INIT ((struct anv_state_table_cleanup){0})
+#define ANV_STATE_ENTRY_SIZE (sizeof(struct anv_free_entry))
+
+static VkResult
+anv_state_table_expand_range(struct anv_state_table *table, uint32_t size);
+
+VkResult
+anv_state_table_init(struct anv_state_table *table,
+ struct anv_device *device,
+ uint32_t initial_entries)
+{
+ VkResult result;
+
+ table->device = device;
+
+ table->fd = memfd_create("free table", MFD_CLOEXEC);
+ if (table->fd == -1)
+ return vk_error(VK_ERROR_INITIALIZATION_FAILED);
+
+ /* Just make it 2GB up-front. The Linux kernel won't actually back it
+ * with pages until we either map and fault on one of them or we use
+ * userptr and send a chunk of it off to the GPU.
+ */
+ if (ftruncate(table->fd, BLOCK_POOL_MEMFD_SIZE) == -1) {
+ result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
+ goto fail_fd;
+ }
+
+ if (!u_vector_init(&table->mmap_cleanups,
+ round_to_power_of_two(sizeof(struct anv_state_table_cleanup)),
+ 128)) {
+ result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
+ goto fail_fd;
+ }
+
+ table->state.next = 0;
+ table->state.end = 0;
+ table->size = 0;
+
+ uint32_t initial_size = initial_entries * ANV_STATE_ENTRY_SIZE;
+ result = anv_state_table_expand_range(table, initial_size);
+ if (result != VK_SUCCESS)
+ goto fail_mmap_cleanups;
+
+ return VK_SUCCESS;
+
+ fail_mmap_cleanups:
+ u_vector_finish(&table->mmap_cleanups);
+ fail_fd:
+ close(table->fd);
+
+ return result;
+}
+
+static VkResult
+anv_state_table_expand_range(struct anv_state_table *table, uint32_t size)
+{
+ void *map;
+ struct anv_mmap_cleanup *cleanup;
+
+ /* Assert that we only ever grow the pool */
+ assert(size >= table->state.end);
+
+ /* Assert that we don't go outside the bounds of the memfd */
+ assert(size <= BLOCK_POOL_MEMFD_SIZE);
+
+ cleanup = u_vector_add(&table->mmap_cleanups);
+ if (!cleanup)
+ return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
+
+ *cleanup = ANV_MMAP_CLEANUP_INIT;
+
+ /* Just leak the old map until we destroy the pool. We can't munmap it
+ * without races or imposing locking on the block allocate fast path. On
+ * the whole the leaked maps adds up to less than the size of the
+ * current map. MAP_POPULATE seems like the right thing to do, but we
+ * should try to get some numbers.
+ */
+ map = mmap(NULL, size, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, table->fd, 0);
+ if (map == MAP_FAILED) {
+ exit(-1);
+ return vk_errorf(table->device->instance, table->device,
+ VK_ERROR_MEMORY_MAP_FAILED, "mmap failed: %m");
+ }
+
+ cleanup->map = map;
+ cleanup->size = size;
+
+ table->map = map;
+ table->size = size;
+
+ return VK_SUCCESS;
+}
+
+static uint32_t
+anv_state_table_grow(struct anv_state_table *table)
+{
+ VkResult result = VK_SUCCESS;
+
+ pthread_mutex_lock(&table->device->mutex);
+
+ uint32_t used = align_u32(table->state.next * ANV_STATE_ENTRY_SIZE,
+ PAGE_SIZE);
+ uint32_t old_size = table->size;
+
+ /* The block pool is always initialized to a nonzero size and this function
+ * is always called after initialization.
+ */
+ assert(old_size > 0);
+
+ uint32_t required = MAX2(used, old_size);
+ if (used * 2 <= required) {
+ /* If we're in this case then this isn't the firsta allocation and we
+ * already have enough space on both sides to hold double what we
+ * have allocated. There's nothing for us to do.
+ */
+ goto done;
+ }
+
+ uint32_t size = old_size * 2;
+ while (size < required)
+ size *= 2;
+
+ assert(size > table->size);
+
+ result = anv_state_table_expand_range(table, size);
+
+ done:
+ pthread_mutex_unlock(&table->device->mutex);
+
+ if (result == VK_SUCCESS) {
+ /* Return the appropriate new size. This function never actually
+ * updates state->next. Instead, we let the caller do that because it
+ * needs to do so in order to maintain its concurrency model.
+ */
+ return table->size / ANV_STATE_ENTRY_SIZE;
+ } else {
+ return 0;
+ }
+}
+
+void
+anv_state_table_finish(struct anv_state_table *table)
+{
+ struct anv_state_table_cleanup *cleanup;
+
+ u_vector_foreach(cleanup, &table->mmap_cleanups) {
+ if (cleanup->map)
+ munmap(cleanup->map, cleanup->size);
+ }
+
+ u_vector_finish(&table->mmap_cleanups);
+
+ close(table->fd);
+}
+
+uint32_t
+anv_state_table_add(struct anv_state_table *table, uint32_t count)
+{
+ struct anv_block_state state, old, new;
+
+ while(1) {
+ state.u64 = __sync_fetch_and_add(&table->state.u64, count);
+ if (state.next + count <= state.end) {
+ assert(table->map);
+ struct anv_free_entry *entry = &table->map[state.next];
+ for (int i = 0; i < count; i++) {
+ entry[i].state.idx = state.next + i;
+ }
+ return state.next;
+ } else if (state.next <= state.end) {
+ /* We allocated the first block outside the pool so we have to grow
+ * the pool. pool_state->next acts a mutex: threads who try to
+ * allocate now will get block indexes above the current limit and
+ * hit futex_wait below.
+ */
+ new.next = state.next + count;
+ do {
+ new.end = anv_state_table_grow(table);
+ } while (new.end < new.next);
+
+ old.u64 = __sync_lock_test_and_set(&table->state.u64, new.u64);
+ if (old.next != state.next)
+ futex_wake(&table->state.end, INT_MAX);
+ } else {
+ futex_wait(&table->state.end, state.end, NULL);
+ continue;
+ }
+ }
+}
+
+void
+anv_state_table_push(union anv_free_list2 *list,
+ struct anv_state_table *table,
+ uint32_t idx, uint32_t count)
+{
+ union anv_free_list2 current, old, new;
+ uint32_t next = idx;
+
+ for (uint32_t i = 1; i < count; i++, next++)
+ table->map[next].list.offset = next + 1;
+
+ old = *list;
+ do {
+ current = old;
+ table->map[next].list.offset = current.offset;
+ new.offset = idx;
+ new.count = current.count + 1;
+ old.u64 = __sync_val_compare_and_swap(&list->u64, current.u64, new.u64);
+ } while (old.u64 != current.u64);
+}
+
+struct anv_state *
+anv_state_table_pop(union anv_free_list2 *list,
+ struct anv_state_table *table)
+{
+ union anv_free_list2 current, new, old;
+
+ current.u64 = list->u64;
+ while (current.offset != EMPTY2) {
+ __sync_synchronize();
+ new.offset = table->map[current.offset].list.offset;
+ new.count = current.count;
+ old.u64 = __sync_val_compare_and_swap(&list->u64, current.u64, new.u64);
+ if (old.u64 == current.u64) {
+ struct anv_free_entry *entry = &table->map[current.offset];
+ return &entry->state;
+ }
+ current = old;
+ }
+
+ return NULL;
+}
+
static bool
anv_free_list_pop(union anv_free_list *list, void **map, int32_t *offset)
{
@@ -311,8 +554,6 @@ anv_block_pool_finish(struct anv_block_pool *pool)
close(pool->fd);
}

-#define PAGE_SIZE 4096
-
static VkResult
anv_block_pool_expand_range(struct anv_block_pool *pool,
uint32_t center_bo_offset, uint32_t size)
@@ -782,6 +1023,7 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
&pool->block_pool,
state.alloc_size,
pool->block_size);
+ /* state.idx = anv_state_table_add(pool->table, state); */

done:
state.map = pool->block_pool.map + state.offset;
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index aff076a55d9..3fe299d55f9 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -615,7 +615,18 @@ union anv_free_list {
uint64_t u64;
};

+union anv_free_list2 {
+ struct {
+ uint32_t offset;
+
+ /* A simple count that is incremented every time the head changes. */
+ uint32_t count;
+ };
+ uint64_t u64;
+};
+
#define ANV_FREE_LIST_EMPTY ((union anv_free_list) { { 1, 0 } })
+#define ANV_FREE_LIST2_EMPTY ((union anv_free_list2) { { UINT32_MAX, 0 } })

struct anv_block_state {
union {
@@ -687,6 +698,7 @@ struct anv_state {
int32_t offset;
uint32_t alloc_size;
void *map;
+ uint32_t idx;
};

#define ANV_STATE_NULL ((struct anv_state) { .alloc_size = 0 })
@@ -701,6 +713,21 @@ struct anv_fixed_size_state_pool {

#define ANV_STATE_BUCKETS (ANV_MAX_STATE_SIZE_LOG2 - ANV_MIN_STATE_SIZE_LOG2 + 1)

+struct anv_free_entry {
+ union anv_free_list2 list;
+ struct anv_state state;
+};
+
+struct anv_state_table {
+ struct anv_device *device;
+ int fd;
+ /* void *map; */
+ struct anv_free_entry *map;
+ uint32_t size;
+ struct anv_block_state state;
+ struct u_vector mmap_cleanups;
+};
+
struct anv_state_pool {
struct anv_block_pool block_pool;

@@ -762,6 +789,23 @@ void anv_state_stream_finish(struct anv_state_stream *stream);
struct anv_state anv_state_stream_alloc(struct anv_state_stream *stream,
uint32_t size, uint32_t alignment);

+VkResult anv_state_table_init(struct anv_state_table *table,
+ struct anv_device *device,
+ uint32_t initial_entries);
+void anv_state_table_finish(struct anv_state_table *table);
+uint32_t anv_state_table_add(struct anv_state_table *table, uint32_t count);
+void anv_state_table_push(union anv_free_list2 *list,
+ struct anv_state_table *table,
+ uint32_t idx, uint32_t count);
+struct anv_state* anv_state_table_pop(union anv_free_list2 *list,
+ struct anv_state_table *table);
+
+
+static inline struct anv_state *
+anv_state_table_get(struct anv_state_table *table, uint32_t idx)
+{
+ return &table->map[idx].state;
+}
/**
* Implements a pool of re-usable BOs. The interface is identical to that
* of block_pool except that each block is its own BO.
--
2.17.1
Rafael Antognolli
2018-12-08 00:05:40 UTC
Permalink
The test was checking whether -1 was smaller than an unsigned int, which
is always false. So it was exiting early and never running until the
end, since it would reach the condition (thread_max == -1).

However, just fixing that is not enough. The test is currently getting
the highest block on each iteration, and then on the next one, until it
reaches the end. But by that point, we wouldn't have looked at all
blocks of all threads. For instance, for 3 threads and 4 blocks per
thread, a situation like this (unlikely to happen):

[Thread]: [Blocks]
[0]: [0, 32, 64, 96]
[1]: [128, 160, 192, 224]
[2]: [256, 288, 320, 352]

Would cause the test to iterate only over the thread number 2.

The fix is to always grab the lowest block on each iteration, and assert
that it is higher than the one from the previous iteration.
---
src/intel/vulkan/tests/block_pool_no_free.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/src/intel/vulkan/tests/block_pool_no_free.c b/src/intel/vulkan/tests/block_pool_no_free.c
index 17006dd3bc7..730297d4e36 100644
--- a/src/intel/vulkan/tests/block_pool_no_free.c
+++ b/src/intel/vulkan/tests/block_pool_no_free.c
@@ -78,16 +78,16 @@ static void validate_monotonic(uint32_t **blocks)
unsigned next[NUM_THREADS];
memset(next, 0, sizeof(next));

- int highest = -1;
+ uint32_t lowest = UINT32_MAX;
while (true) {
- /* First, we find which thread has the highest next element */
- int thread_max = -1;
+ /* First, we find which thread has the lowest next element */
+ uint32_t thread_max = UINT32_MAX;
int max_thread_idx = -1;
for (unsigned i = 0; i < NUM_THREADS; i++) {
if (next[i] >= BLOCKS_PER_THREAD)
continue;

- if (thread_max < blocks[i][next[i]]) {
+ if (thread_max > blocks[i][next[i]]) {
thread_max = blocks[i][next[i]];
max_thread_idx = i;
}
@@ -96,13 +96,14 @@ static void validate_monotonic(uint32_t **blocks)
/* The only way this can happen is if all of the next[] values are at
* BLOCKS_PER_THREAD, in which case, we're done.
*/
- if (thread_max == -1)
+ if (thread_max == UINT32_MAX)
break;

- /* That next element had better be higher than the previous highest */
- assert(blocks[max_thread_idx][next[max_thread_idx]] > highest);
+ /* That next element had better be higher than the previous lowest */
+ assert(lowest == UINT32_MAX ||
+ blocks[max_thread_idx][next[max_thread_idx]] > lowest);

- highest = blocks[max_thread_idx][next[max_thread_idx]];
+ lowest = blocks[max_thread_idx][next[max_thread_idx]];
next[max_thread_idx]++;
}
}
--
2.17.1
Jason Ekstrand
2018-12-10 18:00:40 UTC
Permalink
On Fri, Dec 7, 2018 at 6:06 PM Rafael Antognolli <
Post by Rafael Antognolli
The test was checking whether -1 was smaller than an unsigned int, which
is always false. So it was exiting early and never running until the
end, since it would reach the condition (thread_max == -1).
However, just fixing that is not enough. The test is currently getting
the highest block on each iteration, and then on the next one, until it
reaches the end. But by that point, we wouldn't have looked at all
blocks of all threads. For instance, for 3 threads and 4 blocks per
[Thread]: [Blocks]
[0]: [0, 32, 64, 96]
[1]: [128, 160, 192, 224]
[2]: [256, 288, 320, 352]
Would cause the test to iterate only over the thread number 2.
The fix is to always grab the lowest block on each iteration, and assert
that it is higher than the one from the previous iteration.
---
src/intel/vulkan/tests/block_pool_no_free.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/src/intel/vulkan/tests/block_pool_no_free.c
b/src/intel/vulkan/tests/block_pool_no_free.c
index 17006dd3bc7..730297d4e36 100644
--- a/src/intel/vulkan/tests/block_pool_no_free.c
+++ b/src/intel/vulkan/tests/block_pool_no_free.c
@@ -78,16 +78,16 @@ static void validate_monotonic(uint32_t **blocks)
unsigned next[NUM_THREADS];
memset(next, 0, sizeof(next));
- int highest = -1;
+ uint32_t lowest = UINT32_MAX;
I think this should still be named "highest" since it is the highest thing
we've seen. Also, I kind-of think -1 still makes sense as a starting
value. Maybe we just need to make the comparison below explicitly signed
with a cast.
Post by Rafael Antognolli
while (true) {
- /* First, we find which thread has the highest next element */
- int thread_max = -1;
+ /* First, we find which thread has the lowest next element */
+ uint32_t thread_max = UINT32_MAX;
Yes, this loop needs to change. However, I think we should rename the
variables to thread_min and min_thread_idx.
Post by Rafael Antognolli
int max_thread_idx = -1;
for (unsigned i = 0; i < NUM_THREADS; i++) {
if (next[i] >= BLOCKS_PER_THREAD)
continue;
- if (thread_max < blocks[i][next[i]]) {
+ if (thread_max > blocks[i][next[i]]) {
thread_max = blocks[i][next[i]];
max_thread_idx = i;
}
@@ -96,13 +96,14 @@ static void validate_monotonic(uint32_t **blocks)
/* The only way this can happen is if all of the next[] values are at
* BLOCKS_PER_THREAD, in which case, we're done.
*/
- if (thread_max == -1)
+ if (thread_max == UINT32_MAX)
break;
- /* That next element had better be higher than the previous highest */
- assert(blocks[max_thread_idx][next[max_thread_idx]] > highest);
+ /* That next element had better be higher than the previous lowest */
+ assert(lowest == UINT32_MAX ||
+ blocks[max_thread_idx][next[max_thread_idx]] > lowest);
- highest = blocks[max_thread_idx][next[max_thread_idx]];
+ lowest = blocks[max_thread_idx][next[max_thread_idx]];
next[max_thread_idx]++;
}
}
--
2.17.1
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Rafael Antognolli
2018-12-08 00:05:53 UTC
Permalink
If softpin is supported, create new BOs for the required size and add the
respective BO maps. The other main change of this commit is that
anv_block_pool_map() now returns the map for the BO that the given
offset is part of. So there's no block_pool->map access anymore (when
softpin is used.
---
src/intel/vulkan/anv_allocator.c | 92 ++++++++++++++++++--------------
1 file changed, 53 insertions(+), 39 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 0d426edfb57..46f2278a56c 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -473,17 +473,19 @@ anv_block_pool_init(struct anv_block_pool *pool,
pool->size = 0;
pool->start_address = gen_canonical_address(start_address);

- pool->fd = memfd_create("block pool", MFD_CLOEXEC);
- if (pool->fd == -1)
- return vk_error(VK_ERROR_INITIALIZATION_FAILED);
-
- /* Just make it 2GB up-front. The Linux kernel won't actually back it
- * with pages until we either map and fault on one of them or we use
- * userptr and send a chunk of it off to the GPU.
- */
- if (ftruncate(pool->fd, BLOCK_POOL_MEMFD_SIZE) == -1) {
- result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
- goto fail_fd;
+ if (!(pool->bo_flags & EXEC_OBJECT_PINNED)) {
+ pool->fd = memfd_create("block pool", MFD_CLOEXEC);
+ if (pool->fd == -1)
+ return vk_error(VK_ERROR_INITIALIZATION_FAILED);
+
+ /* Just make it 2GB up-front. The Linux kernel won't actually back it
+ * with pages until we either map and fault on one of them or we use
+ * userptr and send a chunk of it off to the GPU.
+ */
+ if (ftruncate(pool->fd, BLOCK_POOL_MEMFD_SIZE) == -1) {
+ result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
+ goto fail_fd;
+ }
}

if (!u_vector_init(&pool->mmap_cleanups,
@@ -507,7 +509,8 @@ anv_block_pool_init(struct anv_block_pool *pool,
fail_mmap_cleanups:
u_vector_finish(&pool->mmap_cleanups);
fail_fd:
- close(pool->fd);
+ if (!(pool->bo_flags & EXEC_OBJECT_PINNED))
+ close(pool->fd);

return result;
}
@@ -525,8 +528,9 @@ anv_block_pool_finish(struct anv_block_pool *pool)
}

u_vector_finish(&pool->mmap_cleanups);
+ if (!(pool->bo_flags & EXEC_OBJECT_PINNED))
+ close(pool->fd);

- close(pool->fd);
anv_block_pool_bo_finish(pool);
}

@@ -537,6 +541,7 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
void *map;
uint32_t gem_handle;
struct anv_mmap_cleanup *cleanup;
+ const bool use_softpin = !!(pool->bo_flags & EXEC_OBJECT_PINNED);

/* Assert that we only ever grow the pool */
assert(center_bo_offset >= pool->back_state.end);
@@ -544,7 +549,8 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,

/* Assert that we don't go outside the bounds of the memfd */
assert(center_bo_offset <= BLOCK_POOL_MEMFD_CENTER);
- assert(size - center_bo_offset <=
+ assert(use_softpin ||
+ size - center_bo_offset <=
BLOCK_POOL_MEMFD_SIZE - BLOCK_POOL_MEMFD_CENTER);

cleanup = u_vector_add(&pool->mmap_cleanups);
@@ -553,28 +559,36 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,

*cleanup = ANV_MMAP_CLEANUP_INIT;

- /* Just leak the old map until we destroy the pool. We can't munmap it
- * without races or imposing locking on the block allocate fast path. On
- * the whole the leaked maps adds up to less than the size of the
- * current map. MAP_POPULATE seems like the right thing to do, but we
- * should try to get some numbers.
- */
- map = mmap(NULL, size, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, pool->fd,
- BLOCK_POOL_MEMFD_CENTER - center_bo_offset);
- if (map == MAP_FAILED)
- return vk_errorf(pool->device->instance, pool->device,
- VK_ERROR_MEMORY_MAP_FAILED, "mmap failed: %m");
-
- gem_handle = anv_gem_userptr(pool->device, map, size);
- if (gem_handle == 0) {
- munmap(map, size);
- return vk_errorf(pool->device->instance, pool->device,
- VK_ERROR_TOO_MANY_OBJECTS, "userptr failed: %m");
+ uint32_t newbo_size = size - pool->size;
+ if (use_softpin) {
+ gem_handle = anv_gem_create(pool->device, newbo_size);
+ map = anv_gem_mmap(pool->device, gem_handle, 0, newbo_size, 0);
+ if (map == MAP_FAILED)
+ return vk_errorf(pool->device->instance, pool->device,
+ VK_ERROR_MEMORY_MAP_FAILED, "gem mmap failed: %m");
+ } else {
+ /* Just leak the old map until we destroy the pool. We can't munmap it
+ * without races or imposing locking on the block allocate fast path. On
+ * the whole the leaked maps adds up to less than the size of the
+ * current map. MAP_POPULATE seems like the right thing to do, but we
+ * should try to get some numbers.
+ */
+ map = mmap(NULL, size, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, pool->fd,
+ BLOCK_POOL_MEMFD_CENTER - center_bo_offset);
+ if (map == MAP_FAILED)
+ return vk_errorf(pool->device->instance, pool->device,
+ VK_ERROR_MEMORY_MAP_FAILED, "mmap failed: %m");
+ gem_handle = anv_gem_userptr(pool->device, map, size);
+ if (gem_handle == 0) {
+ munmap(map, size);
+ return vk_errorf(pool->device->instance, pool->device,
+ VK_ERROR_TOO_MANY_OBJECTS, "userptr failed: %m");
+ }
}

cleanup->map = map;
- cleanup->size = size;
+ cleanup->size = use_softpin ? newbo_size : size;
cleanup->gem_handle = gem_handle;

#if 0
@@ -593,7 +607,9 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,

/* Now that we successfull allocated everything, we can write the new
* values back into pool. */
- pool->map = map + center_bo_offset;
+ if (!use_softpin) {
+ pool->map = map + center_bo_offset;
+ }
pool->center_bo_offset = center_bo_offset;

/* For block pool BOs we have to be a bit careful about where we place them
@@ -628,7 +644,7 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
* range. On the other hand, if not using softpin, we need to add a BO if we
* don't have one yet.
*/
- if (!pool->bo) {
+ if (use_softpin || !pool->bo) {
bo_elem = malloc(sizeof(*bo_elem));
bo = &bo_elem->bo;
} else {
@@ -639,10 +655,9 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
if (!pool->bo)
pool->bo = bo;

- anv_bo_init(bo, gem_handle, size);
+ anv_bo_init(bo, gem_handle, use_softpin ? newbo_size : size);
if (pool->bo_flags & EXEC_OBJECT_PINNED) {
- bo->offset = pool->start_address + BLOCK_POOL_MEMFD_CENTER -
- center_bo_offset;
+ bo->offset = pool->start_address + pool->size;
}
bo->flags = pool->bo_flags;
bo->map = map;
@@ -850,7 +865,6 @@ anv_block_pool_alloc_new(struct anv_block_pool *pool,
while (1) {
state.u64 = __sync_fetch_and_add(&pool_state->u64, block_size);
if (state.next + block_size <= state.end) {
- assert(pool->map);
return state.next;
} else if (state.next <= state.end) {
if (pool->bo_flags & EXEC_OBJECT_PINNED && state.next < state.end) {
--
2.17.1
Rafael Antognolli
2018-12-08 00:05:42 UTC
Permalink
Usage of anv_state_table_add is really annoying, see comment on the
previous commit.
---
src/intel/vulkan/anv_allocator.c | 96 +++++++++++++++++++++-----------
src/intel/vulkan/anv_private.h | 4 +-
2 files changed, 67 insertions(+), 33 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 3590ede6050..5f0458afd77 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -869,11 +869,17 @@ anv_state_pool_init(struct anv_state_pool *pool,
if (result != VK_SUCCESS)
return result;

+ result = anv_state_table_init(&pool->table, device, 64);
+ if (result != VK_SUCCESS) {
+ anv_block_pool_finish(&pool->block_pool);
+ return result;
+ }
+
assert(util_is_power_of_two_or_zero(block_size));
pool->block_size = block_size;
pool->back_alloc_free_list = ANV_FREE_LIST_EMPTY;
for (unsigned i = 0; i < ANV_STATE_BUCKETS; i++) {
- pool->buckets[i].free_list = ANV_FREE_LIST_EMPTY;
+ pool->buckets[i].free_list = ANV_FREE_LIST2_EMPTY;
pool->buckets[i].block.next = 0;
pool->buckets[i].block.end = 0;
}
@@ -886,6 +892,7 @@ void
anv_state_pool_finish(struct anv_state_pool *pool)
{
VG(VALGRIND_DESTROY_MEMPOOL(pool));
+ anv_state_table_finish(&pool->table);
anv_block_pool_finish(&pool->block_pool);
}

@@ -946,22 +953,30 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
{
uint32_t bucket = anv_state_pool_get_bucket(MAX2(size, align));

- struct anv_state state;
- state.alloc_size = anv_state_pool_get_bucket_size(bucket);
+ struct anv_state *state;
+ uint32_t alloc_size = anv_state_pool_get_bucket_size(bucket);
+ int32_t offset;

/* Try free list first. */
- if (anv_free_list_pop(&pool->buckets[bucket].free_list,
- &pool->block_pool.map, &state.offset)) {
- assert(state.offset >= 0);
+ state = anv_state_table_pop(&pool->buckets[bucket].free_list,
+ &pool->table);
+ if (state) {
+ assert(state->offset >= 0);
goto done;
}

+
/* Try to grab a chunk from some larger bucket and split it up */
for (unsigned b = bucket + 1; b < ANV_STATE_BUCKETS; b++) {
- int32_t chunk_offset;
- if (anv_free_list_pop(&pool->buckets[b].free_list,
- &pool->block_pool.map, &chunk_offset)) {
+ state = anv_state_table_pop(&pool->buckets[b].free_list, &pool->table);
+ if (state) {
unsigned chunk_size = anv_state_pool_get_bucket_size(b);
+ int32_t chunk_offset = state->offset;
+
+ /* First lets update the state we got to its new size. offset and map
+ * remain the same.
+ */
+ state->alloc_size = alloc_size;

/* We've found a chunk that's larger than the requested state size.
* There are a couple of options as to what we do with it:
@@ -990,44 +1005,62 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
* We choose option (3).
*/
if (chunk_size > pool->block_size &&
- state.alloc_size < pool->block_size) {
+ alloc_size < pool->block_size) {
assert(chunk_size % pool->block_size == 0);
/* We don't want to split giant chunks into tiny chunks. Instead,
* break anything bigger than a block into block-sized chunks and
* then break it down into bucket-sized chunks from there. Return
* all but the first block of the chunk to the block bucket.
*/
+ uint32_t push_back = (chunk_size / pool->block_size) - 1;
const uint32_t block_bucket =
anv_state_pool_get_bucket(pool->block_size);
- anv_free_list_push(&pool->buckets[block_bucket].free_list,
- pool->block_pool.map,
- chunk_offset + pool->block_size,
- pool->block_size,
- (chunk_size / pool->block_size) - 1);
+ uint32_t st_idx = anv_state_table_add(&pool->table, push_back);
+ for (int i = 0; i < push_back; i++) {
+ /* update states that were added back to the state table */
+ struct anv_state *state_i = anv_state_table_get(&pool->table,
+ st_idx + i);
+ state_i->alloc_size = pool->block_size;
+ state_i->offset = chunk_offset + pool->block_size * (i + 1);
+ state_i->map = pool->block_pool.map + state_i->offset;
+ }
+ anv_state_table_push(&pool->buckets[block_bucket].free_list,
+ &pool->table, st_idx, push_back);
chunk_size = pool->block_size;
}

- assert(chunk_size % state.alloc_size == 0);
- anv_free_list_push(&pool->buckets[bucket].free_list,
- pool->block_pool.map,
- chunk_offset + state.alloc_size,
- state.alloc_size,
- (chunk_size / state.alloc_size) - 1);
+ assert(chunk_size % alloc_size == 0);
+ uint32_t push_back = (chunk_size / alloc_size) - 1;
+ uint32_t st_idx = anv_state_table_add(&pool->table, push_back);
+ for (int i = 0; i < push_back; i++) {
+ /* update states that were added back to the state table */
+ struct anv_state *state_i = anv_state_table_get(&pool->table,
+ st_idx + i);
+ state_i->alloc_size = alloc_size;
+ state_i->offset = chunk_offset + alloc_size * (i + 1);
+ state_i->map = pool->block_pool.map + state_i->offset;
+ }
+ anv_state_table_push(&pool->buckets[bucket].free_list,
+ &pool->table, st_idx, push_back);

- state.offset = chunk_offset;
+ offset = chunk_offset;
goto done;
}
}

- state.offset = anv_fixed_size_state_pool_alloc_new(&pool->buckets[bucket],
- &pool->block_pool,
- state.alloc_size,
- pool->block_size);
- /* state.idx = anv_state_table_add(pool->table, state); */
+ offset = anv_fixed_size_state_pool_alloc_new(&pool->buckets[bucket],
+ &pool->block_pool,
+ alloc_size,
+ pool->block_size);
+ /* Everytime we allocate a new state, add it to the state pool */
+ uint32_t idx = anv_state_table_add(&pool->table, 1);
+ state = anv_state_table_get(&pool->table, idx);
+ state->offset = offset;
+ state->alloc_size = alloc_size;
+ state->map = pool->block_pool.map + offset;

done:
- state.map = pool->block_pool.map + state.offset;
- return state;
+ return *state;
}

struct anv_state
@@ -1074,9 +1107,8 @@ anv_state_pool_free_no_vg(struct anv_state_pool *pool, struct anv_state state)
pool->block_pool.map, state.offset,
state.alloc_size, 1);
} else {
- anv_free_list_push(&pool->buckets[bucket].free_list,
- pool->block_pool.map, state.offset,
- state.alloc_size, 1);
+ anv_state_table_push(&pool->buckets[bucket].free_list,
+ &pool->table, state.idx, 1);
}
}

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 3fe299d55f9..f7b3ec5f6a4 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -704,7 +704,7 @@ struct anv_state {
#define ANV_STATE_NULL ((struct anv_state) { .alloc_size = 0 })

struct anv_fixed_size_state_pool {
- union anv_free_list free_list;
+ union anv_free_list2 free_list;
struct anv_block_state block;
};

@@ -731,6 +731,8 @@ struct anv_state_table {
struct anv_state_pool {
struct anv_block_pool block_pool;

+ struct anv_state_table table;
+
/* The size of blocks which will be allocated from the block pool */
uint32_t block_size;
--
2.17.1
Rafael Antognolli
2018-12-08 00:05:44 UTC
Permalink
Maybe we should already rename anv_free_list2 -> anv_free_list since the
old one is gone.
---
src/intel/vulkan/anv_allocator.c | 55 --------------------------------
src/intel/vulkan/anv_private.h | 11 -------
2 files changed, 66 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 2171a97970b..cda6a1a9d25 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -373,61 +373,6 @@ anv_state_table_pop(union anv_free_list2 *list,
return NULL;
}

-static bool
-anv_free_list_pop(union anv_free_list *list, void **map, int32_t *offset)
-{
- union anv_free_list current, new, old;
-
- current.u64 = list->u64;
- while (current.offset != EMPTY) {
- /* We have to add a memory barrier here so that the list head (and
- * offset) gets read before we read the map pointer. This way we
- * know that the map pointer is valid for the given offset at the
- * point where we read it.
- */
- __sync_synchronize();
-
- int32_t *next_ptr = *map + current.offset;
- new.offset = VG_NOACCESS_READ(next_ptr);
- new.count = current.count + 1;
- old.u64 = __sync_val_compare_and_swap(&list->u64, current.u64, new.u64);
- if (old.u64 == current.u64) {
- *offset = current.offset;
- return true;
- }
- current = old;
- }
-
- return false;
-}
-
-static void
-anv_free_list_push(union anv_free_list *list, void *map, int32_t offset,
- uint32_t size, uint32_t count)
-{
- union anv_free_list current, old, new;
- int32_t *next_ptr = map + offset;
-
- /* If we're returning more than one chunk, we need to build a chain to add
- * to the list. Fortunately, we can do this without any atomics since we
- * own everything in the chain right now. `offset` is left pointing to the
- * head of our chain list while `next_ptr` points to the tail.
- */
- for (uint32_t i = 1; i < count; i++) {
- VG_NOACCESS_WRITE(next_ptr, offset + i * size);
- next_ptr = map + offset + i * size;
- }
-
- old = *list;
- do {
- current = old;
- VG_NOACCESS_WRITE(next_ptr, current.offset);
- new.offset = offset;
- new.count = current.count + 1;
- old.u64 = __sync_val_compare_and_swap(&list->u64, current.u64, new.u64);
- } while (old.u64 != current.u64);
-}
-
/* All pointers in the ptr_free_list are assumed to be page-aligned. This
* means that the bottom 12 bits should all be zero.
*/
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index d068a4be5d8..539523450ef 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -605,16 +605,6 @@ anv_bo_init(struct anv_bo *bo, uint32_t gem_handle, uint64_t size)
* both the block pool and the state pools. Unfortunately, in order to
* solve the ABA problem, we can't use a single uint32_t head.
*/
-union anv_free_list {
- struct {
- int32_t offset;
-
- /* A simple count that is incremented every time the head changes. */
- uint32_t count;
- };
- uint64_t u64;
-};
-
union anv_free_list2 {
struct {
uint32_t offset;
@@ -625,7 +615,6 @@ union anv_free_list2 {
uint64_t u64;
};

-#define ANV_FREE_LIST_EMPTY ((union anv_free_list) { { 1, 0 } })
#define ANV_FREE_LIST2_EMPTY ((union anv_free_list2) { { UINT32_MAX, 0 } })

struct anv_block_state {
--
2.17.1
Rafael Antognolli
2018-12-08 00:05:49 UTC
Permalink
TODO: This is just flushing the entire dynamic states on every execbuf.
Maybe it's too much. However, in theory we should be already flushing
the states as needed, but I think we didn't hit any bug due to the
coherence implied by userptr.
---
src/intel/vulkan/anv_batch_chain.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/anv_batch_chain.c
index 65df28ccb91..99009679435 100644
--- a/src/intel/vulkan/anv_batch_chain.c
+++ b/src/intel/vulkan/anv_batch_chain.c
@@ -1366,6 +1366,10 @@ anv_reloc_list_add_dep(struct anv_cmd_buffer *cmd_buffer,

anv_block_pool_foreach_bo(bo_list, iter, bo) {
_mesa_set_add(relocs->deps, bo);
+ if (!cmd_buffer->device->info.has_llc) {
+ for (uint32_t i = 0; i < bo->size; i += CACHELINE_SIZE)
+ __builtin_ia32_clflush(bo->map + i);
+ }
}
}
--
2.17.1
Jason Ekstrand
2018-12-10 19:52:15 UTC
Permalink
This seems very much over-the-top. It would be better to either find the
specific bug or else just allocate the BOs we use for states as snooped.
See also the anv_gem_set_caching call in genX_query.c.

On Fri, Dec 7, 2018 at 6:06 PM Rafael Antognolli <
Post by Rafael Antognolli
TODO: This is just flushing the entire dynamic states on every execbuf.
Maybe it's too much. However, in theory we should be already flushing
the states as needed, but I think we didn't hit any bug due to the
coherence implied by userptr.
---
src/intel/vulkan/anv_batch_chain.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/intel/vulkan/anv_batch_chain.c
b/src/intel/vulkan/anv_batch_chain.c
index 65df28ccb91..99009679435 100644
--- a/src/intel/vulkan/anv_batch_chain.c
+++ b/src/intel/vulkan/anv_batch_chain.c
@@ -1366,6 +1366,10 @@ anv_reloc_list_add_dep(struct anv_cmd_buffer *cmd_buffer,
anv_block_pool_foreach_bo(bo_list, iter, bo) {
_mesa_set_add(relocs->deps, bo);
+ if (!cmd_buffer->device->info.has_llc) {
+ for (uint32_t i = 0; i < bo->size; i += CACHELINE_SIZE)
+ __builtin_ia32_clflush(bo->map + i);
+ }
}
}
--
2.17.1
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Rafael Antognolli
2018-12-08 00:05:43 UTC
Permalink
---
src/intel/vulkan/anv_allocator.c | 32 ++++++++++++++++++--------------
src/intel/vulkan/anv_private.h | 2 +-
2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 5f0458afd77..2171a97970b 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -877,7 +877,7 @@ anv_state_pool_init(struct anv_state_pool *pool,

assert(util_is_power_of_two_or_zero(block_size));
pool->block_size = block_size;
- pool->back_alloc_free_list = ANV_FREE_LIST_EMPTY;
+ pool->back_alloc_free_list = ANV_FREE_LIST2_EMPTY;
for (unsigned i = 0; i < ANV_STATE_BUCKETS; i++) {
pool->buckets[i].free_list = ANV_FREE_LIST2_EMPTY;
pool->buckets[i].block.next = 0;
@@ -1077,22 +1077,27 @@ anv_state_pool_alloc(struct anv_state_pool *pool, uint32_t size, uint32_t align)
struct anv_state
anv_state_pool_alloc_back(struct anv_state_pool *pool)
{
- struct anv_state state;
- state.alloc_size = pool->block_size;
+ struct anv_state *state;
+ uint32_t alloc_size = pool->block_size;

- if (anv_free_list_pop(&pool->back_alloc_free_list,
- &pool->block_pool.map, &state.offset)) {
- assert(state.offset < 0);
+ state = anv_state_table_pop(&pool->back_alloc_free_list, &pool->table);
+ if (state) {
+ assert(state->offset < 0);
goto done;
}

- state.offset = anv_block_pool_alloc_back(&pool->block_pool,
- pool->block_size);
+ int32_t offset;
+ offset = anv_block_pool_alloc_back(&pool->block_pool,
+ pool->block_size);
+ uint32_t idx = anv_state_table_add(&pool->table, 1);
+ state = anv_state_table_get(&pool->table, idx);
+ state->offset = offset;
+ state->alloc_size = alloc_size;
+ state->map = pool->block_pool.map + state->offset;

done:
- state.map = pool->block_pool.map + state.offset;
- VG(VALGRIND_MEMPOOL_ALLOC(pool, state.map, state.alloc_size));
- return state;
+ VG(VALGRIND_MEMPOOL_ALLOC(pool, state->map, state->alloc_size));
+ return *state;
}

static void
@@ -1103,9 +1108,8 @@ anv_state_pool_free_no_vg(struct anv_state_pool *pool, struct anv_state state)

if (state.offset < 0) {
assert(state.alloc_size == pool->block_size);
- anv_free_list_push(&pool->back_alloc_free_list,
- pool->block_pool.map, state.offset,
- state.alloc_size, 1);
+ anv_state_table_push(&pool->back_alloc_free_list,
+ &pool->table, state.idx, 1);
} else {
anv_state_table_push(&pool->buckets[bucket].free_list,
&pool->table, state.idx, 1);
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index f7b3ec5f6a4..d068a4be5d8 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -737,7 +737,7 @@ struct anv_state_pool {
uint32_t block_size;

/** Free list for "back" allocations */
- union anv_free_list back_alloc_free_list;
+ union anv_free_list2 back_alloc_free_list;

struct anv_fixed_size_state_pool buckets[ANV_STATE_BUCKETS];
};
--
2.17.1
Rafael Antognolli
2018-12-08 00:05:47 UTC
Permalink
So far we use only one BO (the last one created) in the block pool. When
we switch to not use the userptr API, we will need multiple BOs. So add
code now to store multiple BOs in the block pool.

This has several implications, the main one being that we can't use
pool->map as before. For that reason we update the getter to find which
BO a given offset is part of, and return the respective map.
---
src/intel/vulkan/anv_allocator.c | 132 +++++++++++++++++++++++++------
src/intel/vulkan/anv_private.h | 17 ++++
2 files changed, 125 insertions(+), 24 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 2eb191e98dc..31258e38635 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -428,6 +428,34 @@ static VkResult
anv_block_pool_expand_range(struct anv_block_pool *pool,
uint32_t center_bo_offset, uint32_t size);

+static struct anv_bo *
+anv_block_pool_bo_append(struct anv_block_pool *pool, struct anv_bo_list *elem)
+{
+ /* struct anv_bo_list *elem = malloc(sizeof(*elem)); */
+ elem->next = NULL;
+
+ if (pool->last)
+ pool->last->next = elem;
+ pool->last = elem;
+
+ /* if it's the first BO added, set the pointer to BOs too */
+ if (pool->bos == NULL)
+ pool->bos = elem;
+
+ return &elem->bo;
+}
+
+static void
+anv_block_pool_bo_finish(struct anv_block_pool *pool)
+{
+ struct anv_bo_list *iter, *next;
+
+ for (iter = pool->bos; iter != NULL; iter = next) {
+ next = iter ? iter->next : NULL;
+ free(iter);
+ }
+}
+
VkResult
anv_block_pool_init(struct anv_block_pool *pool,
struct anv_device *device,
@@ -439,19 +467,15 @@ anv_block_pool_init(struct anv_block_pool *pool,

pool->device = device;
pool->bo_flags = bo_flags;
+ pool->bo = NULL;
+ pool->bos = NULL;
+ pool->last = NULL;
+ pool->size = 0;
pool->start_address = gen_canonical_address(start_address);

- pool->bo = malloc(sizeof(*pool->bo));
- if (!pool->bo)
- return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
-
- anv_bo_init(pool->bo, 0, 0);
-
pool->fd = memfd_create("block pool", MFD_CLOEXEC);
- if (pool->fd == -1) {
- result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
- goto fail_bo;
- }
+ if (pool->fd == -1)
+ return vk_error(VK_ERROR_INITIALIZATION_FAILED);

/* Just make it 2GB up-front. The Linux kernel won't actually back it
* with pages until we either map and fault on one of them or we use
@@ -484,8 +508,6 @@ anv_block_pool_init(struct anv_block_pool *pool,
u_vector_finish(&pool->mmap_cleanups);
fail_fd:
close(pool->fd);
- fail_bo:
- free(pool->bo);

return result;
}
@@ -495,7 +517,6 @@ anv_block_pool_finish(struct anv_block_pool *pool)
{
struct anv_mmap_cleanup *cleanup;

- free(pool->bo);
u_vector_foreach(cleanup, &pool->mmap_cleanups) {
if (cleanup->map)
munmap(cleanup->map, cleanup->size);
@@ -506,6 +527,7 @@ anv_block_pool_finish(struct anv_block_pool *pool)
u_vector_finish(&pool->mmap_cleanups);

close(pool->fd);
+ anv_block_pool_bo_finish(pool);
}

static VkResult
@@ -599,24 +621,86 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
* the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag and the kernel does all of the
* hard work for us.
*/
- anv_bo_init(pool->bo, gem_handle, size);
+ struct anv_bo *bo;
+ struct anv_bo_list *bo_elem = NULL;
+
+ /* If using softpin, we will keep adding new BOs every time we expand the
+ * range. On the other hand, if not using softpin, we need to add a BO if we
+ * don't have one yet.
+ */
+ if (!pool->bo) {
+ bo_elem = malloc(sizeof(*bo_elem));
+ bo = &bo_elem->bo;
+ } else {
+ bo = pool->bo;
+ }
+
+ /* pool->bo will always point to the first BO added on this block pool. */
+ if (!pool->bo)
+ pool->bo = bo;
+
+ anv_bo_init(bo, gem_handle, size);
if (pool->bo_flags & EXEC_OBJECT_PINNED) {
- pool->bo->offset = pool->start_address + BLOCK_POOL_MEMFD_CENTER -
+ bo->offset = pool->start_address + BLOCK_POOL_MEMFD_CENTER -
center_bo_offset;
}
- pool->bo->flags = pool->bo_flags;
- pool->bo->map = map;
+ bo->flags = pool->bo_flags;
+ bo->map = map;
+
+ if (bo_elem)
+ anv_block_pool_bo_append(pool, bo_elem);
+ pool->size = size;

return VK_SUCCESS;
}

+static struct anv_bo *
+anv_block_pool_get_bo(struct anv_block_pool *pool, int32_t *offset)
+{
+ struct anv_bo *bo, *bo_found = NULL;
+ int32_t cur_offset = 0;
+
+ assert(offset);
+
+ if (!(pool->bo_flags & EXEC_OBJECT_PINNED))
+ return pool->bo;
+
+ struct anv_bo_list *iter;
+ anv_block_pool_foreach_bo(pool->bos, iter, bo) {
+ if (*offset < cur_offset + bo->size) {
+ bo_found = bo;
+ break;
+ }
+ cur_offset += bo->size;
+ }
+
+ assert(bo_found != NULL);
+ *offset -= cur_offset;
+
+ return bo_found;
+}
+
struct anv_pool_map
anv_block_pool_map(struct anv_block_pool *pool, int32_t offset)
{
- return (struct anv_pool_map) {
- .map = pool->map,
- .offset = offset,
- };
+ if (pool->bo_flags & EXEC_OBJECT_PINNED) {
+ /* If softpin is used, we have multiple BOs, and we need the map from the
+ * BO that contains this offset.
+ */
+ struct anv_bo *bo = anv_block_pool_get_bo(pool, &offset);
+ return (struct anv_pool_map) {
+ .map = bo->map,
+ .offset = offset,
+ };
+ } else {
+ /* Otherwise we want the pool map, which takes into account the center
+ * offset too.
+ */
+ return (struct anv_pool_map) {
+ .map = pool->map,
+ .offset = offset,
+ };
+ }
}

/** Grows and re-centers the block pool.
@@ -668,7 +752,7 @@ anv_block_pool_grow(struct anv_block_pool *pool, struct anv_block_state *state)

assert(state == &pool->state || back_used > 0);

- uint32_t old_size = pool->bo->size;
+ uint32_t old_size = pool->size;

/* The block pool is always initialized to a nonzero size and this function
* is always called after initialization.
@@ -694,7 +778,7 @@ anv_block_pool_grow(struct anv_block_pool *pool, struct anv_block_state *state)
while (size < back_required + front_required)
size *= 2;

- assert(size > pool->bo->size);
+ assert(size > pool->size);

/* We compute a new center_bo_offset such that, when we double the size
* of the pool, we maintain the ratio of how much is used by each side.
@@ -742,7 +826,7 @@ done:
* needs to do so in order to maintain its concurrency model.
*/
if (state == &pool->state) {
- return pool->bo->size - pool->center_bo_offset;
+ return pool->size - pool->center_bo_offset;
} else {
assert(pool->center_bo_offset > 0);
return pool->center_bo_offset;
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index bf98c700873..b01b392daee 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -627,13 +627,30 @@ struct anv_block_state {
};
};

+struct anv_bo_list;
+
+struct anv_bo_list {
+ struct anv_bo bo;
+ struct anv_bo_list *next;
+};
+
+#define anv_block_pool_foreach_bo(list, iter, bo) \
+ for (iter = list, bo = &list->bo; \
+ iter != NULL; \
+ iter = iter->next, bo = &iter->bo)
+
+
struct anv_block_pool {
struct anv_device *device;

uint64_t bo_flags;

+ struct anv_bo_list *bos;
+ struct anv_bo_list *last;
struct anv_bo *bo;

+ uint64_t size;
+
/* The address where the start of the pool is pinned. The various bos that
* are created as the pool grows will have addresses in the range
* [start_address, start_address + BLOCK_POOL_MEMFD_SIZE).
--
2.17.1
Rafael Antognolli
2018-12-08 00:05:46 UTC
Permalink
Change block_pool->bo to be a pointer, and update its usage everywhere.
This makes it simpler to switch it later to a list of BOs.
---
src/intel/vulkan/anv_allocator.c | 31 +++++++++++++++++++-----------
src/intel/vulkan/anv_batch_chain.c | 8 ++++----
src/intel/vulkan/anv_blorp.c | 2 +-
src/intel/vulkan/anv_private.h | 2 +-
src/intel/vulkan/gen8_cmd_buffer.c | 6 +++---
src/intel/vulkan/genX_blorp_exec.c | 4 ++--
src/intel/vulkan/genX_cmd_buffer.c | 20 +++++++++----------
7 files changed, 41 insertions(+), 32 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index acf3c80fbac..2eb191e98dc 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -441,11 +441,17 @@ anv_block_pool_init(struct anv_block_pool *pool,
pool->bo_flags = bo_flags;
pool->start_address = gen_canonical_address(start_address);

- anv_bo_init(&pool->bo, 0, 0);
+ pool->bo = malloc(sizeof(*pool->bo));
+ if (!pool->bo)
+ return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
+
+ anv_bo_init(pool->bo, 0, 0);

pool->fd = memfd_create("block pool", MFD_CLOEXEC);
- if (pool->fd == -1)
- return vk_error(VK_ERROR_INITIALIZATION_FAILED);
+ if (pool->fd == -1) {
+ result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
+ goto fail_bo;
+ }

/* Just make it 2GB up-front. The Linux kernel won't actually back it
* with pages until we either map and fault on one of them or we use
@@ -478,6 +484,8 @@ anv_block_pool_init(struct anv_block_pool *pool,
u_vector_finish(&pool->mmap_cleanups);
fail_fd:
close(pool->fd);
+ fail_bo:
+ free(pool->bo);

return result;
}
@@ -487,6 +495,7 @@ anv_block_pool_finish(struct anv_block_pool *pool)
{
struct anv_mmap_cleanup *cleanup;

+ free(pool->bo);
u_vector_foreach(cleanup, &pool->mmap_cleanups) {
if (cleanup->map)
munmap(cleanup->map, cleanup->size);
@@ -590,13 +599,13 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
* the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag and the kernel does all of the
* hard work for us.
*/
- anv_bo_init(&pool->bo, gem_handle, size);
+ anv_bo_init(pool->bo, gem_handle, size);
if (pool->bo_flags & EXEC_OBJECT_PINNED) {
- pool->bo.offset = pool->start_address + BLOCK_POOL_MEMFD_CENTER -
+ pool->bo->offset = pool->start_address + BLOCK_POOL_MEMFD_CENTER -
center_bo_offset;
}
- pool->bo.flags = pool->bo_flags;
- pool->bo.map = map;
+ pool->bo->flags = pool->bo_flags;
+ pool->bo->map = map;

return VK_SUCCESS;
}
@@ -659,7 +668,7 @@ anv_block_pool_grow(struct anv_block_pool *pool, struct anv_block_state *state)

assert(state == &pool->state || back_used > 0);

- uint32_t old_size = pool->bo.size;
+ uint32_t old_size = pool->bo->size;

/* The block pool is always initialized to a nonzero size and this function
* is always called after initialization.
@@ -685,7 +694,7 @@ anv_block_pool_grow(struct anv_block_pool *pool, struct anv_block_state *state)
while (size < back_required + front_required)
size *= 2;

- assert(size > pool->bo.size);
+ assert(size > pool->bo->size);

/* We compute a new center_bo_offset such that, when we double the size
* of the pool, we maintain the ratio of how much is used by each side.
@@ -722,7 +731,7 @@ anv_block_pool_grow(struct anv_block_pool *pool, struct anv_block_state *state)

result = anv_block_pool_expand_range(pool, center_bo_offset, size);

- pool->bo.flags = pool->bo_flags;
+ pool->bo->flags = pool->bo_flags;

done:
pthread_mutex_unlock(&pool->device->mutex);
@@ -733,7 +742,7 @@ done:
* needs to do so in order to maintain its concurrency model.
*/
if (state == &pool->state) {
- return pool->bo.size - pool->center_bo_offset;
+ return pool->bo->size - pool->center_bo_offset;
} else {
assert(pool->center_bo_offset > 0);
return pool->center_bo_offset;
diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/anv_batch_chain.c
index 6c06858efe1..bec4d647b7e 100644
--- a/src/intel/vulkan/anv_batch_chain.c
+++ b/src/intel/vulkan/anv_batch_chain.c
@@ -501,7 +501,7 @@ anv_cmd_buffer_surface_base_address(struct anv_cmd_buffer *cmd_buffer)
{
struct anv_state *bt_block = u_vector_head(&cmd_buffer->bt_block_states);
return (struct anv_address) {
- .bo = &anv_binding_table_pool(cmd_buffer->device)->block_pool.bo,
+ .bo = anv_binding_table_pool(cmd_buffer->device)->block_pool.bo,
.offset = bt_block->offset,
};
}
@@ -1229,7 +1229,7 @@ adjust_relocations_to_state_pool(struct anv_state_pool *pool,
* relocations that point to the pool bo with the correct offset.
*/
for (size_t i = 0; i < relocs->num_relocs; i++) {
- if (relocs->reloc_bos[i] == &pool->block_pool.bo) {
+ if (relocs->reloc_bos[i] == pool->block_pool.bo) {
/* Adjust the delta value in the relocation to correctly
* correspond to the new delta. Initially, this value may have
* been negative (if treated as unsigned), but we trust in
@@ -1337,7 +1337,7 @@ relocate_cmd_buffer(struct anv_cmd_buffer *cmd_buffer,
* given time. The only option is to always relocate them.
*/
anv_reloc_list_apply(cmd_buffer->device, &cmd_buffer->surface_relocs,
- &cmd_buffer->device->surface_state_pool.block_pool.bo,
+ cmd_buffer->device->surface_state_pool.block_pool.bo,
true /* always relocate surface states */);

/* Since we own all of the batch buffers, we know what values are stored
@@ -1366,7 +1366,7 @@ setup_execbuf_for_cmd_buffer(struct anv_execbuf *execbuf,

adjust_relocations_from_state_pool(ss_pool, &cmd_buffer->surface_relocs,
cmd_buffer->last_ss_pool_center);
- VkResult result = anv_execbuf_add_bo(execbuf, &ss_pool->block_pool.bo,
+ VkResult result = anv_execbuf_add_bo(execbuf, ss_pool->block_pool.bo,
&cmd_buffer->surface_relocs, 0,
&cmd_buffer->device->alloc);
if (result != VK_SUCCESS)
diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
index 478b8e7a3db..1de8977dd0b 100644
--- a/src/intel/vulkan/anv_blorp.c
+++ b/src/intel/vulkan/anv_blorp.c
@@ -719,7 +719,7 @@ void anv_CmdUpdateBuffer(
anv_state_flush(cmd_buffer->device, tmp_data);

struct blorp_address src = {
- .buffer = &cmd_buffer->device->dynamic_state_pool.block_pool.bo,
+ .buffer = cmd_buffer->device->dynamic_state_pool.block_pool.bo,
.offset = tmp_data.offset,
.mocs = cmd_buffer->device->default_mocs,
};
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index a364be8dad5..bf98c700873 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -632,7 +632,7 @@ struct anv_block_pool {

uint64_t bo_flags;

- struct anv_bo bo;
+ struct anv_bo *bo;

/* The address where the start of the pool is pinned. The various bos that
* are created as the pool grows will have addresses in the range
diff --git a/src/intel/vulkan/gen8_cmd_buffer.c b/src/intel/vulkan/gen8_cmd_buffer.c
index 752d04f3013..5d368ee6146 100644
--- a/src/intel/vulkan/gen8_cmd_buffer.c
+++ b/src/intel/vulkan/gen8_cmd_buffer.c
@@ -610,7 +610,7 @@ void genX(CmdSetEvent)(
pc.DestinationAddressType = DAT_PPGTT,
pc.PostSyncOperation = WriteImmediateData,
pc.Address = (struct anv_address) {
- &cmd_buffer->device->dynamic_state_pool.block_pool.bo,
+ cmd_buffer->device->dynamic_state_pool.block_pool.bo,
event->state.offset
};
pc.ImmediateData = VK_EVENT_SET;
@@ -634,7 +634,7 @@ void genX(CmdResetEvent)(
pc.DestinationAddressType = DAT_PPGTT;
pc.PostSyncOperation = WriteImmediateData;
pc.Address = (struct anv_address) {
- &cmd_buffer->device->dynamic_state_pool.block_pool.bo,
+ cmd_buffer->device->dynamic_state_pool.block_pool.bo,
event->state.offset
};
pc.ImmediateData = VK_EVENT_RESET;
@@ -663,7 +663,7 @@ void genX(CmdWaitEvents)(
sem.CompareOperation = COMPARE_SAD_EQUAL_SDD,
sem.SemaphoreDataDword = VK_EVENT_SET,
sem.SemaphoreAddress = (struct anv_address) {
- &cmd_buffer->device->dynamic_state_pool.block_pool.bo,
+ cmd_buffer->device->dynamic_state_pool.block_pool.bo,
event->state.offset
};
}
diff --git a/src/intel/vulkan/genX_blorp_exec.c b/src/intel/vulkan/genX_blorp_exec.c
index 5af6abb0894..9d878cb867f 100644
--- a/src/intel/vulkan/genX_blorp_exec.c
+++ b/src/intel/vulkan/genX_blorp_exec.c
@@ -77,7 +77,7 @@ blorp_get_surface_base_address(struct blorp_batch *batch)
{
struct anv_cmd_buffer *cmd_buffer = batch->driver_batch;
return (struct blorp_address) {
- .buffer = &cmd_buffer->device->surface_state_pool.block_pool.bo,
+ .buffer = cmd_buffer->device->surface_state_pool.block_pool.bo,
.offset = 0,
};
}
@@ -151,7 +151,7 @@ blorp_alloc_vertex_buffer(struct blorp_batch *batch, uint32_t size,
anv_cmd_buffer_alloc_dynamic_state(cmd_buffer, size, 64);

*addr = (struct blorp_address) {
- .buffer = &cmd_buffer->device->dynamic_state_pool.block_pool.bo,
+ .buffer = cmd_buffer->device->dynamic_state_pool.block_pool.bo,
.offset = vb_state.offset,
.mocs = cmd_buffer->device->default_mocs,
};
diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c
index fb70cd2e386..75846bdef6e 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -95,7 +95,7 @@ genX(cmd_buffer_emit_state_base_address)(struct anv_cmd_buffer *cmd_buffer)
sba.SurfaceStateBaseAddressModifyEnable = true;

sba.DynamicStateBaseAddress =
- (struct anv_address) { &device->dynamic_state_pool.block_pool.bo, 0 };
+ (struct anv_address) { device->dynamic_state_pool.block_pool.bo, 0 };
sba.DynamicStateMemoryObjectControlState = GENX(MOCS);
sba.DynamicStateBaseAddressModifyEnable = true;

@@ -104,7 +104,7 @@ genX(cmd_buffer_emit_state_base_address)(struct anv_cmd_buffer *cmd_buffer)
sba.IndirectObjectBaseAddressModifyEnable = true;

sba.InstructionBaseAddress =
- (struct anv_address) { &device->instruction_state_pool.block_pool.bo, 0 };
+ (struct anv_address) { device->instruction_state_pool.block_pool.bo, 0 };
sba.InstructionMemoryObjectControlState = GENX(MOCS);
sba.InstructionBaseAddressModifyEnable = true;

@@ -884,7 +884,7 @@ genX(copy_fast_clear_dwords)(struct anv_cmd_buffer *cmd_buffer,
assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV);

struct anv_address ss_clear_addr = {
- .bo = &cmd_buffer->device->surface_state_pool.block_pool.bo,
+ .bo = cmd_buffer->device->surface_state_pool.block_pool.bo,
.offset = surface_state.offset +
cmd_buffer->device->isl_dev.ss.clear_value_offset,
};
@@ -1508,7 +1508,7 @@ genX(CmdExecuteCommands)(
* we allocated for them in BeginCommandBuffer.
*/
struct anv_bo *ss_bo =
- &primary->device->surface_state_pool.block_pool.bo;
+ primary->device->surface_state_pool.block_pool.bo;
struct anv_state src_state = primary->state.render_pass_states;
struct anv_state dst_state = secondary->state.render_pass_states;
assert(src_state.alloc_size == dst_state.alloc_size);
@@ -2094,7 +2094,7 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,
anv_cmd_buffer_alloc_surface_state(cmd_buffer);

struct anv_address constant_data = {
- .bo = &pipeline->device->dynamic_state_pool.block_pool.bo,
+ .bo = pipeline->device->dynamic_state_pool.block_pool.bo,
.offset = pipeline->shaders[stage]->constant_data.offset,
};
unsigned constant_data_size =
@@ -2464,7 +2464,7 @@ cmd_buffer_flush_push_constants(struct anv_cmd_buffer *cmd_buffer,
uint32_t read_len;
if (binding->set == ANV_DESCRIPTOR_SET_SHADER_CONSTANTS) {
struct anv_address constant_data = {
- .bo = &pipeline->device->dynamic_state_pool.block_pool.bo,
+ .bo = pipeline->device->dynamic_state_pool.block_pool.bo,
.offset = pipeline->shaders[stage]->constant_data.offset,
};
unsigned constant_data_size =
@@ -2512,7 +2512,7 @@ cmd_buffer_flush_push_constants(struct anv_cmd_buffer *cmd_buffer,

if (state.alloc_size > 0) {
c.ConstantBody.Buffer[n] = (struct anv_address) {
- .bo = &cmd_buffer->device->dynamic_state_pool.block_pool.bo,
+ .bo = cmd_buffer->device->dynamic_state_pool.block_pool.bo,
.offset = state.offset,
};
c.ConstantBody.ReadLength[n] =
@@ -2722,7 +2722,7 @@ emit_base_vertex_instance(struct anv_cmd_buffer *cmd_buffer,
anv_state_flush(cmd_buffer->device, id_state);

struct anv_address addr = {
- .bo = &cmd_buffer->device->dynamic_state_pool.block_pool.bo,
+ .bo = cmd_buffer->device->dynamic_state_pool.block_pool.bo,
.offset = id_state.offset,
};

@@ -2740,7 +2740,7 @@ emit_draw_index(struct anv_cmd_buffer *cmd_buffer, uint32_t draw_index)
anv_state_flush(cmd_buffer->device, state);

struct anv_address addr = {
- .bo = &cmd_buffer->device->dynamic_state_pool.block_pool.bo,
+ .bo = cmd_buffer->device->dynamic_state_pool.block_pool.bo,
.offset = state.offset,
};

@@ -3202,7 +3202,7 @@ void genX(CmdDispatchBase)(
sizes[2] = groupCountZ;
anv_state_flush(cmd_buffer->device, state);
cmd_buffer->state.compute.num_workgroups = (struct anv_address) {
- .bo = &cmd_buffer->device->dynamic_state_pool.block_pool.bo,
+ .bo = cmd_buffer->device->dynamic_state_pool.block_pool.bo,
.offset = state.offset,
};
}
--
2.17.1
Rafael Antognolli
2018-12-08 00:05:51 UTC
Permalink
This commit tries to rework the code that split and returns chunks back
to the state pool, while still keeping the same logic.

The original code would get a chunk larger than we need and split it
into pool->block_size. Then it would return all but the first one, and
would split that first one into alloc_size chunks. Then it would keep
the first one (for the allocation), and return the others back to the
pool.

The new anv_state_pool_return_chunk() function will take a chunk (with
the alloc_size part removed), and a small_size hint. It then splits that
chunk into pool->block_size'd chunks, and if there's some space still
left, split that into small_size chunks. small_size in this case is the
same size as alloc_size.

The idea is to keep the same logic, but make it in a way we can reuse it
to return other chunks to the pool when we are growing the buffer.
---
src/intel/vulkan/anv_allocator.c | 147 +++++++++++++++++++++----------
1 file changed, 102 insertions(+), 45 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 31258e38635..bddeb4a0fbd 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -994,6 +994,97 @@ anv_state_pool_get_bucket_size(uint32_t bucket)
return 1 << size_log2;
}

+/** Helper to create a chunk into the state table.
+ *
+ * It just creates 'count' entries into the state table and update their sizes,
+ * offsets and maps, also pushing them as "free" states.
+ */
+static void
+anv_state_pool_return_blocks(struct anv_state_pool *pool,
+ uint32_t chunk_offset, uint32_t count,
+ uint32_t block_size)
+{
+ if (count == 0)
+ return;
+
+ uint32_t st_idx = anv_state_table_add(&pool->table, count);
+ for (int i = 0; i < count; i++) {
+ /* update states that were added back to the state table */
+ struct anv_state *state_i = anv_state_table_get(&pool->table,
+ st_idx + i);
+ state_i->alloc_size = block_size;
+ state_i->offset = chunk_offset + block_size * i;
+ struct anv_pool_map pool_map = anv_block_pool_map(&pool->block_pool,
+ state_i->offset);
+ state_i->map = pool_map.map + pool_map.offset;
+ }
+
+ uint32_t block_bucket = anv_state_pool_get_bucket(block_size);
+ anv_state_table_push(&pool->buckets[block_bucket].free_list,
+ &pool->table, st_idx, count);
+}
+
+static uint32_t
+calculate_divisor(uint32_t size)
+{
+ uint32_t bucket = anv_state_pool_get_bucket(size);
+
+ while (bucket >= 0) {
+ uint32_t bucket_size = anv_state_pool_get_bucket_size(bucket);
+ if (size % bucket_size == 0)
+ return bucket_size;
+ }
+
+ return 0;
+}
+
+/** Returns a chunk of memory back to the state pool.
+ *
+ * If small_size is zero, we split chunk_size into pool->block_size'd pieces,
+ * and return those. If there's some remaining 'rest' space (chunk_size is not
+ * divisble by pool->block_size), then we find a bucket size that is a divisor
+ * of that rest, and split the 'rest' into that size, returning it to the pool.
+ *
+ * If small_size is non-zero, we use it in two different ways:
+ * * if it is larger than pool->block_size, we split the chunk into
+ * small_size'd pieces, instead of pool->block_size'd ones.
+ * * we also use it as the desired size to split the 'rest' after we split
+ * the bigger size of the chunk into pool->block_size;
+ */
+static void
+anv_state_pool_return_chunk(struct anv_state_pool *pool,
+ uint32_t chunk_offset, uint32_t chunk_size,
+ uint32_t small_size)
+{
+ uint32_t divisor = MAX2(pool->block_size, small_size);
+ uint32_t nblocks = chunk_size / divisor;
+ uint32_t rest = chunk_size % pool->block_size;
+
+ /* First return pool->block_size'd chunks.*/
+ uint32_t offset = chunk_offset + rest;
+ anv_state_pool_return_blocks(pool, offset, nblocks, pool->block_size);
+
+ if (rest == 0)
+ return;
+
+ chunk_size = rest;
+
+ if (small_size > 0) {
+ divisor = small_size;
+ } else {
+ /* Find the maximum divisor of the remaining chunk, and return smaller
+ * chunks of that size to the list.
+ */
+ divisor = calculate_divisor(chunk_size);
+ assert(divisor > 0);
+ }
+
+ /* Now return the smaller chunks of 'divisor' size */
+ assert(chunk_size % divisor == 0);
+ nblocks = (chunk_size / divisor);
+ anv_state_pool_return_blocks(pool, chunk_offset, nblocks, divisor);
+}
+
static struct anv_state
anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
uint32_t size, uint32_t align)
@@ -1025,6 +1116,10 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
*/
state->alloc_size = alloc_size;

+ /* Now return the unused part of the chunk back to the pool as free
+ * blocks
+ */
+
/* We've found a chunk that's larger than the requested state size.
* There are a couple of options as to what we do with it:
*
@@ -1049,52 +1144,14 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
* one of them. Then we split what remains into
* state.alloc_size sized chunks and return all but one.
*
- * We choose option (3).
+ * We choose option (3). That is done by returning the remaining of
+ * the chunk with anv_state_pool_return_chunk(), with alloc_size as a
+ * hint of the size that we want the smaller chunk split into.
*/
- if (chunk_size > pool->block_size &&
- alloc_size < pool->block_size) {
- assert(chunk_size % pool->block_size == 0);
- /* We don't want to split giant chunks into tiny chunks. Instead,
- * break anything bigger than a block into block-sized chunks and
- * then break it down into bucket-sized chunks from there. Return
- * all but the first block of the chunk to the block bucket.
- */
- uint32_t push_back = (chunk_size / pool->block_size) - 1;
- const uint32_t block_bucket =
- anv_state_pool_get_bucket(pool->block_size);
- uint32_t st_idx = anv_state_table_add(&pool->table, push_back);
- for (int i = 0; i < push_back; i++) {
- /* update states that were added back to the state table */
- struct anv_state *state_i = anv_state_table_get(&pool->table,
- st_idx + i);
- state_i->alloc_size = pool->block_size;
- state_i->offset = chunk_offset + pool->block_size * (i + 1);
- struct anv_pool_map pool_map = anv_block_pool_map(&pool->block_pool,
- state_i->offset);
- state_i->map = pool_map.map + pool_map.offset;
- }
- anv_state_table_push(&pool->buckets[block_bucket].free_list,
- &pool->table, st_idx, push_back);
- chunk_size = pool->block_size;
- }
-
- assert(chunk_size % alloc_size == 0);
- uint32_t push_back = (chunk_size / alloc_size) - 1;
- uint32_t st_idx = anv_state_table_add(&pool->table, push_back);
- for (int i = 0; i < push_back; i++) {
- /* update states that were added back to the state table */
- struct anv_state *state_i = anv_state_table_get(&pool->table,
- st_idx + i);
- state_i->alloc_size = alloc_size;
- state_i->offset = chunk_offset + alloc_size * (i + 1);
- struct anv_pool_map pool_map = anv_block_pool_map(&pool->block_pool,
- state_i->offset);
- state_i->map = pool_map.map + pool_map.offset;
- }
- anv_state_table_push(&pool->buckets[bucket].free_list,
- &pool->table, st_idx, push_back);
-
- offset = chunk_offset;
+ uint32_t return_offset = chunk_offset + alloc_size;
+ uint32_t return_size = chunk_size - alloc_size;
+ anv_state_pool_return_chunk(pool, return_offset,
+ return_size, alloc_size);
goto done;
}
}
--
2.17.1
Jason Ekstrand
2018-12-10 22:56:40 UTC
Permalink
On Fri, Dec 7, 2018 at 6:06 PM Rafael Antognolli <
Post by Rafael Antognolli
This commit tries to rework the code that split and returns chunks back
to the state pool, while still keeping the same logic.
The original code would get a chunk larger than we need and split it
into pool->block_size. Then it would return all but the first one, and
would split that first one into alloc_size chunks. Then it would keep
the first one (for the allocation), and return the others back to the
pool.
The new anv_state_pool_return_chunk() function will take a chunk (with
the alloc_size part removed), and a small_size hint. It then splits that
chunk into pool->block_size'd chunks, and if there's some space still
left, split that into small_size chunks. small_size in this case is the
same size as alloc_size.
The idea is to keep the same logic, but make it in a way we can reuse it
to return other chunks to the pool when we are growing the buffer.
---
src/intel/vulkan/anv_allocator.c | 147 +++++++++++++++++++++----------
1 file changed, 102 insertions(+), 45 deletions(-)
diff --git a/src/intel/vulkan/anv_allocator.c
b/src/intel/vulkan/anv_allocator.c
index 31258e38635..bddeb4a0fbd 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -994,6 +994,97 @@ anv_state_pool_get_bucket_size(uint32_t bucket)
return 1 << size_log2;
}
+/** Helper to create a chunk into the state table.
+ *
+ * It just creates 'count' entries into the state table and update their sizes,
+ * offsets and maps, also pushing them as "free" states.
+ */
+static void
+anv_state_pool_return_blocks(struct anv_state_pool *pool,
+ uint32_t chunk_offset, uint32_t count,
+ uint32_t block_size)
+{
+ if (count == 0)
+ return;
+
+ uint32_t st_idx = anv_state_table_add(&pool->table, count);
+ for (int i = 0; i < count; i++) {
+ /* update states that were added back to the state table */
+ struct anv_state *state_i = anv_state_table_get(&pool->table,
+ st_idx + i);
+ state_i->alloc_size = block_size;
+ state_i->offset = chunk_offset + block_size * i;
+ struct anv_pool_map pool_map = anv_block_pool_map(&pool->block_pool,
+ state_i->offset);
+ state_i->map = pool_map.map + pool_map.offset;
+ }
+
+ uint32_t block_bucket = anv_state_pool_get_bucket(block_size);
+ anv_state_table_push(&pool->buckets[block_bucket].free_list,
+ &pool->table, st_idx, count);
+}
+
+static uint32_t
+calculate_divisor(uint32_t size)
+{
+ uint32_t bucket = anv_state_pool_get_bucket(size);
+
+ while (bucket >= 0) {
+ uint32_t bucket_size = anv_state_pool_get_bucket_size(bucket);
+ if (size % bucket_size == 0)
+ return bucket_size;
+ }
+
+ return 0;
+}
+
+/** Returns a chunk of memory back to the state pool.
+ *
+ * If small_size is zero, we split chunk_size into pool->block_size'd pieces,
+ * and return those. If there's some remaining 'rest' space (chunk_size is not
+ * divisble by pool->block_size), then we find a bucket size that is a divisor
+ * of that rest, and split the 'rest' into that size, returning it to the pool.
+ *
+ * * if it is larger than pool->block_size, we split the chunk into
+ * small_size'd pieces, instead of pool->block_size'd ones.
+ * * we also use it as the desired size to split the 'rest' after we split
+ * the bigger size of the chunk into pool->block_size;
This seems both overly complicated and not really what we want. If I have
a block size of 8k and allocate a single 64-byte state and then a 8k state,
it will break my almost 8k of padding into 511 64-byte states and return
those which may be very wasteful if the next thing I do is allocate a 1K
state.

It also doesn't provide the current alignment guarantees that all states
are aligned to their size. While the alignment guarantee doesn't matter
for most large states, it does matter for some of the smaller sizes. Now
that I look at it in detail, it appears that the highest alignment we ever
require is 64B and the smallest size we allow is 64B so maybe it just
doesn't matter?

Assuming we don't care about alignments, we could do something like this?

if (small_size > 0) {
assert(chunk_size % small_size == 0);
anv_state_pool_return_blocks(pool, chunk_offset, chunk_size /
small_size, small_size);
} else {
uint32_t divisor = MAX_STATE_SIZE;
while (divisor >= MIN_STATE_SIZE) {
uint32_t nblocks = chunk_size / divisor;
if (nblocks > 0) {
anv_state_pool_return_blocs(pool, chunk_offset, nblocks, divisor);
chunk_offset += nblocks * divisor;
chunk_size -= nblocks * divisor;
}
}
}

If we do care about alignments, the above clearly isn't sufficient. We'd
have to do something where we chunk it separately from both ends or
something. It'd get kind-of gross.
Post by Rafael Antognolli
+ */
+static void
+anv_state_pool_return_chunk(struct anv_state_pool *pool,
+ uint32_t chunk_offset, uint32_t chunk_size,
+ uint32_t small_size)
+{
+ uint32_t divisor = MAX2(pool->block_size, small_size);
+ uint32_t nblocks = chunk_size / divisor;
+ uint32_t rest = chunk_size % pool->block_size;
+
+ /* First return pool->block_size'd chunks.*/
+ uint32_t offset = chunk_offset + rest;
+ anv_state_pool_return_blocks(pool, offset, nblocks, pool->block_size);
+
+ if (rest == 0)
+ return;
+
+ chunk_size = rest;
+
+ if (small_size > 0) {
+ divisor = small_size;
+ } else {
+ /* Find the maximum divisor of the remaining chunk, and return smaller
+ * chunks of that size to the list.
+ */
+ divisor = calculate_divisor(chunk_size);
+ assert(divisor > 0);
+ }
+
+ /* Now return the smaller chunks of 'divisor' size */
+ assert(chunk_size % divisor == 0);
+ nblocks = (chunk_size / divisor);
+ anv_state_pool_return_blocks(pool, chunk_offset, nblocks, divisor);
+}
+
static struct anv_state
anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
uint32_t size, uint32_t align)
@@ -1025,6 +1116,10 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
*/
state->alloc_size = alloc_size;
+ /* Now return the unused part of the chunk back to the pool as free
+ * blocks
+ */
+
/* We've found a chunk that's larger than the requested state size.
*
@@ -1049,52 +1144,14 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
* one of them. Then we split what remains into
* state.alloc_size sized chunks and return all but one.
*
- * We choose option (3).
+ * We choose option (3). That is done by returning the remaining of
+ * the chunk with anv_state_pool_return_chunk(), with alloc_size as a
+ * hint of the size that we want the smaller chunk split into.
*/
- if (chunk_size > pool->block_size &&
- alloc_size < pool->block_size) {
- assert(chunk_size % pool->block_size == 0);
- /* We don't want to split giant chunks into tiny chunks.
Instead,
- * break anything bigger than a block into block-sized chunks and
- * then break it down into bucket-sized chunks from there.
Return
- * all but the first block of the chunk to the block bucket.
- */
- uint32_t push_back = (chunk_size / pool->block_size) - 1;
- const uint32_t block_bucket =
- anv_state_pool_get_bucket(pool->block_size);
- uint32_t st_idx = anv_state_table_add(&pool->table, push_back);
- for (int i = 0; i < push_back; i++) {
- /* update states that were added back to the state table */
- struct anv_state *state_i =
anv_state_table_get(&pool->table,
- st_idx + i);
- state_i->alloc_size = pool->block_size;
- state_i->offset = chunk_offset + pool->block_size * (i + 1);
- struct anv_pool_map pool_map =
anv_block_pool_map(&pool->block_pool,
-
state_i->offset);
- state_i->map = pool_map.map + pool_map.offset;
- }
- anv_state_table_push(&pool->buckets[block_bucket].free_list,
- &pool->table, st_idx, push_back);
In your mind, does adding this helper remove at least some of the API
ugliness you were complaining about in patch 2? I think it does.
Post by Rafael Antognolli
- chunk_size = pool->block_size;
- }
-
- assert(chunk_size % alloc_size == 0);
- uint32_t push_back = (chunk_size / alloc_size) - 1;
- uint32_t st_idx = anv_state_table_add(&pool->table, push_back);
- for (int i = 0; i < push_back; i++) {
- /* update states that were added back to the state table */
- struct anv_state *state_i = anv_state_table_get(&pool->table,
- st_idx + i);
- state_i->alloc_size = alloc_size;
- state_i->offset = chunk_offset + alloc_size * (i + 1);
- struct anv_pool_map pool_map =
anv_block_pool_map(&pool->block_pool,
-
state_i->offset);
- state_i->map = pool_map.map + pool_map.offset;
- }
- anv_state_table_push(&pool->buckets[bucket].free_list,
- &pool->table, st_idx, push_back);
-
- offset = chunk_offset;
+ uint32_t return_offset = chunk_offset + alloc_size;
+ uint32_t return_size = chunk_size - alloc_size;
+ anv_state_pool_return_chunk(pool, return_offset,
+ return_size, alloc_size);
goto done;
}
}
--
2.17.1
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Rafael Antognolli
2018-12-10 23:48:05 UTC
Permalink
Post by Rafael Antognolli
This commit tries to rework the code that split and returns chunks back
to the state pool, while still keeping the same logic.
The original code would get a chunk larger than we need and split it
into pool->block_size. Then it would return all but the first one, and
would split that first one into alloc_size chunks. Then it would keep
the first one (for the allocation), and return the others back to the
pool.
The new anv_state_pool_return_chunk() function will take a chunk (with
the alloc_size part removed), and a small_size hint. It then splits that
chunk into pool->block_size'd chunks, and if there's some space still
left, split that into small_size chunks. small_size in this case is the
same size as alloc_size.
The idea is to keep the same logic, but make it in a way we can reuse it
to return other chunks to the pool when we are growing the buffer.
---
src/intel/vulkan/anv_allocator.c | 147 +++++++++++++++++++++----------
1 file changed, 102 insertions(+), 45 deletions(-)
diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/
anv_allocator.c
index 31258e38635..bddeb4a0fbd 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -994,6 +994,97 @@ anv_state_pool_get_bucket_size(uint32_t bucket)
return 1 << size_log2;
}
+/** Helper to create a chunk into the state table.
+ *
+ * It just creates 'count' entries into the state table and update their sizes,
+ * offsets and maps, also pushing them as "free" states.
+ */
+static void
+anv_state_pool_return_blocks(struct anv_state_pool *pool,
+ uint32_t chunk_offset, uint32_t count,
+ uint32_t block_size)
+{
+ if (count == 0)
+ return;
+
+ uint32_t st_idx = anv_state_table_add(&pool->table, count);
+ for (int i = 0; i < count; i++) {
+ /* update states that were added back to the state table */
+ struct anv_state *state_i = anv_state_table_get(&pool->table,
+ st_idx + i);
+ state_i->alloc_size = block_size;
+ state_i->offset = chunk_offset + block_size * i;
+ struct anv_pool_map pool_map = anv_block_pool_map(&pool->block_pool,
+ state_i->offset);
+ state_i->map = pool_map.map + pool_map.offset;
+ }
+
+ uint32_t block_bucket = anv_state_pool_get_bucket(block_size);
+ anv_state_table_push(&pool->buckets[block_bucket].free_list,
+ &pool->table, st_idx, count);
+}
+
+static uint32_t
+calculate_divisor(uint32_t size)
+{
+ uint32_t bucket = anv_state_pool_get_bucket(size);
+
+ while (bucket >= 0) {
+ uint32_t bucket_size = anv_state_pool_get_bucket_size(bucket);
+ if (size % bucket_size == 0)
+ return bucket_size;
+ }
+
+ return 0;
+}
+
+/** Returns a chunk of memory back to the state pool.
+ *
+ * If small_size is zero, we split chunk_size into pool->block_size'd pieces,
+ * and return those. If there's some remaining 'rest' space (chunk_size is not
+ * divisble by pool->block_size), then we find a bucket size that is a divisor
+ * of that rest, and split the 'rest' into that size, returning it to the pool.
+ *
+ * * if it is larger than pool->block_size, we split the chunk into
+ * small_size'd pieces, instead of pool->block_size'd ones.
+ * * we also use it as the desired size to split the 'rest' after we split
+ * the bigger size of the chunk into pool->block_size;
This seems both overly complicated and not really what we want. If I have a
block size of 8k and allocate a single 64-byte state and then a 8k state, it
will break my almost 8k of padding into 511 64-byte states and return those
which may be very wasteful if the next thing I do is allocate a 1K state.
Good, this would definitely be a waste.
Post by Rafael Antognolli
It also doesn't provide the current alignment guarantees that all states are
aligned to their size. While the alignment guarantee doesn't matter for most
large states, it does matter for some of the smaller sizes. Now that I look at
it in detail, it appears that the highest alignment we ever require is 64B and
the smallest size we allow is 64B so maybe it just doesn't matter?
Assuming we don't care about alignments, we could do something like this?
if (small_size > 0) {
assert(chunk_size % small_size == 0);
anv_state_pool_return_blocks(pool, chunk_offset, chunk_size / small_size,
small_size);
} else {
uint32_t divisor = MAX_STATE_SIZE;
while (divisor >= MIN_STATE_SIZE) {
uint32_t nblocks = chunk_size / divisor;
if (nblocks > 0) {
anv_state_pool_return_blocs(pool, chunk_offset, nblocks, divisor);
chunk_offset += nblocks * divisor;
chunk_size -= nblocks * divisor;
}
}
}
The code above is indeed way simpler and it does seem to achieve
what we want for the "return padding" case.

However, we also use this helper for returning blocks that we got from
the freelist, but were only partially used. For instance if we need to
allocate a state of size 64 bytes, and we have a block of 8KB in the
freelist, due to the snippet above (small_size == 64) we will end up
splitting it into 511 64-byte states too.

Maybe (if we want to keep the old semantics), in the case where
small_size > 0, we have to do something like:

if (small_size > 0) {
assert(chunk_size % small_size == 0);
if (chunk_size > pool->block_size) {
assert(chunk_size % pool->block_size == 0);
uint32_t nblocks = chunk_size / pool->block_size - 1;
anv_state_pool_return_blocks(pool, offset, nblocks, pool->block_size);
chunk_size -= nblocks * pool->block_size;
offset += nblocks * pool->block_size;
}
anv_state_pool_return_blocks(pool, chunk_offset, chunk_size / small_size, small_size);
} else {
... your else clause here
}

Maybe it's over complicating things again... what do you think?
Post by Rafael Antognolli
If we do care about alignments, the above clearly isn't sufficient. We'd have
to do something where we chunk it separately from both ends or something. It'd
get kind-of gross.
+ */
+static void
+anv_state_pool_return_chunk(struct anv_state_pool *pool,
+ uint32_t chunk_offset, uint32_t chunk_size,
+ uint32_t small_size)
+{
+ uint32_t divisor = MAX2(pool->block_size, small_size);
+ uint32_t nblocks = chunk_size / divisor;
+ uint32_t rest = chunk_size % pool->block_size;
+
+ /* First return pool->block_size'd chunks.*/
+ uint32_t offset = chunk_offset + rest;
+ anv_state_pool_return_blocks(pool, offset, nblocks, pool->block_size);
+
+ if (rest == 0)
+ return;
+
+ chunk_size = rest;
+
+ if (small_size > 0) {
+ divisor = small_size;
+ } else {
+ /* Find the maximum divisor of the remaining chunk, and return smaller
+ * chunks of that size to the list.
+ */
+ divisor = calculate_divisor(chunk_size);
+ assert(divisor > 0);
+ }
+
+ /* Now return the smaller chunks of 'divisor' size */
+ assert(chunk_size % divisor == 0);
+ nblocks = (chunk_size / divisor);
+ anv_state_pool_return_blocks(pool, chunk_offset, nblocks, divisor);
+}
+
static struct anv_state
anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
uint32_t size, uint32_t align)
@@ -1025,6 +1116,10 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
*/
state->alloc_size = alloc_size;
+ /* Now return the unused part of the chunk back to the pool as free
+ * blocks
+ */
+
/* We've found a chunk that's larger than the requested state size.
*
@@ -1049,52 +1144,14 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
* one of them. Then we split what remains into
* state.alloc_size sized chunks and return all but one.
*
- * We choose option (3).
+ * We choose option (3). That is done by returning the remaining of
+ * the chunk with anv_state_pool_return_chunk(), with alloc_size as a
+ * hint of the size that we want the smaller chunk split into.
*/
- if (chunk_size > pool->block_size &&
- alloc_size < pool->block_size) {
- assert(chunk_size % pool->block_size == 0);
- /* We don't want to split giant chunks into tiny chunks.
Instead,
- * break anything bigger than a block into block-sized chunks and
- * then break it down into bucket-sized chunks from there.
Return
- * all but the first block of the chunk to the block bucket.
- */
- uint32_t push_back = (chunk_size / pool->block_size) - 1;
- const uint32_t block_bucket =
- anv_state_pool_get_bucket(pool->block_size);
- uint32_t st_idx = anv_state_table_add(&pool->table, push_back);
- for (int i = 0; i < push_back; i++) {
- /* update states that were added back to the state table */
- struct anv_state *state_i = anv_state_table_get(&pool->
table,
- st_idx + i);
- state_i->alloc_size = pool->block_size;
- state_i->offset = chunk_offset + pool->block_size * (i + 1);
- struct anv_pool_map pool_map = anv_block_pool_map(&pool->
block_pool,
- state_i->
offset);
- state_i->map = pool_map.map + pool_map.offset;
- }
- anv_state_table_push(&pool->buckets[block_bucket].free_list,
- &pool->table, st_idx, push_back);
In your mind, does adding this helper remove at least some of the API ugliness
you were complaining about in patch 2? I think it does.
Yes, I think it does too. At first I really didn't want to send the
series until I figured out a way to improve the API, but after I added
these helpers I was less bothered by it.
Post by Rafael Antognolli
- chunk_size = pool->block_size;
- }
-
- assert(chunk_size % alloc_size == 0);
- uint32_t push_back = (chunk_size / alloc_size) - 1;
- uint32_t st_idx = anv_state_table_add(&pool->table, push_back);
- for (int i = 0; i < push_back; i++) {
- /* update states that were added back to the state table */
- struct anv_state *state_i = anv_state_table_get(&pool->table,
- st_idx + i);
- state_i->alloc_size = alloc_size;
- state_i->offset = chunk_offset + alloc_size * (i + 1);
- struct anv_pool_map pool_map = anv_block_pool_map(&pool->
block_pool,
- state_i->
offset);
- state_i->map = pool_map.map + pool_map.offset;
- }
- anv_state_table_push(&pool->buckets[bucket].free_list,
- &pool->table, st_idx, push_back);
-
- offset = chunk_offset;
+ uint32_t return_offset = chunk_offset + alloc_size;
+ uint32_t return_size = chunk_size - alloc_size;
+ anv_state_pool_return_chunk(pool, return_offset,
+ return_size, alloc_size);
goto done;
}
}
--
2.17.1
_______________________________________________
mesa-dev mailing list
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Loading...