-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate object on host and device, no more global objects #116
Separate object on host and device, no more global objects #116
Conversation
@@ -122,10 +122,10 @@ void run() | |||
std::vector<int> array_sums(block*grid,0); | |||
|
|||
// create arrays of arrays on the device | |||
createArrays<<<1,1>>>(grid,block); | |||
createArrays<<<1,1>>>(grid,block, mMC.devAllocator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use a method the get the device object instead to access a member direct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The device allocator should not be pointer, an object (handle) which can contain a pointer should be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, would you prefer a getter method (on the host) that returns an object like the following? (should be passed to the kernel as a pass-by-value object)
template <typename T_DevAllocator>
struct AllocatorHandle
{
T_DevAllocator* devAllocator;
MAMC_ACCELERATOR
void*
malloc(
size_t size
)
{
return devAllocator -> malloc( size );
}
MAMC_ACCELERATOR
void
free(
void* p
)
{
devAllocator -> free( p );
}
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this looks more save than use a pointer. Than the handle is something like an interface for the allocator.
@Flamefire you can also have a look if you want :) |
@slizzered thank you for the fix and refactoring! when this is reviewed,we should rather release a |
static void* initHeap(const T_Obj& obj, void* pool, size_t memsize){ | ||
T_Obj* heap; | ||
MALLOCMC_CUDA_CHECKED_CALL(cudaGetSymbolAddress((void**)&heap,obj)); | ||
static void* initHeap(T_Obj* heap, void* pool, size_t memsize){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make this functions non-static? The instance could save the heap
in initHeap
and reset it in finalizeHeap
Then further calls do not need to pass the heap pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And actually T_Obj&
should be Scatter_Impl&
. It shouldn't be possible to pass another type in here that just happens to have a similar interface...
It seems that the policies methods, that are used from the hostclass_host are all static. This makes it necessary that the device object is passed to all this static functions which would be a point against them. |
And last: I'd also provide an implicit conversion from the |
- Objects living on the accelerator are now created inside a host object - This simplifies handling of multiple allocators - as a downside, the allocator has to be used explicitly
09dd40e
to
b9fe440
Compare
I will test soon |
After long long time I will merge it in and add the new changes to PIConGPU asap |
@psychocoderHPC can you please apply the latest mallocMC, including this feature to PIConGPU? It stops parallel/separable compilation. |
There is no longer a global
__device__
object, which simplifies reasoning about the behavior of mallocMC. Instead, there is an object explicitly created on the host, which internally creates the device allocator and holds a pointer to it. There are several side-effects:mallocMC_overwrites.hpp
was removedmallocMC::initHeap()
function (actually, it was a macro in client code) is no longer needed. Instead, it uses a constructor that takes the size directly.mallocMC::finalizeHeap()
macro is no longer used. Instead, it is encouraged to use thefinalizeHeap()
member function directly.There remain several small problems, that need to be addressed (will create issues):
finalizeHeap()
functionality could be done through the destructor of the host object. This will probably be postponed until C++11, where the host object is forbidden to be copied (only moving will be allowed).finalizeHeap()
, the hostclass currently uses CUDA functions directly to create/destroy the device object. This could either be integrated into the ReservePool-Policy or a new policy could be established to encapsulate this behaviour. Other suggestions welcome!