Separate object on host and device, no more global objects #116

slizzered · 2015-12-07T11:01:59Z

There is no longer a global __device__ object, which simplifies reasoning about the behavior of mallocMC. Instead, there is an object explicitly created on the host, which internally creates the device allocator and holds a pointer to it. There are several side-effects:

fixes calling a device object from host? #113
using multiple allocators will be straightforward and no longer hidden behind MACRO-magic
the whole macro magic class mallocMC_overwrites.hpp was removed
it is no longer possible to directly replace standard malloc with mallocMC. Instead, you must pass the pointer to the device allocator into the kernel and use it:
previously:

using namespace mallocMC;
using MallocMC_Type = ... // alias definition with all the policies
MAMC_SET_ALLOCATOR_TYPE(MallocMC_Type) // ugly MACRO-magic

__global__
void
exampleOLD( )
{
    int* x = malloc( sizeof( int ) * 100 );
}

int
main( )
{
    initHeap( 1024 );
    exampleOLD<<< ... >>>( );
    finalizeHeap();
}

now:

using namespace mallocMC;
using MallocMC_Type = ... // alias definition with all the policies

__global__
void
exampleNEW(
    MallocMC_Type::AllocatorHandle mMC
)
{
    int* x = mMC.malloc( sizeof( int ) * 100 );
}

int
main( )
{
    MallocMC_Type mMC( 1024 );
    exampleNEW<<< ... >>>( mMC );
    mMC.finalizeHeap();
}

as you can see, the mallocMC::initHeap() function (actually, it was a macro in client code) is no longer needed. Instead, it uses a constructor that takes the size directly.
also, the mallocMC::finalizeHeap() macro is no longer used. Instead, it is encouraged to use the finalizeHeap() member function directly.

There remain several small problems, that need to be addressed (will create issues):

the finalizeHeap() functionality could be done through the destructor of the host object. This will probably be postponed until C++11, where the host object is forbidden to be copied (only moving will be allowed).
inside the constructor and finalizeHeap(), the hostclass currently uses CUDA functions directly to create/destroy the device object. This could either be integrated into the ReservePool-Policy or a new policy could be established to encapsulate this behaviour. Other suggestions welcome!

psychocoderHPC · 2015-12-07T12:05:44Z

examples/mallocMC_example01.cu

@@ -122,10 +122,10 @@ void run()
  std::vector<int> array_sums(block*grid,0);

  // create arrays of arrays on the device
-  createArrays<<<1,1>>>(grid,block);
+  createArrays<<<1,1>>>(grid,block, mMC.devAllocator);


Is it possible to use a method the get the device object instead to access a member direct?

The device allocator should not be pointer, an object (handle) which can contain a pointer should be better.

So, would you prefer a getter method (on the host) that returns an object like the following? (should be passed to the kernel as a pass-by-value object)

template <typename T_DevAllocator> struct AllocatorHandle { T_DevAllocator* devAllocator; MAMC_ACCELERATOR void* malloc( size_t size ) { return devAllocator -> malloc( size ); } MAMC_ACCELERATOR void free( void* p ) { devAllocator -> free( p ); } };

Yes this looks more save than use a pointer. Than the handle is something like an interface for the allocator.

ax3l · 2015-12-10T16:24:27Z

@Flamefire you can also have a look if you want :)

ax3l · 2015-12-10T16:26:57Z

@slizzered thank you for the fix and refactoring!

when this is reviewed,we should rather release a 2.3.0 and skip the patch level 2.2.1 because of the changed API, I guess.

Flamefire · 2015-12-11T09:31:54Z

src/include/mallocMC/creationPolicies/Scatter_impl.hpp

-      static void* initHeap(const T_Obj& obj, void* pool, size_t memsize){
-        T_Obj* heap;
-        MALLOCMC_CUDA_CHECKED_CALL(cudaGetSymbolAddress((void**)&heap,obj));
+      static void* initHeap(T_Obj* heap, void* pool, size_t memsize){


Why not make this functions non-static? The instance could save the heap in initHeap and reset it in finalizeHeap
Then further calls do not need to pass the heap pointer

And actually T_Obj& should be Scatter_Impl&. It shouldn't be possible to pass another type in here that just happens to have a similar interface...

Flamefire · 2015-12-11T09:59:22Z

It seems that the policies methods, that are used from the hostclass_host are all static. This makes it necessary that the device object is passed to all this static functions which would be a point against them.
But if all used functions are static, the class should NOT inherit from the policies! It adds false properties (invalid, never set and must never be used) to the instance!

Flamefire · 2015-12-11T10:13:10Z

And last: I'd also provide an implicit conversion from the Allocator to the handle so one could pass the Allocator into a kernel (which causes the conversion and saves the allocator.getAllocatorHandle call)

- Objects living on the accelerator are now created inside a host object - This simplifies handling of multiple allocators - as a downside, the allocator has to be used explicitly

psychocoderHPC · 2016-01-26T12:31:02Z

I will test soon

psychocoderHPC · 2016-04-21T16:14:20Z

After long long time I will merge it in and add the new changes to PIConGPU asap

ax3l · 2017-01-22T23:09:12Z

@psychocoderHPC can you please apply the latest mallocMC, including this feature to PIConGPU? It stops parallel/separable compilation.

slizzered added the refactoring label Dec 7, 2015

slizzered assigned psychocoderHPC Dec 7, 2015

psychocoderHPC reviewed Dec 7, 2015
View reviewed changes

slizzered mentioned this pull request Dec 7, 2015

cudaMalloc and cudaFree in host class #118

Open

ax3l added this to the 2.2.1crp: Bug Fixes milestone Dec 8, 2015

ax3l modified the milestone: 2.3.0crp: Bug Fixes & Refactoring Globals Dec 10, 2015

Flamefire reviewed Dec 11, 2015
View reviewed changes

Carlchristian Eckert added 7 commits January 25, 2016 14:51

Replaced the global __device__ object

74ba8aa

- Objects living on the accelerator are now created inside a host object - This simplifies handling of multiple allocators - as a downside, the allocator has to be used explicitly

replaced initHeap function with constructor

d281cbe

changed pointer to handle

503f0bd

fixed a return type to comply with defined interface

28c52c2

removed unnecessary inheritance

0de5416

fixed OldMalloc policy to comply with new interface

7f742b8

added implicit conversion to device handle

b9fe440

slizzered force-pushed the issue113-separate_object_host_device branch from 09dd40e to b9fe440 Compare January 25, 2016 14:04

some renamig and freeing at end of examples

4070f40

psychocoderHPC merged commit c0c6450 into alpaka-group:dev Apr 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate object on host and device, no more global objects #116

Separate object on host and device, no more global objects #116

slizzered commented Dec 7, 2015

psychocoderHPC Dec 7, 2015

psychocoderHPC Dec 7, 2015

slizzered Dec 7, 2015

psychocoderHPC Dec 7, 2015

ax3l commented Dec 10, 2015

ax3l commented Dec 10, 2015

Flamefire Dec 11, 2015

Flamefire Dec 11, 2015

Flamefire commented Dec 11, 2015

Flamefire commented Dec 11, 2015

psychocoderHPC commented Jan 26, 2016

psychocoderHPC commented Apr 21, 2016

ax3l commented Jan 22, 2017

Separate object on host and device, no more global objects #116

Separate object on host and device, no more global objects #116

Conversation

slizzered commented Dec 7, 2015

psychocoderHPC Dec 7, 2015

Choose a reason for hiding this comment

psychocoderHPC Dec 7, 2015

Choose a reason for hiding this comment

slizzered Dec 7, 2015

Choose a reason for hiding this comment

psychocoderHPC Dec 7, 2015

Choose a reason for hiding this comment

ax3l commented Dec 10, 2015

ax3l commented Dec 10, 2015

Flamefire Dec 11, 2015

Choose a reason for hiding this comment

Flamefire Dec 11, 2015

Choose a reason for hiding this comment

Flamefire commented Dec 11, 2015

Flamefire commented Dec 11, 2015

psychocoderHPC commented Jan 26, 2016

psychocoderHPC commented Apr 21, 2016

ax3l commented Jan 22, 2017