Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate object on host and device, no more global objects #116

Merged

Conversation

slizzered
Copy link
Contributor

There is no longer a global __device__ object, which simplifies reasoning about the behavior of mallocMC. Instead, there is an object explicitly created on the host, which internally creates the device allocator and holds a pointer to it. There are several side-effects:

  • fixes calling a device object from host? #113
  • using multiple allocators will be straightforward and no longer hidden behind MACRO-magic
  • the whole macro magic class mallocMC_overwrites.hpp was removed
  • it is no longer possible to directly replace standard malloc with mallocMC. Instead, you must pass the pointer to the device allocator into the kernel and use it:
  • previously:
using namespace mallocMC;
using MallocMC_Type = ... // alias definition with all the policies
MAMC_SET_ALLOCATOR_TYPE(MallocMC_Type) // ugly MACRO-magic

__global__
void
exampleOLD( )
{
    int* x = malloc( sizeof( int ) * 100 );
}

int
main( )
{
    initHeap( 1024 );
    exampleOLD<<< ... >>>( );
    finalizeHeap();
}
  • now:
using namespace mallocMC;
using MallocMC_Type = ... // alias definition with all the policies

__global__
void
exampleNEW(
    MallocMC_Type::AllocatorHandle mMC
)
{
    int* x = mMC.malloc( sizeof( int ) * 100 );
}

int
main( )
{
    MallocMC_Type mMC( 1024 );
    exampleNEW<<< ... >>>( mMC );
    mMC.finalizeHeap();
}
  • as you can see, the mallocMC::initHeap() function (actually, it was a macro in client code) is no longer needed. Instead, it uses a constructor that takes the size directly.
  • also, the mallocMC::finalizeHeap() macro is no longer used. Instead, it is encouraged to use the finalizeHeap() member function directly.

There remain several small problems, that need to be addressed (will create issues):

  • the finalizeHeap() functionality could be done through the destructor of the host object. This will probably be postponed until C++11, where the host object is forbidden to be copied (only moving will be allowed).
  • inside the constructor and finalizeHeap(), the hostclass currently uses CUDA functions directly to create/destroy the device object. This could either be integrated into the ReservePool-Policy or a new policy could be established to encapsulate this behaviour. Other suggestions welcome!

@@ -122,10 +122,10 @@ void run()
std::vector<int> array_sums(block*grid,0);

// create arrays of arrays on the device
createArrays<<<1,1>>>(grid,block);
createArrays<<<1,1>>>(grid,block, mMC.devAllocator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use a method the get the device object instead to access a member direct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device allocator should not be pointer, an object (handle) which can contain a pointer should be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, would you prefer a getter method (on the host) that returns an object like the following? (should be passed to the kernel as a pass-by-value object)

template <typename T_DevAllocator>
struct AllocatorHandle
{
    T_DevAllocator* devAllocator;

    MAMC_ACCELERATOR
    void*
    malloc(
        size_t size
    )
    {
        return devAllocator -> malloc( size );
    }

    MAMC_ACCELERATOR
    void
    free(
        void* p
    )
    {
        devAllocator -> free( p );
    }

};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this looks more save than use a pointer. Than the handle is something like an interface for the allocator.

@ax3l ax3l added this to the 2.2.1crp: Bug Fixes milestone Dec 8, 2015
@ax3l
Copy link
Member

ax3l commented Dec 10, 2015

@Flamefire you can also have a look if you want :)

@ax3l
Copy link
Member

ax3l commented Dec 10, 2015

@slizzered thank you for the fix and refactoring!

when this is reviewed,we should rather release a 2.3.0 and skip the patch level 2.2.1 because of the changed API, I guess.

@ax3l ax3l modified the milestone: 2.3.0crp: Bug Fixes & Refactoring Globals Dec 10, 2015
static void* initHeap(const T_Obj& obj, void* pool, size_t memsize){
T_Obj* heap;
MALLOCMC_CUDA_CHECKED_CALL(cudaGetSymbolAddress((void**)&heap,obj));
static void* initHeap(T_Obj* heap, void* pool, size_t memsize){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this functions non-static? The instance could save the heap in initHeap and reset it in finalizeHeap
Then further calls do not need to pass the heap pointer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually T_Obj& should be Scatter_Impl&. It shouldn't be possible to pass another type in here that just happens to have a similar interface...

@Flamefire
Copy link
Contributor

It seems that the policies methods, that are used from the hostclass_host are all static. This makes it necessary that the device object is passed to all this static functions which would be a point against them.
But if all used functions are static, the class should NOT inherit from the policies! It adds false properties (invalid, never set and must never be used) to the instance!

@Flamefire
Copy link
Contributor

And last: I'd also provide an implicit conversion from the Allocator to the handle so one could pass the Allocator into a kernel (which causes the conversion and saves the allocator.getAllocatorHandle call)

Carlchristian Eckert added 7 commits January 25, 2016 14:51
@slizzered slizzered force-pushed the issue113-separate_object_host_device branch from 09dd40e to b9fe440 Compare January 25, 2016 14:04
@psychocoderHPC
Copy link
Member

I will test soon

@psychocoderHPC
Copy link
Member

After long long time I will merge it in and add the new changes to PIConGPU asap

@psychocoderHPC psychocoderHPC merged commit c0c6450 into alpaka-group:dev Apr 21, 2016
@ax3l
Copy link
Member

ax3l commented Jan 22, 2017

@psychocoderHPC can you please apply the latest mallocMC, including this feature to PIConGPU? It stops parallel/separable compilation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

calling a device object from host?
4 participants