Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][RTC] Cache frontend invocation #16823

Merged
merged 20 commits into from
Feb 25, 2025
Merged

Conversation

jopperm
Copy link
Contributor

@jopperm jopperm commented Jan 29, 2025

Adds sycl_jit-RTC-specific persistent caching to the runtime. The basic idea is to cache only the LLVM module resulting from the device compilation in bitcode format. Device linking and post-link would be run always (even for a cache hit), as invoking the frontend is the most expensive step in the pipeline right now.

The cache key is the concatenation of:

  • the Base64*-encoding of a BLAKE3 hash of the preprocessed source string (i.e. containing all headers included as virtual files per the kernel_compiler extension as well as from the local file system), and
  • the Base64-encoding of a BLAKE3 hash of the user-supplied build options.

*) Replacing / by - to make the string filesystem-friendly.

Signed-off-by: Julian Oppermann <[email protected]>
@jopperm
Copy link
Contributor Author

jopperm commented Jan 30, 2025

@sommerlukas @cperkinsintel I'll leave this PR in draft mode until discussions settle, but this is how I'd do the basic mechanism, to be extended with a more sophisticated cache lookup.

Signed-off-by: Julian Oppermann <[email protected]>
Signed-off-by: Julian Oppermann <[email protected]>
Signed-off-by: Julian Oppermann <[email protected]>
Signed-off-by: Julian Oppermann <[email protected]>
@jopperm jopperm marked this pull request as ready for review February 14, 2025 05:41
@jopperm jopperm requested review from a team as code owners February 14, 2025 05:41
Copy link
Contributor

@sommerlukas sommerlukas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits and questions.

public:
explicit RTCHashResult(const char *PreprocLog)
: Failed{true}, Hash{}, PreprocLog{PreprocLog} {}
RTCHashResult(const char *Hash, const char *PreprocLog)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this constructor take PreprocLog? The only use of this constructor I found in the code only passes "" in that case, might just as well set it to an empty string here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To distinguish the success/failure result variants, and possibly to always take the log when we reuse the preprocessed source in the other action. I pushed an alternate design in ed093ce, but don't love it. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice the issue with the same parameter type before.

I think ed093ce isn't bad, but we could further simplify it by only storing a single string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, done!

Comment on lines +602 to +604
// Update the cache size file and trigger cache eviction if needed.
if (TotalSize)
updateCacheFileSizeAndTriggerEviction(getRootDir(), TotalSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that this happens outside of the lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not my design (CC @cperkinsintel @uditagarwal97), but I'd say yes. The lock we held before is only for a specific entry (directory), and the eviction mechanism employs its own lock for the file containing the total cache size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Eviction uses its own lock.

if (isEvictionEnabled())
saveCurrentTimeInAFile(FileName + CacheEntryAccessTimeSuffix);
} catch (...) {
// If read was unsuccessfull try the next item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment correct? Looks more like we're just giving up and will compile fresh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cperkinsintel Just realised there's a potential bug in getCompiledKernelFromDisc (which I based getDeviceCodeIRFromDisc on): The iteration over the different .bin files doesn't make sense because for kernel_compiler as we don't write a source item to distinguish the binaries stored for the same cache key. putCompiledKernelToDisc does look for an unused filename to store the binary, but getCompiledKernelFromDisc will always attempt to read the 0.bin file and use that unless an FS error occurs.

Signed-off-by: Julian Oppermann <[email protected]>
Signed-off-by: Julian Oppermann <[email protected]>
Signed-off-by: Julian Oppermann <[email protected]>
Signed-off-by: Julian Oppermann <[email protected]>
std::string SYCLFileName = CompilationID + ".cpp";
// The filename must be stable, because it is part of the preprocessed output
// and in consequence, the cache key.
std::string SYCLFileName = "rtc.cpp";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unfortunate. Some day in the future when debugging support is required, this "rt.cpp" filename for everything might become a problem. That day is likely no day soon, so I'm fine with it.
Is there no other (simple) way for a file name that is unique for the kernel, but consistent between invocations (cache read and write )?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, maybe a short hash of the source string only? Or we could come back to the idea of letting the user set the filename (or a generic tag) via a property, which might be more useful for debugging.

In any case, let me check again how much effort it would be to keep the filename out of the cache key calculation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling the line markers in the preprocessor output does the trick for now, so I reverted back to the original behaviour, using rtc_<n>.cpp as file name.

@sommerlukas sommerlukas merged commit 6dc419f into intel:sycl Feb 25, 2025
20 checks passed
kurapov-peter pushed a commit to kurapov-peter/llvm that referenced this pull request Mar 5, 2025
Adds `sycl_jit`-RTC-specific persistent caching to the runtime. The
basic idea is to cache only the LLVM module resulting from the device
compilation in bitcode format. Device linking and post-link would be run
always (even for a cache hit), as invoking the frontend is the most
expensive step in the pipeline right now.

The cache key is the concatenation of:
- the Base64*-encoding of a BLAKE3 hash of the preprocessed source
string (i.e. containing all headers included as virtual files per the
`kernel_compiler` extension as well as from the local file system), and
- the Base64-encoding of a BLAKE3 hash of the user-supplied build
options.


*) Replacing `/` by `-` to make the string filesystem-friendly.

---------

Signed-off-by: Julian Oppermann <[email protected]>
Co-authored-by: Lukas Sommer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants