forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with e7244d86 (Jan 08) (29) #499
Open
jorickert
wants to merge
480
commits into
bump_to_d622b66a
Choose a base branch
from
bump_to_e7244d86
base: bump_to_d622b66a
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Currently hfinkel is listed as the AliasAnalysis maintainer, but I believe he hasn't been actively working on LLVM in the last couple of years, so I'd like to update this information. I'd like to nominate fhahn and myself as the new maintainers for AA. While here, I'd also like to nominate alinas as the maintainer for MemorySSA.
If directive is put inside `#if __cplusplus`, it should reflect the condition, instead of being generic `expected`.
…-rt (llvm#121625) This compile time test uses inline asm with `.arch` directives to set the target feature. It is however broken and always fails, since each `asm()` construct in LLVM sets up a new AsmParser, and therefore the `.arch` directive has no effect on later `asm()` contents. To fix this we need to use a single inline `asm()` call with the entire code chunk to emit contained inside.
…ion (llvm#121559) As part llvm#112171, support for FEAT_PAuthLR's CFI instructions was added. However, the CFI instructions are emitted in the incorrect location. This leads to incorrect CodeGen being generated and possible issues when running a program. According to the ABI, the CFI instructions should be emitted before the signing instruction. This is now done properly. ABI information can be found here: https://github.com/ARM-software/abi-aa/blob/bf0e2c8047c70987165f3e05e571d7836370ade9/aadwarf64/aadwarf64.rst#44call-frame-instructions
…vm#121681) The new line types help to annotate */&/&& in simple requirements as binary operators. Fixes llvm#121675.
Print operations are often used for debugging, immediately before the compiler aborts. In such cases, it is sometimes possible that the output isn't fully produced yet. Make sure it is by explicitly flushing the output.
…rt for single reductions in ComplexDeinterleavingPass (llvm#112875)" (llvm#120441) This reverts commit 76714be, fixing the build failure that caused the revert. The failure stemmed from the complex deinterleaving pass identifying a series of add operations as a "complex to single reduction", so when it tried to transform this erroneously identified pattern, it faulted. The fix applied is to ensure that complex numbers (or patterns that match them) are used throughout, by checking if there is a deinterleave node amidst the graph.
This PR is in reference to porting LLDB on AIX. Link to discussions on llvm discourse and github: 1. https://discourse.llvm.org/t/port-lldb-to-ibm-aix/80640 2. llvm#101657 The complete changes for porting are present in this draft PR: llvm#102601 Added a HostInfoAIX file for the AIX platform. Most of the common functionalities are handled by the parent HostInfoPosix now, So we just have some basic functions implemented here.
Move the debug output that prints out the selected VF from selectVectorizationFactor -> computeBestVF. This means that the output will still be written even after removing the assert for the legacy and vplan cost models matching.
… binary operation on input (llvm#120207) Add codegen for when the input type has 4 times as many elements as the output type and the input to the partial reduction does not have a binary operation performed on it.
…1550) An instantiated templated function definition may not have a body due to parsing errors inside the templated function. When serializing, an assert is triggered inside `ASTRecordWriter::AddFunctionDefinition`. The instantiation may happen on an intermediate module. The test case was reduced from `mp-units`.
This commit add an NVIDIA-specific lowering of `cf.assert` to to `__assertfail`. Note: `getUniqueFormatGlobalName`, `getOrCreateFormatStringConstant` and `getOrDefineFunction` are moved to `GPUOpsLowering.h`, so that they can be reused.
) This is trivially additional support for the existing ALLOCATE directive, which allows an ALIGN clause. The ALLOCATE directive is currently not implemented, so this is just addding the necessary parser parts to allow the compiler to not say "Huh? I don't get this" [or "Expected OpenMP construct"] when it encounters the ALIGN clause. Some parser testing is updated and a new todo test, just in case the feature of align clause is not supported by the initial support for ALLOCATE.
This registers `sincos[f|l]` as a clang builtin and updates GCBuiltin to emit the `llvm.sincos.*` intrinsic when `-fno-math-errno` is set. Note: `llvm.sincos.*` is only emitted by `__builtin_sincos[f|l]` functions in this initial patch.
…rn (llvm#119527) This fixes a regression from llvm#101294 by checking if we might be clobbering a sh{1,2,3}add pattern. Only do this is the underlying add isn't going to be folded away into an address offset.
…21747) Replace `bzero` with the standard `memset` so that it is common to all platforms.
Fix a grammar mistake in Polly docs. Co-authored-by: hstk30-hw <[email protected]>
…s" (llvm#121749) Generalize the SymbolIDs used for SymbolData to all SymExprs and use these IDs for comparison SymbolRef keys in various containers, such as ConstraintMap. These IDs are superior to raw pointer values because they are more controllable and are not randomized across executions (unlike [pointers](https://en.wikipedia.org/wiki/Address_space_layout_randomization)). These IDs order is stable across runs because SymExprs are allocated in the same order. Stability of the constraint order is important for the stability of the analyzer results. I evaluated this change on a set of 200+ open-source C and C++ projects with the total number of ~78 000 symbolic-execution issues passing Z3 refutation. This patch reduced the run-to-run churn (flakiness) in SE issues from 80-90 to 30-40 (out of 78K) in our CSA deployment (in our setting flaky issues are mostly due to Z3 refutation instability). Note, most of the issue churn (flakiness) is caused by the mentioned Z3 refutation. With Z3 refutation disabled, issue churn goes down to ~10 issues out of 83K and this patch has no effect on appearing/disappearing issues between runs. It however, seems to reduce the volatility of the execution flow: before we had 40-80 issues with changed execution flow, after - 10-30. Importantly, this change is necessary for the next step in stabilizing analysis results by caching Z3 query outcomes between analysis runs (work in progress). Across our admittedly noisy CI runs, I detected no significant effect on memory footprint or analysis time. This PR reapplies llvm#121551 with a fix to a g++ compiler error reported on some build bots CPP-5919
…21338) Inlining must be disabled for new-ZT0 callees as the callee is required to save ZT0 and toggle PSTATE.ZA on entry.
llvm#120104) This combine pattern perform the below transformation. fmul x, select(y, A, B) -> fldexp (x, select i32 (y, a, b)) fmul x, select(y, -A, -B) -> fldexp ((fneg x), select i32 (y, a, b)) where, A=2^a & B=2^b ; a and b are integers. It is a follow-up PR to implement the above combine for globalIsel, as the corresponding DAG combine has been done for SelectionDAG Isel (llvm#111109)
Matches the existing horizontal-add tests, with the additional non-commutable constraint
llvm#118636) Fixes llvm#117975, a regression introduced by llvm#112521 due to forgetting to check for `nullptr` before dereferencing in `CallExpr::getUnusedResultAttr`.
Also move the -fno-wrapv option definition next to the -fwrapv one while here.
…g model (llvm#122007) According to llvm-exegesis, they should have around 2 cycles of latency on P400 cores.
…sted access (llvm#119102) Now that we are accepting commit access requests via GitHub issues, we can keep track of who has recently requested access.
…122003) Case analysis: * EEW=SEW*2, getEMULEqualsEEWDivSEWTimesLMUL(EEW) returns 2 x VLMUL * EEW=SEW, getEMULEqualsEEWDivSEWTimesLMUL(EEW) returns VLMUL
This adds a new main command-line entry point for hdrgen, in the new main.py. This new interface is used for generating a header. The old ways of invoking yaml_to_classes.py for other purposes are left there for now, but `--e` is renamed to `--entry-point` for consistency with the new CLI. The YAML schema is expanded with the `header_template` key where the corresponding `.h.def` file's path is given relative to where the YAML file is found. The build integration no longer gives the `.h.def` path on the command line. Instead, the script now emits a depfile that's used by the cmake rules to track that. The output file is always explicit in the script command line rather than sometimes being derived from a directory path.
We are missing MSVC C++ functions since the name is quoted in the LLVM IR, so we don't find them in the generated IR and therefore don't add the test checks. Additionally, there is an issue with finding functions using NEON types (see llvm#121800). Pull Request: llvm#121976
Since the files have been reorganized, the readme is out of date. This patch updates it to be more accurate.
Optionally (by default) no longer mark callsite nodes as Recursive, which means they would be automatically skipped during cloning. This was too conservative as it prevents cloning of any callsite that showed up in any recursive cycle, even for non-recursive contexts. While this will enable partial cloning of recursive contexts, the recursive calls themselves will not be updated to call the correct clone, possibly leading to some unnecessary but benign cloning and affecting bytes hinted reporting. To prevent this, optional support looks for recursive cycles in contexts during cloning and removes those contexts from cloning. This requires some additional runtime overhead, so is disabled by default for now. Support for correct cloning of recursive cycles is WIP.
…dBundle. (llvm#121846) Explicitly disable copy CTOR/assigment for SchedBundle to avoid acsidentional usage of default versions that do not handle Nodes copies properly. A developer will need to implement them once required.
…#121861) This patch introduces a new method: void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses) const; The method is called at the end of Vectorizer::collectEquivalenceClasses() and is needed to merge equivalence classes that differ only by their underlying objects (UO1 and UO2), where UO1 is 1-level-indirection underlying base for UO2. This situation arises due to the limited lookup depth used during the search of underlying bases with llvm::getUnderlyingObject(ptr). Using any fixed lookup depth can result into creation of multiple equivalence classes that only differ by 1-level indirection bases. The new approach merges equivalence classes if they have adjacent bases (1-level indirection). If a series of equivalence classes form ladder formed of 1-step/level indirections, they are all merged into a single equivalence class. This provides more opportunities for the load-store vectorizer to generate better vectors. --------- Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
Add a new file to the module map and remove 2 missing files (migrated from .def to .td).
the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is not supposed to change the meaning of the program. Further, checking the calling convention is significantly faster than traversing the metadata, improving compile time. This change updates the clang and mlir frontends as well as the NVPTXCtorDtorLowering pass to emit kernels using the calling convention. In addition, this updates all NVPTX unit tests to use the calling convention as well.
Custom lowering for s32 G_ADD/SUB to help match selection dag better. Specifically for RV64 a s32 is produced as a add+sext the output this allows for fewer instructions to sign extend a couple patterns. Allows for the generation of addiw,subw,negw to reduce required instructions to load values quicker Log2_ceil_i32 in rvzbb.ll shows a more obvious improvement case.
…ug output. ORC and JITLink debugging output write the dbgs() raw_ostream, which isn't thread-safe. Use -num-threads=0 to force single-threaded linking for tests that produce debugging output. The llvm-jitlink tool is updated to suggest -num-threads=0 when debugging output is enabled.
…122030) Add mask store to getOperandInfo since it has the same behavior.
…lvm#120667) As mentioned in llvm#118989, all sanitizers but tsan are converted to just module pass for easier maintenance. This patch removes the TySan function pass, convert TySan from function+module pass to just module pass.
…e. NFC Test was added while llvm#121587 was in review.
The implementation of ParentMap assumes that the key is absent if it is mapped to nullptr. This breaks when trying to store a tuple as the value type. Remove this assumption by explicit uses of `try_emplace()`.
This patch extends the MachO linker's map file generation to include branch extension thunk symbols. Previously, thunks were omitted from the map file, making it difficult to understand the final layout of the binary, especially when debugging issues related to long branch thunks. This change ensures thunks are included and correctly interleaved with other symbols based on their address, providing an accurate representation of the linked output.
…y after internal-externc-isystem when nostdlibinc is used (llvm#122035) Embedded development often needs to use a different C standard library, replacing the existing one normally passed as -internal-externc-isystem. This works fine for an apple-macos target, but apple-none-macho doesn't work because the MachO driver doesn't implement AddClangSystemIncludeArgs to add the resource directory as -internal-isystem like most other drivers do. Move most of the search path logic from Darwin and DarwinClang down into an AppleMachO toolchain between the MachO and Darwin toolchains. Also define __MACH__ for apple-none-macho, as Swift expects all MachO targets to have that defined.
This fixes a bug that slipped into llvm#121736.
…#120058) This just copies the same conservative definition from mayWriteToMemory, and enables more VPInstructions to be hoisted out in LICM. I think this should give more accurate costs, and I was able to build llvm-test-suite without the legacy-vplan cost model assertion going off.
Following llvm#120380, `err_pack_expansion_length_conflict` has one close paren too many. Remove the extra parenthesis.
Following on from llvm#115200, disallow the null sgpr as a resource operand in some instructions that were missed.
These can generally be emitted using an ext instruction or mov from the high half. The half half extracts can be free depending on the users, but that is not handled here, just the basic costs. It originally included all subvector extracts, but that was toned-down to just half-vector extracts to try and help the mid end not breakup high/low extracts without having the SLP vectorizer create a mess using other shuffles.
The patch llvm#102460 already implements separate DT/LI/SE for parallel sub function. Crashes have been reported while region generator tries using oringinal function's DT while creating new parallel sub function due to checks in llvm#101198. This patch aims at fixing those cases by switching the DT/LI while generating parallel function using Region Generator. Fixes llvm#117877
…121936) Bolt makes use of add_llvm_library and as such ends up exporting its libraries from LLVMExports.cmake, which is not correct. Bolt doesn't have its own exports file, and I assume that there is no desire to have one either -- Bolt libraries are not intended to be consumed as a cmake module, right? As such, this PR adds a NO_EXPORT option to simplify exclude these libraries from the exports file.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.