forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 11819214 (Jan 17) (48) #519
Open
jorickert
wants to merge
140
commits into
bump_to_f9a80062
Choose a base branch
from
bump_to_11819214
base: bump_to_f9a80062
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…122785) This commit improves the memory efficiency of the lld-macho linker by optimizing how thunks are printed in the map file. Previously, merging vectors of input sections required creating a temporary vector, which increased memory usage and in some cases caused the linker to run out of memory as reported in comments on llvm#120496. The new approach interleaves the printing of two arrays of ConcatInputSection in sorted order without allocating additional memory for a merged array.
…121028) Problems: - Cray pointee cannot be used in the DSA list (If used results in segmentation fault) - Cray pointer has to be in the DSA list when Cray pointee is used in the default (none) region Fix: Added required semantic checks along the tests Reference from the documentation (OpenMP 5.0: 2.19.1): - Cray pointees have the same data-sharing attribute as the storage with which their Cray pointers are associated.
This commit removes convenience methods from `FloatType` to make it independent of concrete interface implementations. See discussion here: https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361 Note for LLVM integration: Replace `FloatType::getF32(` with `Float32Type::get(` etc.
When generating a constant vector, if `UseSplat` is false, the indices different from the index of the extract can be filled with `poison` instead of `undef`.
…es (llvm#123022) We missed a case of type constraints referencing deduced template parameters when constructing a deduction guide for the type alias. This patch fixes the issue by swapping the order of constructing 'template arguments not appearing in the type alias parameters' and 'template arguments that are not yet deduced'. Fixes llvm#122134
…entries. x86_64::GOTTableManager and x86_64::PLTTableManager will now look for existing GOT and PLT sections and re-use existing entries if they're present. This will be used for an upcoming MachO patch to enable compact unwind support. This patch is the x86-64 counterpart 42595bd, which added the same functionality to the GOT and PLT managers for aarch64.
Return `poison` for zero-sized types in `isBitwiseValue`.
This patch removes 11 `check_include_file` invocations from configuration phase of LLVM subproject on most of the platforms, hardcoding the results. Fallback is left for platforms that we don't document as supported or that are not detectable via `CMAKE_SYSTEM_NAME`, e.g. z/OS. This patch reduces configuration time on Linux by 10%, going from 44.7 seconds down to 40.6 seconds on my Debian machine (ramdisk, `cmake -DLLVM_ENABLE_PROJECTS="clang;lldb;clang-tools-extra" -DLLVM_ENABLE_RUNTIMES="libunwind;libcxx;libcxxabi" -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_OPTIMIZED_TABLEGEN=ON -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_DOXYGEN=ON -DLLVM_ENABLE_LIBCXX=ON -DBUILD_SHARED_LIBS=ON -DLLDB_ENABLE_PYTHON=ON ~/endill/llvm-project/llvm`). In order to determine the values to hardcode, I prepared the following header: ```cpp #include <dlfcn.h> #include <errno.h> #include <fcntl.h> #include <fenv.h> #include <mach/mach.h> #include <malloc/malloc.h> #include <pthread.h> #include <signal.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/param.h> #include <sys/resource.h> #include <sys/stat.h> #include <sys/time.h> #include <sys/types.h> #include <sysexits.h> #include <termios.h> #include <unistd.h> int main() {} ``` and tried to compile it on the oldest versions of platforms that are still supported (which was problematic to determine sometimes): macOS 12, Cygwin, DragonFly BSD 6.4.0, FreeBSD 13.3, Haiku R1 beta 4, RHEL 8.10 as a glibc-based Linux, Alpine 3.17 as musl-based Linux, NetBSD 9, OpenBSD 7.4, Solaris 11.4, Windows SDK 10.0.17763.0, which corresponds to Windows 10 1809 and is the oldest Windows 10 SDK in Visual Studio Installer. For platforms I don't have access to, which are AIX 7.2 TL5 and z/OS 2.4.0, I had to rely on the official documentation. I suspect that AIX offers a better set of headers than what this PR claims, so I'm open to input from people who have access to a live system to test it. Similarly to AIX, I have values for z/OS compiled from the official documentation that are not included in this patch, because apparently upstream CMake doesn't even support z/OS, so I don't even know how to make a place to hold those values. I see `if (ZOS)` in several places across our CMake files, but it's a mystery to me where this variable comes from. Input from people who have access to live z/OS instance is welcome.
CMAKE_CXX_SIMULATE_ID indicates the MSVC abi is usable.
Add a prefix to avoid conflicts, otherwise the test becomes invalid on regeneration.
The extra tests are simpler for GISel to detect.
This patch is the third step to extend the current multilib system to support the selection of library variants which do not correspond to existing command-line options. Proposal can be found in https://discourse.llvm.org/t/rfc-multilib-custom-flags/81058 The multilib mechanism supports libraries that target code generation or language options such as --target, -mcpu, -mfpu, -mbranch-protection. However, some library variants are particular to features that do not correspond to any command-line options. Examples include variants for multithreading and semihosting. This work introduces a way to instruct the multilib system to consider these features in library selection. This particular patch is comprised of the core processing of these flags. - Custom flags in the command-line are read and forwarded to the multilib system. If multiple flag values are present for the same flag declaration, the last one wins. Default flag values are inserted for flag declarations for which no value was given. - Feed `MacroDefines` back into the driver. Each item `<string>` in the `MacroDefines` list is formatted as `-D<string>`. Library variants should list their requirement on one or more custom flags like they do for any other flag. The new command-line option is passed as-is to the multilib system, therefore it should be listed in the format `-fmultilib-flag=<str>`. Moreover, a variant that does not specify a requirement on any particular flag can be matched against any value of that flag. If the user specifies `-fmultilib-flag=<name>` with a name that is invalid, but close enough to any valid flag value name in terms of edit distance, a suggesting error is shown: ``` error: unsupported option '-fmultilib-flag=invalidname'; did you mean '-fmultilib-flag=validname'? ``` The candidate with the smallest edit distance is chosen for the suggestion, up to a certain maximum value (implementation detail), after which a non-suggesting error is shown instead: ``` error: unsupported option '-fmultilib-flag=invalidname' ```
…22734) When passing an instruction with a register mask, the machine copy propagation pass was dropping the information about some copy instructions which define a register which is preserved by the mask, because that register overlaps a register which is partially clobbered by it. This resulted in a miscompilation for AArch64, because this caused a live copy to be considered dead. The fix is to clobber register masks by finding the set of reg units which is preserved by the mask, and clobbering all units not in that set. This is based on llvm#122472, and fixes the compile time performance regressions which were caused by that.
…late-names (llvm#123054) Anonymous namespaces are supposed to be optional when looking up types. This was not working in combination with -gsimple-template-names, because the way it was constructing the complete (with template args) name scope (i.e., by generating thescope as a string and then reparsing it) did not preserve the information about the scope kinds. Essentially what the code wants here is to call `GetTypeLookupContext` (that's the function used to get the context in the "regular" code path), but to embelish each name with the template arguments (if they don't have them already). This PR implements exactly that by adding an argument to control which kind of names are we interested in. This should also make the lookup faster as it avoids parsing of the long string, but I haven't attempted to benchmark that. I believe this function can also be used in some other places where we're manually appending template names, but I'm leaving that for another patch.
This patch is the fourth step to extend the current multilib system to support the selection of library variants which do not correspond to existing command-line options. Proposal can be found in https://discourse.llvm.org/t/rfc-multilib-custom-flags/81058 The multilib mechanism supports libraries that target code generation or language options such as --target, -mcpu, -mfpu, -mbranch-protection. However, some library variants are particular to features that do not correspond to any command-line options. Examples include variants for multithreading and semihosting. This work introduces a way to instruct the multilib system to consider these features in library selection. This particular patch updates the documentation.
This commits allows the container to report 3 additional metrics at every sampling event: - a heartbeat - the size of the workflow queue (filtered) - the number of running workflows (filtered) The heartbeat is a simple metric allowing us to monitor the metrics health. Before this commit, a new metrics was pushed only when a workflow was completed. This meant we had to wait a few hours before noticing if the metrics container was unable to push metrics. In addition to this, this commits adds a sampling of the workflow queue size and running count. This should allow us to better understand the load, and improve the autoscale values we pick for the cluster. --------- Signed-off-by: Nathan Gauër <[email protected]>
VTypeAnalysis contains some assertions which can be useful for reasoning that the types of various operands match. This patch teaches VPlanVerifier to invoke VTypeAnalysis to check them, and catches some issues with VPInstruction types that are also fixed here: * Handles the missing cases for CalculateTripCountMinusVF, CanonicalIVIncrementForPart and AnyOf * Fixes ICmp and ActiveLaneMask to return i1 (to align with `icmp` and `@llvm.get.active.lane.mask` in the LangRef) The VPlanVerifier unit tests also need to be fleshed out a bit more to satisfy the stricter assertions
Also clarify the FIXME, only none-UB metadata should be preserved. Extra tests for llvm#115605.
…3187) Reverts llvm#121996 because it broke an emscripten build with `--target=wasm32-unknown-emscripten`: ``` llvm/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:172:17: error: static assertion failed due to requirement '3U <= PointerUnionUIntTraits<const mlir::Value *, const mlir::Type *, mlir::OpOperand *, mlir::detail::OpResultImpl *, mlir::Type>::NumLowBitsAvailable': PointerIntPair with integer size too large for pointer 172 | static_assert(IntBits <= PtrTraits::NumLowBitsAvailable, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llvm/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:111:13: note: in instantiation of template class 'llvm::PointerIntPairInfo<void *, 3, llvm::pointer_union_detail::PointerUnionUIntTraits<const mlir::Value *, const mlir::Type *, mlir::OpOperand *, mlir::detail::OpResultImpl *, mlir::Type>>' requested here 111 | Value = Info::updateInt(Info::updatePointer(0, PtrVal), | ^ llvm/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:89:5: note: in instantiation of member function 'llvm::PointerIntPair<void *, 3, int, llvm::pointer_union_detail::PointerUnionUIntTraits<const mlir::Value *, const mlir::Type *, mlir::OpOperand *, mlir::detail::OpResultImpl *, mlir::Type>>::setPointerAndInt' requested here 89 | setPointerAndInt(PtrVal, IntVal); | ^ llvm/llvm-project/llvm/include/llvm/ADT/PointerUnion.h:77:16: note: in instantiation of member function 'llvm::PointerIntPair<void *, 3, int, llvm::pointer_union_detail::PointerUnionUIntTraits<const mlir::Value *, const mlir::Type *, mlir::OpOperand *, mlir::detail::OpResultImpl *, mlir::Type>>::PointerIntPair' requested here 77 | : Base(ValTy(const_cast<void *>( | ^ llvm/llvm-project/mlir/include/mlir/IR/TypeRange.h:49:36: note: in instantiation of member function 'llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<const mlir::Value *, const mlir::Type *, mlir::OpOperand *, mlir::detail::OpResultImpl *, mlir::Type>, llvm::PointerIntPair<void *, 3, int, llvm::pointer_union_detail::PointerUnionUIntTraits<const mlir::Value *, const mlir::Type *, mlir::OpOperand *, mlir::detail::OpResultImpl *, mlir::Type>>, 4, mlir::Type>::PointerUnionMembers' requested here 49 | TypeRange(Type type) : TypeRange(type, /*count=*/1) {} | ^ llvm/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:172:25: note: expression evaluates to '3 <= 2' 172 | static_assert(IntBits <= PtrTraits::NumLowBitsAvailable, | ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. ```
This commit moves the implementation of the smoothstep function to the CLC library, whilst optimizing the codegen. This commit also adds support for 'half' versions of smoothstep, which were previously missing. The CLC smoothstep implementation now keeps everything in vectors, rather than recursively splitting vectors by half down to the scalar base form. This should result in more optimal codegen across the board. This commit also removes some non-standard overloads of smoothstep with mixed types, such as 'double smoothstep(float, float, float)'. There aren't any mixed-(element )type versions of smoothstep as far as I can see: gentype smoothstep(gentype edge0, gentype edge1, gentype x) gentypef smoothstep(float edge0, float edge1, gentypef x) gentyped smoothstep(double edge0, double edge1, gentyped x) gentypeh smoothstep(half edge0, half edge1, gentypeh x) The CLC library only defines the first type, for simplicity; the OpenCL layer is responsible for handling the scalar/scalar/vector forms. Note that the scalar/scalar/vector forms now splat the scalars to the vector type, rather than recursively split vectors as before. The macro that used to 'vectorize' smoothstep in this way has been moved out of the shared clcmacro.h header as it was only used for the smoothstep builtin. Note that the CLC clamp function is now built for both SPIR-V targets. This is to help build the CLC smoothstep function for the Mesa SPIR-V target.
Store the entry symbol in SymbolTable instead of Configuration, as it differs between symbol tables.
The `getChunk` function returns all chunks, not just those specific to a symbol table. Move it out of the `SymbolTable` class to clarify its scope.
Update the test to use use-dereferenceable-at-point-semantics=1. Existing tests are updated with the nofree attribute and a new one has been added showing that the dereferenceable assumption is used after the pointer may be freed.
Also use brace initialization and emplace to avoid explicitly constructing std::pair, and the same for std::tuple.
On Darwin, the --isysroot flag must also be specified. This happens when either %flang or %flang_fc1 is expanded. As -fc1 must be the first argument, %flang_fc1 must be used in tests, instead of %flang -fc1.
InstrMaps is a helper data structure that maps scalars to vectors and the reverse. This is used by the vectorizer to figure out which vectors it can extract scalar values from.
This patch fixes: third-party/unittest/googletest/include/gtest/gtest.h:1379:11: error: comparison of integers of different signs: 'const unsigned int' and 'const int' [-Werror,-Wsign-compare] llvm/include/llvm/Transforms/Vectorize/SandboxVectorizer/InstrMaps.h:57:12: error: unused variable 'Pair' [-Werror,-Wunused-variable]
This patch rewords some of the user expression diagnostics. - Differentiate between being interrupted and hitting a breakpoint. - Use "expression execution" to make it more obvious that the diagnostic is associated with the user expression. - Consistently use a colon instead of semicolons and commas. rdar://143059974
We're currently excluding Wasm.cpp, because it requires emscripten. When using header modules, Wasm.h gets compiled on its own and it also requires emscripten, so we need to exclude both.
…lvm#122244) The motivation for this is to allow us to match strided accesses that are emitted from the loop vectorizer with EVL tail folding (see llvm#122232) In these loops the step isn't loop invariant and is based off of @llvm.experimental.get.vector.length. We can relax this as long as we make sure to construct the updates after the definition inside the loop, instead of the preheader. I presume the restriction was previously added so that the step would dominate the insertion point in the preheader. I can't think of why it wouldn't be safe to calculate it in the loop otherwise.
…lvm#122459) This avoids some of the pending regressions after AMDGPU implements isExtractVecEltCheap. In a case like shl <value, undef>, splat k, because the second operand was fully defined, we would fall through and use the splat value for the first operand, losing the undef high bits. This would result in an additional instruction to handle the high bits. Add some reduced testcases for different opcodes for one of the regressions.
Once again we have excessive TLI hooks with bad defaults. Permit this for 32-bit element vectors, which are just use-different-register. We should permit 16-bit vectors as cheap with legal packed instructions, but I see some mixed improvements and regressions that need investigation.
This reverts commit 9a6433f. ninja check-flang on x86 host fails to compile.
…vm#122672) This avoids regressions in a future AMDGPU commit. Previously we would have a build_vector (extract_vector_elt x), undef with free access to the elements bloated into a shuffle of one element + undef, which has much worse combine support than the extract. Alternatively could check aggressivelyPreferBuildVectorSources, but I'm not sure it's really different than isExtractVecEltCheap.
This showcases a miscompile involving a widened reduction-phi.
AArch64 instructions have a fixed size 4 bytes, no need to compute.
The hdrgen output is C, not C++.
C++11 introduced `noexcept`, but `throw()` can be used in older versions of the language.
…leDeclsByName (llvm#123152) Part for relanding llvm#122887. I split this to test where the performance regession comes from if modules are not used.
…lvm#87474) The proposed patch, in general, tries to transform the below code sequence: x = 1.0 / sqrt (a); r1 = x * x; // same as 1.0 / a r2 = a / sqrt(a); // same as sqrt (a) TO (If x, r1 and r2 are all used further in the code) r1 = 1.0 / a r2 = sqrt (a) x = r1 * r2 The transform tries to make high latency sqrt and div operations independent and also saves on one multiplication. The patch was tested with SPEC17 suite with cpu=neoverse-v2. The performance uplift achieved was: 544.nab_r ~4% No other regressions were observed. Also, no compile time differences were observed with the patch. Closes llvm#54652
Pull Request: llvm#123282
…9218) The intention is to use a "copy" instead of a "sub" to handle the high parts of 64-bit multiply for this specific case. This unlocks copy prop use cases where the copy can be reused by later multiply+add sequences if possible. Fixes: SWDEV-487672, SWDEV-487669
) Close llvm#90154 This patch is also an optimization to the lookup process to utilize the information provided by `export` keyword. Previously, in the lookup process, the `export` keyword only takes part in the check part, it doesn't get involved in the lookup process. That said, previously, in a name lookup for 'name', we would load all of declarations with the name 'name' and check if these declarations are valid or not. It works well. But it is inefficient since it may load declarations that may not be wanted. Note that this patch actually did a trick in the lookup process instead of bring module information to DeclarationName or considering module information when deciding if two declarations are the same. So it may not be a surprise to me if there are missing cases. But it is not a regression. It should be already the case. Issue reports are welcomed. In this patch, I tried to split the big lookup table into a lookup table as before and a module local lookup table, which takes a combination of the ID of the DeclContext and hash value of the primary module name as the key. And refactored `DeclContext::lookup()` method to take the module information. So that a lookup in a DeclContext won't load declarations that are local to **other** modules. And also I think it is already beneficial to split the big lookup table since it may reduce the conflicts during lookups in the hash table. BTW, this patch introduced a **regression** for a reachability rule in C++20 but it was false-negative. See 'clang/test/CXX/module/module.interface/p7.cpp' for details. This patch is not expected to introduce any other regressions for non-c++20-modules users since the module local lookup table should be empty for them.
The code path has been dead since 2019. See a3eb3d3
The libc headers are C, not C++.
This patch fixes: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:13908:46: error: comparison of integers of different signs: 'uint32_t' (aka 'unsigned int') and 'int' [-Werror,-Wsign-compare]
Only used by Unix/Program.inc and seem always available. Pull Request: llvm#123288
When iterating over function records, filtered by file name, currently, the iteration goes over all the function records, repeatedly for each source file, essentially giving quadratic behavior. 413647d sped up some cases by keeping track of the indices of the function records corresponding to each file name. This change expands the use of that map to FunctionRecordIterator. On a test case with Firefox's libxul.so and a 2.5MB profile, this brings down the runtime of `llvm-cov export $lib --instr-profile $prof -t lcov` from 12 minutes with 90% spent in skipOtherFiles to 19 seconds with no samples in skipOtherFiles at all under a sampling profiler (with a sampling interval of 1ms). Fixes llvm#62079
We still have GetDescription and DumpStopContext which serve a similar purpose. (The main reason this is bothering me is because I'm working through the uses of (deprecated) Function::GetAddressRange.)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.