Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 11819214 (Jan 17) (48) #519

Open
wants to merge 140 commits into
base: bump_to_f9a80062
Choose a base branch
from

Conversation

jorickert
Copy link

No description provided.

alx32 and others added 30 commits January 15, 2025 22:58
…122785)

This commit improves the memory efficiency of the lld-macho linker by
optimizing how thunks are printed in the map file. Previously, merging
vectors of input sections required creating a temporary vector, which
increased memory usage and in some cases caused the linker to run out of
memory as reported in comments on
llvm#120496. The new approach
interleaves the printing of two arrays of ConcatInputSection in sorted
order without allocating additional memory for a merged array.
…121028)

Problems:
- Cray pointee cannot be used in the DSA list (If used results in
segmentation fault)
- Cray pointer has to be in the DSA list when Cray pointee is used in
the default (none) region

Fix: Added required semantic checks along the tests

Reference from the documentation (OpenMP 5.0: 2.19.1):
- Cray pointees have the same data-sharing attribute as the storage with
   which their Cray pointers are associated.
This commit removes convenience methods from `FloatType` to make it
independent of concrete interface implementations.

See discussion here:
https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361

Note for LLVM integration: Replace `FloatType::getF32(` with
`Float32Type::get(` etc.
When generating a constant vector, if `UseSplat` is false, the indices
different from the index of the extract can be filled with `poison`
instead of `undef`.
…es (llvm#123022)

We missed a case of type constraints referencing deduced template
parameters when constructing a deduction guide for the type alias. This
patch fixes the issue by swapping the order of constructing 'template
arguments not appearing in the type alias parameters' and 'template
arguments that are not yet deduced'.

Fixes llvm#122134
…entries.

x86_64::GOTTableManager and x86_64::PLTTableManager will now look for existing
GOT and PLT sections and re-use existing entries if they're present.

This will be used for an upcoming MachO patch to enable compact unwind support.

This patch is the x86-64 counterpart 42595bd, which added the same
functionality to the GOT and PLT managers for aarch64.
Return `poison` for zero-sized types in `isBitwiseValue`.
This patch removes 11 `check_include_file` invocations from
configuration phase of LLVM subproject on most of the platforms,
hardcoding the results. Fallback is left for platforms that we don't
document as supported or that are not detectable via
`CMAKE_SYSTEM_NAME`, e.g. z/OS.

This patch reduces configuration time on Linux by 10%, going from 44.7
seconds down to 40.6 seconds on my Debian machine (ramdisk, `cmake
-DLLVM_ENABLE_PROJECTS="clang;lldb;clang-tools-extra"
-DLLVM_ENABLE_RUNTIMES="libunwind;libcxx;libcxxabi"
-DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_OPTIMIZED_TABLEGEN=ON
-DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_DOXYGEN=ON
-DLLVM_ENABLE_LIBCXX=ON -DBUILD_SHARED_LIBS=ON -DLLDB_ENABLE_PYTHON=ON
~/endill/llvm-project/llvm`).

In order to determine the values to hardcode, I prepared the following
header:
```cpp
#include <dlfcn.h>
#include <errno.h>
#include <fcntl.h>
#include <fenv.h>
#include <mach/mach.h>
#include <malloc/malloc.h>
#include <pthread.h>
#include <signal.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/param.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sysexits.h>
#include <termios.h>
#include <unistd.h>

int main() {}
```
and tried to compile it on the oldest versions of platforms that are
still supported (which was problematic to determine sometimes): macOS
12, Cygwin, DragonFly BSD 6.4.0, FreeBSD 13.3, Haiku R1 beta 4, RHEL
8.10 as a glibc-based Linux, Alpine 3.17 as musl-based Linux, NetBSD 9,
OpenBSD 7.4, Solaris 11.4, Windows SDK 10.0.17763.0, which corresponds
to Windows 10 1809 and is the oldest Windows 10 SDK in Visual Studio
Installer.

For platforms I don't have access to, which are AIX 7.2 TL5 and z/OS
2.4.0, I had to rely on the official documentation. I suspect that AIX
offers a better set of headers than what this PR claims, so I'm open to
input from people who have access to a live system to test it.

Similarly to AIX, I have values for z/OS compiled from the official
documentation that are not included in this patch, because apparently
upstream CMake doesn't even support z/OS, so I don't even know how to
make a place to hold those values. I see `if (ZOS)` in several places
across our CMake files, but it's a mystery to me where this variable
comes from. Input from people who have access to live z/OS instance is
welcome.
CMAKE_CXX_SIMULATE_ID indicates the MSVC abi is usable.
Add a prefix to avoid conflicts, otherwise the test becomes
invalid on regeneration.
The extra tests are simpler for GISel to detect.
This patch is the third step to extend the current multilib system to
support the selection of library variants which do not correspond to
existing command-line options.

Proposal can be found in
https://discourse.llvm.org/t/rfc-multilib-custom-flags/81058

The multilib mechanism supports libraries that target code generation or
language options such as --target, -mcpu, -mfpu, -mbranch-protection.
However, some library variants are particular to features that do not
correspond to any command-line options. Examples include variants for
multithreading and semihosting.

This work introduces a way to instruct the multilib system to consider
these features in library selection. This particular patch is comprised
of the core processing of these flags.

- Custom flags in the command-line are read and forwarded to the
multilib system. If multiple flag values are present for the same flag
declaration, the last one wins. Default flag values are inserted for
flag declarations for which no value was given.
- Feed `MacroDefines` back into the driver. Each item `<string>` in the
`MacroDefines` list is formatted as `-D<string>`.

Library variants should list their requirement on one or more custom
flags like they do for any other flag. The new command-line option is
passed as-is to the multilib system, therefore it should be listed in
the format `-fmultilib-flag=<str>`.

Moreover, a variant that does not specify a requirement on any
particular flag can be matched against any value of that flag.

If the user specifies `-fmultilib-flag=<name>` with a name that is
invalid, but close enough to any valid flag value name in terms of edit
distance, a suggesting error is shown:
```
error: unsupported option '-fmultilib-flag=invalidname'; did you mean '-fmultilib-flag=validname'?
```
The candidate with the smallest edit distance is chosen for the
suggestion, up to a certain maximum value (implementation detail), after
which a non-suggesting error is shown instead:
```
error: unsupported option '-fmultilib-flag=invalidname'
```
…22734)

When passing an instruction with a register mask, the machine copy
propagation pass was dropping the information about some copy
instructions which define a register which is preserved by the mask,
because that register overlaps a register which is partially clobbered
by it. This resulted in a miscompilation for AArch64, because this
caused a live copy to be considered dead.

The fix is to clobber register masks by finding the set of reg units
which is preserved by the mask, and clobbering all units not in that
set.

This is based on llvm#122472, and fixes the compile time performance
regressions which were caused by that.
…late-names (llvm#123054)

Anonymous namespaces are supposed to be optional when looking up types.
This was not working in combination with -gsimple-template-names,
because the way it was constructing the complete (with template args)
name scope (i.e., by generating thescope as a string and then reparsing
it) did not preserve the information about the scope kinds.

Essentially what the code wants here is to call `GetTypeLookupContext`
(that's the function used to get the context in the "regular" code
path), but to embelish each name with the template arguments (if they
don't have them already). This PR implements exactly that by adding an
argument to control which kind of names are we interested in. This
should also make the lookup faster as it avoids parsing of the long
string, but I haven't attempted to benchmark that.

I believe this function can also be used in some other places where
we're manually appending template names, but I'm leaving that for
another patch.
This patch is the fourth step to extend the current multilib system to
support the selection of library variants which do not correspond to
existing command-line options.

Proposal can be found in
https://discourse.llvm.org/t/rfc-multilib-custom-flags/81058

The multilib mechanism supports libraries that target code generation or
language options such as --target, -mcpu, -mfpu, -mbranch-protection.
However, some library variants are particular to features that do not
correspond to any command-line options. Examples include variants for
multithreading and semihosting.

This work introduces a way to instruct the multilib system to consider
these features in library selection. This particular patch updates the
documentation.
This commits allows the container to report 3 additional metrics at
every sampling event:
- a heartbeat
- the size of the workflow queue (filtered)
- the number of running workflows (filtered)

The heartbeat is a simple metric allowing us to monitor the metrics
health. Before this commit, a new metrics was pushed only when a
workflow was completed. This meant we had to wait a few hours
before noticing if the metrics container was unable to push metrics.

In addition to this, this commits adds a sampling of the workflow
queue size and running count. This should allow us to better understand
the load, and improve the autoscale values we pick for the cluster.

---------

Signed-off-by: Nathan Gauër <[email protected]>
VTypeAnalysis contains some assertions which can be useful for reasoning
that the types of various operands match.

This patch teaches VPlanVerifier to invoke VTypeAnalysis to check them,
and catches some issues with VPInstruction types that are also fixed
here:

* Handles the missing cases for CalculateTripCountMinusVF,
CanonicalIVIncrementForPart and AnyOf
* Fixes ICmp and ActiveLaneMask to return i1 (to align with `icmp` and
`@llvm.get.active.lane.mask` in the LangRef)

The VPlanVerifier unit tests also need to be fleshed out a bit more to
satisfy the stricter assertions
Also clarify the FIXME, only none-UB metadata should be preserved.

Extra tests for llvm#115605.
…3187)

Reverts llvm#121996 because it broke an emscripten build
with `--target=wasm32-unknown-emscripten`:

```
llvm/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:172:17: error: static assertion failed due to requirement '3U <= PointerUnionUIntTraits<const mlir::Value *, const mlir::Type *, mlir::OpOperand *, mlir::detail::OpResultImpl *, mlir::Type>::NumLowBitsAvailable': PointerIntPair with integer size too large for pointer
  172 |   static_assert(IntBits <= PtrTraits::NumLowBitsAvailable,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
llvm/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:111:13: note: in instantiation of template class 'llvm::PointerIntPairInfo<void *, 3, llvm::pointer_union_detail::PointerUnionUIntTraits<const mlir::Value *, const mlir::Type *, mlir::OpOperand *, mlir::detail::OpResultImpl *, mlir::Type>>' requested here
  111 |     Value = Info::updateInt(Info::updatePointer(0, PtrVal),
      |             ^
llvm/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:89:5: note: in instantiation of member function 'llvm::PointerIntPair<void *, 3, int, llvm::pointer_union_detail::PointerUnionUIntTraits<const mlir::Value *, const mlir::Type *, mlir::OpOperand *, mlir::detail::OpResultImpl *, mlir::Type>>::setPointerAndInt' requested here
   89 |     setPointerAndInt(PtrVal, IntVal);
      |     ^
llvm/llvm-project/llvm/include/llvm/ADT/PointerUnion.h:77:16: note: in instantiation of member function 'llvm::PointerIntPair<void *, 3, int, llvm::pointer_union_detail::PointerUnionUIntTraits<const mlir::Value *, const mlir::Type *, mlir::OpOperand *, mlir::detail::OpResultImpl *, mlir::Type>>::PointerIntPair' requested here
   77 |         : Base(ValTy(const_cast<void *>(
      |                ^
llvm/llvm-project/mlir/include/mlir/IR/TypeRange.h:49:36: note: in instantiation of member function 'llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<const mlir::Value *, const mlir::Type *, mlir::OpOperand *, mlir::detail::OpResultImpl *, mlir::Type>, llvm::PointerIntPair<void *, 3, int, llvm::pointer_union_detail::PointerUnionUIntTraits<const mlir::Value *, const mlir::Type *, mlir::OpOperand *, mlir::detail::OpResultImpl *, mlir::Type>>, 4, mlir::Type>::PointerUnionMembers' requested here
   49 |   TypeRange(Type type) : TypeRange(type, /*count=*/1) {}
      |                                    ^
llvm/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:172:25: note: expression evaluates to '3 <= 2'
  172 |   static_assert(IntBits <= PtrTraits::NumLowBitsAvailable,
      |                 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
```
This commit moves the implementation of the smoothstep function to the
CLC library, whilst optimizing the codegen.

This commit also adds support for 'half' versions of smoothstep, which
were previously missing.

The CLC smoothstep implementation now keeps everything in vectors,
rather than recursively splitting vectors by half down to the scalar
base form. This should result in more optimal codegen across the board.

This commit also removes some non-standard overloads of smoothstep with
mixed types, such as 'double smoothstep(float, float, float)'. There
aren't any mixed-(element )type versions of smoothstep as far as I can
see:

    gentype smoothstep(gentype edge0, gentype edge1, gentype x)
    gentypef smoothstep(float edge0, float edge1, gentypef x)
    gentyped smoothstep(double edge0, double edge1, gentyped x)
    gentypeh smoothstep(half edge0, half edge1, gentypeh x)

The CLC library only defines the first type, for simplicity; the OpenCL
layer is responsible for handling the scalar/scalar/vector forms. Note
that the scalar/scalar/vector forms now splat the scalars to the vector
type, rather than recursively split vectors as before. The macro that
used to 'vectorize' smoothstep in this way has been moved out of the
shared clcmacro.h header as it was only used for the smoothstep builtin.

Note that the CLC clamp function is now built for both SPIR-V targets.
This is to help build the CLC smoothstep function for the Mesa SPIR-V
target.
Store the entry symbol in SymbolTable instead of Configuration, as it
differs between symbol tables.
The `getChunk` function returns all chunks, not just those specific to a
symbol table. Move it out of the `SymbolTable` class to clarify its
scope.
Update the test to use use-dereferenceable-at-point-semantics=1.
Existing tests are updated with the nofree attribute and a new one has
been added showing that the dereferenceable assumption is used after the
pointer may be freed.
Also use brace initialization and emplace to avoid explicitly 
constructing std::pair, and the same for std::tuple.
On Darwin, the --isysroot flag must also be specified. This
happens when either %flang or %flang_fc1 is expanded. As -fc1 must
be the first argument, %flang_fc1 must be used in tests, instead of
%flang -fc1.
slackito and others added 30 commits January 16, 2025 15:24
InstrMaps is a helper data structure that maps scalars to vectors and
the reverse. This is used by the vectorizer to figure out which vectors
it can extract scalar values from.
This patch fixes:

  third-party/unittest/googletest/include/gtest/gtest.h:1379:11:
  error: comparison of integers of different signs: 'const unsigned
  int' and 'const int' [-Werror,-Wsign-compare]

  llvm/include/llvm/Transforms/Vectorize/SandboxVectorizer/InstrMaps.h:57:12:
  error: unused variable 'Pair' [-Werror,-Wunused-variable]
This patch rewords some of the user expression diagnostics.

 - Differentiate between being interrupted and hitting a breakpoint.
- Use "expression execution" to make it more obvious that the diagnostic
is associated with the user expression.
 - Consistently use a colon instead of semicolons and commas.

rdar://143059974
We're currently excluding Wasm.cpp, because it requires emscripten. When
using header modules, Wasm.h gets compiled on its own and it also
requires emscripten, so we need to exclude both.
…lvm#122244)

The motivation for this is to allow us to match strided accesses that
are emitted from the loop vectorizer with EVL tail folding (see llvm#122232)

In these loops the step isn't loop invariant and is based off of
@llvm.experimental.get.vector.length.

We can relax this as long as we make sure to construct the updates after
the definition inside the loop, instead of the preheader.

I presume the restriction was previously added so that the step would
dominate the insertion point in the preheader. I can't think of why it
wouldn't be safe to calculate it in the loop otherwise.
…lvm#122459)

This avoids some of the pending regressions after AMDGPU implements
isExtractVecEltCheap.

In a case like shl <value, undef>, splat k, because the second operand
was fully defined, we would fall through and use the splat value for the
first operand, losing the undef high bits. This would result in an additional
instruction to handle the high bits. Add some reduced testcases for different
opcodes for one of the regressions.
Once again we have excessive TLI hooks with bad defaults. Permit this
for 32-bit element vectors, which are just use-different-register.
We should permit 16-bit vectors as cheap with legal packed instructions,
but I see some mixed improvements and regressions that need investigation.
This reverts commit 9a6433f.  ninja check-flang on x86 host fails to compile.
…vm#122672)

This avoids regressions in a future AMDGPU commit. Previously we
would have a build_vector (extract_vector_elt x), undef with free
access to the elements bloated into a shuffle of one element + undef,
which has much worse combine support than the extract.

Alternatively could check aggressivelyPreferBuildVectorSources, but
I'm not sure it's really different than isExtractVecEltCheap.
This showcases a miscompile involving a widened reduction-phi.
AArch64 instructions have a fixed size 4 bytes, no need to compute.
C++11 introduced `noexcept`, but `throw()` can be used in older
versions of the language.
…leDeclsByName (llvm#123152)

Part for relanding llvm#122887.

I split this to test where the performance regession comes from if
modules are not used.
…lvm#87474)

The proposed patch, in general, tries to transform the below code
sequence:
x = 1.0 / sqrt (a);
r1 = x * x;  // same as 1.0 / a
r2 = a / sqrt(a); // same as sqrt (a)

TO

(If x, r1 and r2 are all used further in the code) 
r1 = 1.0 / a
r2 = sqrt (a)
x = r1 * r2

The transform tries to make high latency sqrt and div operations
independent and also saves on one multiplication.

The patch was tested with SPEC17 suite with cpu=neoverse-v2. The
performance uplift achieved was:
544.nab_r   ~4%

No other regressions were observed. Also, no compile time differences
were observed with the patch.

Closes llvm#54652
…9218)

The intention is to use a "copy" instead of a "sub" to handle the high
parts of 64-bit multiply for this specific case.

This unlocks copy prop use cases where the copy can be reused by later
multiply+add sequences if possible.

Fixes: SWDEV-487672, SWDEV-487669
)

Close llvm#90154

This patch is also an optimization to the lookup process to utilize the
information provided by `export` keyword.

Previously, in the lookup process, the `export` keyword only takes part
in the check part, it doesn't get involved in the lookup process. That
said, previously, in a name lookup for 'name', we would load all of
declarations with the name 'name' and check if these declarations are
valid or not. It works well. But it is inefficient since it may load
declarations that may not be wanted.

Note that this patch actually did a trick in the lookup process instead
of bring module information to DeclarationName or considering module
information when deciding if two declarations are the same. So it may
not be a surprise to me if there are missing cases. But it is not a
regression. It should be already the case. Issue reports are welcomed.

In this patch, I tried to split the big lookup table into a lookup table
as before and a module local lookup table, which takes a combination of
the ID of the DeclContext and hash value of the primary module name as
the key. And refactored `DeclContext::lookup()` method to take the
module information. So that a lookup in a DeclContext won't load
declarations that are local to **other** modules.

And also I think it is already beneficial to split the big lookup table
since it may reduce the conflicts during lookups in the hash table.

BTW, this patch introduced a **regression** for a reachability rule in
C++20 but it was false-negative. See
'clang/test/CXX/module/module.interface/p7.cpp' for details.

This patch is not expected to introduce any other
regressions for non-c++20-modules users since the module local lookup
table should be empty for them.
The code path has been dead since 2019.
See a3eb3d3
The libc headers are C, not C++.
This patch fixes:

  llvm/lib/Target/AMDGPU/SIISelLowering.cpp:13908:46: error:
  comparison of integers of different signs: 'uint32_t' (aka 'unsigned
  int') and 'int' [-Werror,-Wsign-compare]
Only used by Unix/Program.inc and seem always available.

Pull Request: llvm#123288
When iterating over function records, filtered by file name, currently,
the iteration goes over all the function records, repeatedly for each
source file, essentially giving quadratic behavior.

413647d sped up some cases by keeping
track of the indices of the function records corresponding to each file
name. This change expands the use of that map to FunctionRecordIterator.

On a test case with Firefox's libxul.so and a 2.5MB profile, this brings
down the runtime of `llvm-cov export $lib --instr-profile $prof -t lcov`
from 12 minutes with 90% spent in skipOtherFiles to 19 seconds with no
samples in skipOtherFiles at all under a sampling profiler (with a
sampling interval of 1ms).

Fixes llvm#62079
We still have GetDescription and DumpStopContext which serve a similar
purpose.

(The main reason this is bothering me is because I'm working through the
uses of (deprecated) Function::GetAddressRange.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.