499138 Commits

Author SHA1 Message Date
Nicolas Marie
c51af81745 Fixe DeviceRTL TeamAllocator.
Make the Allocator use dynamic size for teams/workgroup/threadblock and warp/wavefront.
Needed for proper AMD GPUs support.
2024-08-30 10:49:42 -07:00
Nicolas Marie
45da2df28b Update python wrapper.
Remove unused tag at build time and remove variables globalisations.
2024-08-30 10:48:31 -07:00
Nicolas Marie
5b3c557612 Add "MPI mode" to OpenMP Device Runtime Library.
This is used to modify the number of available hardware threadi shown to the user to the size of the OpenMP team.
It is needed in order for OpenMP Applications to work on GPU with DeviceRTL MPI.
Also add getMaxTeamWarps that return the numbers of available active Warp/Wavefront in the current team/workgroup/threadblock (taking SPMD mode into account).
2024-08-30 10:46:12 -07:00
Nicolas Marie
daac07a682 Try to add support for rpc vprintf.
Try to add support for vprintf using rpc without implementing variadic arguments.
Using Buildtin to transform fprintf call to vfprintf & then replace vfprintf call to a call to __omp_vprintf.
This is the methode currently used to make printf work using rpcs.
fprintf is not recognise as a builtin for now, so it does not work.
2024-08-30 10:44:16 -07:00
Nicolas Marie
e9d0f287b4 Update MPI imlpementation
- Implement MPI Reduce (for specific types).
- Add MPI reduce to location Data type.
- Implement MPI Broadcast.
- Implement MPI Reduce ALL.
- Move constant definition out of header.
- Implement MPI SendRecv and MPI SendRecv Replace.
- Implement GetCount & Get Version.
- Implement MPI_Alloc & MPI_Free.
- Try to Implement faster parallel memcopy to improve performace to an acceptable level.
2024-08-30 10:39:39 -07:00
Nicolas Marie
f005107d46 Fixe HostRPC constant adress spaces.
Cast constant declare in adress space 1 to adress space 0.
This is needed for rpc calls as a function call
cannot pass as arguments to a function a pointer in address space 1.
2024-08-30 10:34:28 -07:00
Nicolas Marie
7104affc8e Use HSA_QUEUE_COOPERATIVE for Offload.
Change Oiffload HSA Queue to use HSA_QUEUE_COOPERATIVE instead of HSA_QUEUE_MULTI.
This is needed in order to get Fair scheduling between teams/workgroup/thread blocks.
(This does not appear to work as specified in HSA-SysArch-1.2)
2024-08-30 10:31:09 -07:00
Nicolas Marie
7a714f7795 split runtimes libs install in multiples directories 2024-08-09 10:12:57 -07:00
Nicolas Marie
6ee95b22e5 Fixe segmentation fault fron Atributor Attributes 2024-08-09 10:09:56 -07:00
Nicolas Marie
762b333c3a fixe issues where emmiting omp dynamic allocation instead of a pre alocated memory, we try to add debug informations to an invalide uninitialize adress 2024-08-09 10:01:37 -07:00
Nicolas Marie
13beaff03f Add mpi header include directory to clang-mpi-gpu & add the possibility to setup gpu mpi ranks & threads from env variables MPI_RANKS & MPI_THREADS 2024-08-09 09:58:59 -07:00
Nicolas Marie
c7399b2039 modify rtl to be able to use multi grid syncs 2024-08-09 09:51:58 -07:00
Nicolas Marie
1ce8887a1a temporary remove new operator in conflict with std cpp header when using host headers with host rpcs 2024-08-09 09:49:59 -07:00
Nicolas Marie
de0a400cdb Make debug information of HOST Rpc more clean 2024-08-09 09:47:22 -07:00
Nicolas Marie
dc0bac14f1 continue MPI implementation to run LULESH 2024-08-09 09:45:59 -07:00
Nicolas Marie
c8b441d7c4 fixe allocations and namespace 2024-07-03 15:25:03 -07:00
Nicolas Marie
6b947a0cb3 remove leftover debug 2024-07-03 08:58:32 -07:00
Nicolas Marie
bb9875d4b4 continue implementation of mpi p2p 2024-07-02 17:29:14 -07:00
Nicolas Marie
0ae02a120c mpi p2p communications first try 2024-07-02 09:40:45 -07:00
Nicolas Marie
ad1e11e0d9 Fixe GPU First RPCs to work when given a pointer
- Fixe GPUFirst Memory Allocator to work with new offload plugin.
- Fixe TeamAllocator to not Ignore first Allocation.
2024-06-18 16:42:25 -07:00
Nicolas Marie
3b1aae9380 Use GPUFirst with libc rpc
- add missing headers in rpc.h.def
- add an opcode in libc rpc to handle gpu first host functions calls
- Fixe pointer casting
- Fixe Generated function to account for AMD address space
- remove LibC duplicate FILE declarations
- remove global variable to allow asyncronize rpc call
2024-06-18 08:42:58 -07:00
Nicolas Marie
c59cbdebfd Fixe wrapper to use -O1 on the wrapper (-O0 is not supported) 2024-06-17 17:46:29 -07:00
Nicolas Marie
87cc6ecc5f Fixe libc rpc test 2024-06-17 17:40:54 -07:00
Nicolas Marie
8aa3a4431a Fixe Attributors nulllptr 2024-06-17 17:39:25 -07:00
Nicolas Marie
1582f964a4 Fixe DeviceRTL atomics:
- remove unsupported & unused operations
- add scope awarness to nvidia atomicInc
2024-06-17 17:17:40 -07:00
Nicolas Marie
ef5d9941c3 Temporary fixe for main function canonicalization error. 2024-06-17 17:13:43 -07:00
Nicolas Marie
b4d5dec977 Fixe: remove call to removed LIBC_HAS_BUILTIN macro 2024-06-17 17:10:41 -07:00
Nicolas Marie
eae9159c1d Stop using outdated LTO pipeline for HostRPC 2024-06-17 17:09:15 -07:00
Nicolas Marie
b553ad7fd7 Replace getRawPointer by emitRawPointer 2024-06-17 17:03:07 -07:00
Nicolas Marie
3b6c5536c4 Remove unused CMakeLists.txt 2024-06-17 16:03:17 -07:00
Nicolas Marie
8980abe311 Remove outdated AutoRPC Folder 2024-06-17 14:57:29 -07:00
Joseph Huber
bf407f0829 [libc] Export the RPC interface from libc
Summary:
This patch adds new extensions that allow us to export the RPC interface
from `libc` to other programs. This should allow external users of the
GPU `libc` to interface with the RPC client (which more or less behaves
like syscalls in this context). This is done by wrapping the interface
into a C-style function call.

Obviously, this approach is far less safe than the carefully crafted C++
interface. For example, we now expose the internal packet buffer, and it
is theoretically possible to open a single port with conflicting opcodes
and break the whole interface. So, extra care will be needed when
interacting with this. However, the usage is very similar as shown by
the new test.

This somewhat stretches the concept of `libc` just doing `libc` things,
but I think this is important enough to justify it. It is difficult to
split this out, as we do not want to have the situation where we have
multiple RPC clients running at one time, so for now it makes sense to
just leave it in `libc`.
2024-05-23 09:36:28 -07:00
Nicolas Marie
2a4a8ef91e Fixe compilations issues after rebase of llvm-test-suite-gpu 2024-05-20 14:52:00 -07:00
Nicolas Marie
f1001ba68d Revert "[LTO] Remove Config.UseDefaultPipeline (#82587)"
This reverts commit ec24094b56.
We do need Config.UseDefaultPipeline.
2024-05-20 14:44:46 -07:00
Shilei Tian
1ef92ff738 [OpenMP] Add the initial support for direct gpu compilation
Rebase from llvm-test-suite-gpu, fixe rebase conflict in:

clang/include/clang/Driver/Options.td
clang/lib/CodeGen/CGDecl.cpp
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
clang/lib/Driver/ToolChains/Clang.cpp
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
llvm/lib/Passes/PassRegistry.def
llvm/lib/Transforms/IPO/AttributorAttributes.cpp
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
openmp/libomptarget/DeviceRTL/src/Mapping.cpp
openmp/libomptarget/DeviceRTL/src/State.cpp
openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
openmp/libomptarget/DeviceRTL/src/Utils.cpp
openmp/libomptarget/DeviceRTL/src/exports
openmp/libomptarget/include/omptarget.h
openmp/libomptarget/src/exports
openmp/libomptarget/src/interface.cpp
2024-05-20 14:44:29 -07:00
Valentin Clement
e90126e0dd [flang][cuf] Add attr gen dependency to fix #92635 2024-05-18 06:36:58 -07:00
Krzysztof Parzyszek
33550b43f4 [mlir] Add operator<< for printing Block (#92550)
Turns out it was already in Analysis/CFGLoopInfo, so just move it
to IR/AsmPrinter.
2024-05-18 08:03:19 -05:00
Shengchen Kan
4b62afca64 [X86][CodeGen] Support flags copy lowering for CCMP/CTEST (#91849)
```
%1:gr64 = COPY $eflags
OP1 may update eflags
$eflags = COPY %1
OP2 may use eflags
```

To use eflags as input at 4th instruction, we need to use SETcc to
preserve the eflags before 2, and update the source condition of OP2
according to value in GPR %1.

In this patch, we support CCMP/CTEST as OP2.
2024-05-18 19:50:16 +08:00
Thorsten Schütt
778826f0b8 [GlobalIsel] Combine select to integer min max more (#92570) 2024-05-18 13:43:10 +02:00
Vlad Serebrennikov
f7b0b99c52 [clang][NFC] Further improvements to const-correctness 2024-05-18 12:10:39 +03:00
Antonio Frighetto
2c2e0507e9 [clang][ThreadSafety] Skip past implicit cast in translateAttrExpr
Ignore `ImplicitCastExpr` when building `AttrExp` for capability
attribute diagnostics.

Fixes: https://github.com/llvm/llvm-project/issues/92118.
2024-05-18 09:49:10 +02:00
Fangrui Song
7b4dfec893 [MCAsmParser] Improve .rept/.irp tests 2024-05-17 23:49:01 -07:00
Kareem Ergawy
2a97b507dc [flang][OpenMP] Try to unify induction var privatization for OMP regions. (#91116) 2024-05-18 08:39:58 +02:00
Michael Klemm
bfeebda3b1 [flang][OpenMP] Re-enable tests when building OpenMP as a runtime (#89046) 2024-05-18 08:25:43 +02:00
Fangrui Song
195ba45721 [MCAsmParser] .macro/.rept/.irp/.irpc: remove excess \n after expansion
```
.irp foo,1
nop
.endr
nop
```

expands to an excess EOL between two nop lines. Other loop directives
and .macro have the same issue.

`Lex()` at "Jump to the macro instantiation and prime the lexer"
requires that there is one single \n token in CurTok. Therefore, we
cannot consume the trailing \n when parsing the macro(-like) body.
(commit c6e787f771 (reverted by
1e5f29af81))

Instead, skip the potential \n after jumpToLoc at handleMacroExit.
2024-05-17 23:02:54 -07:00
jiajie zhang
219476d20f Fix: remove wrongly pushed etime-function.mlir at toplevel (#92634)
The purpose of this PR is to remove the 'etime-function.mlir' file that
I mistakenly committed in
https://github.com/llvm/llvm-project/pull/92571. This file is not
necessary in source code control, and its presence may cause confusion
or misunderstanding.
2024-05-18 13:33:00 +08:00
Aiden Grossman
9e98815ef0 [Github] Revert accidental changes to dependabot config
f3524e9aeb accidentally touched the
dependabot config. This patch reverts that change.
2024-05-18 05:04:59 +00:00
Mircea Trofin
cfe9deb135 Reapply "[ctx_profile] Integration test (#92456)"
This reverts commit 881f20e958.

Passing -ldl -lpthread explicitly
2024-05-17 21:55:39 -07:00
Valentin Clement (バレンタイン クレメン)
702198fc9a [flang][cuda] Add data attribute to program globals (#92610) 2024-05-17 20:56:10 -07:00
Fangrui Song
faf39f45e3 Revert "[LoongArch] Use R_LARCH_ALIGN with section symbol (#84741)"
This reverts commit 01f79899ba.

This unusual special case has been discussed on the binutils mailing
list. The approach will be revisited:
https://sourceware.org/pipermail/binutils/2024-May/134092.html

Pull Request: https://github.com/llvm/llvm-project/pull/92584
2024-05-17 19:58:10 -07:00