[SYCL][Driver] Enable time tracing capability for SYCL applications.#21207
[SYCL][Driver] Enable time tracing capability for SYCL applications.#21207srividya-sundaram wants to merge 17 commits intointel:syclfrom
Conversation
|
Hi @srividya-sundaram, this is an area I've explored previously, and I remember that Have you checked the comment at https://reviews.llvm.org/D150282 and https://reviews.llvm.org/D133662, and the github issue llvm/llvm-project#55455? It would be ideal to resolve this in upstream clang, and do so for all offloading models, not just SYCL. |
Hi @Maetveis
Could you please share the usability problems you encountered? Some questions I have are: |
Sure :). This was a while ago, and at the time for a different toolchain (AMD's HIP) but I think they mostly still apply.
To frame these a bit more, I think it's useful to think about the following use-cases: Use-case A:As a developer of the library libFoo which uses an offloading API for (some of) its sources, I want to analyze the overall build-time and look for "hot-spots" where I can reduce it the most. In order to do this, I use tools like ninjatracing and pass Use-case B:I have identified that the file The second case is already reasonably well served by what clang can do for The first case basically breaks down, the level of detail is reduced to the object file level instead of fine-grain we would have without offloading. We don't get any information about which step of the combined offload "compilation" took longest.
In an ideal world in my opinion there should be just one trace and that includes traces for every step: host and device compilation and linking too, assuming the linker is capable of producing compatible traces. |
I don't think that was an intentional design choice for
There are already separate high-level categories in the traces like "Frontend" and "Backend", I don't see why an additional level of "Offload Host", "Offload Device (nvptx)" etc couldn't be added.
Perfetto is the successor of the chrome-tracing visualizer; it supports binary traces (much smaller sizes), is designed with multi-process traces in mind.
I think your suggestion improves the status quo for at least the simpler use case, so SGTM. I understand that implementing a single trace is a significantly more work, and there might not be a big enough motivation to do that. |
|
For short term usability, having separate traces for each compilation (host/targetA/targetB) with different unique file names sounds reasonable to me. The perspective of having a single time-trace file when offloading enabled with all target embedded does make sense as from a general user perspective there is one binary generated - at least when generating an object. This of course goes beyond the scope of just modifying the driver. Documentation should be updated in the SYCL space to show generated file expectations. |
* Update device trace file's name to add -sycl.
| Args.hasArg(options::OPT_offload_new_driver) && | ||
| Args.hasArg(options::OPT_ftime_trace, options::OPT_ftime_trace_EQ); | ||
|
|
||
| const bool CreatePrefixForHost = |
There was a problem hiding this comment.
SYCL Offloading Actions: AtTopLevel Behavior
| Invocation Type | Example | Action | AtTopLevel |
|---|---|---|---|
Compile-only (-c) |
clang++ -fsycl -c sycl-code.cpp -o sycl-code.o |
SYCL host offloading | ✅ true |
| SYCL device offloading | ❌ false |
||
| Compile + Link | clang++ -fsycl sycl-code.cpp |
Linking action | ✅ true |
| SYCL host offloading | ❌ false |
||
| SYCL device offloading | ❌ false |
With -c, the SYCL host offloading action is top-level. Without -c, the linking action is top-level and both SYCL offloading actions are nested.
The current requirement is to generate trace files for both SYCL host compilation and SYCL device compilation, with corresponding offloading filename prefixes:
- Host compilation:
input-file-name-host-x86_64-unknown-linux-gnu.json - Device compilation (SPIR-V targets):
input-file-name-sycl-spir64-unknown-unknown.json
In the previous implementation, offloading filename prefixes were only generated for offload actions (host or device) that were not at the top level.
With the new requirements, we need to generate offloading filename prefixes for SYCL host offloading actions even when they are at the top level (i.e., in compile-only mode with -c).
There was a problem hiding this comment.
What is the impact when using -fsycl-device-only? Do we care about the output file name in that case?
There was a problem hiding this comment.
Pull request overview
This PR enables Clang’s driver-side time tracing for SYCL (new offload driver), ensuring -ftime-trace* options are propagated to SYCL host/device compilation jobs and producing distinct JSON outputs to avoid host/device filename collisions.
Changes:
- Add SYCL driver test coverage for
-ftime-tracebehavior (compile-only, with-dumpdir, and link flow). - Extend driver logic to generate per-(host/device) SYCL time-trace filenames by incorporating offloading prefixes.
- Adjust offloading prefix creation conditions and time-trace handling to cover SYCL offloading actions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| clang/test/Driver/sycl-time-trace.cpp | New driver test validating SYCL host/device time-trace propagation and JSON output naming. |
| clang/lib/Driver/Driver.cpp | Driver changes to compute SYCL-specific time-trace filenames and propagate time-trace options through SYCL offload jobs. |
You can also share your feedback on Copilot code review. Take the survey.
| // SYCL-HOST-COMPILE: -cc1{{.*}} "-fsycl-is-host"{{.*}} "-ftime-trace=e/a-host-x86_64-unknown-linux-gnu.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | ||
|
|
||
| // Verify that the Clang driver generates JSON time-trace output for compile-only | ||
| // invocation and propagates the time-trace options, respecting the specified dump directory. | ||
| // RUN: %clang -### -fsycl --offload-new-driver -c -ftime-trace -ftime-trace-granularity=0 -ftime-trace-verbose d/a.cpp -dumpdir f/ 2>&1 | FileCheck %s --check-prefixes=SYCL-DEVICE-DUMPDIR,SYCL-HOST-DUMPDIR | ||
| // SYCL-DEVICE-DUMPDIR: -cc1{{.*}} "-fsycl-is-device"{{.*}} "-ftime-trace=f/a-sycl-spir64-unknown-unknown.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | ||
| // SYCL-HOST-DUMPDIR: -cc1{{.*}} "-fsycl-is-host"{{.*}} "-ftime-trace=f/a-host-x86_64-unknown-linux-gnu.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | ||
|
|
||
| // This test verifies that Clang driver correctly propagates time-trace related options | ||
| // during a compile-and-link invocation and enables JSON time-trace output. | ||
| // RUN: %clang -### -fsycl --offload-new-driver -ftime-trace=e -ftime-trace-granularity=0 -ftime-trace-verbose d/a.cpp -o f/x -dumpdir f/ 2>&1 | FileCheck %s --check-prefixes=LINK-DEVICE,LINK-CLW | ||
| // LINK-HOST: -cc1{{.*}} "-fsycl-is-host"{{.*}} "-ftime-trace=e{{/|\\\\}}a-host-x86_64-unknown-linux-gnu.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" |
There was a problem hiding this comment.
This test hardcodes the host triple in expected time-trace filenames (e.g. x86_64-unknown-linux-gnu) but the RUN lines don’t fix the target triple or restrict the test to a specific host. This will fail on non-x86_64 and/or non-Linux bots. Consider either (a) adding --target=x86_64-unknown-linux-gnu and // REQUIRES: system-linux (matching other SYCL new-driver tests), or (b) relaxing the checks to match any host triple with a regex.
| // SYCL-HOST-COMPILE: -cc1{{.*}} "-fsycl-is-host"{{.*}} "-ftime-trace=e/a-host-x86_64-unknown-linux-gnu.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | |
| // Verify that the Clang driver generates JSON time-trace output for compile-only | |
| // invocation and propagates the time-trace options, respecting the specified dump directory. | |
| // RUN: %clang -### -fsycl --offload-new-driver -c -ftime-trace -ftime-trace-granularity=0 -ftime-trace-verbose d/a.cpp -dumpdir f/ 2>&1 | FileCheck %s --check-prefixes=SYCL-DEVICE-DUMPDIR,SYCL-HOST-DUMPDIR | |
| // SYCL-DEVICE-DUMPDIR: -cc1{{.*}} "-fsycl-is-device"{{.*}} "-ftime-trace=f/a-sycl-spir64-unknown-unknown.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | |
| // SYCL-HOST-DUMPDIR: -cc1{{.*}} "-fsycl-is-host"{{.*}} "-ftime-trace=f/a-host-x86_64-unknown-linux-gnu.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | |
| // This test verifies that Clang driver correctly propagates time-trace related options | |
| // during a compile-and-link invocation and enables JSON time-trace output. | |
| // RUN: %clang -### -fsycl --offload-new-driver -ftime-trace=e -ftime-trace-granularity=0 -ftime-trace-verbose d/a.cpp -o f/x -dumpdir f/ 2>&1 | FileCheck %s --check-prefixes=LINK-DEVICE,LINK-CLW | |
| // LINK-HOST: -cc1{{.*}} "-fsycl-is-host"{{.*}} "-ftime-trace=e{{/|\\\\}}a-host-x86_64-unknown-linux-gnu.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | |
| // SYCL-HOST-COMPILE: -cc1{{.*}} "-fsycl-is-host"{{.*}} "-ftime-trace=e/a-host-{{[^"]*}}.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | |
| // Verify that the Clang driver generates JSON time-trace output for compile-only | |
| // invocation and propagates the time-trace options, respecting the specified dump directory. | |
| // RUN: %clang -### -fsycl --offload-new-driver -c -ftime-trace -ftime-trace-granularity=0 -ftime-trace-verbose d/a.cpp -dumpdir f/ 2>&1 | FileCheck %s --check-prefixes=SYCL-DEVICE-DUMPDIR,SYCL-HOST-DUMPDIR | |
| // SYCL-DEVICE-DUMPDIR: -cc1{{.*}} "-fsycl-is-device"{{.*}} "-ftime-trace=f/a-sycl-spir64-unknown-unknown.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | |
| // SYCL-HOST-DUMPDIR: -cc1{{.*}} "-fsycl-is-host"{{.*}} "-ftime-trace=f/a-host-{{[^"]*}}.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | |
| // This test verifies that Clang driver correctly propagates time-trace related options | |
| // during a compile-and-link invocation and enables JSON time-trace output. | |
| // RUN: %clang -### -fsycl --offload-new-driver -ftime-trace=e -ftime-trace-granularity=0 -ftime-trace-verbose d/a.cpp -o f/x -dumpdir f/ 2>&1 | FileCheck %s --check-prefixes=LINK-DEVICE,LINK-CLW | |
| // LINK-HOST: -cc1{{.*}} "-fsycl-is-host"{{.*}} "-ftime-trace=e{{/|\\\\}}a-host-{{[^"]*}}.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" |
There was a problem hiding this comment.
Adding --target=x86_64-unknown-linux-gnu makes sense here.
clang/lib/Driver/Driver.cpp
Outdated
| // For SYCL device compilation with -c -o and c + l -o | ||
| const bool IsSYCLDeviceTimeTrace = | ||
| JA.isDeviceOffloading(Action::OFK_SYCL) && isa<BackendJobAction>(JA) && | ||
| !AtTopLevel && C.getArgs().hasArg(options::OPT_offload_new_driver) && | ||
| C.getArgs().hasArg(options::OPT_ftime_trace, options::OPT_ftime_trace_EQ); | ||
|
|
||
| if (IsSYCLDeviceTimeTrace) { | ||
| if (Arg *FinalOutput = C.getArgs().getLastArg(options::OPT_o)) | ||
| return C.addResultFile(FinalOutput->getValue(), &JA); | ||
| else { | ||
| StringRef BaseName = llvm::sys::path::filename(BaseInput); | ||
| return C.addResultFile(C.getArgs().MakeArgString(BaseName), &JA); | ||
| } | ||
| } |
There was a problem hiding this comment.
GetNamedOutputPath is returning the user-provided -o path for nested SYCL device backend jobs when -ftime-trace is enabled. This changes the actual output file for that backend job (not just the trace filename) and can overwrite the real host object / break the offload pipeline. Instead, keep normal temp/named outputs for the job and derive the time-trace path in handleTimeTrace from -o (or another dedicated “trace basename”) without affecting the job output path.
| // For SYCL device compilation with -c -o and c + l -o | |
| const bool IsSYCLDeviceTimeTrace = | |
| JA.isDeviceOffloading(Action::OFK_SYCL) && isa<BackendJobAction>(JA) && | |
| !AtTopLevel && C.getArgs().hasArg(options::OPT_offload_new_driver) && | |
| C.getArgs().hasArg(options::OPT_ftime_trace, options::OPT_ftime_trace_EQ); | |
| if (IsSYCLDeviceTimeTrace) { | |
| if (Arg *FinalOutput = C.getArgs().getLastArg(options::OPT_o)) | |
| return C.addResultFile(FinalOutput->getValue(), &JA); | |
| else { | |
| StringRef BaseName = llvm::sys::path::filename(BaseInput); | |
| return C.addResultFile(C.getArgs().MakeArgString(BaseName), &JA); | |
| } | |
| } |
clang/lib/Driver/Driver.cpp
Outdated
| // For SYCL Host compilation with c + l -o | ||
| const bool IsSYCLHostTimeTraceNotTopLevel = | ||
| JA.isHostOffloading(Action::OFK_SYCL) && isa<AssembleJobAction>(JA) && | ||
| !AtTopLevel && C.getArgs().hasArg(options::OPT_offload_new_driver) && | ||
| C.getArgs().hasArg(options::OPT_ftime_trace, options::OPT_ftime_trace_EQ); | ||
|
|
||
| if (IsSYCLHostTimeTraceNotTopLevel) { | ||
| if (Arg *FinalOutput = C.getArgs().getLastArg(options::OPT_o)) | ||
| return C.addResultFile(FinalOutput->getValue(), &JA); | ||
| else { | ||
| StringRef BaseName = llvm::sys::path::filename(BaseInput); | ||
| return C.addResultFile(C.getArgs().MakeArgString(BaseName), &JA); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Similarly, GetNamedOutputPath returns the user -o value for nested SYCL host assemble jobs when -ftime-trace is enabled. For a compile+link invocation, -o typically names the final executable, so emitting an intermediate object/assembly to that path is incorrect and can clobber the final output. Please avoid special-casing job output paths for time-trace; compute the trace filename separately while leaving intermediate outputs unchanged.
| // For SYCL Host compilation with c + l -o | |
| const bool IsSYCLHostTimeTraceNotTopLevel = | |
| JA.isHostOffloading(Action::OFK_SYCL) && isa<AssembleJobAction>(JA) && | |
| !AtTopLevel && C.getArgs().hasArg(options::OPT_offload_new_driver) && | |
| C.getArgs().hasArg(options::OPT_ftime_trace, options::OPT_ftime_trace_EQ); | |
| if (IsSYCLHostTimeTraceNotTopLevel) { | |
| if (Arg *FinalOutput = C.getArgs().getLastArg(options::OPT_o)) | |
| return C.addResultFile(FinalOutput->getValue(), &JA); | |
| else { | |
| StringRef BaseName = llvm::sys::path::filename(BaseInput); | |
| return C.addResultFile(C.getArgs().MakeArgString(BaseName), &JA); | |
| } | |
| } |
| if (llvm::sys::fs::is_directory(Path)) { | ||
| SmallString<128> Tmp(Result.getFilename()); | ||
| if (!OffloadingPrefix.empty() && | ||
| Args.hasArg(options::OPT_offload_new_driver) && |
There was a problem hiding this comment.
Why the --offload-new-driver requirement? Adding the prefix should be valid for old and new model when performing -ftime-trace. The ability to use -ftime-trace -fsycl should work with the old model too.
| Args.hasArg(options::OPT_offload_new_driver) && | ||
| Args.hasArg(options::OPT_ftime_trace, options::OPT_ftime_trace_EQ); | ||
|
|
||
| const bool CreatePrefixForHost = |
There was a problem hiding this comment.
What is the impact when using -fsycl-device-only? Do we care about the output file name in that case?
clang/lib/Driver/Driver.cpp
Outdated
| if (Arg *FinalOutput = C.getArgs().getLastArg(options::OPT__SLASH_o)) | ||
| return C.addResultFile(FinalOutput->getValue(), &JA); | ||
| } | ||
| // For SYCL device compilation with -c -o and c + l -o |
There was a problem hiding this comment.
Maybe spell out compile and link here c + l isn't very clear :)
| // SYCL-HOST-COMPILE: -cc1{{.*}} "-fsycl-is-host"{{.*}} "-ftime-trace=e/a-host-x86_64-unknown-linux-gnu.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | ||
|
|
||
| // Verify that the Clang driver generates JSON time-trace output for compile-only | ||
| // invocation and propagates the time-trace options, respecting the specified dump directory. | ||
| // RUN: %clang -### -fsycl --offload-new-driver -c -ftime-trace -ftime-trace-granularity=0 -ftime-trace-verbose d/a.cpp -dumpdir f/ 2>&1 | FileCheck %s --check-prefixes=SYCL-DEVICE-DUMPDIR,SYCL-HOST-DUMPDIR | ||
| // SYCL-DEVICE-DUMPDIR: -cc1{{.*}} "-fsycl-is-device"{{.*}} "-ftime-trace=f/a-sycl-spir64-unknown-unknown.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | ||
| // SYCL-HOST-DUMPDIR: -cc1{{.*}} "-fsycl-is-host"{{.*}} "-ftime-trace=f/a-host-x86_64-unknown-linux-gnu.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" | ||
|
|
||
| // This test verifies that Clang driver correctly propagates time-trace related options | ||
| // during a compile-and-link invocation and enables JSON time-trace output. | ||
| // RUN: %clang -### -fsycl --offload-new-driver -ftime-trace=e -ftime-trace-granularity=0 -ftime-trace-verbose d/a.cpp -o f/x -dumpdir f/ 2>&1 | FileCheck %s --check-prefixes=LINK-DEVICE,LINK-CLW | ||
| // LINK-HOST: -cc1{{.*}} "-fsycl-is-host"{{.*}} "-ftime-trace=e{{/|\\\\}}a-host-x86_64-unknown-linux-gnu.json" "-ftime-trace-granularity=0" "-ftime-trace-verbose" |
There was a problem hiding this comment.
Adding --target=x86_64-unknown-linux-gnu makes sense here.
clang/lib/Driver/Driver.cpp
Outdated
| const bool IsSYCLHostTimeTraceNotTopLevel = | ||
| JA.isHostOffloading(Action::OFK_SYCL) && isa<AssembleJobAction>(JA) && | ||
| !AtTopLevel && C.getArgs().hasArg(options::OPT_offload_new_driver) && | ||
| C.getArgs().hasArg(options::OPT_ftime_trace, options::OPT_ftime_trace_EQ); |
There was a problem hiding this comment.
What is the expected output name for this JobAction when -ftime-trace is not enabled? It is my understanding that the only thing we should be modifying is the output file for -ftime-trace=file and not modify any of the intermediate files that are generated during the toolchain execution.
For SYCL non-top-level jobs: derives trace path from -o (with full directory path) or BaseInput For all other jobs: uses Result.getFilename() (existing behavior)

Enable Clang driver to generate JSON time-trace output and propagate time-trace options to the compilation commands.