Skip to main content

Link to section 'AMD uProf Usage Guide' of 'AMD uprof' AMD uProf Usage Guide

AMD uProf is a performance analysis tool for AMD processors and GPUs. It provides insights into application performance, including CPU/GPU usage, memory behavior, and threading efficiency. This guide outlines how to use AMD uProf on RCAC clusters.

Link to section 'Loading on RCAC Clusters' of 'AMD uprof' Loading on RCAC Clusters

AMD uProf is available on the gautschi cluster and can be loaded with:

ml amduprof

You can verify installation and version with:

AMDuProfCLI --version

Link to section 'Collecting Performance Data' of 'AMD uprof' Collecting Performance Data

Similar to other profiling tools, AMD uProf operates in two stages: data collection and analysis. Profiling can be performed via the CLI and later visualized using the GUI or text-based reports.

Link to section 'Basic Syntax' of 'AMD uprof' Basic Syntax

To collect data on an application your_app, use:

AMDuProfCLI collect --config <profile-type> --output-dir </path/to/output-dir> ./your_app [args]

This command collects data using the specified profile type and saves results in the output-dir directory.

Link to section 'Common Profile Types' of 'AMD uprof' Common Profile Types

  • tbp (Time-Based Profiling):
    • Collects periodic samples of function usage.
    • Useful for identifying hotspots and frequently executing functions.
  • hotspots
    • Use this configuration to identify where programs are spending time and it’s callstack.
  • overview
    • Use this configuration to get an overall analysis and to find potential issues for further investigation.
  • threading (Threading Analysis)
    • Use this configuration to get an overall threading analysis and to find potential issues for further investigation.
  • memory (Cache Analysis)
    • Use this configuration to identify the false cache-line sharing issues. The profile data will be collected using IBS OP.

Link to section 'Analyzing Performance Data' of 'AMD uprof' Analyzing Performance Data

From the directory you specified during profiling, you are able to immediately create profiling summaries directly on the command line via the report option in uProf:

AMDuProfCLI report --input-dir /path/to/output-dir/AMDuProf...MMM-DD-YYYY_hh-mm-ss

This will result in a summary CSV file detailing your run

uProf also offers a Graphical User Interface to analyze profiling runs directly, and can be launched with the AMDuProf command, which will lauch the GUI:

 

In the GUI, open your result directory by navigating to “Import Session”, and open the /path/to/output-dir/AMDuProf...MMM-DD-YYYY_hh-mm-ss directory that was generated during your run

Link to section 'Example' of 'AMD uprof' Example

In this simple program, we have introduced several inefficiencies that we may be able to detect through uProf profiling.

#include <iostream>
#include <vector>
#include <chrono>
#include <thread>
#include <cmath>

void inefficient_function() {
    std::vector<double> data(10000, 0.0);
    for (int i = 0; i < 10000; ++i) {
        for (int j = 0; j < 10000; ++j) {
            data[i] += std::sin(j * 0.001);
        }
    }
}

int main() {
    auto start = std::chrono::high_resolution_clock::now();
    std::thread t1(inefficient_function);
    std::thread t2(inefficient_function);
    t1.join();
    t2.join();
    std::this_thread::sleep_for(std::chrono::seconds(2));
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> diff = end - start;
    std::cout << "Program completed in " << diff.count() << " seconds.\n";
    return 0;
}

After we compile, we can collect a time based profiling run of this code with the following command:

AMDuProfCLI collect --config tbp --output-dir ./uprof_results ./test

This saves results to a ./uprof_results/AMDuProf-test-TBP_MMM-DD-YYYY_hh-mm-ss/ directory, which can be directly analyzed with either the AMDuProfCLI report option:

AMDuProfCLI report --input-dir ./uprof_results/AMDuProf-test-TBP_Jun-26-2025_11-46-26/

Alternatively, view results in the GUI via AMDuProf GUI

After loading the results in the GUI, you will be provided with an overview summary of the results, which provides a breakdown of where time is spent:

By navigating the the analysis tab, you will be able to see a timeline breakdown of the profiling run:

Helpful?

Thanks for letting us know.

Please don't include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.