Updated 2023-01-24

Open XDMoD Guide for PACE

Introduction

Open XDMoD was created by the University of Buffalo Center for Computational Research as part of the XSEDE project. Their main site and documentation can be accessed at OpenXDMoD.

About

Open XDMoD is an open source tool to facilitate the management of high performance computing resources. It is widely deployed at academic, industrial and governmental HPC centers. Open XDMoD’s management capabilities include monitoring standard metrics such as utilization, providing quality of service metrics designed to proactively identify underperforming system hardware and software, and reporting job level performance data for every job running on the HPC system without the need to recompile applications. Open XDMoD is designed to meet the following objectives: (1) provide the user community with a tool to more effectively and efficiently use their allocations and optimize their use of HPC resources, (2) provide operational staff with the ability to monitor, diagnose, and tune system performance as well as measure the performance of all applications running on their system, (3) provide software developers with the ability to easily obtain detailed analysis of application performance to aid in optimizing code performance, (4) provide stakeholders with a diagnostic tool to facilitate HPC planning and analysis, and (5) provide metrics to help measure scientific impact. In addition, analyses of the operational characteristics of the HPC environment can be carried out at different levels of granularity, including job, user, or on a system-wide basis.

The Open XDMoD portal provides a rich set of features accessible through an intuitive graphical interface, which is tailored to the role of the user. Metrics provided include: number of jobs, CPU hours consumed, wait time, and wall time, with minimum, maximum and the average of these metrics, in addition to many others. Metrics are organized by a customizable hierarchy appropriate for your organization.

-- Open XDMoD (https://open.xdmod.org)

The PACE XDMoD Site is available for all GT users - you must be logged into the VPN or on the GT Campus Network to access it.

Getting Help

Very detailed usage information is available via the PACE XDMoD User Manual. You may also contact us directly by email at pace-support@oit.gatech.edu as usual.

Using XDMoD

The Open XDMoD site allows you to view statistics for PACE resources from 2015 to the present.

Picture of XDMoD main site

The main page displays a summary of statistics for a particular time period, which can be changed via the calendar drop-down.

Calendar Dropdown highlight

Warning

When comparing XDMoD CPU Time to PACE Statements please note that Total CPU Time in XDMoD reflects only total Walltime * nCPUs, while the Statements "Total CPU-Time" does not include idle time for CPUs. There can also be minor differences due to the end ranges on the aggregation unit, and the fact that XDMoD ignores our local timezone settings.

More detailed statistics can be explored under the Usage tab, allowing you to view fine-grained details by resource, department, and time period.

Usage tab location

Personalizing XDMoD

It is possible to log in using your GT credentials, via the Sign In button at the top left.

XDMoD login button location

Select Sign in with Partnership for an Advanced Computing Environment to continue with your GT credentials.

XDMoD 2nd login button location

Signing in will open up the Metric Explorer, Data Export, Report Generator and Job Viewer tabs. The Metric Explorer and Report Generator allow you to generate scheduled reports which can be emailed to you automatically at regular intervals.

Additional tabs available after sign-on

These are described in great detail in the XDMoD User Manual.