Gobbert group notes for chip

2025-05-22 Matthias Gobbert

Questions that we need to understand:

– Job preemption:

First of all, which nodes exactly are match as opposed to PI nodes? My aim of this question is to read the output of sinfo correctly to know if I can start a multiple-node job on only match nodes.

Second, how does preemption work? Will I get a message when it happens? What will happen after the node becomes available again? What would happen if I rename the directory while the job is preempted? This can happen if I discover that the job did not finish normally and I mess around with it; or if I restart the job myself, while the system is trying to restart?

Third more general question: Is it really specific to a node? In other words, first of all, do I own a particular node or just one node? So, if I request –partition=pi_gobbert and there is an idle node (of the same type) in the match partition, will I get that idle node (instead of actually preempting my own node)?

– Compilers and fundamental software: What compiler suites do we have? GNU C/C++; Intel openAPI (cix, cipx, and Fortran one), Intel MPI and MKL.

– If I am using a 2018 compute node for compiling, will the executable be optimized only for 2018 nodes? Or for all? Or do I have to specify the target architecture?

– And we need to understand how to compile and run OpenMP code, both without and with MPI.

2025-05-21 Matthias Gobbert

Here is my experience for compiling and running C code with MPI on chip-cpu. This code also contains OpenMP multi-threading, but I am not using it here at first; we need to extend this to C with MPI and OpenMP code as soon as possible.

Compiling:

– After logging in to chip’s usernode, you need to be on a compute node to compile with the Intel compiler and with MPI (Intel) or MKL (Intel), since the module for the Intel compiler suite needs to be loaded (and loading cannot be done on the usernode). I am not aware of any other MPI than the Intel one. I do not know if the Intel MPI could be used together with the GNU C/C++ compiler.

To find out the exact name of the module to load, do “module spider Intel“; this command can be issued on the usernode or a compute node, but the actual “module load ...” can only be done on a compute node. I do not know if the type of node, that you are on during the compile, matters for compiling. But here, I show the srun for an interactive session on a 2024 node:

srun --cluster=chip-cpu --account=pi_gobbert --partition=2024 --qos=shared --time=07:00:00 --mem=16G --pty $SHELL

The output of “module spider Intel” shows several compiler versions. The latest as of today is 2024a, so the load command is

module load intel/2024a

As you can confirm by which mpiicc, this script (think of it as the Intel MPI compiler) is now found on the path. It turns out that you still need to tell the system the Intel compiler’s name, so also say

export I_MPI_CC=icx

If the C code is in the file nodesused.c, you can then compile by

mpiicc nodesused.c -o nodesused

In my experience so far, an executable compiled in an interactive shell on a 2018 or 2024 node can be run on either a 2018 or a 2024 node.

Running:

– We want to use batch submission with slurm, so the goal is to write a slurm script. Notice first the issue of the MPI run command: In my experience so far, srun gave errors. So, I am using mpirun for parallel MPI code. You need to have this command on the path, that is, “which mpirun” should work to show you a location. This comes from “module load intel/2024a“. So, you either have to be on a compute node and have that module loaded, before submitting the slurm job. Or you need to put this module load command into the slurm script (right before the mpirun line for instance); with that line in the slurm script itself, you can sbatch the job from the usernode. The following is an example of a slurm script for the 2024 partition; its filename is run-n2ppn4mpi.slurm to indicate that it uses 2 nodes, 4 processes-per-node, and MPI-only. You submit the job by saying

sbatch run-n2ppn4mpi.slurm

at the Linux command-line.

[gobbert@chip nodesused11]$ pwd -P
/umbc/rs/gobbert/common/hpcf/tests_chip-cpu/250515compile/nodesused11
[gobbert@chip nodesused11]$ more run-n2ppn4mpi.slurm
#!/bin/bash

#SBATCH --job-name=nodesused            # Job name
#SBATCH --output=slurm.out              # Output file name
#SBATCH --error=slurm.err               # Error file name
#SBATCH --cluster=chip-cpu              # Cluster
#SBATCH --account=pi_gobbert            # Account
#SBATCH --partition=match               # Partition
#SBATCH --qos=shared                    # Queue
#SBATCH --time=00:05:00                 # Time limit
#SBATCH --nodes=2                       # Number of nodes
#SBATCH --ntasks-per-node=4             # MPI processes per node
#SBATCH --mem=4G                        # Memory for job

unset I_MPI_PMI_LIBRARY
export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0
module load intel/2024a
mpirun -print-rank-map ./nodesused

This nodesused.c C+MPI+OpenMP program is supposed to report the hostnames of the nodes used, the MPI ranks, the cpu id used for each rank, and the OpenMP threads and their cpu id. Sample output of a run with above slurm script reads

[gobbert@chip nodesused11]$ pwd -P
/umbc/rs/gobbert/common/hpcf/tests_chip-cpu/250515compile/nodesused11
[gobbert@chip nodesused11]$ ll
total 240
-rwxrwx--- 1 gobbert pi_gobbert 17504 May 19 15:19 nodesused*
-rw-rw---- 1 gobbert pi_gobbert  3809 Oct 22  2018 nodesused.c
-rw-rw---- 1 gobbert pi_gobbert   440 May 21 14:04 nodesused_cpuid.log
-rw-rw---- 1 gobbert pi_gobbert   320 May 21 14:04 nodesused.log
-rw-rw---- 1 gobbert pi_gobbert   592 May 21 14:03 run-n2ppn4mpi.slurm
-rw-rw---- 1 gobbert pi_gobbert     0 May 21 14:04 slurm.err
-rw-rw---- 1 gobbert pi_gobbert   555 May 21 14:04 slurm.out
[gobbert@chip nodesused11]$ more slurm.out
(c24-36:0,1,2,3)
(c24-37:4,5,6,7)

Hello world from process 0003 out of 0008, processor name c24-36
Hello world from process 0001 out of 0008, processor name c24-36
Hello world from process 0000 out of 0008, processor name c24-36
Hello world from process 0002 out of 0008, processor name c24-36
Hello world from process 0004 out of 0008, processor name c24-37
Hello world from process 0005 out of 0008, processor name c24-37
Hello world from process 0007 out of 0008, processor name c24-37
Hello world from process 0006 out of 0008, processor name c24-37
[gobbert@chip nodesused11]$ more nodesused.log
MPI process 0000 of 0008 on node c24-36
MPI process 0001 of 0008 on node c24-36
MPI process 0002 of 0008 on node c24-36
MPI process 0003 of 0008 on node c24-36
MPI process 0004 of 0008 on node c24-37
MPI process 0005 of 0008 on node c24-37
MPI process 0006 of 0008 on node c24-37
MPI process 0007 of 0008 on node c24-37
[gobbert@chip nodesused11]$ more nodesused_cpuid.log
MPI process 0000 of 0008 on cpu_id 0000 of node c24-36
MPI process 0001 of 0008 on cpu_id 0002 of node c24-36
MPI process 0002 of 0008 on cpu_id 0004 of node c24-36
MPI process 0003 of 0008 on cpu_id 0006 of node c24-36
MPI process 0004 of 0008 on cpu_id 0000 of node c24-37
MPI process 0005 of 0008 on cpu_id 0002 of node c24-37
MPI process 0006 of 0008 on cpu_id 0004 of node c24-37
MPI process 0007 of 0008 on cpu_id 0006 of node c24-37

Some notes about the above: In the slurm script, notice the “mpirun -print-rank-map“, this results in the two first lines (one for each of the two nodes) in the stdout file slurm.out. Notice the order of the lines in slurm.out is random, while these lines in the log files are ordered. We can confirm that 2024 nodes c24-36 and c24-37 were used. Each had 4 MPI processes running on it. The cpu id report as 0, 2, 4, 6; it is not clear if this is good, since it would appear that these are all on CPU0 of the two CPUs CPU0 and CPU1 on the node.

Monitoring running:

sinfo, squeue, scancel, scontrol

2025-05-20 Matthias Gobbert

Acknowledgment for using chip in HPCF:

Please do not use the old Acknowledgment text any more (the text that mentioned several MRI grants and one SCREMS grant). Instead, please use the following. Here is a full sentence that would be good for the Acknowledgment section of a paper

We acknowledge the UMBC High Performance Computing Facility (hpcf.umbc.edu) and the financial contributions from NIH, NSF, CIRC, and UMBC for this work.

and a one-line phrase that is appropriate for the title page of a presentation or a poster

Acknowledgments: HPCF, NIH, NSF, CIRC, UMBC

2025-05-15 Matthias Gobbert:

Best sequence of steps to connect to chip:

– Run VPN GlobalProtect

– Use PuTTy to connect to chip.rs.umbc.edu. Or use Windows Powershell. Or use Mac Terminal or Linux Terminal, then ssh to chip.rs.umbc.edu; if graphics connection is intended, use ssh -Y to connect from these terminals.

– You can start editing files, etc. on the user node and you can start jobs using a slurm file by sbatch on the user node.

– If you need to load module(s) to work, you need to use a compute node (instead of the user node) by requesting an interactive session and loading the module(s) there. For example, I use the following srun command, which runs a bash shell for up to 7 hours on the match partition of the 2024 CPU nodes (I use 7 hours to cover a typical workday for me; you can use shorter, but may not want to go much longer for an interactive session):

srun --cluster=chip-cpu --account=pi_gobbert --partition=match --qos=shared --time=7:00:00 --mem=16G --pty $SHELL

– The above does not make a graphics connection. If graphics connection is intended, get an allocation of a (portion of a) node with X11 tunneling by salloc first, then ssh to the node. The salloc command is the same as the srun above, but with the -pty $SHELL removed; you need to wait which node you get, then use that hostname to ssh to. Here is a sample session:

[gobbert@chip ~]$ salloc --cluster=chip-cpu --account=pi_gobbert --partition=match --qos=shared --time=7:00:00 --mem=16G
salloc: Granted job allocation 73057
salloc: Nodes c24-29 are ready for job
[gobbert@chip ~]$ ssh c24-29

– On that compute node, for instance to use LaTeX, load modules to use pdflatex and xpdf by

module load texLive/2025
module load xpdf/4.04-GCCcore-12.3.0

Check that pdflatex and xpdf are found:

[gobbert@c24-29 ~]$ which pdflatex
/cm/shared/apps/texLive/2025/bin/x86_64-linux/pdflatex
[gobbert@c24-29 ~]$ which xpdf
/usr/ebuild/installs/software/xpdf/4.04-GCCcore-12.3.0/bin/xpdf

Now, you can compile LaTeX’s tex files by pdflatex
and display their resulting PDF files using xpdf

– For such a graphics connection, make sure to prepare your Windows side by running XLaunch or XMing or similar.

– To see available
To see details of available (idle) equipment, use

sinfo -o "%10N %4c %10m %40f %10G"

A sample output of this is

[gobbert@chip ~]$ sinfo -o "%10N %4c %10m %40f %10G"
CLUSTER: chip-cpu
NODELIST   CPUS MEMORY     AVAIL_FEATURES                           GRES
c24-[14-51 64   476837     location=local,low_mem                   (null)
c24-[01-13 64   953674     location=local,high_mem                  (null)
c18-[01,05 36+  182524+    location=local                           (null)

CLUSTER: chip-gpu
NODELIST   CPUS MEMORY     AVAIL_FEATURES                           GRES
g20-[01,03 96   385581     RTX_2080TI,RTX_2080ti,rtx_2080TI,2080,20 gpu:8
g20-[12-13 96   238418     RTX_8000,rtx_8000,8000                   gpu:8
g24-[01-08 32   257443     L40S,l40s,L40s,l40S                      gpu:4
g24-[09-10 32   257443     h100,H100                                gpu:2
g20-[02,04 96   385581     RTX_2080TI,RTX_2080ti,rtx_2080TI,2080,20 gpu:6
g20-[05-11 96   385581     RTX_6000,rtx_6000,6000                   gpu:8

2025-04-17 Matthias Gobbert:
I attended (online) the Getting Started with Chip training.

– They will have standard software for Python like matplotlib, numpy, etc.
–> Which other ones? Can someone see a list of more ‘standard’ ones that are there?
– You are encouraged to use virtual environments for Python.
–> Is this the correct phrase? Someone with experience with this, can you provide an example?
– They can help by “office hours” [which I understand are best arranged by filing a ticket first and agreeing when/where to meet, instead of just walking in].

2025-04-16: I am trying to use this page under CIRC to collect our more detailed and practical information how to use chip.

(0) Please study the DoIT documentation yourself! Three places:
(a) hpcf.umbc.edu -> Compute tab -> Overview (has table that describes all portions of chip)
(a) hpcf.umbc.edu -> Compute tab -> slurm:chip-cpu (has table with exact list which nodes c?? are in which partition)
(b) hpcf.umbc.edu -> Compute tab -> User Documentation -> slurm (about in middle of the table-of-contents in the main part of the screen).

(1) Please, confirm that you can log in to chip.rs.umbc.edu for instance using PuTTy from Windows or a terminal/shell from Mac or Linux.
The new chip cluster lives behind the UMBC firewall, so either (i) run VPN first or (ii) you need a Duo push to log in. I recommend to use a VPN, since that will take care of all connection issues, such as from multiple shells, WinSCP, and more.

Note: Instructions and download links for the UMBC GlobalProtect VPN for Windows and macOS can be found here: https://umbc.atlassian.net/wiki/spaces/faq/pages/30754220/Getting+Connected+with+the+UMBC+GlobalProtect+VPN

For Linux users, please refer to the official instructions provided by Palo Alto Networks:
https://docs.paloaltonetworks.com/globalprotect/6-2/globalprotect-app-user-guide/globalprotect-app-for-linux/use-the-globalprotect-app-for-linux

You can download the zipped .tar file for the Linux client here:
https://drive.google.com/file/d/1cKFRjv8bt0JQ0h_eS2kXLQhtQbbDfZs7/view?usp=sharing

 

(2) The home directory may be very bare. In particular, DoIT is not creating symbolic links any more. Enter the command “alias” to see all aliases and notice how something like “gobbert_user” is now an alias for “cd gobbert_user”, as if gobbert_user were a symbolic link. To get our old behavior with links back, just do yourself

ln -s /umbc/rs/gobbert/common/ gobbert_common
ln -s /umbc/rs/gobbert/users/$USER gobbert_user
ln -s /umbc/rs/gobbert/group_saved/ gobbert_saved

The first and the third symbolic link are identical for all of us, but the second one is obviously not, since this is your personal user area; the shell variable $USER picks up your username. Notice that the name of the link should still be gobbert_user, since this refers to your user area in the pi_gobbert group. If you are a member of another Unix group, this link would have a different name such as cybertrn_user.

(3) The startup file is .bashrc in the home directory. I copied some material from taki’s .bashrc over, and please add also the line “umask 077”. It seems that this umask setting is not provided any more. We need to research this more, as the behavior is not clear to me. Anyway, my .bashrc on chip looks like as follows, keep reading for more discussion about it:

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
PATH=$PATH:/home/$USER

export PATH

module load gcc
module load slurm

# User specific aliases and functions

umask 007

set -o noclobber
# set -o rmstar equivalent to this in bash?
set -o notify

alias ll='ls -lF'
alias lt='ls -ltF'
alias lr='ls -ltrF'
alias mv='mv -i'
alias cp='cp -i'

alias h='history 20'

alias interactive-srun-match='srun --cluster=chip-cpu --account=pi_gobbert --partition=match --qos=shared --time=7:00:00 --mem=16G --pty $SHELL'
alias interactive-srun-2018='srun --cluster=chip-cpu --account=pi_gobbert --partition=2018 --qos=medium --time=7:00:00 --mem=16G --pty $SHELL'
alias interactive-salloc-match='salloc --cluster=chip-cpu --account=pi_gobbert --partition=match --qos=shared --time=7:00:00 --mem=16G'
alias interactive-salloc-2018='salloc --cluster=chip-cpu --account=pi_gobbert --partition=2018 --qos=medium --time=7:00:00 --mem=16G'

alias load-latex='module load texLive/2025'
export TEXINPUTS=/umbc/rs/gobbert/group_saved/soft/tex/inputs:.:
export BSTINPUTS=/umbc/rs/gobbert/group_saved/soft/tex/inputs:.:
export BIBINPUTS=/umbc/rs/gobbert/group_saved/soft/tex/biblio/curr:

(4) With the user/login/edge node on chip being virtual, we should use an interactive shell on a compute node even for simple tasks, one being LaTeX for instance, which cannot be done on the user node any more. Notice that I defined two interactive srun aliases above. These will be available after logging in to chip. “interactive-2018” should be sufficient for light-weight tasks including compiling.
To load the module for LaTeX’s pdflatex command, note my alias “load-latex” above in my .bashrc shell. I use that after I have an interactive shell on a compute node.

(5) How to code .cache needs to be explained. That is .cache for VSCode. Which other one? For Python?

(6) Geant4 version 10.7.3 has been installed at the following path:

/umbc/rs/gobbert/common/research/geant4/Geant4.10.7/geant4.10.07

To enable Geant4 in your environment, please add the following lines to your .bashrc file:

source /umbc/rs/gobbert/common/research/geant4/Geant4.10.7/geant4.10.07/bin/geant4.sh
source /umbc/rs/gobbert/common/research/geant4/Geant4.10.7/geant4.10.07/share/Geant4-10.7.3/geant4make/geant4make.sh

To verify that Geant4 has been successfully loaded, run the following command:

(base) [ehsans1@c21-16 ~]$ geant4-config –version
10.7.3
(base) [ehsans1@c21-16 ~]$

If you see the version number 10.7.3 as shown above, it confirms that Geant4 is correctly loaded in your environment on the cluster.