Wiki¶

Quick help¶

Each request to the RIMAR-BAC service is managed via the ticketing web system that can be reached at https://ticketing.bioinfo.szn.it. You can login to it using the same credentials you use for https://amministrazione.szn.it. It is important to note that for our support system your account will exist only after your first successful login.

HPC cluster access¶

You can request access to our HPC cluster systems by filing a request to the RIMAR-BAC service. Few easy steps:

open a ticket on this ticketing platform asking for access to HPC computational resources: you can login to the ticketing platform using the same credentials you use for amministrazione.szn.it
we will take your request in charge and eventually ask you to come to our offices
we will provide you a form to fill with some informations and sign, read it carefully!
you will receive your access credentials. The ticketing platform is the main way to ask for support, software installation, bug reporting and troubleshooting

BAC Next Generation HPC.v2 (internal name: Joshua)¶

Cluster specs¶

Joshua is composed of 20 CPU-only nodes and 4 GPU-nodes. All of the nodes are installed with Ubuntu Linux 24.04.02 LTS.

Each CPU node (Dell PowerEdge R660) is equipped with:

2x32 cores Intel XEON Gold 6548N CPUs
1TB DDR5 RAM
1x Dual 100GbE NIC

Each GPU node (Dell PowerEdge R760xa) is equipped with:

2x24 cores Intel Xeon Gold 5418Y CPUs
1TB DDR5 RAM
2x dual 100GbE NICs
4x L40S NVIDIA GPU cards

Accounts¶

Standard account provides the following resources on CPU queues:

Total of 64 cores split into 2 jobs. Each job can use 32 cores max.
Max 2 jobs active simultaneously
Max 5 jobs queued

Following limits apply to the GPU queue:

Max 1 GPU in use
Max 12 CPU cores in use. Each job can use all of the 12 CPUs
Max 1 job active
Max 2 jobs queued

Custom limits can be set upon specific agreement.
For storage resources, please refer to the Available storage and user quota section.

Cluster access¶

There are several ways to access the new HPC cluster command line interface.

Direct ssh access¶

If you are planning to use graphical tools (such as sftp clients for file upload or tools integrating graphical file browsers such as MobaxTerm under windows) direct ssh access is the easiest way to go. Use the following connection parameters.

server hostname: sshfe.bac.szn.it
port: 2222

In practice, you can use the following command to connect:

ssh -p 2222 myusername@sshfe.bac.szn.it

In case of ongoing DNS server service disruption, the hostname sshfe.bac.szn.it will fail to resolve.
In alternative you can use the following command:

ssh -p 2222 myusername@90.147.76.151

Please, consider that these parameters may change once the migration to the new infrastructure is completed.

Load balanced access¶

The load balanced access should be the preferred in case you only use ssh and scp from the command line.
To connect using the load balanced mode, you need some configurations in place. You need to add the following lines to your ssh client config.

Host *.bioext.szn
    ProxyCommand openssl s_client -quiet -connect sshfe.bac.szn.it:22 -servername %h
    ForwardAgent yes
    Compression yes
    ServerAliveInterval 25
    ServerAliveCountMax 2

The following code could be enough to add those lines to your user ssh client config file (this is valid for linux bash/zsh shells and for Windows MobaxTerm).

cat << EOF >> ~/.ssh/config

Host *.bioext.szn
    ProxyCommand openssl s_client -quiet -connect sshfe.bac.szn.it:22 -servername %h
    ForwardAgent yes
    Compression yes
    ServerAliveInterval 25
    ServerAliveCountMax 2
EOF

After this configuration is in place you can connect to the HPC cluster with the following command:

ssh myusername@joshua.bioext.szn

The given configuration works for scp transfers too. For instance the following command will transfer a file called myfile to your home directory on the cluster:

scp myfile myusername@joshua.bioext.szn:./

Password policy¶

You are welcome to change your password. Our password policy is the following:

password must be at least 10 chars long
it must contain at least 2 uppercase letters
it must contain at least 1 lowercase letter
it must contain at least 2 digits
it must contain at least 1 special symbol

To change your paasword you can use the

passwd

command from the command line.

Available storage and user quota¶

Three main storage areas are available for each user:

home directory (/home/username): your home directory by default provides 10GB of space. You can use it for your code, scripts, whatever you need for your work. This storage is backed up periodically.
scratch space (/bee/userdata/username): dedicated scratch space, about 2TB for each user. This must be used for the data you are going to analyze and for the output of your analysis. Once you complete your analysis, this space must be cleaned. No backup of the data stored here is provided.
archive (/archive/username): archival space, by default 3TB for user. This storage is backed up periodically.

ATTENTION: the "/home" and the "/bee/userdata" paths are visible across all nodes of the cluster. The "/archive" path is only available on the frontend, thus it cannot be used as a path for the input or output of your analysis.

Storage areas usage¶

Transfer the data to your scratch storage directly:

scp -r myfiles myusername@joshua.bioext.szn:/bee/userdata/myusername/

Prepare your batch job file and store it in your home folder. Paths for input and output must point to the positions in /bee/userdata/myusername/myfiles

Data transfer from BAC-HPC.v1 (Falkor) to BAC-HPC.v2 (Joshua)¶

You will be responsible for the transfer of your data from the old HPC to the new one. Transfer can be accomplished via rsync or scp.
To transfer a full directory:

login to BAC-HPC.v1 (Falkor)
use scp or rsync to transfer data to BAC-HPC.v2 (Joshua)

Please, if you don't feel safe doing it by yourself, open a ticket asking for support.

Example using rsync (this will copy your_directory to your home on Joshua):

rsync -avp your_directory myusername@10.234.0.150:./

Using scp:

scp -r your_directory myusername@10.234.0.150:./

Another example, transferring your data directly to the scratch area, using rsync:

rsync -avp your_directory myusername@10.234.0.150:/bee/userdata/myusername/

Using scp:

scp -r your_directory myusername@10.234.0.150:/bee/userdata/myusername/

Installed software¶

The new HPC infrastructure already provides a set of installed software. You can find the list at the following URL: https://docs.google.com/spreadsheets/d/110aaJFrLrly7raj6FwVzjQq2Ci4KGKJ3DVyjUXLbhSU/edit?usp=sharing
The list is subject to changes as new software tools are installed weekly. As in the past, you can request the installation of new software via the ticketing platform.

Example job scripts¶

The following example uses CPUs for a common analysis. The task will be scheduled on the `all` partition, which includes CPU computation nodes. The job will use 12 cores at once.

#!/bin/bash
#
#SBATCH --partition=all
#SBATCH --job-name=bwa-job
#SBATCH --cpus-per-task=12
#SBATCH --mail-user=myaddress@szn.it
#SBATCH --mail-type=ALL

# create index for tomato_genome.fasta. Index will be created in /bee/userdata/myusername/testing/tomato_idx
srun bwa index -p /bee/userdata/myusername/testing/tomato_idx -a bwtsw /bee/userdata/myusername/testing/tomato_genome.fasta
# create index for athaliana_genome.fasta. Index will be created in /bee/userdata/myusername/testing/athaliana_idx
srun bwa index -p /bee/userdata/myusername/testing/athaliana_idx -a bwtsw /bee/userdata/myusername/testing/athaliana_genome.fasta

# start your bwa processes sequentially using 64 cores. Outputs will be created in /bee/userdata/myusername/testing/out/
# use the -t switch to specify the number of threads/cores to use for the process
srun bwa mem -t 12 /bee/userdata/myusername/testing/athaliana_idx /bee/userdata/myusername/testing/my.fastq  > /bee/userdata/myusername/testing/athaliana.sam
srun bwa mem -t 12 /bee/userdata/myusername/testing/tomato_idx /bee/userdata/myusername/testing/my.fastq  > /bee/userdata/myusername/testing/tomato.sam

Save this script as myjob.sh

Schedule your job:
```
sbatch myjob.sh
```

Once your job completes:

move your output files away from the scratch area (archive them or transfer them out of the HPC cluster)
delete all temporary files from the scratch area, delete all the files you don't need anymore. Keep your area clean.

GPU jobs must be scheduled on a dedicated partition. The following example job shows a basic usage.

#!/bin/bash
#
#SBATCH --partition=gpu
#SBATCH --job-name=dorado-job
#SBATCH --gres=gpu:ada:1
#SBATCH --cpus-per-task=12
#SBATCH --mail-user=myaddress@szn.it
#SBATCH --mail-type=ALL

module load dorado/0.9.6

srun dorado basecaller --emit-fastq --device cuda:0 --output-dir /bee/userdata/myusername/testout rna002_70bps_hac@v3 /bee/userdata/myusername/pod5/

Old BAC-HPC.v1 cluster access (internal name: Falkor)¶

You can access Falkor via SSH at host 10.18.16.35 from the internal network or using 90.147.76.150 from the external network
```
ssh <your_username>@90.147.76.150
```

If you receive an error similar to the following when connecting from external networks

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:5Dfoz4xuiPNosKOrqfIXlKRuVQJNQPI6VwxX93/o6J0.
Please contact your system administrator.
Offending ECDSA key in /Users/xxyyyxx/.ssh/known_hosts:117
ECDSA host key for 193.205.231.59 has changed and you have requested strict checking.
Host key verification failed.

Please, modify your local .ssh/known_hosts file removing the offending key. The error is due to the fact that the same external IP is now pointing to a different machine

You can transfer files from and to the HPC cluster via scp
We are not enforcing an user quota on your home, but we are monitoring usage. Once in a while we could ask you to operate a cleanup to make room on the storage. We rely on a policy of fair usage of resources at the moment

Main rules¶

All your jobs must be submitted to the scheduler queue. Our cluster uses the SLURM workload manager for job queuing. Please, have a look at the documentation below
As a consequence, please do__not__start jobs on the HPC cluster frontend

HPC user environment¶

Our HPC cluster use
```
module
```
to setup software environments according to the needs. Type
```
module avail
```
to get a list of available environments
To use a module you can use the "module load" syntax. E.g.: to load bowtie 2.3.1 environment
```
module load bowtie/2.3.1
```
If you want to unload a module:
```
module unload bowtie/2.3.1
```
You are very welcome to suggest and propose new module files. You can even prepare your own modulefiles. You can find a quick guide at this URL

Software¶

There is a moltitude of software installed at the moment. Have a look at this page for more details
You can request the installation of new software. According to the software programming language and/or software dependencies, we will find the optimal way to install it. This process is usually cooperative: please expect to be asked for feedback

Workload manager: SLURM¶

You are asked to launch your jobs using the default workload manager, SLURM. SLURM is a very flexible and customizable software with a moderate learning curve. Please, have a look at the SLURM documentation. For a quick guide have a look at this page also
To get informations about nodes current status and the available queues you can type the following command in a bash shell on the cluster:
```
sinfo
```
We are giving some sample SLURM batch file on this page
SLURM and the MPICH2 talk each other in different ways. An early knowledge base is here

Data backup¶

The HPC cluster is now running quite stable after having suffered bad faults in the past. At the moment, we have no backup facilities. Please, keep a copy of your data.

Programming languages knowledge base¶

This section will collect documentation about language specific tasks, parallelization and task running under SLURM

Quick links¶

Some quick links you can reach from the internal SZN network segment (at the moment):

This wiki page can be found at: https://ticketing.bioinfo.szn.it/projects/user-requests/wiki/Wiki/
Redmine ticketing system: https://ticketing.bioinfo.szn.it
Gitlab (experimental system): https://gitlab.bioinfo.szn.it
Shell Access to other resources

Files (0)

Project

General

Profile

User requests

Wiki