Project

General

Profile

Wiki

Quick help

Each request to the RIMAR-BAC service is managed via the ticketing web system that can be reached at https://ticketing.bioinfo.szn.it. You can login to it using the same credentials you use for https://amministrazione.szn.it. It is important to note that for our support system your account will exist only after your first successful login.

HPC cluster access

You can request access to our HPC cluster systems by filing a request to the RIMAR-BAC service. Few easy steps:
  • open a ticket on this ticketing platform asking for access to HPC computational resources: you can login to the ticketing platform using the same credentials you use for amministrazione.szn.it
  • we will take your request in charge and eventually ask you to come to our offices
  • we will provide you a form to fill with some informations and sign, read it carefully!
  • you will receive your access credentials. The ticketing platform is the main way to ask for support, software installation, bug reporting and troubleshooting

BAC Next Generation HPC.v2 (internal name: Joshua)

Cluster access

There are several ways to access the new HPC cluster command line interface.

Direct ssh access

If you are planning to use graphical tools (such as sftp clients for file upload or tools integrating graphical file browsers such as MobaxTerm under windows) direct ssh access is the easiest way to go. Use the following connection parameters.

server hostname: sshfe.bac.szn.it
port: 2222

In practice, you can use the following command to connect:

ssh -p 2222 myusername@sshfe.bac.szn.it

Please, consider that these parameters may change once the migration to the new infrastructure is completed.

Load balanced access

The load balanced access should be the preferred in case you only use ssh and scp from the command line.
To connect using the load balanced mode, you need some configurations in place. You need to add the following lines to your ssh client config.

Host *.bioext.szn
    ProxyCommand openssl s_client -quiet -connect sshfe.bac.szn.it:22 -servername %h
    ForwardAgent yes
    Compression yes
    ServerAliveInterval 25
    ServerAliveCountMax 2

The following code could be enough to add those lines to your user ssh client config file (this is valid for linux bash/zsh shells and for Windows MobaxTerm).

cat << EOF >> ~/.ssh/config

Host *.bioext.szn
    ProxyCommand openssl s_client -quiet -connect sshfe.bac.szn.it:22 -servername %h
    ForwardAgent yes
    Compression yes
    ServerAliveInterval 25
    ServerAliveCountMax 2
EOF

After this configuration is in place you can connect to the HPC cluster with the following command:

ssh myusername@joshua.bioext.szn

The given configuration works for scp transfers too. For instance the following command will transfer a file called myfile to your home directory on the cluster:

scp myfile myusername@joshua.bioext.szn:./

Password policy

You are welcome to change your password. Our password policy is the following:
  • password must be at least 10 chars long
  • it must contain at least 2 uppercase letters
  • it must contain at least 1 lowercase letter
  • it must contain at least 2 digits
  • it must contain at least 1 special symbol

To change your paasword you can use the

passwd
command from the command line.

Available storage and user quota

Three main storage areas are available for each user:
  • home directory (/home/username): your home directory by default provides 10GB of space. You can use it for your code, scripts, whatever you need for your work. This storage is backed up periodically.
  • scratch space (/bee/us2 erdata/username): dedicated scratch space, about 2TB for each user. This must be used for the data you are going to analyze and for the output of your analysis. Once you complete your analysis, this space must be cleaned. No backup of the data stored here is provided.
  • archive (/archive/username): archival space, by default 3TB for user. This storage is backed up periodically.

Storage areas usage

  • Transfer the data to your scratch storage directly:
    scp -r myfiles myusername@joshua.bioext.szn:/bee/userdata/myusername/
    
  • Prepare your batch job file and store it in your home folder. Paths for input and output must point to the positions in /bee/userdata/myusername/myfiles

Data transfer from BAC-HPC.v1 (Falkor) to BAC-HPC.v2 (Joshua)

You will be responsible for the transfer of your data from the old HPC to the new one. Transfer can be accomplished via rsync or scp.
To transfer a full directory:
  • login to BAC-HPC.v1 (Falkor)
  • use scp or rsync to transfer data to BAC-HPC.v2 (Joshua)

Please, if you don't feel safe doing it by yourself, open a ticket asking for support.

Example using rsync (this will copy your_directory to your home on Joshua):

rsync -avp your_directory myusername@10.234.0.150:./

Using scp:

scp -r your_directory myusername@10.234.0.150:./

Another example, transferring your data directly to the scratch area, using rsync:

rsync -avp your_directory myusername@10.234.0.150:/bee/userdata/myusername/

Using scp:

scp -r your_directory myusername@10.234.0.150:/bee/userdata/myusername/

Example job scripts

The following example uses CPUs for a common analysis. The task will be scheduled on the `all` partition, which includes CPU computation nodes. The job will use 12 cores at once.

#!/bin/bash
#
#SBATCH --partition=all
#SBATCH --job-name=bwa-job
#SBATCH --cpus-per-task=12
#SBATCH --mail-user=myaddress@szn.it
#SBATCH --mail-type=ALL

# create index for tomato_genome.fasta. Index will be created in /bee/userdata/myusername/testing/tomato_idx
srun bwa index -p /bee/userdata/myusername/testing/tomato_idx -a bwtsw /bee/userdata/myusername/testing/tomato_genome.fasta
# create index for athaliana_genome.fasta. Index will be created in /bee/userdata/myusername/testing/athaliana_idx
srun bwa index -p /bee/userdata/myusername/testing/athaliana_idx -a bwtsw /bee/userdata/myusername/testing/athaliana_genome.fasta

# start your bwa processes sequentially using 64 cores. Outputs will be created in /bee/userdata/myusername/testing/out/
# use the -t switch to specify the number of threads/cores to use for the process
srun bwa mem -t 12 /bee/userdata/myusername/testing/athaliana_idx /bee/userdata/myusername/testing/my.fastq  > /bee/userdata/myusername/testing/athaliana.sam
srun bwa mem -t 12 /bee/userdata/myusername/testing/tomato_idx /bee/userdata/myusername/testing/my.fastq  > /bee/userdata/myusername/testing/tomato.sam

Save this script as myjob.sh

  • Schedule your job:
    sbatch myjob.sh
    
Once your job completes:
  • move your output files away from the scratch area (archive them or transfer them out of the HPC cluster)
  • delete all temporary files from the scratch area, delete all the files you don't need anymore. Keep your area clean.

GPU jobs must be scheduled on a dedicated partition. The following example job shows a basic usage.

#!/bin/bash
#
#SBATCH --partition=gpu
#SBATCH --job-name=dorado-job
#SBATCH --gres=gpu:ada:1
#SBATCH --cpus-per-task=12
#SBATCH --mail-user=myaddress@szn.it
#SBATCH --mail-type=ALL

module load dorado/0.9.6

srun dorado basecaller --emit-fastq --device cuda:0 --output-dir /bee/userdata/myusername/testout rna002_70bps_hac@v3 /bee/userdata/myusername/pod5/

Old BAC-HPC.v1 cluster access (internal name: Falkor)

  • You can access Falkor via SSH at host 10.18.16.35 from the internal network or using 90.147.76.150 from the external network
    ssh <your_username>@90.147.76.150
    

If you receive an error similar to the following when connecting from external networks

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:5Dfoz4xuiPNosKOrqfIXlKRuVQJNQPI6VwxX93/o6J0.
Please contact your system administrator.
Offending ECDSA key in /Users/xxyyyxx/.ssh/known_hosts:117
ECDSA host key for 193.205.231.59 has changed and you have requested strict checking.
Host key verification failed.

Please, modify your local .ssh/known_hosts file removing the offending key. The error is due to the fact that the same external IP is now pointing to a different machine

  • You can transfer files from and to the HPC cluster via scp
  • We are not enforcing an user quota on your home, but we are monitoring usage. Once in a while we could ask you to operate a cleanup to make room on the storage. We rely on a policy of fair usage of resources at the moment

Main rules

  • All your jobs must be submitted to the scheduler queue. Our cluster uses the SLURM workload manager for job queuing. Please, have a look at the documentation below
  • As a consequence, please do__not__start jobs on the HPC cluster frontend

HPC user environment

  • Our HPC cluster use
    module
    to setup software environments according to the needs. Type
    module avail
    to get a list of available environments
  • To use a module you can use the "module load" syntax. E.g.: to load bowtie 2.3.1 environment
    module load bowtie/2.3.1
    If you want to unload a module:
    module unload bowtie/2.3.1
  • You are very welcome to suggest and propose new module files. You can even prepare your own modulefiles. You can find a quick guide at this URL

Software

  • There is a moltitude of software installed at the moment. Have a look at this page for more details
  • You can request the installation of new software. According to the software programming language and/or software dependencies, we will find the optimal way to install it. This process is usually cooperative: please expect to be asked for feedback

Workload manager: SLURM

  • You are asked to launch your jobs using the default workload manager, SLURM. SLURM is a very flexible and customizable software with a moderate learning curve. Please, have a look at the SLURM documentation. For a quick guide have a look at this page also
  • To get informations about nodes current status and the available queues you can type the following command in a bash shell on the cluster:
    sinfo
  • We are giving some sample SLURM batch file on this page
  • SLURM and the MPICH2 talk each other in different ways. An early knowledge base is here

Data backup

  • The HPC cluster is now running quite stable after having suffered bad faults in the past. At the moment, we have no backup facilities. Please, keep a copy of your data.

Programming languages knowledge base

This section will collect documentation about language specific tasks, parallelization and task running under SLURM

Quick links

Some quick links you can reach from the internal SZN network segment (at the moment):