Wiki¶
Quick help¶
Each request to the RIMAR-BAC service is managed via the ticketing web system that can be reached at https://ticketing.bioinfo.szn.it. You can login to it using the same credentials you use for https://amministrazione.szn.it. It is important to note that for our support system your account will exist only after your first successful login.
HPC cluster access¶
You can request access to our HPC cluster systems by filing a request to the RIMAR-BAC service. Few easy steps:- open a ticket on this ticketing platform asking for access to HPC computational resources: you can login to the ticketing platform using the same credentials you use for amministrazione.szn.it
- we will take your request in charge and eventually ask you to come to our offices
- we will provide you a form to fill with some informations and sign, read it carefully!
- you will receive your access credentials. The ticketing platform is the main way to ask for support, software installation, bug reporting and troubleshooting
BAC Next Generation HPC.v2 (internal name: Joshua)¶
Cluster access¶
There are several ways to access the new HPC cluster command line interface.
Direct ssh access¶
If you are planning to use graphical tools (such as sftp clients for file upload or tools integrating graphical file browsers such as MobaxTerm under windows) direct ssh access is the easiest way to go. Use the following connection parameters.
server hostname: sshfe.bac.szn.it port: 2222
In practice, you can use the following command to connect:
ssh -p 2222 myusername@sshfe.bac.szn.it
Please, consider that these parameters may change once the migration to the new infrastructure is completed.
Load balanced access¶
The load balanced access should be the preferred in case you only use ssh and scp from the command line.
To connect using the load balanced mode, you need some configurations in place. You need to add the following lines to your ssh client config.
Host *.bioext.szn ProxyCommand openssl s_client -quiet -connect sshfe.bac.szn.it:22 -servername %h ForwardAgent yes Compression yes ServerAliveInterval 25 ServerAliveCountMax 2
The following code could be enough to add those lines to your user ssh client config file (this is valid for linux bash/zsh shells and for Windows MobaxTerm).
cat << EOF >> ~/.ssh/config Host *.bioext.szn ProxyCommand openssl s_client -quiet -connect sshfe.bac.szn.it:22 -servername %h ForwardAgent yes Compression yes ServerAliveInterval 25 ServerAliveCountMax 2 EOF
After this configuration is in place you can connect to the HPC cluster with the following command:
ssh myusername@joshua.bioext.szn
The given configuration works for scp transfers too. For instance the following command will transfer a file called myfile to your home directory on the cluster:
scp myfile myusername@joshua.bioext.szn:./
Password policy¶
You are welcome to change your password. Our password policy is the following:- password must be at least 10 chars long
- it must contain at least 2 uppercase letters
- it must contain at least 1 lowercase letter
- it must contain at least 2 digits
- it must contain at least 1 special symbol
To change your paasword you can use the
passwdcommand from the command line.
Available storage and user quota¶
Three main storage areas are available for each user:- home directory (/home/username): your home directory by default provides 10GB of space. You can use it for your code, scripts, whatever you need for your work. This storage is backed up periodically.
- scratch space (/bee/us2 erdata/username): dedicated scratch space, about 2TB for each user. This must be used for the data you are going to analyze and for the output of your analysis. Once you complete your analysis, this space must be cleaned. No backup of the data stored here is provided.
- archive (/archive/username): archival space, by default 3TB for user. This storage is backed up periodically.
Storage areas usage¶
- Transfer the data to your scratch storage directly:
scp -r myfiles myusername@joshua.bioext.szn:/bee/userdata/myusername/
- Prepare your batch job file and store it in your home folder. Paths for input and output must point to the positions in /bee/userdata/myusername/myfiles
Data transfer from BAC-HPC.v1 (Falkor) to BAC-HPC.v2 (Joshua)¶
You will be responsible for the transfer of your data from the old HPC to the new one. Transfer can be accomplished via rsync or scp.To transfer a full directory:
- login to BAC-HPC.v1 (Falkor)
- use scp or rsync to transfer data to BAC-HPC.v2 (Joshua)
Please, if you don't feel safe doing it by yourself, open a ticket asking for support.
Example using rsync (this will copy your_directory to your home on Joshua):
rsync -avp your_directory myusername@10.234.0.150:./
Using scp:
scp -r your_directory myusername@10.234.0.150:./
Another example, transferring your data directly to the scratch area, using rsync:
rsync -avp your_directory myusername@10.234.0.150:/bee/userdata/myusername/
Using scp:
scp -r your_directory myusername@10.234.0.150:/bee/userdata/myusername/
Example job scripts¶
The following example uses CPUs for a common analysis. The task will be scheduled on the `all` partition, which includes CPU computation nodes. The job will use 12 cores at once.
#!/bin/bash # #SBATCH --partition=all #SBATCH --job-name=bwa-job #SBATCH --cpus-per-task=12 #SBATCH --mail-user=myaddress@szn.it #SBATCH --mail-type=ALL # create index for tomato_genome.fasta. Index will be created in /bee/userdata/myusername/testing/tomato_idx srun bwa index -p /bee/userdata/myusername/testing/tomato_idx -a bwtsw /bee/userdata/myusername/testing/tomato_genome.fasta # create index for athaliana_genome.fasta. Index will be created in /bee/userdata/myusername/testing/athaliana_idx srun bwa index -p /bee/userdata/myusername/testing/athaliana_idx -a bwtsw /bee/userdata/myusername/testing/athaliana_genome.fasta # start your bwa processes sequentially using 64 cores. Outputs will be created in /bee/userdata/myusername/testing/out/ # use the -t switch to specify the number of threads/cores to use for the process srun bwa mem -t 12 /bee/userdata/myusername/testing/athaliana_idx /bee/userdata/myusername/testing/my.fastq > /bee/userdata/myusername/testing/athaliana.sam srun bwa mem -t 12 /bee/userdata/myusername/testing/tomato_idx /bee/userdata/myusername/testing/my.fastq > /bee/userdata/myusername/testing/tomato.sam
Save this script as myjob.sh
- Schedule your job:
sbatch myjob.sh
- move your output files away from the scratch area (archive them or transfer them out of the HPC cluster)
- delete all temporary files from the scratch area, delete all the files you don't need anymore. Keep your area clean.
GPU jobs must be scheduled on a dedicated partition. The following example job shows a basic usage.
#!/bin/bash # #SBATCH --partition=gpu #SBATCH --job-name=dorado-job #SBATCH --gres=gpu:ada:1 #SBATCH --cpus-per-task=12 #SBATCH --mail-user=myaddress@szn.it #SBATCH --mail-type=ALL module load dorado/0.9.6 srun dorado basecaller --emit-fastq --device cuda:0 --output-dir /bee/userdata/myusername/testout rna002_70bps_hac@v3 /bee/userdata/myusername/pod5/
Old BAC-HPC.v1 cluster access (internal name: Falkor)¶
- You can access Falkor via SSH at host 10.18.16.35 from the internal network or using 90.147.76.150 from the external network
ssh <your_username>@90.147.76.150
If you receive an error similar to the following when connecting from external networks
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the ECDSA key sent by the remote host is SHA256:5Dfoz4xuiPNosKOrqfIXlKRuVQJNQPI6VwxX93/o6J0. Please contact your system administrator. Offending ECDSA key in /Users/xxyyyxx/.ssh/known_hosts:117 ECDSA host key for 193.205.231.59 has changed and you have requested strict checking. Host key verification failed.
Please, modify your local .ssh/known_hosts file removing the offending key. The error is due to the fact that the same external IP is now pointing to a different machine
- You can transfer files from and to the HPC cluster via scp
- We are not enforcing an user quota on your home, but we are monitoring usage. Once in a while we could ask you to operate a cleanup to make room on the storage. We rely on a policy of fair usage of resources at the moment
Main rules¶
- All your jobs must be submitted to the scheduler queue. Our cluster uses the SLURM workload manager for job queuing. Please, have a look at the documentation below
- As a consequence, please do__not__start jobs on the HPC cluster frontend
HPC user environment¶
- Our HPC cluster use
module
to setup software environments according to the needs. Typemodule avail
to get a list of available environments - To use a module you can use the "module load" syntax. E.g.: to load bowtie 2.3.1 environment
module load bowtie/2.3.1
If you want to unload a module:module unload bowtie/2.3.1
- You are very welcome to suggest and propose new module files. You can even prepare your own modulefiles. You can find a quick guide at this URL
Software¶
- There is a moltitude of software installed at the moment. Have a look at this page for more details
- You can request the installation of new software. According to the software programming language and/or software dependencies, we will find the optimal way to install it. This process is usually cooperative: please expect to be asked for feedback
Workload manager: SLURM¶
- You are asked to launch your jobs using the default workload manager, SLURM. SLURM is a very flexible and customizable software with a moderate learning curve. Please, have a look at the SLURM documentation. For a quick guide have a look at this page also
- To get informations about nodes current status and the available queues you can type the following command in a bash shell on the cluster:
sinfo
- We are giving some sample SLURM batch file on this page
- SLURM and the MPICH2 talk each other in different ways. An early knowledge base is here
Data backup¶
- The HPC cluster is now running quite stable after having suffered bad faults in the past. At the moment, we have no backup facilities. Please, keep a copy of your data.
Programming languages knowledge base¶
This section will collect documentation about language specific tasks, parallelization and task running under SLURM
Quick links¶
Some quick links you can reach from the internal SZN network segment (at the moment):
- This wiki page can be found at: https://ticketing.bioinfo.szn.it/projects/user-requests/wiki/Wiki/
- Redmine ticketing system: https://ticketing.bioinfo.szn.it
- Gitlab (experimental system): https://gitlab.bioinfo.szn.it
- Shell Access to other resources