Project

General

Profile

Joshua

Joshua HPC cluster access

To connect to the Joshua HPC cluster you need some configurations in place. You need to add the following lines to your ssh client config.

Host *.bioext.szn
    ProxyCommand openssl s_client -quiet -connect sshfe.bac.szn.it:22 -servername %h
    ForwardAgent yes
    Compression yes
    ServerAliveInterval 25
    ServerAliveCountMax 2

The following code could be enough to add those lines to your user ssh client config file (this is valid for linux bash/zsh shells and for Windows MobaxTerm).

cat << EOF >> ~/.ssh/config

Host *.bioext.szn
    ProxyCommand openssl s_client -quiet -connect sshfe.bac.szn.it:22 -servername %h
    ForwardAgent yes
    Compression yes
    ServerAliveInterval 25
    ServerAliveCountMax 2
EOF

After this configuration is in place you can connect to the HPC cluster with the following command:

ssh myusername@joshua.bioext.szn

The given configuration works for scp transfers too. For instance the following command will transfer a file called myfile to your home directory on the cluster:

scp myfile myusername@joshua.bioext.szn:./

Password policy

You are welcome to change your password. Our pssword policy is the following:
  • password must be at least 10 chars long
  • it must contain at least 2 uppercase letters
  • it must contain at least 1 lowercase letter
  • it must contain at least 2 digits
  • it must contain at least 1 special symbol

To change your paasword you can use the

passwd
command from the command line.

Available storage and user quota

Three main storage areas are available for each user:
  • home directory (/home/username): your home directory by default provides 10GB of space. You can use it for your code, scripts, whatever you need for your work. This storage is backed up periodically.
  • scratch space (/bee/us2 erdata/username): dedicated scratch space, about 2TB for each user. This must be used for the data you are going to analyze and for the output of your analysis. Once you complete your analysis, this space must be cleaned. No backup of the data stored here is provided.
  • archive (/archive/username): archival space, by default 3TB for user. This storage is backed up periodically.

Storage areas usage

  • Transfer the data to your scratch storage directly:
    scp -r myfiles myusername@joshua.bioext.szn:/bee/userdata/myusername/
    
  • Prepare your batch job file and store it in your home folder. Paths for input and output must point to the positions in /bee/userdata/myusername/myfiles

Example job scripts

The following example uses CPUs for a common analysis. The task will be scheduled on the `all` partition, which includes CPU computation nodes. The job will use 12 cores at once.
scp tomato_genome.fasta athaliana_genome.fasta :/bee/userdata/myusername/testing/

#!/bin/bash
#
#SBATCH --partition=all
#SBATCH --job-name=bwa-job
#SBATCH --cpus-per-task=12

# create index for tomato_genome.fasta. Index will be created in /bee/userdata/myusername/testing/tomato_idx
srun bwa index -p /bee/userdata/myusername/testing/tomato_idx -a bwtsw /bee/userdata/myusername/testing/tomato_genome.fasta
# create index for athaliana_genome.fasta. Index will be created in /bee/userdata/myusername/testing/athaliana_idx
srun bwa index -p /bee/userdata/myusername/testing/athaliana_idx -a bwtsw /bee/userdata/myusername/testing/athaliana_genome.fasta

# start your bwa processes sequentially using 64 cores. Outputs will be created in /bee/userdata/myusername/testing/out/
# use the -t switch to specify the number of threads/cores to use for the process
srun bwa mem -t 12 /bee/userdata/myusername/testing/athaliana_idx /bee/userdata/myusername/testing/my.fastq  > /bee/userdata/myusername/testing/athaliana.sam
srun bwa mem -t 12 /bee/userdata/myusername/testing/tomato_idx /bee/userdata/myusername/testing/my.fastq  > /bee/userdata/myusername/testing/tomato.sam

Save this script as myjob.sh

  • Schedule your job:
    sbatch myjob.sh
    
Once your job completes:
  • move your output files away from the scratch area (archive them or transfer them out of the HPC cluster)
  • delete all temporary files from the scratch area, delete all the files you don't need anymore. Keep your area clean.

GPU jobs must be scheduled on a dedicated partition. The following example job shows a basic usage.

#!/bin/bash
#
#SBATCH --partition=gpu
#SBATCH --job-name=dorado-job
#SBATCH --gres=gpu:ada:1
#SBATCH --cpus-per-task=12

module load dorado/0.9.6

srun dorado basecaller --emit-fastq --device cuda:0 --output-dir /bee/userdata/myusername/testout rna002_70bps_hac@v3 /bee/userdata/myusername/pod5/