Joshua¶

Joshua HPC cluster access¶

There are several ways to access the Joshua HPC command line interface.

Load balanced access¶

The load balanced access should be the preferred one in case you only use ssh and scp from the command line.
To connect using the load balanced mode, you need some configurations in place. You need to add the following lines to your ssh client config.

Host *.bioext.szn
    ProxyCommand openssl s_client -quiet -connect sshfe.bac.szn.it:22 -servername %h
    ForwardAgent yes
    Compression yes
    ServerAliveInterval 25
    ServerAliveCountMax 2

The following code could be enough to add those lines to your user ssh client config file (this is valid for linux bash/zsh shells and for Windows MobaxTerm).

cat << EOF >> ~/.ssh/config

Host *.bioext.szn
    ProxyCommand openssl s_client -quiet -connect sshfe.bac.szn.it:22 -servername %h
    ForwardAgent yes
    Compression yes
    ServerAliveInterval 25
    ServerAliveCountMax 2
EOF

After this configuration is in place you can connect to the HPC cluster with the following command:

ssh myusername@joshua.bioext.szn

The given configuration works for scp transfers too. For instance the following command will transfer a file called myfile to your home directory on the cluster:

scp myfile myusername@joshua.bioext.szn:./

Direct ssh access¶

If you are planning to use graphical tools (such as sftp clients for file upload or tools integrating graphical file browsers such as MobaxTerm under windows) it will probably be easier to use the following connection parameters.

server hostname: sshfe.bac.szn.it
port: 2222

Password policy¶

You are welcome to change your password. Our pssword policy is the following:

password must be at least 10 chars long
it must contain at least 2 uppercase letters
it must contain at least 1 lowercase letter
it must contain at least 2 digits
it must contain at least 1 special symbol

To change your paasword you can use the

passwd

command from the command line.

Available storage and user quota¶

Three main storage areas are available for each user:

home directory (/home/username): your home directory by default provides 10GB of space. You can use it for your code, scripts, whatever you need for your work. This storage is backed up periodically.
scratch space (/bee/us2 erdata/username): dedicated scratch space, about 2TB for each user. This must be used for the data you are going to analyze and for the output of your analysis. Once you complete your analysis, this space must be cleaned. No backup of the data stored here is provided.
archive (/archive/username): archival space, by default 3TB for user. This storage is backed up periodically.

Storage areas usage¶

Transfer the data to your scratch storage directly:

scp -r myfiles myusername@joshua.bioext.szn:/bee/userdata/myusername/

Prepare your batch job file and store it in your home folder. Paths for input and output must point to the positions in /bee/userdata/myusername/myfiles

Example job scripts¶

The following example uses CPUs for a common analysis. The task will be scheduled on the `all` partition, which includes CPU computation nodes. The job will use 12 cores at once.
scp tomato_genome.fasta athaliana_genome.fasta myusername@joshua.bioext.szn:/bee/userdata/myusername/testing/

#!/bin/bash
#
#SBATCH --partition=all
#SBATCH --job-name=bwa-job
#SBATCH --cpus-per-task=12

# create index for tomato_genome.fasta. Index will be created in /bee/userdata/myusername/testing/tomato_idx
srun bwa index -p /bee/userdata/myusername/testing/tomato_idx -a bwtsw /bee/userdata/myusername/testing/tomato_genome.fasta
# create index for athaliana_genome.fasta. Index will be created in /bee/userdata/myusername/testing/athaliana_idx
srun bwa index -p /bee/userdata/myusername/testing/athaliana_idx -a bwtsw /bee/userdata/myusername/testing/athaliana_genome.fasta

# start your bwa processes sequentially using 64 cores. Outputs will be created in /bee/userdata/myusername/testing/out/
# use the -t switch to specify the number of threads/cores to use for the process
srun bwa mem -t 12 /bee/userdata/myusername/testing/athaliana_idx /bee/userdata/myusername/testing/my.fastq  > /bee/userdata/myusername/testing/athaliana.sam
srun bwa mem -t 12 /bee/userdata/myusername/testing/tomato_idx /bee/userdata/myusername/testing/my.fastq  > /bee/userdata/myusername/testing/tomato.sam

Save this script as myjob.sh

Schedule your job:
```
sbatch myjob.sh
```

Once your job completes:

move your output files away from the scratch area (archive them or transfer them out of the HPC cluster)
delete all temporary files from the scratch area, delete all the files you don't need anymore. Keep your area clean.

GPU jobs must be scheduled on a dedicated partition. The following example job shows a basic usage.

#!/bin/bash
#
#SBATCH --partition=gpu
#SBATCH --job-name=dorado-job
#SBATCH --gres=gpu:ada:1
#SBATCH --cpus-per-task=12

module load dorado/0.9.6

srun dorado basecaller --emit-fastq --device cuda:0 --output-dir /bee/userdata/myusername/testout rna002_70bps_hac@v3 /bee/userdata/myusername/pod5/

Files (0)

Project

General

Profile

User requests

Wiki