- français
- English
Scientific Computing resources and High Performance Computing
Getting Started
Getting an account
- Description of different types of account and how to get them
Tutorials
- https://scitas-doc.epfl.ch/courses/training-courses/
Connecting to the cluster
- Using X11 forwarding: This does not provide you with a remote desktop, but allow you to launch graphical applications (e.g, rstudio) on the cluster and forward their windows to your local workstation.
Running jobs
- submit jobs, check status etc
- slurm QOS and partition (set job resources and priorities etc)
File System
- Introduction to the default system:
- Adding $WORK and $SCRATCH to your .bashrc
# Set up some useful environment export WORK='/work/upzenk' export SCRATCH='/scratch/<username>'
- Addtional location: our group own server. To mount our group server to the cluster see instruction. In our case:
Notes:
- $HOME and $WORK are snapshotted daily. To recover data you can use /.snapshots, see Recovering data
- $SCRATCH: stagging area for large input and output data. Data in the scratch should be backed up to our own group server.
- To transfer data from our own group server to the cluster, it is much faster to mount our share first, then do rsync
Using the cluster
Setup some softwares
- define $PATH where we installed cellranger, cellranger-atac, cellranger-arc, rstudio etc
cd $WORK source source_softwares_setup
Python
Runing Jupyter notebook
Step 1: Activate the venvs on /work/upzenk: Python virtual environment with common packages installed (e.g, scanpy).
cd $WORK
source softwares/python_venvs/venv-single-cell/bin/activate
Step 2: Running Jupyter notebook. This jupyter notebook session can use all the packages you have in the virtual environment.
Install new packages
- pip install to a virtural environment
- conda install: activate the miniconda environment using
source miniconda3/bin/activate
Notes
- Create a new environment. This new environment is independent of the venvs on $WORK/softwares/python_venvs (i.e, whatever additonal packages we installed there are not synced).
- Jupyter session would imply paying for as long as jupyter is open, regardless of you being actually trying to compute something, or just being at lunch time with your colleagues (provided you don't close jupyter, of course).
R
Using Rstudio
# login with -X11 forwarding
ssh -X <username>.jed.epfl.ch
# prepare rstudio
module load gcc r
cd $WORK
source source_softwares_setup
# login to a node; specify the mem, core and time you want
Sinteract -m 16G
# open Rstudio
rstudio
# check library path
.libPaths()
Install R packages
module load r
# check R library path
R .libPaths()
# we install packages in '/work/upzenk/softwares/r/r_packages'
R .libPaths( c( .libPaths(), "/path/to/the/dir") )
# install the packages
R install.packages('dummy', lib = '/path/to/the/dir')
Tips for bioinformatics tools
Snakemake
- Set up the environment: first you need to load modules: gcc, snakemake; then load the tools useful for your pipeline (e.g, bwa, samtools).
- After setting up the environment, you can run the very minimum example: https://snakemake.readthedocs.io/en/stable/tutorial/setup.html (do not need to follow the installation part).
Snakepipes
- A tool trailored for epigenetics data analysis, based on snakemake and python (some R scripts are used as well). See documentation.
- createIndices
- Follow the guidelines to indexing hybird genomes (reference + spike-in). To submit the job to the cluster, as an example you can create a .sh file:
- tip1: when submitting this job, allocate enough memories (~200G)
- tip2: check the logs when have errors
- tip3: when a certain aligner failed, delete the folder and redo it