cryoDRGN

Deep Reconstructing Generative Networks for cryo-EM heterogeneous reconstruction

For general information on job submission, please see here.

General information

cryoDRGN runs on extracted particle stacks that have undergone a 3D refinement. This ‘consensus refinement’ information is then used as input for cryoDRGN.
We have combined Step 1 to Step 6 as outlined on the cryoDRGN repo into a single submission step. This includes:
1. Preprocess image stack
2. Parse image poses
3. Parse CTF parameters
4. [We skip Step 4]
5. Running cryoDRGN heterogeneous reconstructions
6. Analysis of results
  - We perform a default analysis at this point using the last epoch as the input.
Recommended usage from cryoDRGN repo:
- “It is recommended to first train on lower-resolution images (e.g. D=128) with --zdim 8 using the default architecture (fast). After validation, pose optimization, and any necessary particle filtering, then train on the full resolution image stack (up to D=256) with a large architecture (slow).”
- “Note: While these settings worked well for the datasets we’ve tested, they are highly experimental for the general case as different datasets have diverse sources of heterogeneity. Please reach out to the authors with questions/consult — we’d love to learn more.”

Input particle stack format:

RELION-form particle stacks. Click here to learn what this means.

Required input parameters:

Consensus refinement STAR file
- STAR file from RELION refinement uploaded using ‘Browser upload’ not Globus
Box size for refined structure
- Box size of structure determined from refinement
Scaled-down box size
- cryoDRGN will scale down the box size to save memory (and time!). Default is 128 pixels but you can only use 64 pixels.
Pixel size of original data
- Provide pixel size of input data
Check box if STAR file is from RELION v 3.1
- Indicate if STAR file is from RELION version 3.1
Accelerating voltage, Spherical aberration, and Amplitude contrast ratio:
- Provide these values for your dataset
Number of epochs to use during training (-n)
- Number of iterations used for training the VAE
Number of nodes in hidden layers for encoder (–enc-dim)
- Encoder is the layer that is learning the data.
- Default = 256; You can make the network wider by increasing to 1024
Number of hidden layers for encoder (–enc-layers)
- Adding layers increases complexity
Number of nodes in hidden layers for decoder (–dec-dim)
- Decoder is the layer that is learning the data.
- Default = 256; You can make the network wider by increasing to 1024
Number of hidden layers for encoder (–dec-layers)
- Adding layers increases complexity

Optional or Advanced input parameters:

Advanced: Minibatch size (-b)
- By default this is automatically determined but can be specified by the user. This is the size of particle batches to use during training
Advanced: Dimension of latent variable (–zdim) (fast=1; slow=10)
- More dimensions = longer training per epoch.
- A –zdim of 8 works well in most cases
Advanced: Checkpoint file – initialize training from a checkpoint (–load)
- If you want to continue training from a certain point, include the pkl file here (uploaded via browser upload)
Advanced: Index file – filter particle stack by these indices (–ind)
- Filter particle stack by these indices
Advanced: Check box to refine poses with gradient descent (–do-pose-sgd)
- Locally refine poses with gradient descent (BETA!)
Advanced: Turn off real space windowing of dataset (–no-window)
Optional: Invert contrast (needed if particles are black on white background)

RELION particles are default white on black background

Suggested workflow

Initial run: Run cryoDRGN using a fast training architecture to assess data distribution in latent space.
- Parameters: (These suggested parameters are all the defaults on COSMIC2)
  - Scaled-down box size: 64 or 128
  - -n: 50
  - –zdim: 8
  - –enc-dim: 256
  - –enc-layers: 3
  - –dec-dim: 256
  - –dec-layers: 3
- Investigate output representation of the data for the umap (umap.png, z_pca.png)
  - Are there outlier groups? If so, investigate whether they should be removed or included.
Preliminary test-run: When you have identified a cleaned particle set, you should now run train cryoDRGN into a larger network by expanding the encoder and decoder dimensions.
- Parameters:
  - Scaled-down box size: 128
  - -n: 25
  - –zdim: 8
  - –enc-dim: 1024
  - –enc-layers: 3
  - –dec-dim: 1024
  - –dec-layers: 3
- Investigate output representation of the data.
  - Do you separation into groups? Do you need higher-resolution details for these reconstructions? If so, then you can run cryoDRGN on less-downsampled data (step 3)
Full run: After confirming that you have a particle stack and it capable of sorting into discrete states, you should now train cryoDRGN on higher-resolution (i.e. less down-sampled) particles.
- Parameters:
  - Scaled-down box size: 256
  - -n: 25
  - –zdim: 8
  - –enc-dim: 1024
  - –enc-layers: 3
  - –dec-dim: 1024
  - –dec-layers: 3
- Investigate output representation of the data.

Monitoring job progress

While the job is running, you can watch the output file ‘stdout.txt’ to follow job progress. This can be found in the “Intermediate Results” file listing for your task.

Downloading output

To visualize the output from cryoDRGN, first download the .zip file “cosmic2-cryodrgn.zip” displayed in the output file page. After downloading and unzipping, you will find all outputs from cryoDRGN, including the analysis run for the last epoch. You’ll see .png files corresponding to outputs generated using PCA and UMAP, as well as associated 3D reconstructions using the clustering.

Visualizing results in Jupyter notebooks on your local machine

In this analysis folder (e.g. “analyze.49”) there will be a Jupyter notebook python notebook generated by cryoDRGN. Jupyter notebooks are interactive python scripting tools display as a web page. Ellen Zhong (cryoDRGN author) writes out a very nice interactive data analysis Jupyter notebook. When opening this notebook successfully on your local machine, you will see the following page on your local web browser:

Jupyter notebook landing page:

cryoDRGN output notebook:

Important: If you do not have CUDA on your local machine, you cannot generate 3D reconstructions. You must re-run cryoDRGN on COSMIC2 for this.

Installing cryoDRGN and Jupyter notebooks on your local machine

To run cryoDRGN Jupyter notebooks locally, you will need to install both cryoDRGN and Jupyter.

Install cryoDRGN (following instructions on cryoDRGN Github, omitting pytorch and cuda install for machines that don’t have NVIDIA GPU cards):

conda create --name cryodrgn python=3.7
conda activate cryodrgn
conda install pandas
conda install seaborn scikit-learn 
conda install -c conda-forge umap-learn
conda install -c conda-forge jupyterlab
pip install ipywidgets
pip install cufflinks
git clone https://github.com/zhonge/cryodrgn.git
cd cryodrgn
git checkout 0.3.0
python setup.py install

Next, install Jupyter:

pip install jupyterlab

pip install jupyterlab --user

if you don’t have permission to install.

Using Jupyter notebooks to visualize cryoDRGN output

To visualize data in the Jupyter notebook:

$ conda active cryodrgn 
$ jupyter notebook

[I 15:17:03.418 NotebookApp] Serving notebooks from local directory: /Users/michael

[I 15:17:03.418 NotebookApp] Jupyter Notebook 6.1.1 is running at:

[I 15:17:03.418 NotebookApp] http://localhost:8888/?token=774f5eb9c9c9e3f6e302996dfde12c82984e5acfb7f5ab8b

Then navigate to the web URL listed in the command line. In this example I would copy & paste this into my web browser:

http://localhost:8888/?token=774f5eb9c9c9e3f6e302996dfde12c82984e5acfb7f5ab8b

Video tutorial showing Jupyter notebook launch, navigating to cryoDRGN, and running visualization:

In this example, I do not have CUDA GPUs on my Mac laptop so I did not generate new 3D reconstructions or visualize particles (since the particles remained on COSMIC2; not my laptop).

Citation:

CryoDRGN: Reconstruction of heterogeneous structures from cryo-electron micrographs using neural networks. Ellen D. Zhong, Tristan Bepler, Bonnie Berger*, Joseph H. Davis*. bioRxiv

Software:

Github link

Questions:

Check out the Github repo and reach out to Ellen Zhong for more details on how to run cryoDRGN.

If you have specific requests for COSMIC² please email us at cosmic2support@umich.edu.