Usage

This page details how to use lpath. Follow the Getting Started page to learn how to install lpath.

Introduction

lpath is made of four separate steps: discretize, extract, match, and plot. All these steps take a variety of different parameters, which are all listed in the API page.

  1. The discretize step is used to discretize your trajectories for the purpose of finding successful transitions.

  2. The extract step takes what’s assigned in discretize and identifies all instances where there is a successful transition.

  3. The match step takes what’s outputted in extract and cross pattern match to identify pathway classes. It is possible to reassign states in this step.

  4. The plot step takes what’s outputted in match and allows you to easily make different plots. The LPATHPlot class objects contains many pre-calculated datasets for custom plotting.

There are two different ways of running these steps. Due to the sheer amount of parameter options, it is recommended that users start with the Jupyter notebook.

  1. Import each step’s main() function and run everything in an interactive python session (e.g., Jupyter notebook). [RECOMMENDED]

  2. Run through the command line (e.g., lpath discretize -I west.h5 --assign-arguments '--config-from-file --scheme TEST'.

Molecular Dynamics

lpath is written to cluster pathways from molecular dynamics simulation. The input file will be a numpy or text file of the features used to assign states. For example, an alanine dipeptide system will use the phi, psi angles.

Discretize

In this step, we will assign each frame of an MD trajectory based on the phi/psi dihedral angles saved in dihedral.npy. Here is an example assign function (in a file called module.py) used for assigning states:

def assign_dih(input_array):
    """
    This is an example function for mapping a list of features to state IDs. This should be subclassed.

    Parameters
    ----------
    input_array : numpy.ndarray
        An array generated from load_file.

    Returns
    -------
    state_list : list
        A list containing
    """
    state_list = []
    for val in input_array:
        if val[0] >= -180 and val[0] <= -45 and val[1] >= -55 and val[1] <= 30:  # Phi/Psi for Alpha Helix
            state_list.append(0)
        elif val[0] >= 165 and val[0] <= 180 and val[1] >= -55 and val[1] <= 30:
            state_list.append(0)
        elif val[0] >= -170 and val[0] <= -55 and val[1] >= 40 and val[1] <= 100:  # Phi/Psi for C7eq
            state_list.append(1)
        elif val[0] >= 25 and val[0] <= 90 and val[1] >= -55 and val[1] <= 0:  # Phi/Psi for C7ax
            state_list.append(2)
        else:
            state_list.append(3)

    return state_list

We will monkey-patch this function into lpath.

  1. From the command line, run the following:

    lpath discretize -I dihedral.npy -O states.npy -af module.assign_dih --stride 100
    
  2. We’ve read in dihedral.npy with a stride step of 100. This smaller dataset will be discretized with module.assign_dih. This will generate a states.npy file to be used in the extract step.

Extract

In this step, we will identify any successful transitions in the trajectory. We will be looking at the C7eq to C7ax transition. Since we already read in the data every 100 frames with --stride 100 in discretize, we do not need to use --stride again. If you want to include in extra features for reassignment later on (i.e. in match), use the --feature-stride 100 option.

  1. From the command line, run the following:

    lpath extract --extract-input states.npy --extract-output pathways.pickle --source-state 1 --target-state 2
    

Match

In this step, we will pattern match any successful transitions we’ve identified in extract. We will, again, be looking at the C7eq to C7ax transition.

  1. From the command line, run the following:

    lpath match --input-pickle succ_traj/pathways.pickle --cluster-labels-output succ_traj/cluster_labels.npy
    
  2. After the comparison process is completed, it should show you the dendrogram. Closing the figure should trigger prompts to guide you further.

  3. Input y if you think the threshold (horizontal line which dictates how many clusters there are) should be a different value. Otherwise, input n and tell the program how many clusters you want at the end.

Plot

[UNDER CONSTRUCTION]

Weighted Ensemble Simulations

lpath is written to cluster pathways generated by the WESTPA software suite. Make sure WESTPA is installed. See the Getting Started page for more information.

Discretize

We will use WESTPA’s w_assign tool to assign to states. See the tool’s wiki page and Sphinx documentation for more information about the tool.

We’ll try to discretize a multi.h5 (generated with w_multi_west --ibstates) with w_assign based on what’s defined with the TEST scheme in the west.cfg.

  1. Run the following in the command line to run w_assign:

    lpath discretize -we -W multi.h5 -A ANALYSIS/TEST/assign.h5 \
        --assign-args "-W multi.h5 -r west.cfg --config-from-file --scheme TEST"
    

Extract

In this step, we will identify any successful transitions in the trajectory. We will be looking at the C7eq to C7ax transition. If you are looking to compare using segment IDs in the next step (not recommended for simulations combined with w_multi_west) or want to include the waiting time (time spent in the source state) in the pattern matching, make sure you turn on --trace-basis to trace all the way back to the basis state. Do note that this significantly increases the time it requires to extract all successful trajectories.

  1. From the command line, run the following:

    lpath extract -we -W multi.h5 -A ANALYSIS/TEST/assign.h5 --source-state 1 \
        --target-state 2 --extract-output output.pickle --out-dir succ_traj
    

Match

In this step, we will pattern match any successful transitions we’ve identified in extract. We will, again, be looking at the C7eq to C7ax transition. This will do the pattern matching and output individual h5 files for each cluster.

  1. From the command line, run the following:

    lpath match -we --input-pickle succ_traj/output.pickle --cluster-labels-output succ_traj/cluster_labels.npy \
        --export-h5 --file-pattern "west_succ_c{}.h5"
    
  2. After the comparison process is completed, it should show you the dendrogram. Closing the figure should trigger prompts to guide you further.

  3. Input y if you think the threshold (horizontal line which dictates how many clusters there are) should be a different value. Otherwise, input n and tell the program how many clusters you want at the end.

For cases where you want to run pattern matching comparison between segment IDs, you will have to use the largest common substring --substring option. By default, the longest common subsequence algorithm is used.:

lpath match -we --input-pickle succ_traj/output.pickle --cluster-labels-output succ_traj/cluster_labels.npy \
    --export-h5 --file-pattern "west_succ_c{}.h5" --reassign-function "reassign_segid" --substring

Plot

[UNDER CONSTRUCTION]