MountainSort Custom "ms4_geoff_pipeline" Processor Draft

Posted by Geoff, Published: 4 years, 11 months ago (Updated: 4 years, 7 months ago)

When using Python code to execute WSL terminal commands as I've shown throughout this project, it can be problematic to have a workflow that requires a long sequence of steps with different MountainSort processors as you'd likely have to wait for outputs to be produced before these outputs can be used as inputs for the following step. For example, sorting might require some pre-processing steps. First step might be to band-pass the raw data. We do not need the low frequency signal anymore, as we just want to isolate the spikes themselves (they are generally riding on top of lower amplitude signals). So you might begin by running the ephys.bandpass_filter processor with MountainSort. However, you'll have to wait for the filtered output file before you can proceed with the next step. 

You could have all the required steps within a script, and you'd simply have to code a wait function. This shouldn't be too difficult to code as you'd pre-define the output filenames for the processor that you are using. However, what happens if there is an error that causes the processor not to finish (no file produced). Your script would hang, waiting indefinitely for a file that will never be produced.

You could add a stopping threshold Tstop that will exit the script (or proceed with analyzing the next session as it could be a corrupt session file) after a pre-determined amount of time. The problem here is that you'd have to define Tstop, which is arbitrary.

My solution was creating ms4_geoff_pipeline (MGP). MGP is a processor that performs all the steps for proper analysis (discussed further down this page). Therefore you can just simply run this processor and it will handle all the aforementioned problems.

MGP Workflow

The above image is my artistic rendition of the workflow that occurs in MGP. It begins by determining the input type. You have three options raw (--raw), filtered (--filt), or pre-processed (--pre).

Band-Pass Filtering:

When the output is raw, that means the data still needs to be band-pass filtered. So it will band-pass the data using filtering parameters provided by the user (freq_min and freq_max). The band-pass filtering is done using the ephys.bandpass_filter processor. Note: If you provided a filtered or pre-processed input, this step will be skipped.

  • Inputs:
    • Timeseries raw data.
  • Outputs:
    • Filtered timeseries data. ​

Post-Filtering

After the steps above, the raw input has now reached a point in the MGP where it can be considered filtered. Thus if you were to re-analyze using MGP, you can use the output timeseries file from the steps above as an input to the processor using the --filt parameter.

Artifact Masking (Optional)

The artifact masking step is an optional step, and occurs after the filtering (affects raw and filtered inputs, pre-processed inputs will skip this step). If you wanted to eliminate high amplitude noise artifacts, likely caused by the rodent's motion, then you have the option to do so using MGP in this step. This step uses the ephys.mask_out_artifacts, processor that I created for MountainLab based off the old pipeline used in MountainSort (version 3). An example of this processor's output is pictured below.

  • Inputs:
    • Timeseries (filtered) data.
  • Outputs:
    • Masked timeseries data. ​

Spatial Whitening (Optional)

This step is an optional one that occurs after band-pass filtering and artifact masking. This step is applied to both raw and filtered signal inputs (pre-processed inputs will skip). This step will apply a spatial whitening filter to the data, which the MountainLab team suggests is crucial for separating nearby clusters1. This signal will remove correlations among the channels, thus the output will be void of common-mode signal. This portion of the pipeline uses the existing ephys.whiten processor. Note: this will also normalize the data, so your threshold for the sorting should be in units relating to # of standard deviations. 

Fully Pre-Processed Data

The data has now been fully pre-processed, therefore if you were to re-analyze using the MGP, you can provide the timeseries data produced from the last step of those listed above as input to the MGP using the --pre parameter.

Sorting

Now that the data has been filtered, masked (optional), and whitened (also optional), the data can finally be sorted. If you want to learn about the specifics of the sorting method, feel free to read the Chung et al paper referenced below1. The sorting uses the new method of MountainSort (MountainSort-Js / V4), and thus uses the ms4alg.sort processor. 

  • Inputs:
    • Timeseries (pre-processed) data.
    • Geometry .json file (optional, determines the geometric arrangement of the electrodes so the algorithm can eliminate duplicate spikes found by neighboring electrodes).
    • Detect sign (spikes with positive/negative/both values), adjacency radius (distance between electrodes), threshold (value that the data must reach to be considered a spike),
    • Detect interval (minimum samples between spikes), clip size (number of samples to consider as a spike waveform).
    • Number of features (# of features/dimensions to use when sorting), max number of PCA clips (not all of the spike clips will be considered for PCA, the algorithm will sub-sample based off of this number).
    • Number of workers (workers is jargin for multiprocessing, how many processes do you want simultaneously working on the sort).
  • Outputs:
    • Cell firing (file containing the spike times and cell ID's produced from the sorting step) data.

Computer Cluster/Isolation Metrics

After the sorting has finished we will calculate some metrics to determine the quality of the cells discovered within the sorted data. First the cluster metrics (# spikes, firing rate, first spike time, last spike time, etc) are computed using the ms3.cluster_metrics processor. Afterwards the isolation metrics (noise overlap, peak noise, peak amplitude, peak signal to noise ratio, and isolation1) are calculated using the ms3.isolation_metrics processor. Finally, these two metrics are combined to produce a single metrics output using the ms3.combine_cluster_metrics processor.

  • Inputs:
    • Timeseries (pre-processed) data.
    • Cell firing (file containing the spike times and cell ID's produced from the sorting step) data, and samplerate (Hz).
  • Outputs:
    • A single .json file (metrics_out) containing all the isolation and cluster metrics.

Curate

Now that the metrics have been computed, we can use them as a way to curate the cells. The traditional method of curation (MountainSort V3) simply added a "rejected" tag to the cell, and when saving the firings_out file that contains the cell spike times, any cell with the "rejected" tag would not be saved. Maybe there is a case where MountainSort identifies two cells as being separate, however they should really be merged. In this case the cell might have an isolation score that does not meet our requirements due to a nearly identical cell existing with a different label. Therefore, the Frank Lab has created the franklab_mstaggedcuration pipeline that will tag these cells as "mua" instead. We will then be able to save these cells for further manual sorting (merging).

Installing MGP

  1. Identify the location where your MountainLab packages exist (I'll call this $Packages). If you installed it using the method I described in Installing MountainSort post, then this should be the following file-path: ~/conda/envs/[environment_name]/etc/mountainlab/packages, where [environment_name] is replaced with the name of the environment you created, in the example my environment name was mlab.
  2. Navigate to $Packages using the following command: 
    cd ~/conda/envs/[environment_name]/etc/mountainlab/packages
  3. This location (from step 2) is where you will install any custom packages, usually using git clone.
    1. The MGP does require that you have the franklab_mstaggecuration4 pipeline installed.
      • Use the following command to install the franklab_mstaggedcuration pipeline in your $Packages path.
        git clone https://bitbucket.org/franklab/franklab_mstaggedcuration.git
      • You should now have pyms.add_curation_tags, and pyms.merge_burst_parents within the processor list. You can use the following command to double-check:
        ml-list-processors | grep pyms

        You should see a result similar to what is below:

    2. Now we can clone the MGP into the %Packages path using the following command:
      git clone https://github.com/GeoffBarrett/ms4_geoff_pipeline.git
    3. When properly installed, you should have the ms4_geoff.sort processor. Double-check that this processor exists by using the following command:
      ml-list-processors | grep geoff

      • If you do not see the ms4_geoff.sort processor listed, you need to ensure that some of the files have executable permissions by using the following command (from the $Packages file-path):
        chmod a+x ./ms4_geoff_pipeline/ms4_geoff_pipeline/ms4_geoff_spec.py.mp

References

  1. Chung, J. E., Magland, J. F., Barnett, A. H., Tolosa, V. M., Tooker, A. C., Lee, K. Y., ... Greengard, L. (2017). A Fully Automated Approach to Spike Sorting. Neuron, 95(6), 1381-1394.e6. https://doi.org/10.1016/j.neuron.2017.08.030 https://www.cell.com/neuron/fulltext/S0896-6273(17)30745-6


Comments

Post Comment