|
| 1 | +Automatic labeling units after spike sorting |
| 2 | +============================================ |
| 3 | + |
| 4 | +This example shows how to automatically label units after spike sorting, |
| 5 | +using three different approaches: |
| 6 | + |
| 7 | +1. Simple filter based on quality metrics |
| 8 | +2. Bombcell: heuristic approach to label units based on quality and |
| 9 | + template metrics [Fabre]_ |
| 10 | +3. UnitRefine: pre-trained classifiers to label units as noise or |
| 11 | + SUA/MUA [Jain]_ |
| 12 | + |
| 13 | +.. code:: ipython3 |
| 14 | +
|
| 15 | + import numpy as np |
| 16 | +
|
| 17 | + import spikeinterface as si |
| 18 | + import spikeinterface.curation as sc |
| 19 | + import spikeinterface.widgets as sw |
| 20 | +
|
| 21 | + from pprint import pprint |
| 22 | +
|
| 23 | +.. code:: ipython3 |
| 24 | +
|
| 25 | + %matplotlib inline |
| 26 | +
|
| 27 | +.. code:: ipython3 |
| 28 | +
|
| 29 | + analyzer_path = "/ssd980/working/analyzer_np2_single_shank.zarr" |
| 30 | +
|
| 31 | +.. code:: ipython3 |
| 32 | +
|
| 33 | + sorting_analyzer = si.load(analyzer_path) |
| 34 | +
|
| 35 | +
|
| 36 | +.. code:: ipython3 |
| 37 | +
|
| 38 | + sorting_analyzer |
| 39 | +
|
| 40 | +
|
| 41 | +.. parsed-literal:: |
| 42 | +
|
| 43 | + SortingAnalyzer: 96 channels - 142 units - 1 segments - zarr - sparse - has recording |
| 44 | + Loaded 14 extensions: amplitude_scalings, correlograms, isi_histograms, noise_levels, principal_components, quality_metrics, random_spikes, spike_amplitudes, spike_locations, template_metrics, template_similarity, templates, unit_locations, waveforms |
| 45 | +
|
| 46 | +
|
| 47 | +
|
| 48 | +The ``SortingAnalyzer`` includes several metrics that we can use for |
| 49 | +curation: |
| 50 | + |
| 51 | +.. code:: ipython3 |
| 52 | +
|
| 53 | + sorting_analyzer.get_metrics_extension_data().columns |
| 54 | +
|
| 55 | +
|
| 56 | +
|
| 57 | +
|
| 58 | +.. parsed-literal:: |
| 59 | +
|
| 60 | + Index(['amplitude_cutoff', 'amplitude_cv_median', 'amplitude_cv_range', |
| 61 | + 'amplitude_median', 'd_prime', 'drift_mad', 'drift_ptp', 'drift_std', |
| 62 | + 'firing_range', 'firing_rate', 'isi_violations_count', |
| 63 | + 'isi_violations_ratio', 'isolation_distance', 'l_ratio', 'nn_hit_rate', |
| 64 | + 'nn_miss_rate', 'noise_cutoff', 'noise_ratio', 'num_spikes', |
| 65 | + 'presence_ratio', 'rp_contamination', 'rp_violations', 'sd_ratio', |
| 66 | + 'silhouette', 'sliding_rp_violation', 'snr', 'sync_spike_2', |
| 67 | + 'sync_spike_4', 'sync_spike_8', 'exp_decay', 'half_width', |
| 68 | + 'main_peak_to_trough_ratio', 'main_to_next_extremum_duration', |
| 69 | + 'num_negative_peaks', 'num_positive_peaks', |
| 70 | + 'peak_after_to_trough_ratio', 'peak_after_width', |
| 71 | + 'peak_before_to_peak_after_ratio', 'peak_before_to_trough_ratio', |
| 72 | + 'peak_before_width', 'peak_to_trough_duration', 'recovery_slope', |
| 73 | + 'repolarization_slope', 'spread', 'trough_width', 'velocity_above', |
| 74 | + 'velocity_below', 'waveform_baseline_flatness'], |
| 75 | + dtype='object') |
| 76 | +
|
| 77 | +
|
| 78 | +
|
| 79 | +1. Quality-metrics based curation |
| 80 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 81 | + |
| 82 | +A simple solution is to use a filter based on quality metrics. To do so, |
| 83 | +we can use the ``spikeinterface.curation.qualitymetrics_label_units`` |
| 84 | +function and provide a set of thresholds. |
| 85 | + |
| 86 | +.. code:: ipython3 |
| 87 | +
|
| 88 | + qm_thresholds = { |
| 89 | + "snr": {"min": 5}, |
| 90 | + "firing_rate": {"min": 0.1, "max": 200}, |
| 91 | + "rp_contamination": {"max": 0.5} |
| 92 | + } |
| 93 | +
|
| 94 | +.. code:: ipython3 |
| 95 | +
|
| 96 | + qm_labels = sc.qualitymetrics_label_units(sorting_analyzer, thresholds=qm_thresholds) |
| 97 | +
|
| 98 | +.. code:: ipython3 |
| 99 | +
|
| 100 | + qm_labels["label"].value_counts() |
| 101 | +
|
| 102 | +
|
| 103 | +
|
| 104 | +
|
| 105 | +.. parsed-literal:: |
| 106 | +
|
| 107 | + label |
| 108 | + noise 115 |
| 109 | + good 27 |
| 110 | + Name: count, dtype: int64 |
| 111 | +
|
| 112 | +
|
| 113 | +
|
| 114 | +.. code:: ipython3 |
| 115 | +
|
| 116 | + w = sw.plot_unit_labels(sorting_analyzer, qm_labels["label"], ylims=(-300, 100)) |
| 117 | + w.figure.suptitle("Quality-metrics labeling") |
| 118 | +
|
| 119 | +
|
| 120 | +
|
| 121 | +.. image:: auto_label_units_files/auto_label_units_12_1.png |
| 122 | + |
| 123 | + |
| 124 | +Only 27 units are labeled as *good*, and we can see from the plots that |
| 125 | +some “noisy” waveforms are not properly flagged and some visually good |
| 126 | +waveforms are labeled as noise. Let’s take a look at more powerful |
| 127 | +methods. |
| 128 | + |
| 129 | +1. Bombcell |
| 130 | +----------- |
| 131 | + |
| 132 | +**Bombcell** ([Fabre]_) is another threshold-based method that also uses |
| 133 | +quality metrics and template metrics, but in a much more refined way! It |
| 134 | +can label units as ``noise``, ``mua``, and ``good`` and further detect |
| 135 | +``non-soma`` units. It comes with some default thresholds, but |
| 136 | +user-defined thresholds can be provided from a dictionary or a JSON |
| 137 | +file. |
| 138 | + |
| 139 | +.. code:: ipython3 |
| 140 | +
|
| 141 | + bombcell_default_thresholds = sc.bombcell_get_default_thresholds() |
| 142 | + pprint(bombcell_default_thresholds) |
| 143 | +
|
| 144 | +
|
| 145 | +.. parsed-literal:: |
| 146 | +
|
| 147 | + {'amplitude_cutoff': {'max': 0.2, 'min': None}, |
| 148 | + 'amplitude_median': {'max': None, 'min': 40}, |
| 149 | + 'drift_ptp': {'max': 100, 'min': None}, |
| 150 | + 'exp_decay': {'max': 0.1, 'min': 0.01}, |
| 151 | + 'main_peak_to_trough_ratio': {'max': 0.8, 'min': None}, |
| 152 | + 'num_negative_peaks': {'max': 1, 'min': None}, |
| 153 | + 'num_positive_peaks': {'max': 2, 'min': None}, |
| 154 | + 'num_spikes': {'max': None, 'min': 300}, |
| 155 | + 'peak_after_to_trough_ratio': {'max': 0.8, 'min': None}, |
| 156 | + 'peak_before_to_peak_after_ratio': {'max': 3, 'min': None}, |
| 157 | + 'peak_before_to_trough_ratio': {'max': 3, 'min': None}, |
| 158 | + 'peak_before_width': {'max': None, 'min': 0.00015}, |
| 159 | + 'peak_to_trough_duration': {'max': 0.00115, 'min': 0.0001}, |
| 160 | + 'presence_ratio': {'max': None, 'min': 0.7}, |
| 161 | + 'rp_contamination': {'max': 0.1, 'min': None}, |
| 162 | + 'snr_baseline': {'max': None, 'min': 5}, |
| 163 | + 'trough_width': {'max': None, 'min': 0.0002}, |
| 164 | + 'waveform_baseline_flatness': {'max': 0.5, 'min': None}} |
| 165 | +
|
| 166 | +
|
| 167 | +.. code:: ipython3 |
| 168 | +
|
| 169 | + bombcell_labels = sc.bombcell_label_units(sorting_analyzer, thresholds=bombcell_default_thresholds) |
| 170 | +
|
| 171 | +.. code:: ipython3 |
| 172 | +
|
| 173 | + bombcell_labels["label"].value_counts() |
| 174 | +
|
| 175 | +
|
| 176 | +
|
| 177 | +
|
| 178 | +.. parsed-literal:: |
| 179 | +
|
| 180 | + label |
| 181 | + good 58 |
| 182 | + noise 50 |
| 183 | + mua 33 |
| 184 | + non_soma 1 |
| 185 | + Name: count, dtype: int64 |
| 186 | +
|
| 187 | +
|
| 188 | +
|
| 189 | +.. code:: ipython3 |
| 190 | +
|
| 191 | + w = sw.plot_unit_labels(sorting_analyzer, bombcell_labels["label"], ylims=(-300, 100)) |
| 192 | + w.figure.suptitle("Bombcell labeling") |
| 193 | +
|
| 194 | +
|
| 195 | +
|
| 196 | +.. image:: auto_label_units_files/auto_label_units_18_1.png |
| 197 | + |
| 198 | + |
| 199 | +UnitRefine |
| 200 | +---------- |
| 201 | + |
| 202 | +**UnitRefine** ([Jain]_) also uses quality and template metrics, but in |
| 203 | +a different way. It uses pre-trained classifiers to trained on |
| 204 | +hand-curated data. By default, the classification is performed in two |
| 205 | +steps: first a *noise*/*neural* classifier is applied, followed by a |
| 206 | +*sua*/*mua* classifier. Several models are available on the |
| 207 | +`SpikeInterface HuggingFace |
| 208 | +page <https://huggingface.co/SpikeInterface>`__. |
| 209 | + |
| 210 | +.. code:: ipython3 |
| 211 | +
|
| 212 | + unitrefine_labels = sc.unitrefine_label_units( |
| 213 | + sorting_analyzer, |
| 214 | + noise_neural_classifier="SpikeInterface/UnitRefine_noise_neural_classifier", |
| 215 | + sua_mua_classifier="SpikeInterface/UnitRefine_sua_mua_classifier", |
| 216 | + ) |
| 217 | +
|
| 218 | +.. code:: ipython3 |
| 219 | +
|
| 220 | + unitrefine_labels["label"].value_counts() |
| 221 | +
|
| 222 | +
|
| 223 | +
|
| 224 | +
|
| 225 | +.. parsed-literal:: |
| 226 | +
|
| 227 | + label |
| 228 | + sua 62 |
| 229 | + noise 47 |
| 230 | + mua 33 |
| 231 | + Name: count, dtype: int64 |
| 232 | +
|
| 233 | +
|
| 234 | +
|
| 235 | +.. code:: ipython3 |
| 236 | +
|
| 237 | + w = sw.plot_unit_labels(sorting_analyzer, unitrefine_labels["label"], ylims=(-300, 100)) |
| 238 | + w.figure.suptitle("UnitRefine labeling") |
| 239 | +
|
| 240 | +
|
| 241 | +
|
| 242 | +.. image:: auto_label_units_files/auto_label_units_22_1.png |
| 243 | + |
| 244 | + |
| 245 | + **NOTE:** If you want to train your own models, see the `UnitRefine |
| 246 | + repo <%60https://github.com/anoushkajain/UnitRefine%60>`__ for |
| 247 | + instructions! |
| 248 | + |
| 249 | +This “How To” demonstrated how to automatically label units after spike |
| 250 | +sorting with different strategies. We recommend running **Bombcell** and |
| 251 | +**UnitRefine** as part of your pipeline. These methods will facilitate |
| 252 | +further curation and make downstream analysis cleaner. |
| 253 | + |
| 254 | +To remove units from your ``SortingAnalyzer``, you can simply use the |
| 255 | +``select_units`` function: |
| 256 | + |
| 257 | +.. code:: ipython3 |
| 258 | +
|
| 259 | + non_noisy_units = bombcell_labels["label"] != "noise" |
| 260 | + sorting_analyzer_clean = sorting_analyzer.select_units(sorting_analyzer.unit_ids[non_noisy_units]) |
| 261 | +
|
| 262 | +.. code:: ipython3 |
| 263 | +
|
| 264 | + sorting_analyzer_clean |
| 265 | +
|
| 266 | +
|
| 267 | +
|
| 268 | +
|
| 269 | +.. parsed-literal:: |
| 270 | +
|
| 271 | + SortingAnalyzer: 96 channels - 92 units - 1 segments - memory - sparse - has recording |
| 272 | + Loaded 14 extensions: random_spikes, waveforms, templates, amplitude_scalings, correlograms, isi_histograms, noise_levels, principal_components, spike_locations, spike_amplitudes, quality_metrics, template_metrics, template_similarity, unit_locations |
0 commit comments