Skip to content

Commit 4c9d32d

Browse files
committed
bombcell returns dataframe and docs
1 parent 42d9945 commit 4c9d32d

File tree

13 files changed

+420
-57
lines changed

13 files changed

+420
-57
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,7 @@ examples/modules_gallery/**/*.zarr
141141

142142
# Files and folders generated during tests
143143
test_folder/
144+
*trained_pipeline/
144145

145146
# Mac OS
146147
.DS_Store

doc/how_to/auto_label_units.rst

Lines changed: 272 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,272 @@
1+
Automatic labeling units after spike sorting
2+
============================================
3+
4+
This example shows how to automatically label units after spike sorting,
5+
using three different approaches:
6+
7+
1. Simple filter based on quality metrics
8+
2. Bombcell: heuristic approach to label units based on quality and
9+
template metrics [Fabre]_
10+
3. UnitRefine: pre-trained classifiers to label units as noise or
11+
SUA/MUA [Jain]_
12+
13+
.. code:: ipython3
14+
15+
import numpy as np
16+
17+
import spikeinterface as si
18+
import spikeinterface.curation as sc
19+
import spikeinterface.widgets as sw
20+
21+
from pprint import pprint
22+
23+
.. code:: ipython3
24+
25+
%matplotlib inline
26+
27+
.. code:: ipython3
28+
29+
analyzer_path = "/ssd980/working/analyzer_np2_single_shank.zarr"
30+
31+
.. code:: ipython3
32+
33+
sorting_analyzer = si.load(analyzer_path)
34+
35+
36+
.. code:: ipython3
37+
38+
sorting_analyzer
39+
40+
41+
.. parsed-literal::
42+
43+
SortingAnalyzer: 96 channels - 142 units - 1 segments - zarr - sparse - has recording
44+
Loaded 14 extensions: amplitude_scalings, correlograms, isi_histograms, noise_levels, principal_components, quality_metrics, random_spikes, spike_amplitudes, spike_locations, template_metrics, template_similarity, templates, unit_locations, waveforms
45+
46+
47+
48+
The ``SortingAnalyzer`` includes several metrics that we can use for
49+
curation:
50+
51+
.. code:: ipython3
52+
53+
sorting_analyzer.get_metrics_extension_data().columns
54+
55+
56+
57+
58+
.. parsed-literal::
59+
60+
Index(['amplitude_cutoff', 'amplitude_cv_median', 'amplitude_cv_range',
61+
'amplitude_median', 'd_prime', 'drift_mad', 'drift_ptp', 'drift_std',
62+
'firing_range', 'firing_rate', 'isi_violations_count',
63+
'isi_violations_ratio', 'isolation_distance', 'l_ratio', 'nn_hit_rate',
64+
'nn_miss_rate', 'noise_cutoff', 'noise_ratio', 'num_spikes',
65+
'presence_ratio', 'rp_contamination', 'rp_violations', 'sd_ratio',
66+
'silhouette', 'sliding_rp_violation', 'snr', 'sync_spike_2',
67+
'sync_spike_4', 'sync_spike_8', 'exp_decay', 'half_width',
68+
'main_peak_to_trough_ratio', 'main_to_next_extremum_duration',
69+
'num_negative_peaks', 'num_positive_peaks',
70+
'peak_after_to_trough_ratio', 'peak_after_width',
71+
'peak_before_to_peak_after_ratio', 'peak_before_to_trough_ratio',
72+
'peak_before_width', 'peak_to_trough_duration', 'recovery_slope',
73+
'repolarization_slope', 'spread', 'trough_width', 'velocity_above',
74+
'velocity_below', 'waveform_baseline_flatness'],
75+
dtype='object')
76+
77+
78+
79+
1. Quality-metrics based curation
80+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
81+
82+
A simple solution is to use a filter based on quality metrics. To do so,
83+
we can use the ``spikeinterface.curation.qualitymetrics_label_units``
84+
function and provide a set of thresholds.
85+
86+
.. code:: ipython3
87+
88+
qm_thresholds = {
89+
"snr": {"min": 5},
90+
"firing_rate": {"min": 0.1, "max": 200},
91+
"rp_contamination": {"max": 0.5}
92+
}
93+
94+
.. code:: ipython3
95+
96+
qm_labels = sc.qualitymetrics_label_units(sorting_analyzer, thresholds=qm_thresholds)
97+
98+
.. code:: ipython3
99+
100+
qm_labels["label"].value_counts()
101+
102+
103+
104+
105+
.. parsed-literal::
106+
107+
label
108+
noise 115
109+
good 27
110+
Name: count, dtype: int64
111+
112+
113+
114+
.. code:: ipython3
115+
116+
w = sw.plot_unit_labels(sorting_analyzer, qm_labels["label"], ylims=(-300, 100))
117+
w.figure.suptitle("Quality-metrics labeling")
118+
119+
120+
121+
.. image:: auto_label_units_files/auto_label_units_12_1.png
122+
123+
124+
Only 27 units are labeled as *good*, and we can see from the plots that
125+
some “noisy” waveforms are not properly flagged and some visually good
126+
waveforms are labeled as noise. Let’s take a look at more powerful
127+
methods.
128+
129+
1. Bombcell
130+
-----------
131+
132+
**Bombcell** ([Fabre]_) is another threshold-based method that also uses
133+
quality metrics and template metrics, but in a much more refined way! It
134+
can label units as ``noise``, ``mua``, and ``good`` and further detect
135+
``non-soma`` units. It comes with some default thresholds, but
136+
user-defined thresholds can be provided from a dictionary or a JSON
137+
file.
138+
139+
.. code:: ipython3
140+
141+
bombcell_default_thresholds = sc.bombcell_get_default_thresholds()
142+
pprint(bombcell_default_thresholds)
143+
144+
145+
.. parsed-literal::
146+
147+
{'amplitude_cutoff': {'max': 0.2, 'min': None},
148+
'amplitude_median': {'max': None, 'min': 40},
149+
'drift_ptp': {'max': 100, 'min': None},
150+
'exp_decay': {'max': 0.1, 'min': 0.01},
151+
'main_peak_to_trough_ratio': {'max': 0.8, 'min': None},
152+
'num_negative_peaks': {'max': 1, 'min': None},
153+
'num_positive_peaks': {'max': 2, 'min': None},
154+
'num_spikes': {'max': None, 'min': 300},
155+
'peak_after_to_trough_ratio': {'max': 0.8, 'min': None},
156+
'peak_before_to_peak_after_ratio': {'max': 3, 'min': None},
157+
'peak_before_to_trough_ratio': {'max': 3, 'min': None},
158+
'peak_before_width': {'max': None, 'min': 0.00015},
159+
'peak_to_trough_duration': {'max': 0.00115, 'min': 0.0001},
160+
'presence_ratio': {'max': None, 'min': 0.7},
161+
'rp_contamination': {'max': 0.1, 'min': None},
162+
'snr_baseline': {'max': None, 'min': 5},
163+
'trough_width': {'max': None, 'min': 0.0002},
164+
'waveform_baseline_flatness': {'max': 0.5, 'min': None}}
165+
166+
167+
.. code:: ipython3
168+
169+
bombcell_labels = sc.bombcell_label_units(sorting_analyzer, thresholds=bombcell_default_thresholds)
170+
171+
.. code:: ipython3
172+
173+
bombcell_labels["label"].value_counts()
174+
175+
176+
177+
178+
.. parsed-literal::
179+
180+
label
181+
good 58
182+
noise 50
183+
mua 33
184+
non_soma 1
185+
Name: count, dtype: int64
186+
187+
188+
189+
.. code:: ipython3
190+
191+
w = sw.plot_unit_labels(sorting_analyzer, bombcell_labels["label"], ylims=(-300, 100))
192+
w.figure.suptitle("Bombcell labeling")
193+
194+
195+
196+
.. image:: auto_label_units_files/auto_label_units_18_1.png
197+
198+
199+
UnitRefine
200+
----------
201+
202+
**UnitRefine** ([Jain]_) also uses quality and template metrics, but in
203+
a different way. It uses pre-trained classifiers to trained on
204+
hand-curated data. By default, the classification is performed in two
205+
steps: first a *noise*/*neural* classifier is applied, followed by a
206+
*sua*/*mua* classifier. Several models are available on the
207+
`SpikeInterface HuggingFace
208+
page <https://huggingface.co/SpikeInterface>`__.
209+
210+
.. code:: ipython3
211+
212+
unitrefine_labels = sc.unitrefine_label_units(
213+
sorting_analyzer,
214+
noise_neural_classifier="SpikeInterface/UnitRefine_noise_neural_classifier",
215+
sua_mua_classifier="SpikeInterface/UnitRefine_sua_mua_classifier",
216+
)
217+
218+
.. code:: ipython3
219+
220+
unitrefine_labels["label"].value_counts()
221+
222+
223+
224+
225+
.. parsed-literal::
226+
227+
label
228+
sua 62
229+
noise 47
230+
mua 33
231+
Name: count, dtype: int64
232+
233+
234+
235+
.. code:: ipython3
236+
237+
w = sw.plot_unit_labels(sorting_analyzer, unitrefine_labels["label"], ylims=(-300, 100))
238+
w.figure.suptitle("UnitRefine labeling")
239+
240+
241+
242+
.. image:: auto_label_units_files/auto_label_units_22_1.png
243+
244+
245+
**NOTE:** If you want to train your own models, see the `UnitRefine
246+
repo <%60https://github.com/anoushkajain/UnitRefine%60>`__ for
247+
instructions!
248+
249+
This “How To” demonstrated how to automatically label units after spike
250+
sorting with different strategies. We recommend running **Bombcell** and
251+
**UnitRefine** as part of your pipeline. These methods will facilitate
252+
further curation and make downstream analysis cleaner.
253+
254+
To remove units from your ``SortingAnalyzer``, you can simply use the
255+
``select_units`` function:
256+
257+
.. code:: ipython3
258+
259+
non_noisy_units = bombcell_labels["label"] != "noise"
260+
sorting_analyzer_clean = sorting_analyzer.select_units(sorting_analyzer.unit_ids[non_noisy_units])
261+
262+
.. code:: ipython3
263+
264+
sorting_analyzer_clean
265+
266+
267+
268+
269+
.. parsed-literal::
270+
271+
SortingAnalyzer: 96 channels - 92 units - 1 segments - memory - sparse - has recording
272+
Loaded 14 extensions: random_spikes, waveforms, templates, amplitude_scalings, correlograms, isi_histograms, noise_levels, principal_components, spike_locations, spike_amplitudes, quality_metrics, template_metrics, template_similarity, unit_locations
73.7 KB
Loading
67.4 KB
Loading
68.3 KB
Loading

doc/how_to/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Guides on how to solve specific, short problems in SpikeInterface. Learn how to.
2121
load_matlab_data
2222
load_your_data_into_sorting
2323
benchmark_with_hybrid_recordings
24+
auto_label_units
2425
auto_curation_training
2526
auto_curation_prediction
2627
import_kilosort_data

0 commit comments

Comments
 (0)