This repository was archived by the owner on Jan 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathindex.html
More file actions
571 lines (538 loc) · 44.9 KB
/
index.html
File metadata and controls
571 lines (538 loc) · 44.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>NeuroImaging with the Scikit-learn: fMRI inverse inference tutorial — NeuroImaging with the scikit-learn</title>
<link rel="stylesheet" href="_static/nature.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '',
VERSION: '2010',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="top" title="NeuroImaging with the scikit-learn" href="#" />
</head>
<body>
<div class="header-wrapper">
<div class="header">
<p class="logo"><a href="#">
<img src="_static/scikitlearn.png" alt="Logo"/>
</a>
</p><div class="navbar">
</div>
</div> <!-- end navbar --></div>
</div>
<div class="content-wrapper">
<div class="sphinxsidebar">
<div class="rel">
</div>
<h3>Contents</h3>
<ul>
<li><a class="reference internal" href="#">NeuroImaging with the Scikit-learn: fMRI inverse inference tutorial</a><ul>
<li><a class="reference internal" href="#introduction">Introduction</a><ul>
<li><a class="reference internal" href="#what-is-the-scikit-learn">What is the scikit-learn?</a></li>
<li><a class="reference internal" href="#installation-of-the-required-materials">Installation of the required materials</a><ul>
<li><a class="reference internal" href="#the-data">The data</a></li>
<li><a class="reference internal" href="#nibabel">Nibabel</a></li>
<li><a class="reference internal" href="#scikits-learn">Scikits-learn</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#first-step-looking-at-the-data">First step: looking at the data</a></li>
<li><a class="reference internal" href="#second-step-decoding-analysis">Second step: decoding analysis</a><ul>
<li><a class="reference internal" href="#prediction-function">Prediction function</a></li>
<li><a class="reference internal" href="#dimension-reduction">Dimension reduction</a></li>
</ul>
</li>
<li><a class="reference internal" href="#third-step-launch-it-on-real-data">Third step: launch it on real data</a><ul>
<li><a class="reference internal" href="#fit-train-and-predict-test">Fit (train) and predict (test)</a></li>
<li><a class="reference internal" href="#cross-validation">Cross-validation</a></li>
<li><a class="reference internal" href="#prediction-accuracy">Prediction accuracy</a></li>
</ul>
</li>
<li><a class="reference internal" href="#final-script">Final script</a></li>
<li><a class="reference internal" href="#going-further-with-scikit-learn">Going further with scikit-learn</a><ul>
<li><a class="reference internal" href="#example-of-the-simplicity-of-scikit-learn">Example of the simplicity of scikit-learn</a></li>
<li><a class="reference internal" href="#changing-the-prediction-function">Changing the prediction function</a></li>
<li><a class="reference internal" href="#changing-the-feature-selection">Changing the feature selection</a></li>
</ul>
</li>
<li><a class="reference internal" href="#any-questions">Any questions ?</a></li>
</ul>
</li>
</ul>
</div>
<div class="content">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body">
<div class="section" id="neuroimaging-with-the-scikit-learn-fmri-inverse-inference-tutorial">
<h1>NeuroImaging with the <a class="reference external" href="http://scikit-learn.sourceforge.net/">Scikit-learn</a>: fMRI inverse inference tutorial<a class="headerlink" href="#neuroimaging-with-the-scikit-learn-fmri-inverse-inference-tutorial" title="Permalink to this headline">¶</a></h1>
<p><strong>Autors:</strong> <a class="reference external" href="https://parietal.saclay.inria.fr/">INRIA Parietal Project Team</a> and <a class="reference external" href="http://scikit-learn.sourceforge.net/">scikit-learn</a> folks, among which <strong>A.
Gramfort, V. Michel, G. Varoquaux, F. Pedregosa and B. Thirion</strong></p>
<p>Thanks to M. Hanke and Y. Halchenko for data and packaging.</p>
<div class="topic">
<p class="topic-title first">Objectives</p>
<p>At the end of this tutorial you will be able to:</p>
<blockquote>
<ol class="arabic simple">
<li>Install and use the required tools (Nibabel and scikits-learn).</li>
<li>Load fMRI volumes in Python.</li>
<li>Perform a state-of-the-art decoding analysis of fMRI data.</li>
<li>Perform even more sophisticated analyzes of fMRI data.</li>
</ol>
</blockquote>
</div>
<div class="section" id="introduction">
<h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline">¶</a></h2>
<div class="section" id="what-is-the-scikit-learn">
<h3>What is the scikit-learn?<a class="headerlink" href="#what-is-the-scikit-learn" title="Permalink to this headline">¶</a></h3>
<p>Scikit-learn is a Python library for machine learning.</p>
<p>Principal features:</p>
<ul class="simple">
<li>Easy to use.</li>
<li>Easy to install.</li>
<li>Well documented.</li>
<li>Provide standard machine learning methods for non-experts.</li>
</ul>
<p>Technical choices:</p>
<ul class="simple">
<li>Python: general-purpose, high-level language.</li>
<li>Simple data structures (numpy arrays).</li>
<li>BSD license : reuse even in commercial settings</li>
</ul>
</div>
<div class="section" id="installation-of-the-required-materials">
<h3>Installation of the required materials<a class="headerlink" href="#installation-of-the-required-materials" title="Permalink to this headline">¶</a></h3>
<div class="section" id="the-data">
<h4>The data<a class="headerlink" href="#the-data" title="Permalink to this headline">¶</a></h4>
<p>We use here the <em>Haxby 2001</em> dataset [Haxby et al. (2001)], that has been
reanalyzed in [Hanson et al (2004), O’Toole et al. (2005)].</p>
<p>In short, we have:</p>
<blockquote>
<ul class="simple">
<li>8 objects presented once in 12 sessions</li>
<li>864 volumes containing 39912 voxels</li>
</ul>
</blockquote>
<p>Additional information : <a class="reference external" href="http://www.sciencemag.org/content/293/5539/2425">http://www.sciencemag.org/content/293/5539/2425</a></p>
<p>Download the data:</p>
<div class="highlight-python"><pre>$ wget http://www.pymvpa.org/files/pymvpa_exampledata.tar.bz2</pre>
</div>
<p>decompress them:</p>
<div class="highlight-python"><pre>$ tar xjfv pymvpa_exampledata.tar.bz2</pre>
</div>
<p>and go to the data directory:</p>
<div class="highlight-python"><pre>$ cd pymvpa-exampledata</pre>
</div>
</div>
<div class="section" id="nibabel">
<h4>Nibabel<a class="headerlink" href="#nibabel" title="Permalink to this headline">¶</a></h4>
<p>Easy to use reader of ANALYZE (plain, SPM99, SPM2), GIFTI, NIfTI1, MINC
(former PyNIfTI):</p>
<div class="highlight-python"><pre>$ easy_install nibabel</pre>
</div>
<p>and if you can not be root:</p>
<div class="highlight-python"><pre>$ easy_install --prefix=~/usr nibabel</pre>
</div>
</div>
<div class="section" id="scikits-learn">
<h4>Scikits-learn<a class="headerlink" href="#scikits-learn" title="Permalink to this headline">¶</a></h4>
<p>(Quick) installation:</p>
<div class="highlight-python"><pre>$ easy_install scikits.learn</pre>
</div>
</div>
</div>
</div>
<div class="section" id="first-step-looking-at-the-data">
<h2>First step: looking at the data<a class="headerlink" href="#first-step-looking-at-the-data" title="Permalink to this headline">¶</a></h2>
<p>Now, launch ipython:</p>
<div class="highlight-python"><pre>$ ipython</pre>
</div>
<p>First, we load the data. We have to import the nibabel module and the numpy
module (basic array manipulations):</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">nibabel</span> <span class="kn">as</span> <span class="nn">ni</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
</pre></div>
</div>
<p>... load the fMRI volumes (what we will call X)</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">X</span> <span class="o">=</span> <span class="n">ni</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">"bold.nii.gz"</span><span class="p">)</span><span class="o">.</span><span class="n">get_data</span><span class="p">()</span>
</pre></div>
</div>
<p>... the mask</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">mask</span> <span class="o">=</span> <span class="n">ni</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">"mask.nii.gz"</span><span class="p">)</span><span class="o">.</span><span class="n">get_data</span><span class="p">()</span>
</pre></div>
</div>
<p>... and the target (that we will call y), and the session index:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">y</span><span class="p">,</span> <span class="n">session</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">loadtxt</span><span class="p">(</span><span class="s">"attributes.txt"</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s">"int"</span><span class="p">)</span><span class="o">.</span><span class="n">T</span>
</pre></div>
</div>
<p>Check the dimensions of the data:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">X</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(40, 64, 64, 1452)</span>
<span class="gp">>>> </span><span class="n">mask</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(40, 64, 64)</span>
</pre></div>
</div>
<p>Mask the data X and transpose the matrix, so that its shape becomes (n_samples,
n_features):</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">mask</span><span class="o">!=</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">T</span>
<span class="gp">>>> </span><span class="n">X</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(1452, 39912)</span>
</pre></div>
</div>
<p>and we (hopefully) retrieve the correct number of voxels (39912).</p>
<p>Finally, we can detrend the data (for each session separately):</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">scipy</span> <span class="kn">import</span> <span class="n">signal</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">session</span><span class="p">):</span>
<span class="gp">... </span> <span class="n">X</span><span class="p">[</span><span class="n">session</span><span class="o">==</span><span class="n">s</span><span class="p">]</span> <span class="o">=</span> <span class="n">signal</span><span class="o">.</span><span class="n">detrend</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">session</span><span class="o">==</span><span class="n">s</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</pre></div>
</div>
<p>Now, we take a look to the target y:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">y</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(1452,)</span>
<span class="gp">>>> </span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="go">array([0, 1, 2, 3, 4, 5, 6, 7, 8])</span>
</pre></div>
</div>
<p>where 0 is rest period, and [1..8] is the label of each object.</p>
<div class="topic">
<p class="topic-title first">Exercise</p>
<ol class="arabic simple">
<li>Extract the period of activity from the data (i.e. remove the remainder).</li>
</ol>
</div>
<div class="topic">
<p class="topic-title first">Solution</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">session</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">y</span><span class="o">!=</span><span class="mi">0</span><span class="p">],</span> <span class="n">y</span><span class="p">[</span><span class="n">y</span><span class="o">!=</span><span class="mi">0</span><span class="p">],</span> <span class="n">session</span><span class="p">[</span><span class="n">y</span><span class="o">!=</span><span class="mi">0</span><span class="p">]</span>
</pre></div>
</div>
</div>
<p>We can check that:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">n_samples</span><span class="p">,</span> <span class="n">n_features</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span>
<span class="gp">>>> </span><span class="n">n_samples</span>
<span class="go">864</span>
<span class="gp">>>> </span><span class="n">n_features</span>
<span class="go">39912</span>
</pre></div>
</div>
<p>and we have the 8 conditions:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">n_conditions</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">y</span><span class="p">))</span>
<span class="gp">>>> </span><span class="n">n_conditions</span>
<span class="go">8</span>
</pre></div>
</div>
</div>
<div class="section" id="second-step-decoding-analysis">
<h2>Second step: decoding analysis<a class="headerlink" href="#second-step-decoding-analysis" title="Permalink to this headline">¶</a></h2>
<p>In a decoding analysis we construct a model, so that one can predict
a value of y given a set X of images.</p>
<div class="section" id="prediction-function">
<h3>Prediction function<a class="headerlink" href="#prediction-function" title="Permalink to this headline">¶</a></h3>
<p>We define here a simple Support Vector Classification (or SVC) with C=1,
and a linear kernel. We first import the correct module from
scikits-learn:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">scikits.learn.svm</span> <span class="kn">import</span> <span class="n">SVC</span>
</pre></div>
</div>
<p>and we define the classifier:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">clf</span> <span class="o">=</span> <span class="n">SVC</span><span class="p">(</span><span class="n">kernel</span><span class="o">=</span><span class="s">'linear'</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="mf">1.</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">clf</span>
<span class="go">SVC(kernel='linear', C=1.0, probability=False, degree=3, coef0=0.0,</span>
<span class="go">eps=0.001, cache_size=100.0, shrinking=True, gamma=0.0)</span>
</pre></div>
</div>
<p>Need some doc ?</p>
<div class="highlight-python"><pre>>>> clf ?
Type: SVC
Base Class: <class 'scikits.learn.svm.libsvm.SVC'>
String Form:
SVC(kernel=linear, C=1.0, probability=False, degree=3, coef0=0.0, eps=0.001,
cache_size=100.0, shrinking=True, gamma=0.0)
Namespace: Interactive
Docstring:
C-Support Vector Classification.
Parameters
----------
C : float, optional (default=1.0)
penalty parameter C of the error term.
...</pre>
</div>
<p>Or go to the <a class="reference external" href="http://scikit-learn.sourceforge.net/modules/svm.html">scikits-learn
documentation</a></p>
<p>We use a SVC here, but we can use
<a class="reference external" href="http://scikit-learn.sourceforge.net/supervised_learning.html">many other
classifiers</a></p>
</div>
<div class="section" id="dimension-reduction">
<h3>Dimension reduction<a class="headerlink" href="#dimension-reduction" title="Permalink to this headline">¶</a></h3>
<p>But a classification with few samples and many features is plagued by the
<em>curse of dimensionality</em>. Let us add a feature selection procedure.</p>
<p>For this, we need to import the correct module:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">scikits.learn.feature_selection</span> <span class="kn">import</span> <span class="n">SelectKBest</span><span class="p">,</span> <span class="n">f_classif</span>
</pre></div>
</div>
<p>and define a simple F-score based feature selection (a.k.a. Anova):</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">feature_selection</span> <span class="o">=</span> <span class="n">SelectKBest</span><span class="p">(</span><span class="n">f_classif</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">500</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">feature_selection</span>
<span class="go">SelectKBest(k=500, score_func=<function f_classif at 0x8c93684>)</span>
</pre></div>
</div>
<p>We have our classifier (SVC), our feature selection (SelectKBest), and now, we
can plug them together in a <em>pipeline</em> that performs the two operations
successively:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">scikits.learn.pipeline</span> <span class="kn">import</span> <span class="n">Pipeline</span>
<span class="gp">>>> </span><span class="n">anova_svc</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="p">([(</span><span class="s">'anova'</span><span class="p">,</span> <span class="n">feature_selection</span><span class="p">),</span> <span class="p">(</span><span class="s">'svc'</span><span class="p">,</span> <span class="n">clf</span><span class="p">)])</span>
<span class="gp">>>> </span><span class="n">anova_svc</span>
<span class="go">Pipeline(steps=[('anova', SelectKBest(k=500, score_func=<function</span>
<span class="go">f_classif at 0x8c93684>)), ('svc', SVC(kernel='linear', C=1.0,</span>
<span class="go">probability=False, degree=3, coef0=0.0, eps=0.001,</span>
<span class="go">cache_size=100.0, shrinking=True, gamma=0.0))])</span>
</pre></div>
</div>
<p>We use a univariate feature selection, but we can use other dimension
reduction such as <a class="reference external" href="http://scikit-learn.sourceforge.net/modules/generated/scikits.learn.feature_selection.rfe.RFE.html">RFE</a></p>
</div>
</div>
<div class="section" id="third-step-launch-it-on-real-data">
<h2>Third step: launch it on real data<a class="headerlink" href="#third-step-launch-it-on-real-data" title="Permalink to this headline">¶</a></h2>
<div class="section" id="fit-train-and-predict-test">
<h3>Fit (train) and predict (test)<a class="headerlink" href="#fit-train-and-predict-test" title="Permalink to this headline">¶</a></h3>
<p>In scikits-learn, prediction function have a very simple API:</p>
<blockquote>
<ul>
<li><p class="first">a <em>fit</em> function that “learn” the parameters of the model from the data.
Thus, we need to give some training data to <em>fit</em>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">anova_svc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="go">Pipeline(steps=[('anova', SelectKBest(k=500, score_func=<function f_classif</span>
<span class="go">at 0x8c93684>)), ('svc', SVC(kernel='linear', C=1.0, probability=False,</span>
<span class="go">degree=3, coef0=0.0, eps=0.001,</span>
<span class="go">cache_size=100.0, shrinking=True, gamma=0.0))])</span>
</pre></div>
</div>
</li>
<li><p class="first">a <em>predict</em> function that “predict” a target from new data.
Here, we just have to give the new set of images (as the target should be
unknown):</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">y_pred</span> <span class="o">=</span> <span class="n">anova_svc</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">y_pred</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(864,)</span>
<span class="gp">>>> </span><span class="n">X</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(864, 39912)</span>
</pre></div>
</div>
<p><strong>Warning ! Do not do this at home !</strong> the score that we obtain here is
heavily biased (see next paragraph). This is used here to check that
we have one predicted value per image.</p>
</li>
</ul>
</blockquote>
<p>Note that you could have done this in only 1 line:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">y_pred</span> <span class="o">=</span> <span class="n">anova_svc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="cross-validation">
<h3>Cross-validation<a class="headerlink" href="#cross-validation" title="Permalink to this headline">¶</a></h3>
<p>However, the last analysis is <em>wrong</em>, as we have learned and tested on
the same set of data. We need to use a cross-validation to split the data
into different sets.</p>
<p>Let us define a Leave-one-session-out cross-validation:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">scikits.learn.cross_val</span> <span class="kn">import</span> <span class="n">LeaveOneLabelOut</span>
<span class="gp">>>> </span><span class="n">cv</span> <span class="o">=</span> <span class="n">LeaveOneLabelOut</span><span class="p">(</span><span class="n">session</span><span class="p">)</span>
</pre></div>
</div>
<p>In scikit-learn, a cross-validation is simply a function that generates
the index of the folds within a loop.
So, now, we can apply the previously defined <em>pipeline</em> with the
cross-validation:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">cv_scores</span> <span class="o">=</span> <span class="p">[]</span> <span class="c"># will store the number of correct predictions in each fold</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="ow">in</span> <span class="n">cv</span><span class="p">:</span>
<span class="gp">>>> </span> <span class="n">y_pred</span> <span class="o">=</span> <span class="n">anova_svc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">train</span><span class="p">],</span> <span class="n">y</span><span class="p">[</span><span class="n">train</span><span class="p">])</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">test</span><span class="p">])</span>
<span class="gp">>>> </span> <span class="n">cv_scores</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">y_pred</span> <span class="o">==</span> <span class="n">y</span><span class="p">[</span><span class="n">test</span><span class="p">]))</span>
</pre></div>
</div>
<p>But we are lazy people, so there is a specific
function, <em>cross_val_score</em> that computes for you the results for the
different folds of cross-validation:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">scikits.learn.cross_val</span> <span class="kn">import</span> <span class="n">cross_val_score</span>
<span class="gp">>>> </span><span class="n">cv_scores</span> <span class="o">=</span> <span class="n">cross_val_score</span><span class="p">(</span><span class="n">anova_svc</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">cv</span><span class="o">=</span><span class="n">cv</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="go"> verbose=1, iid=True)</span>
<span class="go">n_jobs = 1 means that the computation is not parallel.</span>
<span class="go">But, if you are the happy owner of a multiple processors computer, you can</span>
<span class="go">even speed up the computation:</span>
<span class="gp">>>> </span><span class="n">cv_scores</span> <span class="o">=</span> <span class="n">cross_val_score</span><span class="p">(</span><span class="n">anova_svc</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">cv</span><span class="o">=</span><span class="n">cv</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
<span class="go"> verbose=1, iid=True)</span>
</pre></div>
</div>
</div>
<div class="section" id="prediction-accuracy">
<h3>Prediction accuracy<a class="headerlink" href="#prediction-accuracy" title="Permalink to this headline">¶</a></h3>
<blockquote>
<p>We can take a look to the results of the <em>cross_val_score</em> function:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">cv_scores</span>
<span class="go">array([ 60., 59., 65., 49., 57., 56., 52., 44., 54., 47., 49.,</span>
<span class="go"> 51.])</span>
</pre></div>
</div>
<p>This is simply the number of correct predictions for each fold.</p>
</blockquote>
<div class="topic">
<p class="topic-title first">Exercise</p>
<ol class="arabic simple">
<li>Compute the mean prediction accuracy using <em>cv_scores</em></li>
</ol>
</div>
<div class="topic">
<p class="topic-title first">Solution</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">classification_accuracy</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">cv_scores</span><span class="p">)</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="n">n_samples</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">classification_accuracy</span>
<span class="go">0.74421296296296291</span>
</pre></div>
</div>
</div>
<p>We have a total prediction accuracy of 74% across the different folds.</p>
<p>We can add a line to print the results:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="k">print</span> <span class="s">"Classification accuracy: </span><span class="si">%f</span><span class="s">"</span> <span class="o">%</span> <span class="n">classification_accuracy</span><span class="p">,</span> \
<span class="go"> " / Chance level: %f" % (1. / n_conditions)</span>
<span class="go">Classification accuracy: 0.744213 / Chance level: 0.125000</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="final-script">
<h2>Final script<a class="headerlink" href="#final-script" title="Permalink to this headline">¶</a></h2>
<p>An thus, the complete script is (<a class="reference download internal" href="_downloads/decoding_example.py"><tt class="xref download docutils literal"><span class="pre">decoding_example.py</span></tt></a>):</p>
<div class="highlight-python"><div class="highlight"><pre><span class="c">### All the imports</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="kn">from</span> <span class="nn">scipy</span> <span class="kn">import</span> <span class="n">signal</span>
<span class="kn">import</span> <span class="nn">nibabel</span> <span class="kn">as</span> <span class="nn">ni</span>
<span class="kn">from</span> <span class="nn">scikits.learn.svm</span> <span class="kn">import</span> <span class="n">SVC</span>
<span class="kn">from</span> <span class="nn">scikits.learn.feature_selection</span> <span class="kn">import</span> <span class="n">SelectKBest</span><span class="p">,</span> <span class="n">f_classif</span>
<span class="kn">from</span> <span class="nn">scikits.learn.pipeline</span> <span class="kn">import</span> <span class="n">Pipeline</span>
<span class="kn">from</span> <span class="nn">scikits.learn.cross_val</span> <span class="kn">import</span> <span class="n">LeaveOneLabelOut</span><span class="p">,</span> <span class="n">cross_val_score</span>
<span class="c">### Load data</span>
<span class="n">y</span><span class="p">,</span> <span class="n">session</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">loadtxt</span><span class="p">(</span><span class="s">"attributes.txt"</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s">"int"</span><span class="p">)</span><span class="o">.</span><span class="n">T</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">ni</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">"bold.nii.gz"</span><span class="p">)</span><span class="o">.</span><span class="n">get_data</span><span class="p">()</span>
<span class="n">mask</span> <span class="o">=</span> <span class="n">ni</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">"mask.nii.gz"</span><span class="p">)</span><span class="o">.</span><span class="n">get_data</span><span class="p">()</span>
<span class="c"># Process the data in order to have a two-dimensional design matrix X of</span>
<span class="c"># shape (nb_samples, nb_features).</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">mask</span><span class="o">!=</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">T</span>
<span class="c"># Detrend data on each session independently</span>
<span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">session</span><span class="p">):</span>
<span class="n">X</span><span class="p">[</span><span class="n">session</span><span class="o">==</span><span class="n">s</span><span class="p">]</span> <span class="o">=</span> <span class="n">signal</span><span class="o">.</span><span class="n">detrend</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">session</span><span class="o">==</span><span class="n">s</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="c"># Remove volumes corresponding to rest</span>
<span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">session</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">y</span><span class="o">!=</span><span class="mi">0</span><span class="p">],</span> <span class="n">y</span><span class="p">[</span><span class="n">y</span><span class="o">!=</span><span class="mi">0</span><span class="p">],</span> <span class="n">session</span><span class="p">[</span><span class="n">y</span><span class="o">!=</span><span class="mi">0</span><span class="p">]</span>
<span class="n">n_samples</span><span class="p">,</span> <span class="n">n_features</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span>
<span class="n">n_conditions</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">y</span><span class="p">))</span>
<span class="c">### Define the prediction function to be used.</span>
<span class="c"># Here we use a Support Vector Classification, with a linear kernel and C=1</span>
<span class="n">clf</span> <span class="o">=</span> <span class="n">SVC</span><span class="p">(</span><span class="n">kernel</span><span class="o">=</span><span class="s">'linear'</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="mf">1.</span><span class="p">)</span>
<span class="c">### Define the dimension reduction to be used.</span>
<span class="c"># Here we use a classical univariate feature selection based on F-test,</span>
<span class="c"># namely Anova. We set the number of features to be selected to 500</span>
<span class="n">feature_selection</span> <span class="o">=</span> <span class="n">SelectKBest</span><span class="p">(</span><span class="n">f_classif</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">500</span><span class="p">)</span>
<span class="c">### We combine the dimension reduction and the prediction function</span>
<span class="n">anova_svc</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="p">([(</span><span class="s">'anova'</span><span class="p">,</span> <span class="n">feature_selection</span><span class="p">),</span> <span class="p">(</span><span class="s">'svc'</span><span class="p">,</span> <span class="n">clf</span><span class="p">)])</span>
<span class="c">### Define the cross-validation scheme used for validation.</span>
<span class="c"># Here we use a LeaveOneLabelOut cross-validation on the session, which</span>
<span class="c"># corresponds to a leave-one-session-out</span>
<span class="n">cv</span> <span class="o">=</span> <span class="n">LeaveOneLabelOut</span><span class="p">(</span><span class="n">session</span><span class="p">)</span>
<span class="c">### Compute the prediction accuracy for the different folds (i.e. session)</span>
<span class="n">cv_scores</span> <span class="o">=</span> <span class="n">cross_val_score</span><span class="p">(</span><span class="n">anova_svc</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">cv</span><span class="o">=</span><span class="n">cv</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">iid</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="c">### Return the corresponding mean prediction accuracy</span>
<span class="n">classification_accuracy</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">cv_scores</span><span class="p">)</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="n">n_samples</span><span class="p">)</span>
<span class="k">print</span> <span class="s">"Classification accuracy: </span><span class="si">%f</span><span class="s">"</span> <span class="o">%</span> <span class="n">classification_accuracy</span><span class="p">,</span> \
<span class="s">" / Chance level: </span><span class="si">%f</span><span class="s">"</span> <span class="o">%</span> <span class="p">(</span><span class="mf">1.</span> <span class="o">/</span> <span class="n">n_conditions</span><span class="p">)</span>
</pre></div>
</div>
<p>Now, you just have to publish the results :)</p>
</div>
<div class="section" id="going-further-with-scikit-learn">
<h2>Going further with scikit-learn<a class="headerlink" href="#going-further-with-scikit-learn" title="Permalink to this headline">¶</a></h2>
<p>We have seen a very simple analysis with scikits-learn.</p>
<p><a class="reference external" href="http://scikit-learn.sourceforge.net/modules/glm.html">Other prediction functions with
Scikits-learn</a></p>
<p><a class="reference external" href="http://scikit-learn.sourceforge.net/modules/clustering.html">Unsupervised learning (e.g. clustering, PCA, ICA) with
Scikits-learn</a></p>
<div class="section" id="example-of-the-simplicity-of-scikit-learn">
<h3>Example of the simplicity of scikit-learn<a class="headerlink" href="#example-of-the-simplicity-of-scikit-learn" title="Permalink to this headline">¶</a></h3>
<p>One of the major assets of scikits-learn is the real simplicity of use.</p>
</div>
<div class="section" id="changing-the-prediction-function">
<h3>Changing the prediction function<a class="headerlink" href="#changing-the-prediction-function" title="Permalink to this headline">¶</a></h3>
<p>We now see how one can easily change the prediction function, if needed.
We can try the Linear Discriminant Analysis
(LDA) <a class="reference external" href="http://scikit-learn.sourceforge.net/auto_examples/plot_lda_qda.html">http://scikit-learn.sourceforge.net/auto_examples/plot_lda_qda.html</a></p>
<p>Import the module:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">scikits.learn.lda</span> <span class="kn">import</span> <span class="n">LDA</span>
</pre></div>
</div>
<p>Construct the new prediction function and use it in a pipeline:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">lda</span> <span class="o">=</span> <span class="n">LDA</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">anova_lda</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="p">([(</span><span class="s">'anova'</span><span class="p">,</span> <span class="n">feature_selection</span><span class="p">),</span> <span class="p">(</span><span class="s">'LDA'</span><span class="p">,</span> <span class="n">lda</span><span class="p">)])</span>
</pre></div>
</div>
<p>and recompute the cross-validation score:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">cv_scores</span> <span class="o">=</span> <span class="n">cross_val_score</span><span class="p">(</span><span class="n">anova_lda</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">cv</span><span class="o">=</span><span class="n">cv</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
<span class="go"> verbose=1, iid=True)</span>
<span class="gp">>>> </span><span class="n">classification_accuracy</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">cv_scores</span><span class="p">)</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="n">n_samples</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">print</span> <span class="s">"Classification accuracy: </span><span class="si">%f</span><span class="s">"</span> <span class="o">%</span> <span class="n">classification_accuracy</span><span class="p">,</span> \
<span class="go"> " / Chance level: %f" % (1. / n_conditions)</span>
<span class="go">Classification accuracy: 0.728009 / Chance level: 0.125000</span>
</pre></div>
</div>
</div>
<div class="section" id="changing-the-feature-selection">
<h3>Changing the feature selection<a class="headerlink" href="#changing-the-feature-selection" title="Permalink to this headline">¶</a></h3>
<p>Let’s say that you want a more sophisticated feature selection, for example a
<a class="reference external" href="http://scikit-learn.sourceforge.net/modules/generated/scikits.learn.feature_selection.rfe.RFE.html">Recursive Feature Elimination
(RFE)</a></p>
<p>Import the module:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">scikits.learn.feature_selection</span> <span class="kn">import</span> <span class="n">RFE</span>
</pre></div>
</div>
<p>Construct your new fancy selection:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">rfe</span> <span class="o">=</span> <span class="n">RFE</span><span class="p">(</span><span class="n">SVC</span><span class="p">(</span><span class="n">kernel</span><span class="o">=</span><span class="s">'linear'</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="mf">1.</span><span class="p">),</span> <span class="n">n_features</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">percentage</span><span class="o">=</span><span class="mf">0.25</span><span class="p">)</span>
</pre></div>
</div>
<p>and create a new pipeline:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">rfe_svc</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="p">([(</span><span class="s">'rfe'</span><span class="p">,</span> <span class="n">rfe</span><span class="p">),</span> <span class="p">(</span><span class="s">'svc'</span><span class="p">,</span> <span class="n">clf</span><span class="p">)])</span>
</pre></div>
</div>
<p>and recompute the cross-validation score:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">cv_scores</span> <span class="o">=</span> <span class="n">cross_val_score</span><span class="p">(</span><span class="n">rfe_svc</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">cv</span><span class="o">=</span><span class="n">cv</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
<span class="go"> verbose=1, iid=True)</span>
</pre></div>
</div>
<p>But, be aware that this can take A WHILE...</p>
</div>
</div>
<div class="section" id="any-questions">
<h2>Any questions ?<a class="headerlink" href="#any-questions" title="Permalink to this headline">¶</a></h2>
<blockquote>
<a class="reference external" href="http://scikit-learn.sourceforge.net/">http://scikit-learn.sourceforge.net/</a></blockquote>
</div>
</div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
<div class="footer">
© Copyright INRIA Parietal 2010.
Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1pre/1f40a2bc5294 after a design by <a href="http://webylimonada.com">Web y Limonada</a>.
</div>
</body>
</html>