-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Hi,I try to rerun this code to test this model's performance by using the 'python3.6 main.py --cfg cfgs/cifar10/aognet_cifar10_ps_4_bottleneck_1M.yaml --gpus 1,2'.At first everything seemed to be going smoothly,however,when it comes to epoch 280,it is stoped by an error:
Traceback (most recent call last):
File "main.py", line 133, in
main()
File "main.py", line 120, in main
epoch_end_callback = checkpoint)
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/module/base_module.py", line 575, in fit
callback(epoch, self.symbol, arg_params, aux_params)
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/callback.py", line 89, in _callback
save_checkpoint(prefix, iter_no + 1, sym, arg, aux)
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/model.py", line 409, in save_checkpoint
nd.save(param_name, save_dict)
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/ndarray/utils.py", line 273, in save
keys))
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: [09:52:25] src/io/local_filesys.cc:39: Check failed: std::fwrite(ptr, 1, size, fp) == size FileStream.Write incomplete
I can't find the suitable solution to deal with this problem.So could you please tell me how to solve this problem?