Skip to content

error when training by cifar10 #4

@Super-1123

Description

@Super-1123

Hi,I try to rerun this code to test this model's performance by using the 'python3.6 main.py --cfg cfgs/cifar10/aognet_cifar10_ps_4_bottleneck_1M.yaml --gpus 1,2'.At first everything seemed to be going smoothly,however,when it comes to epoch 280,it is stoped by an error:
Traceback (most recent call last):
File "main.py", line 133, in
main()
File "main.py", line 120, in main
epoch_end_callback = checkpoint)
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/module/base_module.py", line 575, in fit
callback(epoch, self.symbol, arg_params, aux_params)
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/callback.py", line 89, in _callback
save_checkpoint(prefix, iter_no + 1, sym, arg, aux)
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/model.py", line 409, in save_checkpoint
nd.save(param_name, save_dict)
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/ndarray/utils.py", line 273, in save
keys))
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: [09:52:25] src/io/local_filesys.cc:39: Check failed: std::fwrite(ptr, 1, size, fp
) == size FileStream.Write incomplete
I can't find the suitable solution to deal with this problem.So could you please tell me how to solve this problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions