Monday 21 October 2019

Model checkpointed using torch.save() unable to be loaded using torch.load() #12042

deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)

RuntimeError: storage has wrong size: expected -5099839699493302364 got 589824

This usually happens when multiple processes try to write to a single file.
However, this should be prevented with the if condition if rank == 0:.


https://discuss.pytorch.org/t/unable-to-load-waveglow-checkpoint-after-training-with-multiple-gpus/47959/2

https://github.com/pytorch/examples/blob/master/imagenet/main.py#L252

No comments:

Post a Comment