Parallelize restore functionality 

This might be beneficial for applications that use thread-parallelism within each MPI rank.

The interface has a flag for this already but nothing is parallelized as of now.