This might be beneficial for applications that use thread-parallelism within each MPI rank. The interface has a flag for this already but nothing is parallelized as of now.