You're confusing what MPI exposes as part of its interface (API call to allocate shared memory) and how an MPI implementation uses the hardware to achieve its goals.
MPI is subtle, complex, and surprising, and implementations even subtle, complex, and surprising. It's designed to eke the last bit of performance out of systems and thus the implementations may use any range of system details so long as they don't violate the contracts in the MPI design. I don't think the original MPI designers really considered intra-node communication as particularly important, at the time most machines were single core and the parallelism was between machines. Then massively multicore with coherent memory models showed up, and the MPI implementers (both OpenMPI and MPICH; I don't know about any commercial implementations) added explicit shared memory to the API (the call you reference) and adopted shared memory as an internal transport.