There is a widespread opinion that in the C++ language it is better to pass an object to a function by reference than by value (thus supposedly reducing the number of copies). But what if this object is located in RAM (not in processor the cache) and the function itself accesses the fields of this object several times. As known, access to RAM is expensive (200-600 cycles), but 64-128 bytes are copied at a time. Wouldn't it be more efficent to pass a small object (up to 128 bytes) by value, allowing the processor to copy it into the cache at a time and then access its fields using the high speed cache memory?
Or maybe the compiler in such a situation independently performs implicit copying to the cache?