There is nothing unique in the regard with the sub-pixels except they're small, right?
But with those you’re far enough away that the apparent size of them is similar enough to the sub-pixels in your monitor, so your visual system combines them.
For example, if a display is emitting red and green light then the light reaching a viewer's retina will be red and green light, not yellow light.
RGB light does actually ‘combine’, physically, before we ‘perceive’ the color. It’s because we (most of us) have 3 sensors each with specific wavelength response functions. The physical output of those sensors is the same for a red+green combination as it is for a yellow combination, and therefore the color has been combined as part of measuring the color.
We might be arguing semantics but I'm going to say no, they do not physically combine before we perceive the color. This is supported by the link you provided on metamerism in a related comment.
red + green = yellow
green + blue = cyan
(also based on subpixel aliasing demonstrated in the video, just not mentioned)
It's important in rendering to take your colorspaces seriously, from the engine driving them to the artist making the art. There are some crazy optimizations that take your perception of colour into account too, texture compression relies on this to some capacity (blues are given less bits, for example).
And the first sub-pixel font Millitext from 2008, as mentioned in another comment.