Pixels per degree (PPD) alone is not a reliable predictor for high-resolution experience in VR and AR. This is because "high-resolution experience" not only depends on PPD but also display fill factor, pixel arrangement, graphics rendering, and other factors. This complicates architecture decisions and design comparisons. Is there a simple way to capture all the contributors and match user experience? In this paper, we present a system level model, system MTF, to predict perceptual quality considering all the key VR/AR dimensions: pixel shape (display), pixel per degree (display), fill factor (display), optical blur (Optics), and image processing (graphics pipeline). The metric can be defined in much the same way of traditional MTF for imaging systems by examining image formation of a point source and then performing Fourier transform over the response function, with special mathematical treatment. One application is presented on perceived text quality, where two weight functions depending on text orientation and frequency incorporated into the above model. A perceptual study about text quality was performed to validate the system MTF model.