Since MediaWiki allows images that are clearly separated out, we should see if we can use <figure> and <figcaption>. These tags are intended for "some flow content, optionally with a caption, that is self-contained and is typically referenced as a single unit from the main flow of the document."
This would particularly make sense for thumb and frame where a caption is already shown. However, figure can also be used without figcaption, which should simplify implementation.
The main concern is backwards-compatibility with browsers that don't know about these tags. They're new to HTML5 (which is now always on in MediaWiki), but
- Older IE will not be able to style these elements from CSS
- Older IE will not be able to select these elements from JavaScript. This can be worked around by calling createElement(tagName) once for each of the relevant tag names.
For issue number 2, I think jQuery's internal createSafeFragment already takes care of old IE. So only the styling issue remains.
See also: T25932: Allow use of semantic HTML5 elements in wikitext