The parser function setHook and its various derivatives only restrict tag names to not including the following characters: "<>\r\n". This is excessively liberal and does not conform to any traditional specification. This means that one could register tag names like the following:
- <!-->, which would allow a "valid" tag of <!--This is not an HTML comment...really...or is it?-->
- <my tag>, which becomes confusing, since spaces are not normally allowed under the HTML/XML specs, and denote attributes.
- <mytag NotAnAttribute="hello">, which anybody would take to be a tag with an attribute, but could, in fact, be an entire tag unto itself, with attributes following after. This again might cause parser issues, depending on the implementation details.
- <NotSelfClosed />, which anyone would assume is a self-closed tag...until they see they closing tag </NotSelfClosed />.
In all of these cases, how the tag actually performs compared to expectations would depend on the details of the parser implementation, which is obviously not ideal.
The HTML specification limits tags to 0-9, a-z, A-Z, which has the benefit of being simple, but is probably not appropriate given that extensions may be language-specific. XML provides a much more liberal spec which would probably be more appropriate for MW to use, though it's fairly complex (see https://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name). On the bright side, the complexity is not really an issue, since it's not like you're going to be using setHook thousands of times. Even if neither of those specs is appropriate, I think we should at least restrict identifying characters like space, single- and double-quotes, equals, and slash.