Substitute a replacement character for invalid UTF-16 text in a TextSpan #84887

jason-simmons · 2021-06-18T22:06:46Z

Hixie · 2021-06-18T22:34:41Z

packages/flutter/test/painting/text_painter_test.dart

+    String text = 'Hello\uD83DWorld';
+    painter.text = TextSpan(text: text);
+    painter.layout();
+    expect(painter.width, greaterThan(0.0));


it should be exactly 10.0, right? assuming default font size is 10 and font is Ahem

Hixie

...I really should update this image.

Hixie · 2021-06-18T22:35:43Z

Bonus points if you mention this somewhere in the API docs, maybe for Text, TextSpan, and TextPainter in particular.

sm2017 · 2021-06-19T04:40:35Z

@jason-simmons in this line, As I understand you replace whole text with REPLACEMENT CHARACTER, Why whole text? We should replace only invalid characters

You convert Hello\uD83DWorld' to �, But I think it can be Hello�World , it's the desired output, see Specials_(Unicode_block)

U+FFFD � REPLACEMENT CHARACTER used to replace an unknown, unrecognized, or unrepresentable character

The replacement character � (often displayed as a black rhombus with a white question mark) is a symbol found in the Unicode standard at code point U+FFFD in the Specials table. It is used to indicate problems when a system is unable to render a stream of data to a correct symbol.[4] It is usually seen when the data is invalid and does not match any character:

Consider a text file containing the German word für (meaning 'for') in the ISO-8859-1 encoding (0x66 0xFC 0x72). This file is now opened with a text editor that assumes the input is UTF-8. The first and last byte are valid UTF-8 encodings of ASCII, but the middle byte (0xFC) is not a valid byte in UTF-8. Therefore, a text editor could replace this byte with the replacement character symbol to produce a valid string of Unicode code points. The whole string now displays like this: "f�r".

A poorly implemented text editor might save the replacement in UTF-8 form; the text file data will then look like this: 0x66 0xEF 0xBF 0xBD 0x72, which will be displayed in ISO-8859-1 as "fï¿½r" (this is called mojibake). Since the replacement is the same for all errors this makes it impossible to recover the original character. A better (but harder to implement) design is to preserve the original bytes, including the error, and only convert to the replacement when displaying the text. This will allow the text editor to save the original byte sequence, while still showing the error indicator to the user.

At one time the replacement character was often used when there was no glyph available in a font for that character. However most modern text rendering systems instead use a font's .notdef character, which in most cases is an empty box (or "?" or "X" in a box[5]), sometimes called a "tofu" (this browser displays �). There is no Unicode code point for this symbol.

Thus the replacement character is now only seen for encoding errors, such as invalid UTF-8. Some software attempts to hide this by translating the bytes of invalid UTF-8 to matching characters in Windows-1252 (since that is the most likely source of these errors), so that the replacement character is never seen.

Hixie · 2021-06-19T06:00:25Z

Replacing individual characters is a lot of work, I really don't think we should do that given that the whole point is to fail. If you want to replace invalid characters, that's something to do in your app.

sm2017 · 2021-06-19T07:59:50Z

@Hixie I understand, Assume I want to do it, There is many many string and Text in the application, Some Text are in the dependencies of dependencies
It's very cumbersome to do it in application layer, At least there must be a Global option to override text sanitization

Hixie · 2021-06-19T22:58:05Z

You would have to provide a central place in your application where strings were sanitized, yes.

But fundamentally, strings should be sanitized long before they reach Text. If they're not, that indicates a more fundamental problem. I don't think it's Flutter's job to provide APIs to make it easier to work around such fundamental problems.

Fixes flutter#84693

jason-simmons requested a review from Hixie June 18, 2021 22:06

flutter-dashboard bot added the framework flutter/packages/flutter repository. See also f: labels. label Jun 18, 2021

google-cla bot added the cla: yes label Jun 18, 2021

jason-simmons force-pushed the bug_84693 branch from c4f840e to 0ed403a Compare June 18, 2021 22:08

jason-simmons mentioned this pull request Jun 18, 2021

Sanitize invalid UTF-16 paragraph text instead of throwing flutter/engine#26808

Closed

Hixie reviewed Jun 18, 2021

View reviewed changes

Hixie approved these changes Jun 18, 2021

View reviewed changes

jason-simmons force-pushed the bug_84693 branch from 0ed403a to 6fbae3f Compare June 19, 2021 00:17

jason-simmons force-pushed the bug_84693 branch from 6fbae3f to 0e2ff4d Compare June 21, 2021 20:00

Piinks added the a: typography Text rendering, possibly libtxt label Jun 23, 2021

jason-simmons force-pushed the bug_84693 branch from 0e2ff4d to e77ed48 Compare June 23, 2021 22:23

Substitute a replacement character for invalid UTF-16 text in a TextSpan

4f0b502

Fixes flutter#84693

jason-simmons force-pushed the bug_84693 branch from e77ed48 to 4f0b502 Compare June 24, 2021 00:17

jason-simmons added the waiting for tree to go green label Jun 24, 2021

fluttergithubbot merged commit 3a51eb1 into flutter:master Jun 24, 2021

jason-simmons mentioned this pull request Jan 20, 2022

Remove UTF-8 conversion/UTF-16 validation from Paragraph::addText flutter/engine#30956

Closed

dnfield mentioned this pull request Jan 25, 2022

Avoid validating or decoding UTF-16 in ParagraphBuilder::addText #97183

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Substitute a replacement character for invalid UTF-16 text in a TextSpan #84887

Substitute a replacement character for invalid UTF-16 text in a TextSpan #84887

jason-simmons commented Jun 18, 2021

Hixie Jun 18, 2021

jason-simmons Jun 19, 2021

Hixie left a comment

Hixie commented Jun 18, 2021

sm2017 commented Jun 19, 2021 •

edited

Loading

Hixie commented Jun 19, 2021

sm2017 commented Jun 19, 2021 •

edited

Loading

Hixie commented Jun 19, 2021

Substitute a replacement character for invalid UTF-16 text in a TextSpan #84887

Substitute a replacement character for invalid UTF-16 text in a TextSpan #84887

Conversation

jason-simmons commented Jun 18, 2021

Hixie Jun 18, 2021

Choose a reason for hiding this comment

jason-simmons Jun 19, 2021

Choose a reason for hiding this comment

Hixie left a comment

Choose a reason for hiding this comment

Hixie commented Jun 18, 2021

sm2017 commented Jun 19, 2021 • edited Loading

Hixie commented Jun 19, 2021

sm2017 commented Jun 19, 2021 • edited Loading

Hixie commented Jun 19, 2021

sm2017 commented Jun 19, 2021 •

edited

Loading

sm2017 commented Jun 19, 2021 •

edited

Loading