This bug is probably related to MC-221987.
Although the server.properties file only supports ASCII characters, it is possible to add larger characters to a server's MOTD by using the sequence \uXXXX (where XXXX are two hexadecimal bytes). This allows using characters with a length of up to two bytes.
Expected behavior
I could not find any official information on what encoding strings in query responses should use. Dinnerbone's mcstatus python program refers to the encoding as ASCII (https://github.com/Dinnerbone/mcstatus/blob/master/mcstatus/querier.py#L136 ) but actually uses ISO-8859-1 (https://github.com/Dinnerbone/mcstatus/blob/master/mcstatus/protocol/connection.py#L77).
The expected behavior depends on what the actual character encoding is supposed to be, but generally, characters in query responses should be converted to the correct encoding or filtered if they can't be represented in the correct encoding.
Bug
Only the last byte of 2-byte-characters is sent in query responses (e.g. \uABCD => CD). This generally results in characters longer than 8 bits being unreadable/corrupted in query responses. Additionally, this results in 2-byte-characters that end on 00 (like ᴀ = \u1D00) being sent as 00, which, due to query responses using null-terminated string, results in the whole query response being unreadable.
Reproduce
Create a server
Change the MOTD to
This character will break the query protocol\:\u1D00Try using the mcstatus CLI
mcstatus 0.0.0.0:25565 querySince the query response is unreadable, a python error is displayed
A hex dump of the query response data shows the additional null-byte:
00000000 00 00 00 00 01 54 68 69 73 20 63 68 61 72 61 63 |.....This charac|
00000010 74 65 72 20 77 69 6c 6c 20 62 72 65 61 6b 20 74 |ter will break t|
00000020 68 65 20 71 75 65 72 79 20 70 72 6f 74 6f 63 6f |he query protoco|
00000030 6c 3a 00 00 53 4d 50 00 77 6f 72 6c 64 00 30 00 |l:..SMP.world.0.|
00000040 32 30 00 dd 63 31 32 37 2e 30 2e 31 2e 31 00 |20..c127.0.1.1.|
0000004f
Linked issues
Attachments
Comments 2
I was not able to test this myself yet, but I’d consider the new behavior a fix for this issue. Breaking compatibility with old implementations could be a bug, but there honestly is no good way to fix the problem without breaking old implementations. It might be possible to just strip all Unicode characters, but I think that would arguably be a worse solution. ISO-8859-1 characters outside the ASCII range no longer working is unfortunate, but I think that should be reported as a separate issue and probably can’t be fixed reasonably.
Can confirm this is still an issue in 1.21.11, although the behavior has changed. In 1.21.11, the server no longer truncates 2-byte characters to a single byte (which previously caused the fatal
\x00null byte injection and broke the packet structure). Instead, the server now encodes MOTD strings as UTF-8.However, because the Query Protocol historically uses ISO-8859-1 (Latin-1) strings, standard query clients read these UTF-8 bytes as Latin-1. This means characters are still visually corrupted/unreadable as described in the original report (
\u1D00appears asá´€). It also introduces a visual regression where standard Latin-1 characters like£that used to work fine are now corrupted into£