mojira.dev
MC-231035

2-byte-characters in MOTD can break query responses

This bug is probably related to MC-221987.

Although the server.properties file only supports ASCII characters, it is possible to add larger characters to a server's MOTD by using the sequence \uXXXX (where XXXX are two hexadecimal bytes). This allows using characters with a length of up to two bytes.

Expected behavior

I could not find any official information on what encoding strings in query responses should use. Dinnerbone's mcstatus python program refers to the encoding as ASCII (https://github.com/Dinnerbone/mcstatus/blob/master/mcstatus/querier.py#L136 ) but actually uses ISO-8859-1 (https://github.com/Dinnerbone/mcstatus/blob/master/mcstatus/protocol/connection.py#L77).

The expected behavior depends on what the actual character encoding is supposed to be, but generally, characters in query responses should be converted to the correct encoding or filtered if they can't be represented in the correct encoding.

Bug

Only the last byte of 2-byte-characters is sent in query responses (e.g. \uABCD => CD). This generally results in characters longer than 8 bits being unreadable/corrupted in query responses. Additionally, this results in 2-byte-characters that end on 00 (like ᴀ = \u1D00) being sent as 00, which, due to query responses using null-terminated string, results in the whole query response being unreadable.

Reproduce

  1. Create a server

  2. Change the MOTD to

    This character will break the query protocol\:\u1D00
  3. Try using the mcstatus CLI

    mcstatus 0.0.0.0:25565 query
  4. Since the query response is unreadable, a python error is displayed

 

A hex dump of the query response data shows the additional null-byte:

 

00000000  00 00 00 00 01 54 68 69  73 20 63 68 61 72 61 63  |.....This charac|
00000010  74 65 72 20 77 69 6c 6c  20 62 72 65 61 6b 20 74  |ter will break t|
00000020  68 65 20 71 75 65 72 79  20 70 72 6f 74 6f 63 6f  |he query protoco|
00000030  6c 3a 00 00 53 4d 50 00  77 6f 72 6c 64 00 30 00  |l:..SMP.world.0.|
00000040  32 30 00 dd 63 31 32 37  2e 30 2e 31 2e 31 00     |20..c127.0.1.1.|
0000004f

 

Linked issues

Attachments

Comments 2

Can confirm this is still an issue in 1.21.11, although the behavior has changed. In 1.21.11, the server no longer truncates 2-byte characters to a single byte (which previously caused the fatal \x00 null byte injection and broke the packet structure). Instead, the server now encodes MOTD strings as UTF-8.

However, because the Query Protocol historically uses ISO-8859-1 (Latin-1) strings, standard query clients read these UTF-8 bytes as Latin-1. This means characters are still visually corrupted/unreadable as described in the original report (\u1D00 appears as á´€). It also introduces a visual regression where standard Latin-1 characters like £ that used to work fine are now corrupted into £

I was not able to test this myself yet, but I’d consider the new behavior a fix for this issue. Breaking compatibility with old implementations could be a bug, but there honestly is no good way to fix the problem without breaking old implementations. It might be possible to just strip all Unicode characters, but I think that would arguably be a worse solution. ISO-8859-1 characters outside the ASCII range no longer working is unfortunate, but I think that should be reported as a separate issue and probably can’t be fixed reasonably.

KurtThiemann

(Unassigned)

Community Consensus

Platform

Low

Networking

1.17, 1.17.1 Release Candidate 1, 1.17.1, 1.21.11

Retrieved