Uploaded image for project: 'Bedrock Dedicated Server'
  1. Bedrock Dedicated Server
  2. BDS-17567

BDS Memory Leak in Linux 1.19.21.01

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • None
    • 1.19.21 Hotfix
    • None
    • Ubuntu 20.04 LTS, Azure B2 server, 2CPU, 4GB RAM. Also confirmed under OpenSuse 15.3, and on larger virtual servers, and different hypervisors (XCP-NG).
    • Unconfirmed

      There appears to be a significant memory leak in BDS for Linux.  The bedrock_server process continues to grow its memory utilization whenever any activity occurs, and appears to not ever release that memory.  Utilization continues to increase until all available RAM on the host is exhausted, at which point either swap kicks in (resulting in host paging/thrashing), or the process is killed by the OOM killer in the kernel, which of course releases all the memory but terminates the BDS service. Either situation results in all players being forcibly disconnected, and could result in database corruption.

      The problem appears to be related to loading chunks. It is exacerbated when teleporting.  Teleporting to a distant location can trigger significant memory allocations (on the order of 10MB per second) making this problem easier to duplicate.  It also exposes an additional aspect of this bug, which I will describe below.

      Steps to duplicate:
      1. Fresh install a current Ubuntu LTS version on a bare metal or virtual server.
      2. Download BDS 1.19.21.01. No mods, blank world. Start the process running.
      3. Observe utilization of about 300MB initially.
      4. Connect to game. Note that RAM increases slightly (that's expected of course.)
      5. Teleport to a distant location (e.g. 2000, 80, 2000).
      6. Note that RAM increases by 10-20MB within seconds.
      7. Teleport back to origin. Wait.
      8. Note that RAM is not released.
      9. Teleport to the same distant location.
      10. Observe that RAM again increases by 10-20 MB within seconds.
      11. Repeat steps 7-10. Observe that RAM continues to increase.
      12. Disconnect from game. Wait for hours. Observe that allocated RAM is never released.

      I used an Azure server with 2 CPUs, 4GB of RAM, and 32GB of disk, but this has also been tested on larger server configurations with the same result. I used Ubuntu 20 LTS as directed, but I also tested this under OpenSuse 15.3, with the same result.

      Following the above steps, I was, within the space of 10 minutes, able to more than double the process' RAM allocation to above 700MB. Clearly I could have continued teleporting, back and forth, until I brought the server down from memory exhaustion.

      This highlights a number of points:

      1. As C++ has no automated garbage collection mechanism, save for very limited scope-exit recoveries, it is necessary for the program to track and release its own memory. This clearly is not happening. An idle server with no players on it should detect this condition and release unused memory. A chunk which is no longer in use after a period of time should be written back to disk and similarly released from memory. Neither of these things are happening.

      2. If you're building an in-memory copy of loaded portions of the database (as is clearly the case here), an in-core index or other similar data structure should be used to track which chunks are already loaded, and point functions back to them. This clearly is also not happening. Only two chunks (or areas) were being visited in my test: 0,0,0 (the spawn point, or near to it), and 2000,80,2000. Yet, each visit to those same chunks in either direction, caused additional RAM to be alloc'ed by the process. Each teleport required, in my case, an additional 10-20MB of RAM to complete. This suggests that multiple copies of the same chunk(s) were being maintained in RAM (clearly without the process realizing it), which amplifies the leak: Not only is RAM not being released, but data is being duplicated in RAM, causing growth to expand at least geometrically. This may be related to why memory is not being freed: If the process isn't tracking which memory it's allocated, and loses the pointer to the allocated memory block(s), it CANNOT release them. That type of thing seems to be happening here.

      3. This obviously exposes a potential for a denial-of-service-style attack against a BDS. If a world has either a malicious operator, or if the world has set up (for example) command blocks enabling regular users to teleport, then repeated use of the teleport by players - either maliciously in quick succession - or, simply over time, if the server process continues to run - is guaranteed to speed up memory consumption and hasten the crashing of the server process itself. Again this applies to any in-server actions: the more action, the faster the RAM exhaustion appears to occur.

      Note here that this bug is NOT about teleporting itself: Memory usage increases whenever new chunks are loaded via ANY method, and such memory appears to never be released. Teleporting simply speeds up the process and the visibility of the problem. Even with teleporting disabled, BDS on Linux slowly grows in RAM size, and never releases any RAM, eventually exhausting available resources on the server causing a crash of some kind to happen.

      I've been off the test instance for an hour as I post this bug, and I firewalled it off so nobody could get in. It's still at 705.64MB - after being unused by anyone for an hour - and will continue that way until it's reset.

            Unassigned Unassigned
            glenb711 glenb711
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: