Fix a Memory Leak of Metal Backend of Leela Chess Zero
I fixed a memory leak of Metal backend of Leela Chess Zero (lc0). Let’s see how I did it.
Get lc0 Metal backend source code
The lc0 Metal backend is developed in almaudoh’s new-metal-backend-mpsgraph
branch. I cloned the source code with the following commands.
git clone -b new-metal-backend-mpsgraph --recurse-submodules https://github.com/almaudoh/lc0.git lc0-almaudoh
cd lc0-almaudoh
git checkout 055e19e4081a32fdb763569375eaa05c7a2be37f
It will create a subdirectory lc0-almaudoh
in the current directory, and download the source code to lc0-almaudoh/
. Then, it will checkout the revision 055e19e4081a32fdb763569375eaa05c7a2be37f
that has a memory leak.
Generate an Xcode project from lc0
I detect memory leaks by Xcode Instruments, so I need to generate an Xcode project from lc0. The commands are shown as below.
cd lc0-almaudoh
meson setup build --backend xcode
It will create a subdirectory build
in the lc0-almaudoh
directory, and generate Xcode project files into build/lc0.xcodeproj
.
Configure lc0 Xcode project
The generated Xcode project has not been configured with:
Therefore, I manually select the scheme of lc0 executable, and specify a launch argument. The steps are shown as follows.
In Terminal, open the generated Xcode project by the following command:
open build/lc0.xcodeproj
It will open Xcode graphical user interface (GUI).
Scheme of lc0 executable
In Xcode GUI, click the following items.
- Product -> Scheme -> lc0@exe
It selects the scheme of lc0 executable.
Argument passed on launch
In Xcode GUI, click the following items.
- Product -> Scheme -> Edit Scheme…
- Run -> Arguments -> +
- Enter “
-b metal
” -> Close
It will set lc0’s backend to Metal.
Hack universal chess interface
The memory leak can be easily detected by running lc0 infinitely. Unfortunately, Xcode Instruments cannot run lc0 infinitely because of the following facts:
- The lc0 waits for commands from standard input stream.
- Xcode Instruments does not seem to allow a user entering commands to standard input stream.
- Passing a command file to standard input stream makes lc0 terminate after lc0 processed all commands in the command file.
Therefore, I hacked universal chess interface, so that lc0 can receive a command that makes lc0 search infinitely on launch.
I modified the source code as follows.
src/chess/uciloop.cc:132:
void UciLoop::RunLoop() {
std::cout.setf(std::ios::unitbuf);
std::string line;
line = "go infinite";
LOGFILE << ">> " << line;
try {
auto command = ParseCommand(line);
DispatchCommand(command.first, command.second);
} catch (Exception& ex) {
SendResponse(std::string("error ") + ex.what());
}
while (std::getline(std::cin, line)) {
LOGFILE << ">> " << line;
try {
auto command = ParseCommand(line);
// Ignore empty line.
if (command.first.empty()) continue;
if (!DispatchCommand(command.first, command.second)) break;
} catch (Exception& ex) {
SendResponse(std::string("error ") + ex.what());
}
}
}
It will let lc0 process an additional command go infinite
before reading data from standard input stream.
Build lc0 executable
The lc0 executable is ready to be built from the source code. I built lc0 executable by the following steps.
- Product -> Build
It will build lc0 executable to build/debug/lc0
.
Download leela chess zero network
The lc0 evaluates a chess move by neural network, so I need to download a neural network to let lc0 use. The steps are shown as below.
- In Safari, download this file:
http://training.lczero.org/get_network?sha=82d14d7d8a4f00826f269901d5e31df1a7b2112c20604dc8bee4008271db4d88
. - Move the downloaded file to the directory
build/debug/
.
It downloads last T60 320x24 network 606511 to the directory that contains lc0 executable, so lc0 can find the network.
Run Xcode Instruments
Xcode project has been configured correctly. And, lc0 can run Metal backend with neural network infinitely. Now, I can run Xcode Instruments to find memory leaks of Metal backend of lc0. The steps are shown as follows.
- Product -> Profile
- All -> Leaks -> Choose
- Record
It runs lc0 and collect information of memory allocations and leaks. I saw the result as follows.
It shows that memory usage is infinitely increasing, and the function runInferenceWithBatchSize
has used 1366.22 MB. I double-clicked the function runInferenceWithBatchSize
to see more details inside it.
It shows that 100% memory is using by the following source code.
_resultDataDictionary = [_graph runWithFeeds:@{_inputTensor : _inputTensorData}
targetTensors:_targetTensors
targetOperations:nil];
Fix memory leak
In the previous section, we know which source code generates a memory leak. In this section, I will fix the memory leak.
The memory leak can be fixed by my pull request, in which I made the following changes.
- Release
_inputTensorData
explicitly. - Surround the function that has memory leaks by an autorelease pool.
To checkout my pull request locally, we can follow the steps below.
In Terminal, enter the following commands:
git stash
git fetch origin pull/2/head:fix-memory-leak
git checkout fix-memory-leak
git stash pop
It will stash changes in the working directory. Then, it will fetch the pull request to a new branch fix-memory-leak
. Finally, it will checkout the branch fix-memory-leak
, and pop the stashed changes.
I reran Xcode Profiler to see whether the memory leak is fixed. The result is shown as follows.
It shows that the memory usage doesn’t increase infinitely now.
Remarks
There is another memory leak that is reported by Xcode Profiler, but it has not been fixed in my pull request. Can you find the memory leak and fix it?