Generating upcalls fills up java CodeHeap and can lead to OutOfMemoryException #223

Closed
opened 2025-05-08 20:14:48 +02:00 by BwackNinja · 6 comments
BwackNinja commented 2025-05-08 20:14:48 +02:00 (Migrated from github.com)

This one is a bit more invasive, and I'm a bit out of my depth for fixing it myself.

More easily reproduced when setting a smaller codeheap size, using options like -XX:NonProfiledCodeHeapSize=10M -XX:ProfiledCodeHeapSize=10M -XX:NonNMethodCodeHeapSize=8M

The individual values as they change can be viewed in the Memory tab of a JMX session like JDK Mission Control (jmc). When all 3 fill up (CodeHeap 'profiled nmethods', CodeHeap 'non-nmethods', and CodeHeap 'non-profiled nmethods'), an OutOfMemoryException will be thrown.

[5323.954s][warning][codecache] CodeHeap 'non-nmethods' is full. Compiler has been disabled.
[5323.954s][warning][codecache] Try increasing the code heap size using -XX:NonNMethodCodeHeapSize=
OpenJDK 64-Bit Server VM warning: CodeHeap 'non-nmethods' is full. Compiler has been disabled.
OpenJDK 64-Bit Server VM warning: Try increasing the code heap size using -XX:NonNMethodCodeHeapSize=
CodeHeap 'non-profiled nmethods': size=119172Kb used=119171Kb max_used=119171Kb free=0Kb
 bounds [0x00007cfcf7f9f000, 0x00007cfcff400000, 0x00007cfcff400000]
CodeHeap 'profiled nmethods': size=119164Kb used=119163Kb max_used=119163Kb free=0Kb
 bounds [0x00007cfcf0400000, 0x00007cfcf785f000, 0x00007cfcf785f000]
CodeHeap 'non-nmethods': size=7424Kb used=7423Kb max_used=7423Kb free=0Kb
 bounds [0x00007cfcf785f000, 0x00007cfcf7f9f000, 0x00007cfcf7f9f000]
CodeCache: size=245760Kb, used=245757Kb, max_used=245757Kb, free=0Kb
 total_blobs=322192, nmethods=2008, adapters=1272, full_count=1
Compilation: disabled (not enough contiguous free space left), stopped_count=1, restarted_count=0
[5324.272s][warning][codecache] CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled.
[5324.272s][warning][codecache] Try increasing the code heap size using -XX:NonProfiledCodeHeapSize=
OpenJDK 64-Bit Server VM warning: CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled.
OpenJDK 64-Bit Server VM warning: Try increasing the code heap size using -XX:NonProfiledCodeHeapSize=
CodeHeap 'non-profiled nmethods': size=119172Kb used=119171Kb max_used=119171Kb free=0Kb
 bounds [0x00007cfcf7f9f000, 0x00007cfcff400000, 0x00007cfcff400000]
CodeHeap 'profiled nmethods': size=119164Kb used=119163Kb max_used=119163Kb free=0Kb
 bounds [0x00007cfcf0400000, 0x00007cfcf785f000, 0x00007cfcf785f000]
CodeHeap 'non-nmethods': size=7424Kb used=7423Kb max_used=7423Kb free=0Kb
 bounds [0x00007cfcf785f000, 0x00007cfcf7f9f000, 0x00007cfcf7f9f000]
CodeCache: size=245760Kb, used=245757Kb, max_used=245757Kb, free=0Kb
 total_blobs=322193, nmethods=2009, adapters=1272, full_count=570
Compilation: disabled (not enough contiguous free space left), stopped_count=1, restarted_count=0
java.lang.AssertionError: java.lang.InternalError: java.lang.NoSuchMethodException: no such method: java.lang.invoke.MethodHandle.linkToSpecial(Object,long,long,int,long,long,MemberName)void/invokeStatic

Classes extending io.github.jwharm.javagi.base.FunctionPointer like org.gnome.glib.SourceFunc require memory in the CodeHeap whenever they create a new upcall stub. Instead of creating a new upcall stub for call to functions like GLib.idleAdd, a generic static upcall stub can be created and the MemorySegment passed to the upcall can be used to reference the actual SourceFunc, similar to how Arenas.close_cb works.

I've tested that manually replacing calls to GLib.idleAdd with this and avoid the OutOfMemoryException:

	private static final Map<Integer, SourceFunc> FUNCS = new HashMap<>();

	/**
	 * The upcall stub for the timeouAddSeconds callback method
	 */
	public static final MemorySegment NEXT_SOURCE_FUNC;

	// Allocate the upcall stub for the SourceFunc callback method
	static {
		try {
			FunctionDescriptor _fdesc = FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.ADDRESS);
			MethodHandle _handle = MethodHandles.lookup().findStatic(SlideShowPlaylistSprite.class, "upcall",
					_fdesc.toMethodType());
			NEXT_SOURCE_FUNC = Linker.nativeLinker().upcallStub(_handle, _fdesc, Arena.global());
		} catch (NoSuchMethodException | IllegalAccessException e) {
			throw new RuntimeException(e);
		}
	}

	/**
	 * The {@code upcall} method is called from native code. The parameters
	 * are marshaled and {@link #run} is executed.
	 */
	public static int upcall(MemorySegment userData) {
		int hashCode = userData.reinterpret(ValueLayout.JAVA_INT.byteSize()).get(ValueLayout.JAVA_INT, 0);

		SourceFunc func = FUNCS.get(hashCode);
		if (func != null) {
			var _result = func.run();
			if (!_result) {
				FUNCS.remove(hashCode);
			}
			return _result ? 1 : 0;
		}
		return 0;
	}

	@FunctionalInterface
	@Generated("io.github.jwharm.JavaGI")
	public static interface SourceFunc {
		/**
		 * Specifies the type of function passed to {@link GLib#timeoutAdd},
		 * {@code GLib#timeoutAddFull}, {@link GLib#idleAdd}, and
		 * {@code GLib#idleAddFull}.
		 * <p>
		 * When calling {@link Source#setCallback}, you may need to cast a
		 * function of a different type to this type. Use {@code GLib#SOURCEFUNC} to
		 * avoid warnings about incompatible function types.
		 */
		boolean run();
	}

	static final MethodHandle g_timeout_add_seconds_full = Interop
			.downcallHandle(
					"g_timeout_add_seconds_full", FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.JAVA_INT,
							ValueLayout.JAVA_INT, ValueLayout.ADDRESS, ValueLayout.ADDRESS, ValueLayout.ADDRESS),
					false);

	public static int timeoutAddSeconds(int priority, int interval, SourceFunc function) {
		try (var _arena = Arena.ofConfined()) {
			final Arena _functionScope = Arena.ofShared();
			int _result;
			try {
				int hashCode = _functionScope.hashCode();
				FUNCS.put(hashCode, function);
				_result = (int) g_timeout_add_seconds_full.invokeExact(priority, interval,
						(MemorySegment) (function == null ? MemorySegment.NULL : NEXT_SOURCE_FUNC),
						Arenas.cacheArena(_functionScope), Arenas.CLOSE_CB_SYM);
			} catch (Throwable _err) {
				throw new AssertionError(_err);
			}
			int _returnValue = _result;
			return _returnValue;
		}
	}
This one is a bit more invasive, and I'm a bit out of my depth for fixing it myself. More easily reproduced when setting a smaller codeheap size, using options like `-XX:NonProfiledCodeHeapSize=10M -XX:ProfiledCodeHeapSize=10M -XX:NonNMethodCodeHeapSize=8M` The individual values as they change can be viewed in the Memory tab of a JMX session like JDK Mission Control (jmc). When all 3 fill up (CodeHeap 'profiled nmethods', CodeHeap 'non-nmethods', and CodeHeap 'non-profiled nmethods'), an OutOfMemoryException will be thrown. ``` [5323.954s][warning][codecache] CodeHeap 'non-nmethods' is full. Compiler has been disabled. [5323.954s][warning][codecache] Try increasing the code heap size using -XX:NonNMethodCodeHeapSize= OpenJDK 64-Bit Server VM warning: CodeHeap 'non-nmethods' is full. Compiler has been disabled. OpenJDK 64-Bit Server VM warning: Try increasing the code heap size using -XX:NonNMethodCodeHeapSize= CodeHeap 'non-profiled nmethods': size=119172Kb used=119171Kb max_used=119171Kb free=0Kb bounds [0x00007cfcf7f9f000, 0x00007cfcff400000, 0x00007cfcff400000] CodeHeap 'profiled nmethods': size=119164Kb used=119163Kb max_used=119163Kb free=0Kb bounds [0x00007cfcf0400000, 0x00007cfcf785f000, 0x00007cfcf785f000] CodeHeap 'non-nmethods': size=7424Kb used=7423Kb max_used=7423Kb free=0Kb bounds [0x00007cfcf785f000, 0x00007cfcf7f9f000, 0x00007cfcf7f9f000] CodeCache: size=245760Kb, used=245757Kb, max_used=245757Kb, free=0Kb total_blobs=322192, nmethods=2008, adapters=1272, full_count=1 Compilation: disabled (not enough contiguous free space left), stopped_count=1, restarted_count=0 [5324.272s][warning][codecache] CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled. [5324.272s][warning][codecache] Try increasing the code heap size using -XX:NonProfiledCodeHeapSize= OpenJDK 64-Bit Server VM warning: CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled. OpenJDK 64-Bit Server VM warning: Try increasing the code heap size using -XX:NonProfiledCodeHeapSize= CodeHeap 'non-profiled nmethods': size=119172Kb used=119171Kb max_used=119171Kb free=0Kb bounds [0x00007cfcf7f9f000, 0x00007cfcff400000, 0x00007cfcff400000] CodeHeap 'profiled nmethods': size=119164Kb used=119163Kb max_used=119163Kb free=0Kb bounds [0x00007cfcf0400000, 0x00007cfcf785f000, 0x00007cfcf785f000] CodeHeap 'non-nmethods': size=7424Kb used=7423Kb max_used=7423Kb free=0Kb bounds [0x00007cfcf785f000, 0x00007cfcf7f9f000, 0x00007cfcf7f9f000] CodeCache: size=245760Kb, used=245757Kb, max_used=245757Kb, free=0Kb total_blobs=322193, nmethods=2009, adapters=1272, full_count=570 Compilation: disabled (not enough contiguous free space left), stopped_count=1, restarted_count=0 java.lang.AssertionError: java.lang.InternalError: java.lang.NoSuchMethodException: no such method: java.lang.invoke.MethodHandle.linkToSpecial(Object,long,long,int,long,long,MemberName)void/invokeStatic ``` Classes extending `io.github.jwharm.javagi.base.FunctionPointer` like `org.gnome.glib.SourceFunc` require memory in the CodeHeap whenever they create a new upcall stub. Instead of creating a new upcall stub for call to functions like `GLib.idleAdd`, a generic static upcall stub can be created and the MemorySegment passed to the upcall can be used to reference the actual SourceFunc, similar to how `Arenas.close_cb` works. I've tested that manually replacing calls to GLib.idleAdd with this and avoid the OutOfMemoryException: ``` private static final Map<Integer, SourceFunc> FUNCS = new HashMap<>(); /** * The upcall stub for the timeouAddSeconds callback method */ public static final MemorySegment NEXT_SOURCE_FUNC; // Allocate the upcall stub for the SourceFunc callback method static { try { FunctionDescriptor _fdesc = FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.ADDRESS); MethodHandle _handle = MethodHandles.lookup().findStatic(SlideShowPlaylistSprite.class, "upcall", _fdesc.toMethodType()); NEXT_SOURCE_FUNC = Linker.nativeLinker().upcallStub(_handle, _fdesc, Arena.global()); } catch (NoSuchMethodException | IllegalAccessException e) { throw new RuntimeException(e); } } /** * The {@code upcall} method is called from native code. The parameters * are marshaled and {@link #run} is executed. */ public static int upcall(MemorySegment userData) { int hashCode = userData.reinterpret(ValueLayout.JAVA_INT.byteSize()).get(ValueLayout.JAVA_INT, 0); SourceFunc func = FUNCS.get(hashCode); if (func != null) { var _result = func.run(); if (!_result) { FUNCS.remove(hashCode); } return _result ? 1 : 0; } return 0; } @FunctionalInterface @Generated("io.github.jwharm.JavaGI") public static interface SourceFunc { /** * Specifies the type of function passed to {@link GLib#timeoutAdd}, * {@code GLib#timeoutAddFull}, {@link GLib#idleAdd}, and * {@code GLib#idleAddFull}. * <p> * When calling {@link Source#setCallback}, you may need to cast a * function of a different type to this type. Use {@code GLib#SOURCEFUNC} to * avoid warnings about incompatible function types. */ boolean run(); } static final MethodHandle g_timeout_add_seconds_full = Interop .downcallHandle( "g_timeout_add_seconds_full", FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.JAVA_INT, ValueLayout.JAVA_INT, ValueLayout.ADDRESS, ValueLayout.ADDRESS, ValueLayout.ADDRESS), false); public static int timeoutAddSeconds(int priority, int interval, SourceFunc function) { try (var _arena = Arena.ofConfined()) { final Arena _functionScope = Arena.ofShared(); int _result; try { int hashCode = _functionScope.hashCode(); FUNCS.put(hashCode, function); _result = (int) g_timeout_add_seconds_full.invokeExact(priority, interval, (MemorySegment) (function == null ? MemorySegment.NULL : NEXT_SOURCE_FUNC), Arenas.cacheArena(_functionScope), Arenas.CLOSE_CB_SYM); } catch (Throwable _err) { throw new AssertionError(_err); } int _returnValue = _result; return _returnValue; } } ```
jwharm commented 2025-05-08 21:13:24 +02:00 (Migrated from github.com)

Your proposed solution should work, but it will require large changes in a very complex part of the code generator. I'd really prefer not to go there... And I don't see why the current implementation wouldn't work:

  • GLib.idleAdd calls g_idle_add_full. Looking at the GIR file, the SourceFunc parameter has "notified" scope. This means the "notify" callback parameter is called immediately after the SourceFunc has finished.
  • In Java-GI, the "notify" callback is Arenas.CLOSE_CB_SYM, and it closes the arena of SourceFunc's upcall stub.
  • The JVM will deallocate the upcall stub when the arena is closed.

I think the logic is sound, so I can't really explain your OOM exception. Can you investigate if the CLOSE_CB_SYM is run for your SourceFuncs?

Your proposed solution should work, but it will require large changes in a very complex part of the code generator. I'd really prefer not to go there... And I don't see why the current implementation wouldn't work: - `GLib.idleAdd` calls `g_idle_add_full`. Looking at the GIR file, the SourceFunc parameter has "notified" scope. This means the "notify" callback parameter is called immediately after the SourceFunc has finished. - In Java-GI, the "notify" callback is `Arenas.CLOSE_CB_SYM`, and it closes the arena of SourceFunc's upcall stub. - The JVM will deallocate the upcall stub when the arena is closed. I think the logic is sound, so I can't really explain your OOM exception. Can you investigate if the `CLOSE_CB_SYM` is run for your SourceFuncs?
BwackNinja commented 2025-05-08 22:05:54 +02:00 (Migrated from github.com)

I can take another look to reconfirm. I believe that CLOSE_CB_SYM is run properly because I had regular memory issues when dealing with lambdas rather than passing in static classes as SourceFunc. Regular heap usage isn't increasing significantly -- it's the CodeHeap specifically, which by default on my machine has maximums of 116MiB 'profiled-nmethods', 7.25MiB 'non-nmethods', and 116MiB 'non-profiled nmethods'. It normally takes much longer to fill those, which is why I shared the options for limiting them. I also ran with -XX:+ClassUnloading -XX:+UseCodeCacheFlushing in hopes that those caches would clear up, but they didn't.

I can take another look to reconfirm. I believe that `CLOSE_CB_SYM` is run properly because I had regular memory issues when dealing with lambdas rather than passing in static classes as SourceFunc. Regular heap usage isn't increasing significantly -- it's the CodeHeap specifically, which by default on my machine has maximums of 116MiB 'profiled-nmethods', 7.25MiB 'non-nmethods', and 116MiB 'non-profiled nmethods'. It normally takes much longer to fill those, which is why I shared the options for limiting them. I also ran with `-XX:+ClassUnloading -XX:+UseCodeCacheFlushing` in hopes that those caches would clear up, but they didn't.
jwharm commented 2025-05-10 11:48:29 +02:00 (Migrated from github.com)

Can you provide a minimal reproducable testcase so I can investigate?

Can you provide a minimal reproducable testcase so I can investigate?
BwackNinja commented 2025-05-10 19:10:34 +02:00 (Migrated from github.com)

I'm working on a testcase now. I have one that doesn't exhibit this problem, so it looks like this may be the symptom of a different issue and not leaking generally. I'm comparing to my failing case to see where it diverges.

I should be able to isolate it in the next couple of days.

I'm working on a testcase now. I have one that doesn't exhibit this problem, so it looks like this may be the symptom of a different issue and not leaking generally. I'm comparing to my failing case to see where it diverges. I should be able to isolate it in the next couple of days.
BwackNinja commented 2025-05-12 23:08:46 +02:00 (Migrated from github.com)
package com.bwackninja;

import java.lang.foreign.MemorySegment;

import org.gnome.gdk.Paintable;
import org.gnome.gdk.Snapshot;
import org.gnome.gio.ApplicationFlags;
import org.gnome.glib.GLib;
import org.gnome.glib.Type;
import org.gnome.gobject.GObject;
import org.gnome.gtk.Application;
import org.gnome.gtk.ApplicationWindow;
import org.gnome.gtk.Fixed;
import org.gnome.gtk.Picture;
import org.gnome.gtk.Window;
import org.gnome.pango.Context;

import io.github.jwharm.javagi.gobject.types.Types;

public class Test {
	public static class Canvas extends GObject implements Paintable {
		public Canvas(MemorySegment address) {
			super(address);
		}
		
		public static Type gtype = Types.register(Canvas.class);
		
		@Override
		public void snapshot(Snapshot snapshot, double width, double height) {
			try {
				if (snapshot instanceof org.gnome.gtk.Snapshot gsnapshot) {
					gsnapshot.save();
					gsnapshot.restore();
				}
			} catch (Exception e) {
				e.printStackTrace();
			}
		}
		
		public static Canvas create() {
			Canvas ret = GObject.newInstance(Canvas.gtype);
			return ret;
		}
		
		@Override
		public int getIntrinsicWidth() {
			return 1920;
		}
		
		@Override
		public int getIntrinsicHeight() {
			return 1080;
		}
	}

	public static void runApp(Application application) {
		var window = (Window) ApplicationWindow.builder()
				.setApplication(application)
				.setDefaultWidth(1920)
				.setDefaultHeight(1080)
				.build();
		var draw = Canvas.create();
		var img = new Picture();
		img.setPaintable(draw);
		img.addTickCallback(( _, _) -> {
			draw.invalidateContents();
			return GLib.SOURCE_CONTINUE;
		});
		Fixed fixed = new Fixed();
		fixed.put(img, 0, 0);
		img.setSizeRequest(1920, 1080);
		window.setChild(fixed);
		window.present();
	}

	public static void main(String[] args) {
		var app = new Application("com.bwackninja.Test",
				ApplicationFlags.DEFAULT_FLAGS);
		app.onActivate(() -> runApp(app));
		app.run(args);
	}
}

Running this with -XX:NonProfiledCodeHeapSize=10M -XX:ProfiledCodeHeapSize=10M -XX:NonNMethodCodeHeapSize=8M will complain about a full CodeHeap within 10 minutes. Commenting out draw.invalidateContents() stays stable. A GLib.idleAdd doesn't permanently add to the CodeHeap, but will fail being unable to allocate the upcall after the CodeHeap is full. I didn't hit this issue when I forgot to add the Fixed to the Window.

``` package com.bwackninja; import java.lang.foreign.MemorySegment; import org.gnome.gdk.Paintable; import org.gnome.gdk.Snapshot; import org.gnome.gio.ApplicationFlags; import org.gnome.glib.GLib; import org.gnome.glib.Type; import org.gnome.gobject.GObject; import org.gnome.gtk.Application; import org.gnome.gtk.ApplicationWindow; import org.gnome.gtk.Fixed; import org.gnome.gtk.Picture; import org.gnome.gtk.Window; import org.gnome.pango.Context; import io.github.jwharm.javagi.gobject.types.Types; public class Test { public static class Canvas extends GObject implements Paintable { public Canvas(MemorySegment address) { super(address); } public static Type gtype = Types.register(Canvas.class); @Override public void snapshot(Snapshot snapshot, double width, double height) { try { if (snapshot instanceof org.gnome.gtk.Snapshot gsnapshot) { gsnapshot.save(); gsnapshot.restore(); } } catch (Exception e) { e.printStackTrace(); } } public static Canvas create() { Canvas ret = GObject.newInstance(Canvas.gtype); return ret; } @Override public int getIntrinsicWidth() { return 1920; } @Override public int getIntrinsicHeight() { return 1080; } } public static void runApp(Application application) { var window = (Window) ApplicationWindow.builder() .setApplication(application) .setDefaultWidth(1920) .setDefaultHeight(1080) .build(); var draw = Canvas.create(); var img = new Picture(); img.setPaintable(draw); img.addTickCallback(( _, _) -> { draw.invalidateContents(); return GLib.SOURCE_CONTINUE; }); Fixed fixed = new Fixed(); fixed.put(img, 0, 0); img.setSizeRequest(1920, 1080); window.setChild(fixed); window.present(); } public static void main(String[] args) { var app = new Application("com.bwackninja.Test", ApplicationFlags.DEFAULT_FLAGS); app.onActivate(() -> runApp(app)); app.run(args); } } ``` Running this with `-XX:NonProfiledCodeHeapSize=10M -XX:ProfiledCodeHeapSize=10M -XX:NonNMethodCodeHeapSize=8M` will complain about a full CodeHeap within 10 minutes. Commenting out `draw.invalidateContents()` stays stable. A `GLib.idleAdd` doesn't permanently add to the CodeHeap, but will fail being unable to allocate the upcall after the CodeHeap is full. I didn't hit this issue when I forgot to add the `Fixed` to the `Window`.
jwharm commented 2025-05-14 20:35:36 +02:00 (Migrated from github.com)

I'm able to reproduce the issue, but haven't found what is clogging the CodeHeap yet. I used jcmd to repeatedly dump the size and contents of the CodeHeap into a text file until the OOM occured, but I didn't see anything out of the ordinary. The amount of free space fluctuates a bit, but doesn't show a clear trend, until after 8-10 minutes it suddenly goes to zero.

I'm able to reproduce the issue, but haven't found what is clogging the CodeHeap yet. I used `jcmd` to repeatedly dump the size and contents of the CodeHeap into a text file until the OOM occured, but I didn't see anything out of the ordinary. The amount of free space fluctuates a bit, but doesn't show a clear trend, until after 8-10 minutes it suddenly goes to zero.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
java-gi/java-gi#223
No description provided.