Intercepting behavior with java agents

ยท

6 min read

A previous post showed using JVMTI to log method calls in a non-intrusive way, and without having to make modifications to upstream libraries. JVMTI is much more powerful than that post showed - for example it can replace and modify code in a running JVM altogether, which can be useful for things like logging or performance measurements, but also intercepting or changing behavior at runtime.

It is, however, quite cumbersome to write code for that sort of thing in C or C++ using the JNI interfaces. It turns out Java provides a higher level interface to instrument or redefine classes using the Java programming language itself. This post will demonstrate a ridiculously simple example of such an agent. You can find the example code in GitHub.

A gratuitous image of a secret agent

A simple program

Let's start off with the really simple program that we want to instrument. The Greeter class does the time honored thing of saying Hello World. We've for some reason awkwardly and weirdly moved the World part of that into a helper method. It's totally artificial, but it helps keep this example straightforward. In addition to Greeter, there's a simple Main class (not shown) that just calls Greeter.sayHello().

public class Greeter {
  private static String getName() {
    return "World";
  }

  static void sayHello() {
    System.out.printf("Hello %s\n", getName());
  }
}

It's easy and does what you'd expect. Using Bazel, here's how I build and run the program:

$ bazel run src/main/java/org/dubh/examples/agent/target:Target
Hello World

From this point on, let's assume we can't (for whatever reason) touch the code of Greeter. Being a bit selfish, I want this program to say hello to me, not the whole world. A Java agent can change the behavior without changing or recompiling Target.java or Greeter.java. I'll use it to change the implementation of getName() at runtime.

Agent basic structure

The main() method is the entry point to a Java application. Java agents have special powers to do things before main() is called, so the entry point for an agent is premain(). You're passed arguments for the agent, and an object implementing Instrumentation, which is how you access the APIs you'll need to transform classes. Our simple agent checks whether it's ok to redefine classes, and registers a ClassFileTransformer.

public static void premain(String agentArgs, Instrumentation inst) {
  if (!inst.isRedefineClassesSupported()) {
    System.err.println("ExampleAgent: not allowed to redefine classes!");
    return;
  }

  inst.addTransformer(new ClassFileTransformer() {
    @Override
    public byte[] transform(ClassLoader loader, String className, 
        Class<?> oldClazz, ProtectionDomain domain, byte[] classfileBuffer) {

      if ("org/dubh/examples/agent/target/Greeter".equals(className)) {
        return transformClass(classfileBuffer);
      }

      return null;
    }
  });
}

transform() is called when the JVM is loading a class, and provides a hook to rewrite its implementation. The className passed here is in JVM internal form (which in this simple case, just means replacing each . with a /). If we return null from this method, the class will be unaltered, which is what we want in all cases unless we're loading Greeter.

Byte code swizzling

All that remains is just to do the actual transformation. The array of bytes we were given in classfileBuffer is the original compiled code for the class in class file binary format. If you were feeling super adventurous, you could swizzle around with the bytes of this array yourself. However, it's much easier to use a library that already understands this format and lets you manipulate it. ASM is a popular library for doing just this kind of thing.

ASM makes it really easy to manipulate bytecode, but you'll still need a basic understanding of how JVM instructions work. Explaining this is beyond the scope of this post, but you can use the javap tool to look at .class files and see the instructions they contain. The body of the current getName() method looks like this:

$ javap -p -c -cp Target_deploy.jar \
    org.dubh.examples.agent.target.Greeter
  private static java.lang.String getName();
    Code:
       0: ldc           #2                  // String World
       2: areturn

It contains two instructions: The ldc operation pushes the constant value "World" on to the stack, and then the areturn instruction pops the top of the stack and returns it. We want to replace this with set of instructions that call a static method instead:

  private static java.lang.String getName();
    Code:
       0: invokestatic  #2                  // Method getNewName:()Ljava/lang/String;
       3: areturn

These new instructions consist of an invokestatic to call a getNewName() static method pushing its returned value on the stack, and an areturn like before to pop the stack and return it. Alongside the agent, we need to include the new method we want to be called, and we do that in a simple NewGreeter class that's compiled along with the agent:

public class NewGreeter {
  public static String getNewName() {
    return "Brian";
  }
}

Here's what the transformClass() method looks like with comments that hopefully explain what's going on:

private static byte[] transformClass(byte[] classfileBuffer) {
  // ClassReader knows how to grok the buffer of bytes as a Java class.
  ClassReader reader = new ClassReader(classfileBuffer);

  // ClassNode is a visitor over the things in the classfile that collects
  // them into an in-memory data structure that we can easily traverse. You
  // can also avoid creating a separate in-memory representation by just
  // implementing a simple ClassVisitor, but it often requires more code.
  ClassNode classNode = new ClassNode();
  reader.accept(classNode, Opcodes.ASM8);

  // Now ClassNode contains a data strcuture with all the things in the
  // class, and we can look through the methods for the one we care about.
  for (MethodNode method : classNode.methods) {
    // You'd maybe want to check the signature also in a real program.
    if ("getName".equals(method.name)) {
      // Method bodies contain instruction lists. Here, we create a simple
      // instruction list with two instructions - one to call a static 
      // method, and another to return whatever that static method returned.
      InsnList instructions = new InsnList();
      instructions.add(new MethodInsnNode(Opcodes.INVOKESTATIC, 
          "org/dubh/examples/agent/NewGreeter", "getNewName",
          "()Ljava/lang/String;"));
      instructions.add(new InsnNode(Opcodes.ARETURN));

      // This replaces the existing instruction list of the method with our
      // new instruction list.
      method.instructions = instructions;
    }
  }

  // ClassWriter is a visitor that knows how to traverse the data structure,
  // and write back out the bytes of a class.
  ClassWriter writer = new ClassWriter(ClassWriter.COMPUTE_FRAMES | ClassWriter.COMPUTE_MAXS);
  classNode.accept(writer);

  return writer.toByteArray();
}

Deploying and using the agent

There's one last thing we need to do in order to make our agent work. Agents must be compiled into a jar file that contains instructions about where to find the premain class and which capabilities our agent has. For this example, the MANIFEST.MF looks like the one below.

Manifest-Version: 1.0
Premain-Class: org.dubh.examples.agent.ExampleAgent
Agent-Class: org.dubh.examples.agent.ExampleAgent
Can-Redefine-Classes: true
Can-Retransform-Classes: true

If you're using Bazel, you can accomplish this using the deploy_manifest_lines attribute on java_binary, like so:

java_binary(
    name ="agent",
    runtime_deps = [ ":agent_lib" ],
    main_class = "org.dubh.examples.agent.ExampleAgent",
    deploy_manifest_lines = [
        "Premain-Class: org.dubh.examples.agent.ExampleAgent",
        "Agent-Class: org.dubh.examples.agent.ExampleAgent",
        "Can-Redefine-Classes: true",
        "Can-Retransform-Classes: true",
    ]
)

With this in place, let's try running our program with and without the agent. We use the -javaagent argument to java to tell it where our agent jar is.

$ cd bazel-bin/src/main/java/org/dubh/examples/agent
$ java -jar target/Target_deploy.jar
Hello world
$ java -javaagent:agent_deploy.jar -jar target/Target_deploy.jar
Hello Brian

It works!

Summing up

This is a fairly trivial example of how to write a Java agent, and there's lots more to dive into for complex things. At the core though, this setup of using ASM to rewrite bytes is a template for much more complicated things. I missed out a few details around Bazel in the interest of making the post as simple as possible, but you can play around with the full example in the javaagent github project. Hope this has been useful. I'd love to hear about the kinds of problems you're solving with Java agents in the comments :)

ย