Duffblog

Showstopper!

Brian Duff — Wed, 15 Nov 2023 08:16:45 GMT

I've changed companies again! I seem to be doing this a lot lately, but there are (as always) good reasons for it. This time around, the nice folks at LinkedIn reached out about a Distinguished Engineer role they had in the developer productivity space. It seemed like such a perfect fit for me, that I couldn't say no. I had a tremendously rewarding experience going through the interview process and got to meet many of LinkedIn's most senior technical leaders. They were direct and candid, and I liked talking with them a lot.

One thing that's especially interesting about this particular job is that I finally (in an indirect kind of way) also now work for Microsoft. I didn't contemplate this much during the interview process, but since I accepted and started sharing the news with folks, several people reacted, "Ah... that means you're working at Microsoft now?".

Back in 1994, when I was just starting my first year of University, I read "Showstopper!" by G. Pascal Zachary. It painted an honestly slightly scary picture of the experience of several engineers with very little in the way of what we'd call work / life balance today at Microsoft during the time when Windows NT was coming into existence. At that time, and in that very different world, I was a huge fan of Microsoft. This book, despite depicting such harrowing things as a person who got divorced because they were working too hard, had a weird effect - it ignited me on a journey that would lead to 25 years of working at various technology companies, and propel me thousands of miles away from home to Silicon Valley.

There was something compelling, not about the workaholism, but about the sheer passion with which those individuals and teams took on their work. I wanted nothing more than to be a part of something monumental like that. In some strange way, hard work doesn't feel like that when you're doing things that are truly meaningful and when you're enfranchised to make a difference. Maybe not coincidentally, I'm feeling that way about my new job, early as it is.

I'm in a very, very different place now, and Microsoft is in a very, very different place, and I don't actually work in Microsoft, even though Microsoft is the parent company of the place I do work. I get to visit Redmond in a few weeks as an extended employee for a summit, and it's something I'm awfully excited about. Even though I've never been there before, it's oddly like coming home to a formative idea of what it was like to work in the software industry.

RTO at the Googleplex

Brian Duff — Sat, 08 Jul 2023 09:12:00 GMT

I've been back at Google for a year already - it's hard to believe the time has passed so quickly. It will soon also be the 15th anniversary since the day I first joined Google in 2008.

A lot has changed. It's a bit of an eerie experience at times - the company itself is very different, but there's also this uncanny feeling when I walk around the hallways, especially at the Mountain View campus.

I found myself the other day wandering around in building 900, which is the building that housed the DevTools team when I joined in 2008 (they'd just moved there from the beating heart of Google in Building 43 on the Googleplex, and I remember some were a bit unhappy about that shift and what it meant). This was during a golden period for DevTools - Blaze was quickly becoming the main build tool for google3, and the years immediately following it would see quite significant improvements with Forge and Objfs making google3 into quite the impressive (and scalable) monorepo. It was also the year that Chrome launched, and learning about it and playing with it before its release was quite exciting.

Like most Google buildings, 900 has changed enormously since 2008 on the inside. It's pleasant (REWS, the real estate folks at Google, do a tremendous job of making the buildings very stylish - things felt much more cheap & cheerful and weirdly less well-lit when I joined), but entire office configurations have disappeared. The area where I sat when I first joined is now a bit of a hallway. The nearby micro kitchen is still in the same place, and it feels quite familiar, albeit with far fewer snacks than it had 15 years ago.

There's a window on the second floor of 900. It's designed in a way that looking out of it feels a bit like peering out of the laser turret of the Millenium Falcon. I remember back in 2008 the window used to be always covered in dusty cobwebs. The atrium it was in was kind of dark, had a broken down arcade machine, and a bright red sofa with scratchy material of the kind that you could find all over the Googleplex in those days. I used to go there to eat MK snacks (again, they had quite yummy ones at the time) and maybe read a bit of the late Bob Lee's book on Guice. Because Guice was the hotness then. Now it's sparkly clean, bright, and full of modern expensive office furniture. When I sat there the other day I sort of felt the ghost of my old self and that old place and company and the magic it had. I overheard a conversation between two Googlers bemoaning the recent RTO announcement and how all that time spent commuting would be such a waste of time they could spend being productive.

I have mixed feelings about the enforced RTO thing - I'm a parent so I enjoy the flexibility of hybrid (while simultaneously realizing that we always had a fair amount of flexibility, but now it's sort of enforced). I'm also a dinosaur, so I'm just used to being in an office and collaborating with people by, like, talking with them.

When I heard that conversation though, I felt a mild sense of loss about a time when we were just excited beyond belief to be here at Google. It'll never again I think have quite the same sense of magic and specialness as it did in those days, and that's ok. While knowing that it's different now, I'm glad to have lived in that time, and it has certainly been nice coming back and getting the chance to see it all once again.

Here be dragons

Brian Duff — Tue, 24 May 2022 17:00:23 GMT

When you first join a new org or company, draw a map.

Months into some roles, I've made faulty assumptions due to my understanding of organizational relationships. I'd wonder why a function was inefficiently split across two orgs, only to find that it gained a different kind of efficiency with that split. I'd ponder why there was no team to solve some problem, when in fact there was, but they weren't solving it, just "licking the cookie".

I'm working on getting better at drawing organizational maps. It's just a bunch of diagrams, notes, and annotations about who does what at an individual and team level, and how things are related.

Figure out where the unexplored territories are. Identify who to talk to in those teams. Mark up parts of it with "here be dragons" based on the conversations you have. Form questions to ask, or ideas about taming the dragons. Your first versions of the map will look like a crude cave picture six months from now. The orgs and people will change. It's ok - keep revising the map as you learn more.

There's usually more than one way to visualize the teams and individuals that make up an organization. Don't be satisfied with org charts alone as a primary structure. The real value is in the "shadow graph" through which influence and support flow.

Instinct

Brian Duff — Fri, 20 May 2022 18:21:04 GMT

It just doesn't feel right.

Your instinct is telling you it's wrong.

But you don't trust it. You're rational. You believe in the scientific method. Cause and effect. Engineers don't operate on instinct. We're data driven. We look for root causes. We analyze and determine outcomes.

It took me a while to learn the importance of instinct in technical work. An instinctive sense is often the result of accumulated experience. I can't always immediately explain the link between something I've experienced before and an instinctive insight. When I changed my perspective of instinct and began seeing it as an incredible tool, I became a much better engineer. Instinct can hugely help to reduce the investigation space when trying to root cause. It can help you to sense smells and identify opportunities for improvement.

It's only part of the story. Convincing others on technical matters with instinct alone is a difficult path. Technical influence rightly prioritizes evidence with data. As a technical contributor, evidence should be key to your own decision making. Instinct can result from bias. Bias is also a product of accumulated experience, but perhaps more from your internal voice. Evidence helps you to check those biases.

Use instinct as a shortcut to trim the solution space, or find different ways to think about a problem. After that, go do the (often) grungy work of actually finding and presenting the data that proves or disproves your instinct before trying to convince others.

Feedback

Brian Duff — Tue, 04 Jan 2022 20:13:34 GMT

You feel like you did OK, but the outcome isn't what you were anticipating.

It often means you didn't do as well as you thought. There may be a slight, but correctable gap between your situational perception and reality.

Sometimes external factors mean that no matter what you did, the outcome was going to be the same. However, when success is truly related to how you interact, you have a lot of control. I think it's quite healthy to believe that you have control in most such situations. It leads to a reflection and improvement instead of a feeling of fatality due to external circumstances.

From time to time, others will give you direct feedback about where you fell short. It can be hard to hear, but it's almost always a gift. I've had this happen in interview situations, or in fit calls with candidates where they ultimately decided to choose another company or team. When you get such feedback, the extent to which you have to guess is reduced. It's enormously helpful emotionally, because guessing can sometimes lead to quite a lot of self doubt.

There have been times in the past where I did a bad job of accepting such feedback even though it was spot on. Ego is a terrible trap, but the good news is that if you're open to hearing feedback and genuinely want to get better, it gets easier to hear what you need to hear.

Giving feedback is a great service you can provide to others. The best mentors and leaders consistently provide such feedback in a constructive way. Nowadays, I feel blessed when people directly give me feedback with candor. I appreciate it much more than invoking The System that's preventing a favorable outcome, whatever that may be: the promotion quotas, or the bell curve, or the hiring constraints. It's almost always easy to deconstruct "it's the system" rationales and prove they're an excuse with counter examples.

I don't like making resolutions that much - the new year is such an arbitrary excuse. That said, one thing I'd like to do more of this year is to look for opportunities to give better feedback to others, and to avoid the urge to blame systems.

Data Studio Custom Visualizations using React

Brian Duff — Wed, 24 Nov 2021 00:53:01 GMT

Google Data Studio is pretty easy to use for simple analytics dashboarding. Its standard visualization components are pretty comprehensive for most of the simple problems I try to solve. When you want to do something more complex, you can build custom visualizations with HTML or SVG using Community Visualizations .

The documentation for writing visualizations is great, but the examples involve direct DOM manipulation:

var chartElement = document.createElement('div');chartElement.id = 'myViz';document.body.appendChild(chartElement);

I recently worked on a complex visualization for a test flakiness dashboard, and found it convenient to use React and JSX instead of hand crafting DOM nodes. In this post, I'll develop a simple community visualization that uses React. In a later blog, I'll expand this example to show how it can populate with a datasource from Data Studio.

You can find the code for this example over on brianduff/datastudio-react on GitHub.

Setting up

I'm going to use TypeScript, but you can use plain old JavaScript if you want.

npm init -y# Install the DSCC library and reactnpm install @google/dscc react react-domnpm install --save-dev @types/react @types/react-dom# Install and initialize TypeScript (optional)npm install --save-dev typescriptnpx tsc init

Among the things this generates is the tsconfig.json file. We'll make a few minor changes to it. Here's the complete file:

// TODO: more stuff to add here.{  "compilerOptions": {    "target": "es2016",    "module": "commonjs",    "esModuleInterop": true,    "forceConsistentCasingInFileNames": true,    "strict": true,    "skipLibCheck": true,    // Enable jsx    "jsx": "react-jsx",    // Send output to a different folder    "outDir": "built-tsc"  },  // Tell tsc to find our code in a folder called src.  "include": ["src"]}

Create the visualization

Let's create a really simple visualization component to get started in src/Hello.tsx:

export function Hello() {  return <div>Hello!div>}

Now, we'll create the entry point that integrates with Data Studio's DSCC library to embed our component in src/index.tsc. There's a bit of unavoidable DOM nastiness here, but it's confined this one place.

import { ObjectFormat, objectTransform, subscribeToData } from '@google/dscc'import { Hello } from './Hello'import ReactDOM from 'react-dom'export function drawViz(data: ObjectFormat) {  // Insert or replace the visualization element  let element = document.getElementById('viz')  if (element) {    element.parentNode?.removeChild(element)  }  element = document.createElement('div')  element.setAttribute("id", "viz")  document.body.appendChild(element)  // Actually render our component  ReactDOM.render(<Hello />, element)}// Connect our drawViz function to Data StudiosubscribeToData(drawViz, { transform: objectTransform })

At this point, we can check that the code compiles using npx tsc. It should emit a built-tsc folder containing JavaScript versions of the above code.

Bundling with webpack

In order to actually deploy the visualization, Data Studio needs it to be bundled up into a self-contained single JavaScript file along with all of its dependencies. For us, this includes the react and jsx runtimes as well as the DSCC library itself. Webpack is designed exactly for this kind of usecase, so we'll use it here.

Let's install webpack:

npm install --save-dev webpack webpack-cli

Then, we'll create a simple webpack config file in webpack.config.js:

const path = require('path')const isProduction = process.env.NODE_ENV == 'production'const config = {  entry: './built-tsc/index.js',  output: {    path: path.resolve(__dirname, 'dist'),    filename: 'viz.js',  }}module.exports = () => {  config.mode = isProduction ? "production" : "development"  return config}

We can now generate a single dist/viz.js containing the visualization and its dependencies:

npx webpack

Writing the manifest and config

When we deploy custom components for Data Studio, we can actually deploy a set of components in a library. We describe the whole set of components in a manifest file, and each individual component includes a config file. Let's go ahead and create these config files for our component.

manifest.json describes our whole "library" of components. We'll come back to BUCKETNAME and what it means later.

{  "name": "Example Visualizations",  "organization": "Brian Duff",  "description": "Example visualizations using React and JSX",  "logoUrl": "https://img.icons8.com/dotty/80/000000/view-file.png",  "packageUrl": "https://duff.blog",  "supportUrl": "https://duff.blog",  "components": [    {      "id": "hello",      "name": "Hello",      "description": "Just says hello",      "iconUrl": "https://img.icons8.com/material/24/000000/hello.png",      "resource": {        "js": "gs://BUCKETNAME/hello/viz.js",        "config": "gs://BUCKETNAME/hello/hello.config.json"      }    }  ]}

And hello.config.json describes this simple component we've created. For now, we'll create a config with a simple dimension. We'll customize this in later blog posts.

{  "data": [    {      "id": "data",      "label": "Data",      "elements": [        {          "id": "someDimension",          "label": "A Dimension",          "type": "DIMENSION"        }      ]    }  ]}

Deploying to Google Cloud Storage

Data Studio loads custom components from Google Cloud Storage. To deploy our custom component, we must upload the manifest.json, vis.js, and hello.config.json files to a Cloud Storage bucket. Create a new bucket following the instructions in Creating a storage bucket, and make a note of your bucket name. You'll want to modify manifest.json to replace BUCKETNAME with the actual name of the bucket you created.

After that, you can upload using the gsutil command, which you should have installed as part of the bucket creation instructions.

First, you'll want to change the ACLs of the bucket so that they allow public access. New buckets in GCS are created with uniform bucket-level access, which means you can't set permissions for individual files in the bucket (or the bucket itself), and are not visible to public by default. The instructions for Data Studio haven't been updated yet to account for this; the -a public-read option to gsutil cp won't work. Let's go ahead and make our new bucket visible to the public (replace BUCKETNAME with the name of your bucket):

gsutil iam ch allUsers:objectViewer gs://BUCKETNAME

Now, let's organize our files the way we want to upload them:

mkdir -p deploy/hello && \  cp manifest.json deploy/ && \  cp dist/viz.js deploy/hello/ && \  cp hello.config.json deploy/hello/

After that, we can copy the files up to GCS like so:

gsutil cp -r deploy/* gs://BUCKETNAME

Trying it out

We should be able to try the component out in Data Studio now. Go over to https://datastudio.google.com and create a new blank report. Add a data source to your report (it can just be a Google Sheet). Click on the Community Visualizations and Components toolbar button:

Click "+ Explore More", then click the "Build your own visualization" button in the Community Gallery. In the Manifest path, type the bucket URL of your manifest:

Note here, that you're entering the bucket path (i.e. gs://BUCKETNAME), not the path of the actual manifest.json file within it (e.g. gs://BUCKETNAME/manifest.json). This tripped me up when I first tried this out, and the error message is quite opaque.

Click on the Hello component and grant it permission, and you should see it render in your report:

Scripts to make life easier

Let's make things a bit easier for future development by adding some scripts to package.json. In the scripts section of this file, we'll add the following:

  "scripts": {    "build": "npx tsc",    "prepare": "mkdir -p deploy/hello && cp manifest.json deploy/ && cp dist/viz.js deploy/hello/ && cp hello.config.json deploy/hello/",    "deploy": "npm run build && npm run prepare && gsutil cp -r deploy/* gs://BUCKETNAME"  },

Now we can build and deploy in one step with npm run deploy.

Wrapping up

Phew, that was a lot for a glorified Hello World :P However, now it's pretty easy to write quite complex custom visualization components using react that work well with Data Studio. In future blogs, I'll explore more of what you can do.

The code for this simple project might serve as a good starting point for your own visualizations. You can grab the code at brianduff/datastudio-react, and let me know in the comments if you run into any issues or have questions!

Cloud Functions with TypeScript

Brian Duff — Sun, 24 Oct 2021 09:07:17 GMT

Here's a small sample that uses TypeScript + Cloud Functions

I have a complicated relationship with JavaScript. I'm old enough to remember when it first appeared in Netscape Navigator back when the web was truly tiny and everything had a kind of boring gray background, and the days when "Dynamic HTML" started to become a thing. It was awful back then. Truly, terribly awful.

At some point in the last few years, I rediscovered JavaScript through the world of Node.js and React, and I can't really imagine using anything other than React to build web apps these days (I tried Vue.js for at least one project, and found it quite lacking).

But the lack of strong typing is bothersome. I successfully strolled along for a while trying to get away with writing React and Node apps without configuring TypeScript, but now I'm all in on TypeScript.

When it comes to the backend, I oscillate a bit between using Rust+Rocket and Node.js with Express. But recently, I've been poking around increasingly with Google's Cloud Functions. The out of the box instructions though don't do a very good job of explaining how to get things working with TypeScript (there are some instructions, but they're for Firebase, and it's kind of similar, but different enough that it doesn't quite work if you just use Google Cloud directly.

It turns out to be relatively easy to set up, so I wrote it down for posterity.

Setting up the project

First off do the usual npm initialization dance with npm init -y.

We're gonna want to install some deps. You'll find the functions-framework from Google useful, since it'll let you do local development with Cloud Functions, and also provides types. Additionally as always, we'll want typescript itself as a dev dependency:

npm install --save-dev @google-cloud/functions-framework typescript

Next, let's set up TypeScript. This'll create a tsconfig.json file.

npx tsc --init

I like to make put my code in a src folder, and have it output to a separate ts-built folder. Here's a minimal tsconfig.json file that will achieve that:

{  "compilerOptions": {    "target": "es6",    "module": "commonjs",    "moduleResolution": "node",    "esModuleInterop": true,    "rootDir": "src",    "outDir": "ts-built",  }}

Next, let's write a simple cloud function in src/index.ts. Sssh, we're not using any types yet, but as usual with TypeScript, that's ok:

// src/index.tsexport const hello = (req, res) => res.send("Hello!")

Let's check if it compiles:

npx tsc

If all is well, you should see a ts-built/index.js file that contains the compiled JavaScript code.

It'd be cool to see if our Cloud Function is working with the framework. Before we can do that, we need to tell Cloud Functions where to find our index.js file. Update package.json to set the main property to its path:

{...   "main": "ts-built/index.js"...}

Now, we can run the framework:

$ npx functions-framework --target=helloServing function...Function: helloSignature type: httpURL: http://localhost:8080/

If you visit the localhost URL, you should be greeted with the expected friendly message.

Deploying to Google Cloud

When we deploy this to Google Cloud, it will run npm ci to build the project. That's just a more efficient way to invoke npm install. This isn't enough - we need it to also run the TypeScript compiler, otherwise index.js won't exist when the code is up in the Cloud. Luckily, Google Cloud provides a hook to perform extra steps during the build. In package.json, we need to add a gcp-build script. We'll add a build script and a start script to make local development easier too:

{  "scripts": {    "gcp-build": "npm run build",    "build": "tsc",    "start": "npm run build && npx @google-cloud/functions-framework --target=hello"  }}

Before deploying, it's a good idea to check that it works with npm run gcp-build.

To deploy, you'll need to create a Google Cloud project and enable the APIs - the stuff in the "Before you begin" section in this document. You'll also need to install the Cloud SDK locally so you can use the gcloud command. You'll also need to authorize and select your project using (replace PROJECTNAME):

gcloud auth logingcloud config set project PROJECTNAME

Now you can deploy with:

gcloud functions deploy simple-function --entry-point hello \    --allow-unauthenticated --trigger-http --runtime nodejs16

It'll churn away for a wee while doing its thing, but after it's done you should be able to go to the console to find out the URL of your new function. In my case, this is https://us-central1-dubh-cloud.cloudfunctions.net/simple-function

Actually using types!

Ok, so far we've got it working end to end with TypeScript, but we didn't actually use types. You don't have to, of course, but if you want to, you can be more specific. For example, the functions-framework defines a type called HttpFunction for cloud functions themselves, and we could write:

import { HttpFunction } from '@google-cloud/functions-framework';export const hello: HttpFunction = (req, res) => res.send("Hello!")

The actual request and response are express types. I'll write an example in a future blog that takes advantage of TypeScript to do much more interesting things.

Lessons from a toy project: Heimdall

Brian Duff — Thu, 29 Apr 2021 05:19:08 GMT

https://soundcloud.com/brian-duff-467458622/duffblog-heimdall

After the kids are fast asleep, I often tinker around with toy projects. I have a long history going back to when I first started programming at 8 or so of starting and generally never finishing such projects, but they're inevitably useful for learning new things.

My kids, like many others, have been zooming into school from home for most of the last year. At the start of the lockdown, they were 6 and 7. We'd rarely allowed them to use a computer unattended before. All of a sudden, they were sitting at a desk for 6 hours a day.

They quickly discovered things like YouTube, and online web based games. There were good things too (Michael became inordinately good at Chess), but we sometimes worried. We especially wanted to limit their access to the computers outside of school hours, because even after 6 hours, we'd often find them continuing to use their computers for a couple more hours after school.

We set up an elaborate system involving Blocksite, and using Google WiFi to block internet access at certain times. However, this wasn't enough: Google WiFi's scheduling controls aren't fine-grained enough, we sometimes had to poke holes in Blocksite to let them use YouTube legitimately for school, and we found that they got around the Internet being unavailable by using Screen Recorder on the mac to save local copies of YouTube videos (which despite the terrible audio quality, I thought was quite irritatingly ingenious given their age).

What I really wanted was a way to lock them out of their computer entirely on a schedule. Then a remote control to temporarily unlock them from my cell phone. Finally, to be really cool, we could ideally hook it up to Google Calendar so that it would track when they were and weren't supposed to be using their computer for either school or after school classes.

So ridiculously late in the lockdown (as it turned out mere weeks before they went back to school in person), I hacked together a thing called Heimdall that was going to do some of this.

Heimdall is an unfinished lump, but through it I learned some interesting and perhaps useful things, continued my journey getting more experienced with Rust and modern web frameworks, and learned some interesting things about Mac OS that I never knew before. The next sequence of blog posts will be about the things I learned along the way. Anyway as a taste, here's what it looks like:

https://www.youtube.com/watch?v=2JCbYFstPG4

Noogler became n00b became Tweep!

Brian Duff — Tue, 02 Mar 2021 16:54:00 GMT

Friday will be the 14th anniversary of my first tweet. Umm... it hasn't aged well. The Internet was stunned by the revelation that I was reading my email.

Reading email and messing around with Eclipse 3.3M5.
Brian Duff (@brianduff) March 5, 2007

In more contemporary news, this week I joined Twitter as a Principal Engineer in Engineering Effectiveness.

I'll try to build things that help engineers inside Twitter have a lovely, productive time creating the cool things that they make every day. Delighting developers is something that I've continued to be passionate about across Oracle, Google, and Facebook. I'm really excited to make the jump from being a long time user of Twitter to being part of the Twitter team. It's pretty cool that I get to continue to work on stuff that I enjoy so much. Onboarding remotely is a... weird... experience, but so far I'm having a whale of a time (geddit? gurgle).

I do leave Facebook with a great amount of sadness and fond remembrance. Whatever you may think about it (or any of the tech companies, frankly) from a societal and moral perspective, the experience of being an engineer at these companies is truly awe inspiring, humbling, and transformative (and frankly, fun). I was lucky to learn and grow with an exceptional set of talented and passionate people. I got to work on some challenging and fascinating projects. Most of all, I experienced a tremendous amount of support and care from people I worked with at every level as we went through the tummult and disconnection of working from home and adjusting to how that impacted just about everything in strange and unexpected ways.

This is only the fourth company I've worked for in (a quite shocking and hard to believe) 23 years in the software industry, 16 of those years living in Silicon Valley. Seeing huge change in the perception of the tech industry, watching cozy little startups transform into big hulking tech, and noticing the general perception of the Bay Area as a whole shift significantly, I still feel a sense of delighted disbelief about where I am. This wee man from a working class family in Leith somehow made it to be the first person in the family to finish six years of secondary school, make it to university, and then crazily make it all the way to this lucky life in a distant country doing what I'm passionate about for a living for most of my adult life. I've worked hard, but I think I've also been very very lucky, and it's part of my fiber that I will always do whatever I can to help others who need a bit of that luck too, whatever background they're from.

So pumped and ready to get started on chapter 4 :)

HashFile: A disk-based hash structure

Brian Duff — Wed, 14 Oct 2020 18:07:00 GMT

Previously, I introduced a problem I was trying to solve where a large datastructure was being pinned into memory for occasional lookups. This post delves into the implemented solution which pushes it onto disk but retains (relatively) fast lookups. I think using a database or a B-Tree is a good solution to this kind of problem in general, but it was fun and inexpensive to implement this utility, and it turned out to be generally useful. Bear with me if you already understand HashMaps pretty well, because I'm basically describing a HashMap here, but it's a disk-based HashMap.

Logically, the data consists of a series of key-value pairs. The keys and values are of variable size, because they contain strings. If we were to write only the values to disk in a binary format, we might have something like this for the JSON example in the previous post:

There are two records, at offsets 0x00000000 and 0x000000A4. If we had some way to map a key to one of these offsets, we could seek() to that offset on disk and read a single record. We'll need some kind of index for that. A simple thing we could do is to store a table of the hashCode() of each key to the offset of the value. The hashcodes are as follows:

//src/com/foo/bar/baz:baz -691376290 (0xD6CA6F5E)
//foo/far/fun:fun -1488203677 (0xA74BD063)

So, our index is a simple table that looks like the diagram below. Notice that we've added 0x10 to the offsets because the index is at the start of the file and is 0x10 bytes long, pushing the values down by that much (also, in the real implementation, the offsets are longs, but I made them ints to keep this diagram and example more readable :)).

We store the index sorted by the hashcode. Given a key to look up, we can then compute its hashcode and use binary search in the index portion of the file to easily find an offset. This requires O(log n) seeks over the index, followed by a single seek to the position of the record.

We could also have stored this in a more conventional hashmap style by calculating the modulus of the hashcode with some known index size, or computing a perfect hash. However, I'm always looking for a random excuse to write binary search again :)

With this scheme, there's still the possibility that a hashcode will match for two keys, so we actually store a list of records at each offset, along with their keys so we can disambiguate. In practice, in our real dataset, there happen to be zero collisions at present, so this is a bit wasteful. Again, should really use a perfect hash.

This datastructure is completely impractical if we want to support insertion, because we'd have to push the entire value set down in the file. In practice, we always just write the entire file each time (it takes about 250ms for the data we have), which neatly side steps this problem. If insertion were desirable, separating the index and data into two separate files would probably be a better approach, so the data could just be appended to the values file. You'd probably rewrite the index each time because it's sorted, but the index is much smaller than the values data anyway (in our case, it's a little over 1MB with 100,000 entries - 12 bytes per entry, and it could probably be 8 bytes per entry if we used int rather than long offsets).

Performance

The performance of a single key lookup with this is approximately _disk_seek_time + (log n * disk_seek_time) + record_read_time_. For a SSD with seek time of 0.10ms and 100,000 entries, we'd expect a lookup to take around 2-3ms. This is an upper bound of the performance I see experimentally from the implementation. It'd be far worse on a spinning disk, but our developers don't have those. The memory cost is O(1), a fancy way of just saying that we don't need to load the whole blimmin' map into memory like we were before.

Implementation

As it happens, this whole thing was implemented as part of Buck, which is opensource. So if you're interested, you can find the source code in GitHub. The main implementation is in HashFile.java - it's pretty simple (less than 200 lines of code). You can find some unit tests that show off usage of it in HashFileTest.java. Enjoy!

The case of the unwieldy HashMap

Brian Duff — Tue, 13 Oct 2020 15:58:00 GMT

Some data structure was pinning over 70MB of heap space in Android Studio. Our developers have limited memory on laptops, and are often upset about memory consumption in general. This behemoth (retained by our internal plugins) was the second largest allocated and pinned single object in the whole of AS's heap.

buck project generates IDE project definitions from buck targets. It can be configured it to emit a target-info.json file, which contains simple mappings that look something like this:

{  "//src/com/foo/bar/baz:baz" : {    "buck.type": "java_library",    "intellij.file_path" : ".idea/modules/src_com_foo_bar_baz_baz.iml",    "intellij.name" : "src_com_foo_bar_baz_baz",    "intellij.type" : "module"    "module.lang" : "KOTLIN",    "generated_sources" : [       "buck-out/fe3a3a3/src/com/foo/bar/baz/__gen__",       "buck-out/fe3a3a3/src/com/foo/bar/baz/__gen_more__"     ],  },  "//foo/far/fun:fun" : {    "buck.type": "cxx_library",    "intellij.file_path" : ".idea/modules/foo_far_fun_fun.iml",    "intellij.name" : "foo_far_fun_fun",    "intellij.type" : "module"  }}

We have some large number of these targets (ballpark 100k or so), so the existing datastructure representation of this (a hashmap correlating to the structure above) could become quite large. The datastructure was intentionally pinned in memory in case we needed it.

This was a fun small optimization problem. The map is accessed infrequently in bursts, and the number of keys we typically have to look up are a tiny proportion of the total set of keys in the map. Our ideal structure has relatively fast lookups with low memory overhead. Here are some of the things we might try:

Load the file lazily when we need to do a lookup

Instead of pinning this datastructure in memory permanently, try to arrange the code so that we load the file once into a HashMap, and use it locally where it's needed, allowing the HashMap to be garbage collected when we're done.

Loading the file is relatively slow: it takes about 600ms to read and parse when the system's filesystem cache is cold, and about 200ms otherwise. But assuming we can arrange the code in a way where this happens only once, and it happens in a way that doesn't block anything else, that might be acceptable.

This option turned out to be impractical, because of the architecture of the part of the plugin API in IntelliJ it was invoked from. The component which renders file inspections is recreated multiple times while a file is visible on screen, and there's no convenient way to attach the loaded HashMap to the context of an open editor. Given that, most lookups would take in the 200ms range, and this would generate large amounts of garbage to be collected on the heap, leading to increased GC times.

Optimize the in-memory representation of the data

There's a fair amount of repetitive pattern based naming in the original structure. For example, a target called //foo/bar maps to a module called foo_bar. There are different configurable schemes for how targets map to module names, so the above mapping isn't necessarily canonical. We could consider the options used to generate the project information, and map back the names dynamically at runtime.

I didn't extensively investigate this option. Logically, it'd reduce memory usage by about a third, and the extra processing required would likely be quite cheap.

Load the file lazily and hold it in a time-based cache

This is similar to the first option, except we alleviate the problem of not having a convenient place to keep hold of the HashMap by retaining it in a WeakReference cache for a fixed period of time after it's last used. I ended up implementing this first as a stopgap, because it's relatively easy.

It does have the downside of still using a large amount of heap for some period of time after the last access, and potentially can also generate a lot of garbage to be collected depending on usage patterns.

Change the file to a binary format to speed up reads

The 600ms initial disk read time seems high. This file clocks in at 37MB, under ideal conditions on a 500 MB/s SSD, we'd expect to be able to read it in under 100ms. Indeed that matches what we see if we use dd to measure read performance:

# Purge disk buffers$ sudo purge# First read is about 200ms$ time dd if=target-info.json of=/dev/null bs=8k4740+1 records in4740+1 records out38837604 bytes transferred in 0.019201 secs (2022682285 bytes/sec)dd if=target-info.json of=/dev/null bs=8k  0.00s user 0.01s system 67% cpu 0.023 total# Second read, buffered in the disk cache, is about 100ms$ time dd if=target-info.json of=/dev/null bs=8k4740+1 records in4740+1 records out38837604 bytes transferred in 0.010184 secs (3813571762 bytes/sec)dd if=target-info.json of=/dev/null bs=8k  0.00s user 0.01s system 92% cpu 0.013 total

We should profile to see why there's such a large discrepancy between ideal and observed read time, but some theories about this:

Read contention
Overhead of parsing JSON
Cost of allocating object on the heap while building the map

Writing some code to use a raw binary format for the file clearly demonstrated that the JSON parsing wasn't the issue. I didn't profile it to my shame, but I think it's likely allocation is the primary contributor. Overall, it's very wasteful that we're allocating so many objects that we don't need.

Don't load the file into memory at all

Ideally, we'd just avoid reading this whole file into memory altogether. Since we're unlikely to use most of it, it's always going to be pretty wasteful.

We could use some kind of mechanism to stream the JSON and find just the key we're looking for. But since we control the generation of the file, it seems like it might be better just to write out a file format that's makes it easy to look up a specific key and read data directly from the file for that key. A B-Tree is a good datastructure for this kind of problem, but I wound up doing something much simpler to implement. The next post talks about a disk-based hash structure I used to solve this problem.

In the lunch line with Larry Page

Brian Duff — Mon, 12 Oct 2020 22:03:00 GMT

The Whisper project (which became Nearby) got started as part of the Google+ org. The Google+ team sat in the same building as Larry Page, and we'd often see him in his office or walking around the building.

One day, a few of my teammates and I were standing in line at Cloud Cafe, in the restricted part of building 1900 where the Google+ team sat. Right in front of us in the line was none other than Larry Page himself. One of the engineers on the team struck up a conversation with him, and Larry asked us how Whisper was going. We hadn't launched anything yet, and were deeply in the midst of doing crazy cool things with ultrasound.

My teammate answered quite honestly that it was hard, and it was taking much longer than we expected to get it to a dogfoodable and eventually launchable state. Larry asked him why - what made it hard? Probably a fairly innocuous, polite question, but coming from the CEO and founder of Google, it sort of takes on an extra weight. We tried our best within 1.5 minutes to explain how a variety of unexpected complications contributed to delay and difficulty.

Throughout my career, I've been in a bunch of review meetings with super senior executive people. Usually a big amount of preparation and forethought (not to mention rehearsing and preparing answers to difficult questions you expect to be asked) goes in to these things. But that totally unprepared conversation in the lunch line with one of the founders of perhaps the most rapidly successful company in history made me think that sometimes it'd be nice if there was a bit less formality and you could just have an open and honest chat about where things are at without all the prep.

After we ate our lunch together in a chittery, excited huddle on a sofa (there were never any tables free at lunchtime in Cloud - I wonder where Larry sat), I wandered back to my desk thinking, "is it really all that difficult?" and plotting how we could try to cut out some of the complexity. So I've no idea what impression Larry had from it (I'd be surprised if it's something he even remembers), but it had quite an effect on me and the others who were there at the time.

Shoes and secret projects with Vic

Brian Duff — Fri, 09 Oct 2020 15:36:00 GMT

Google used to have a tradition called TGIF. It still seems sad that I'm talking about this in past tense, but hey ho. At TGIF, the founders and other executives got up on stage, welcomed nooglers (new Google people), introduced a bunch of prepared speakers who talked about interesting things that were going on, then opened themselves and other executives up to pre-voted and audience questions.

There are many infamous things that happened at TGIF during my time there that I can't talk about, but I was physically present in Charlies for these:

the time they let off fireworks inside Charlies to celebrate a Nexus device launch and gave everyone in the company the new device
the time they announced that everyone at Google was getting a significant pay raise and bonus. People went wild. There was screaming and yipping. It was like a music concert in the 80s.
the several times Patrick Pichette came by with a huge backpack of cash and everyone got an envelope with 10x$100 bills as a Christmas gift.
the time someone's mic had to be cut during the live q&a because they were ranting quite wildly at Larry Page for a significant period of time, fairly incoherently.
the infamous Handbook Guy incident, which I am mildly surprised to find no references to on the Internet. But I was there in the audience that day in Charlies, and wow.
the TGIF where Google Glass was revealed internally for the first time. People gave it a standing ovation.

Caitlin enjoying the food in Charlies. Not a picture of TGIF, since those aren't allowed (ok, not everyone got the message apparently, which is probably part of why they're not around any more)

Anyway. At one point, I'd recently started working on mobile infrastructure for Google+, and at TGIF, Andy Rubin (of all people - this was before the terribleness) was up on stage talking about how a lot of the developer pain with Android was going to be solved soon by some upcoming project. This was intriguing, and so I figured I'd pop him a quick email to ask about it. To my surprise, Andy replied almost immediately, and this resulted in a meeting between myself, my manager, and Vic Gundotra, the charismatic overall lead of Google's social efforts.

.jpg/440px-Google_VP_Engineering_Vic_Gundotra(cropped).jpg).jpg/440px-Google_VP_Engineering_Vic_Gundotra(cropped).jpg)

I never did learn what this mysterious solution to all developer pain was - the intrigue around it only deepened in the conversation with Vic. He alluded to some shadowy organization, hidden from the org chart, and working on a project so secret we didn't even want those people to show up in internal systems for fear of opposition. They were working on something that might never pan out, and it was being given a bubble of space in which to grow without criticism. I think I can sort of guess with the benefit of hindsight what it became eventually, but it was super intriguing at the time.

Vic did enthusiastically share his love of shoes with us though, at great length. I really enjoyed the passion with which he talked about this, and how he connected it back to the overall product direction of Google+ at that time. Interest based channels eventually became a key component of Google+. I think a lot of it had to do with shoes.

The very scientific microkitchen testing event

Brian Duff — Thu, 08 Oct 2020 16:05:00 GMT

One of the things I miss most about the office are the microkitchens. Facebook and Google both have fantastic selections of yummy snacks to fuel folks through the day. It's actually quite great for adhoc conversations and just getting away from your desk for a bit. I've tried to replicate this by purchasing a box of Funyuns from Amazon to keep at home, but it's just not the same. As I'm... you know... pathetically eating my Funyuns at home on my own with my shorts on.

At Google, the microkitchens were legendary when I started in 2008. But by the time I left in 2019, they had changed a lot. The stock was intentionally kept low, and the snacks were healthier. This wasn't always a popular thing. For a period of about 3 years or so, the snack selection, which used to rotate fairly regularly, was frozen in time with the same set of things. I think this was due to Google trying to plan the future of the microkitchen program, and it just took a while. Or something like that. Anyway, this fallow period was great if you were the world's biggest fan of French Onion SunChips.

At some point in 2016, I was asked, perhaps because of my infamous love of snacky goodness, or probably for some other random reason, to organize a snack tasting party for people on my floor in building 43. Facilities sent me a gigantic box of snacks, and we had to very scientifically. Like, very scientifically, try out the snacks and give feedback about which things we liked. Based on the feedback from similar parties like this all over the campus, they rotated the microkitchen snacks after a long period of way too many French Onion SunChips.

So in this, incredibly serious business meeting, we're eating as much as we can. For science.

(Sorry for the blurry image, I was probably eating something)

It was pretty obvious which were the unpopular snacks, since they remained in the big box from facilities outside my desk for several weeks despite my urgent pleas to the mailing list for the floor of my building to please come and eat them.

It's a like a startup inside a startup, Topaz

Brian Duff — Wed, 07 Oct 2020 16:30:00 GMT

At Google, I knew a great engineer (let's call them Topaz) who, after working on a bunch of different teams and being pretty successful at what they did, decided to join the hot new team that was hiring like crazy. It was pretty exciting for them, they told me many times. For me, I've been in some of those situations where you're anticipating something new in your life so much you have the most vivid dreams about it actually happening. As if it were actually happening now. It was like that for this person.

Now this was part of Google was hot at the time; it was in the realm of all things social when Google was trying to do that. It was breathtaking how all-in Google went on the social stuff so quickly, and there was a buzzy aura around the teams who worked on it. The building they were in had restricted access (because reasons), still a relatively rare thing at that time, and its own not-so-secret restaurant.

So, the big day came, and Topaz joined the new team (let's call it the Flingwheel team), and had their first 1:1 with their manager, Borantz. All good fictional names should end with a z. Apparently. Something was just off about that from the start. The manager was charming, and nice - a very sociable and amenable guy, but there was a weird disconnect. It felt like the conversation was one-way. Topaz would talk about what they were looking to learn about and develop on the team, and Borantz would nod and talk about how cool this team was because it was a startup within a startup or something like that.

Topaz wasn't sure if it was just their imagination. A single 1:1 isn't enough, so Topaz paid close attention the next few times they met with Borantz. After a few weeks on the team, Topaz had realized that there was a huge opportunity here - they wanted to hone their frontend web development skills, and this project's frontend was moving slowly and jeopardizing the whole project. Topaz thought this would be an excellent thing to talk to Borantz about, and they did. And Borantz talked about how cool the project was and how cool this team was because it was a startup with a startup, and hey, Topaz, could you do some backend stuff because that's what you're good at right?

Wait, what?

It turns out that Topaz's good friend Potaz happened to have worked with Topaz before about 87 years ago (er, slightly exaggerating here) knew of Topaz's legendary backend skills. Skills which in actuality amounted to a couple of years spent fiddling around with servlets on a shaky and not too reliable server somewhere in the back of an office. Potaz also worked on Flingwheel, and had recommended Topaz to Borantz, and had emphasized Topaz's aforementioned legendary backend skills.

The cycle of dysfunctional 1:1s went on, and Topaz noticed that many other things were dysfunctional about the team. All of a sudden, out of nowhere, the team ruptured into two parts, the Flingwheel team were now doing something that was decidedly less interesting than the original idea, and a new offshoot team went on to build.. well... something that you probably use now. Topaz was stuck on the Flingwheel team, and pretty unhappy at this point, since their job consisted chiefly of converting protocol buffer messages to other protocol buffer messages and writing unit tests for said rote conversion code.

The kicker came when one day, out of the blue, Topaz had a random meeting with Stanz, the director of all backend projects in his part of Google. Topaz's legendary backend skills were needed on the core backend team, and a transfer and reorg were imminent, and wasn't that exciting, Topaz? What's that? You want to be a full stack engineer? You don't actually like backend? Your manager hasn't actually mentioned any of this to you? Oh yea, the Flingwheel team was cool, and it was like a startup within a startup or something like that, and now you're going to be converting protocol buffers back and forth until the end of time, dear Topaz.

Topaz was sad for a bit, and quit, and then found something much more fun to work on, and a manager who listened to them.

Intercepting behavior with java agents

Brian Duff — Tue, 06 Oct 2020 16:30:00 GMT

A previous post showed using JVMTI to log method calls in a non-intrusive way, and without having to make modifications to upstream libraries. JVMTI is much more powerful than that post showed - for example it can replace and modify code in a running JVM altogether, which can be useful for things like logging or performance measurements, but also intercepting or changing behavior at runtime.

It is, however, quite cumbersome to write code for that sort of thing in C or C++ using the JNI interfaces. It turns out Java provides a higher level interface to instrument or redefine classes using the Java programming language itself. This post will demonstrate a ridiculously simple example of such an agent. You can find the example code in GitHub.

A simple program

Let's start off with the really simple program that we want to instrument. The Greeter class does the time honored thing of saying Hello World. We've for some reason awkwardly and weirdly moved the World part of that into a helper method. It's totally artificial, but it helps keep this example straightforward. In addition to Greeter, there's a simple Main class (not shown) that just calls Greeter.sayHello().

public class Greeter {  private static String getName() {    return "World";  }  static void sayHello() {    System.out.printf("Hello %s\n", getName());  }}

It's easy and does what you'd expect. Using Bazel, here's how I build and run the program:

$ bazel run src/main/java/org/dubh/examples/agent/target:TargetHello World

From this point on, let's assume we can't (for whatever reason) touch the code of Greeter. Being a bit selfish, I want this program to say hello to me, not the whole world. A Java agent can change the behavior without changing or recompiling Target.java or Greeter.java. I'll use it to change the implementation of getName() at runtime.

Agent basic structure

The main() method is the entry point to a Java application. Java agents have special powers to do things before main() is called, so the entry point for an agent is premain(). You're passed arguments for the agent, and an object implementing Instrumentation, which is how you access the APIs you'll need to transform classes. Our simple agent checks whether it's ok to redefine classes, and registers a ClassFileTransformer.

public static void premain(String agentArgs, Instrumentation inst) {  if (!inst.isRedefineClassesSupported()) {    System.err.println("ExampleAgent: not allowed to redefine classes!");    return;  }  inst.addTransformer(new ClassFileTransformer() {    @Override    public byte[] transform(ClassLoader loader, String className,         Class oldClazz, ProtectionDomain domain, byte[] classfileBuffer) {      if ("org/dubh/examples/agent/target/Greeter".equals(className)) {        return transformClass(classfileBuffer);      }      return null;    }  });}

transform() is called when the JVM is loading a class, and provides a hook to rewrite its implementation. The className passed here is in JVM internal form (which in this simple case, just means replacing each . with a /). If we return null from this method, the class will be unaltered, which is what we want in all cases unless we're loading Greeter.

Byte code swizzling

All that remains is just to do the actual transformation. The array of bytes we were given in classfileBuffer is the original compiled code for the class in class file binary format. If you were feeling super adventurous, you could swizzle around with the bytes of this array yourself. However, it's much easier to use a library that already understands this format and lets you manipulate it. ASM is a popular library for doing just this kind of thing.

ASM makes it really easy to manipulate bytecode, but you'll still need a basic understanding of how JVM instructions work. Explaining this is beyond the scope of this post, but you can use the javap tool to look at .class files and see the instructions they contain. The body of the current getName() method looks like this:

$ javap -p -c -cp Target_deploy.jar \    org.dubh.examples.agent.target.Greeter  private static java.lang.String getName();    Code:       0: ldc           #2                  // String World       2: areturn

It contains two instructions: The ldc operation pushes the constant value "World" on to the stack, and then the areturn instruction pops the top of the stack and returns it. We want to replace this with set of instructions that call a static method instead:

  private static java.lang.String getName();    Code:       0: invokestatic  #2                  // Method getNewName:()Ljava/lang/String;       3: areturn

These new instructions consist of an invokestatic to call a getNewName() static method pushing its returned value on the stack, and an areturn like before to pop the stack and return it. Alongside the agent, we need to include the new method we want to be called, and we do that in a simple NewGreeter class that's compiled along with the agent:

public class NewGreeter {  public static String getNewName() {    return "Brian";  }}

Here's what the transformClass() method looks like with comments that hopefully explain what's going on:

private static byte[] transformClass(byte[] classfileBuffer) {  // ClassReader knows how to grok the buffer of bytes as a Java class.  ClassReader reader = new ClassReader(classfileBuffer);  // ClassNode is a visitor over the things in the classfile that collects  // them into an in-memory data structure that we can easily traverse. You  // can also avoid creating a separate in-memory representation by just  // implementing a simple ClassVisitor, but it often requires more code.  ClassNode classNode = new ClassNode();  reader.accept(classNode, Opcodes.ASM8);  // Now ClassNode contains a data strcuture with all the things in the  // class, and we can look through the methods for the one we care about.  for (MethodNode method : classNode.methods) {    // You'd maybe want to check the signature also in a real program.    if ("getName".equals(method.name)) {      // Method bodies contain instruction lists. Here, we create a simple      // instruction list with two instructions - one to call a static       // method, and another to return whatever that static method returned.      InsnList instructions = new InsnList();      instructions.add(new MethodInsnNode(Opcodes.INVOKESTATIC,           "org/dubh/examples/agent/NewGreeter", "getNewName",          "()Ljava/lang/String;"));      instructions.add(new InsnNode(Opcodes.ARETURN));      // This replaces the existing instruction list of the method with our      // new instruction list.      method.instructions = instructions;    }  }  // ClassWriter is a visitor that knows how to traverse the data structure,  // and write back out the bytes of a class.  ClassWriter writer = new ClassWriter(ClassWriter.COMPUTE_FRAMES | ClassWriter.COMPUTE_MAXS);  classNode.accept(writer);  return writer.toByteArray();}

Deploying and using the agent

There's one last thing we need to do in order to make our agent work. Agents must be compiled into a jar file that contains instructions about where to find the premain class and which capabilities our agent has. For this example, the MANIFEST.MF looks like the one below.

Manifest-Version: 1.0Premain-Class: org.dubh.examples.agent.ExampleAgentAgent-Class: org.dubh.examples.agent.ExampleAgentCan-Redefine-Classes: trueCan-Retransform-Classes: true

If you're using Bazel, you can accomplish this using the deploy_manifest_lines attribute on java_binary, like so:

java_binary(    name ="agent",    runtime_deps = [ ":agent_lib" ],    main_class = "org.dubh.examples.agent.ExampleAgent",    deploy_manifest_lines = [        "Premain-Class: org.dubh.examples.agent.ExampleAgent",        "Agent-Class: org.dubh.examples.agent.ExampleAgent",        "Can-Redefine-Classes: true",        "Can-Retransform-Classes: true",    ])

With this in place, let's try running our program with and without the agent. We use the -javaagent argument to java to tell it where our agent jar is.

$ cd bazel-bin/src/main/java/org/dubh/examples/agent$ java -jar target/Target_deploy.jarHello world$ java -javaagent:agent_deploy.jar -jar target/Target_deploy.jarHello Brian

It works!

Summing up

This is a fairly trivial example of how to write a Java agent, and there's lots more to dive into for complex things. At the core though, this setup of using ASM to rewrite bytes is a template for much more complicated things. I missed out a few details around Bazel in the interest of making the post as simple as possible, but you can play around with the full example in the javaagent github project. Hope this has been useful. I'd love to hear about the kinds of problems you're solving with Java agents in the comments :)

Aapt2: Tower of Babel

Brian Duff — Mon, 05 Oct 2020 16:30:00 GMT

OK, we'd fixed a bug that brought us back to our baseline build speed with aapt2. But could it be made even faster?

This is the final part of a three part series about adventures with aapt2, Android's resource compiler / optimizer. You can read the intro bit and get more context here. The second post explained a small fix that resulted in a nice performance win.

Big Android apps like the hypothetical ones from that hypothetical company that I hypothetically work for tend to contain a lot of strings that are translated into hypothetical languages (er, maybe I didn't need that last hypothetical). However, during the compile-run cycle, most developers are usually working with a single language. This is not to understate the importance of testing with a variety of languages, but it's reasonable and normal to restrict things somewhat in dev builds for developer efficiency reasons.

There are really a lot of strings in a lot of languages in some of these hypothetical apps. Like really a lot. I wish I could say how much, but think about what you'd consider a lot, then add some more to it.

My profiling from finding the previous issue had shown me that aapt2 was spending an awful lot of time dealing with the huge number of strings we were throwing at it. But most of the time, developers really only cared about a tiny fraction of these strings in developer builds. I spotted a friendly looking option in that looked like it might be quite useful:

So, I was thinking if maybe I could just do something like this in our dev builds, things would be waaaay faster:

aapt2 link -c en ...

Whelp, it didn't work! Erm.

So it turned out that the way aapt2 link processes the -c option is as a post filter - it still processes all the input resource files containing strings in every configuration. One way we could get around that is to filter these resources out at compile time so we never pass them into the link phase. Because of some particular complexities of our source / build system, that'd mean copying or creating symlink farms of resources.

I ended up instead just patching aapt2 to make it respect -c as a pre-filter instead. With this change, it completely ignores inputs that don't match the specified configurations. Doing that eliminated about a 60s build time penalty when resources are changed on every developer build.

Aapt2: Please don't delete me!

Brian Duff — Fri, 02 Oct 2020 16:30:00 GMT

By making one weird change in aapt2, we sped up our build by 45 seconds. Developers love that stuff.

This is the second in a three part series about adventures with aapt2, Android's resource compiler / optimizer. You can read the first bit and get more context here.

Proguard is an optimizer that many Android apps use. It can do nifty things like removing unused code and resources, inlining things that have no real reason to be in separate methods, and even obfuscating symbols so you can pretend like nobody will ever be able to figure out what your clever code is doing. In modern Android, the r8 shrinker has a similar function, and is driven by proguard configuration files.

However, r8 / Proguard can't always figure out if something is used, or sometimes optimizes more aggressively than you'd like. Configuration directives can be used to tell it to keep things that would otherwise be removed. aapt2 has options that let it emit configuration files for resource related code.

I used a profiler to look at the performance of aapt2 on our codebase, and it turned out that a significant chunk of the increased time was being spent in one function, aapt::proguard::CollectLocations(), which is part of the machinery that generates these rules. In particular, it was spending a lot of time generating rules for the --proguard-conditional-keep-rules option, which removes resource ids that don't match certain usage patterns known to be used in layouts.

It turned out that we didn't have that option turned on in our codebase (we use other tools for optimizations like this), so the extra work that was being done here was being thrown away anyway. I wrote up my findings and sent a patch upstream, which I think is always quite a polite thing to do when you discover an easily fixable issue. This immediately sped up our round trip build time by about 45 seconds. Many thanks to the folks at Google for quickly accepting this upstream!

But I was still not happy with how long developers had to wait for aapt2 to do its thing... Stay tuned for another optimization in the next post.

The next blog post in this series talks about another big performance optimization that came from reducing the number of languages we use in dev builds..

How I learned to love aapt2

Brian Duff — Thu, 01 Oct 2020 16:30:00 GMT

The Android Asset Packaging Tool (aapt2) takes all those lovely resources (images, strings, cat pictures, and whatnot) in your Android app, and compiles them into a binary format that the runtime understands. It's also the thing that generates numerical identifiers and constants for them in R.java, which is the class you use to refer to resources in code.

You should rarely have reasons to interact with aapt2 directly, since for most Android developers, it's something that happens automatically during a build with Gradle, or your build system of choice (e.g. Bazel, or in our case Buck). Suffice to say, you're either doing seriously hardcore interesting things, or you're maybe working on a build / developer infra or something like that (we're hiring!) if you're interacting directly with this tool.

aapt2 operates in two phases; in the compile phase, it converts individual resources into a binary representation (either a .flat file or a .flata, which is just a zip of .flat files). The usually more expensive link phase merges all of these individual resources together into a final .ap_ file, which is just like an APK except with no code. For a while, I super enjoyed dropping "flata" and "ap underscore" into random conversations around the dinner table and relished the confusion on the face of those around me. My family love me really. Anyway.

For some reason, maybe following the, "if it ain't broke" principal, we were using quite an old version of aapt2 for quite some time, then tried to upgrade at some point. There were some new goodies we wanted in the latest version, but when we upgraded we ran into a slew of incompatibilities (caused by our own infamous creativity for the most part) that had to be fixed, and then a significant increase in the aapt2 link phase of our builds. Slow builds make everyone sad.

The next couple of blog entries will talk about two specific optimizations I made to aapt2 a while back to speed it up for a (cough) well known large Android app that may or may not be associated with my employer.

The next blog post in this series talks about a small bug fix that saved 45s of build time.

Dynamic Method Tracing in Java: The Implementation

Brian Duff — Wed, 30 Sep 2020 16:56:00 GMT

In the last blog entry, I talked about the need for a tool in Java that can be configured easily to log method calls without redeploying a binary, attaching a debugger, or obtaining root in order to trace system calls. In this blog, I'll dive into some of how the tool was implemented.

A caveat: I knocked this tool together in my spare time one afternoon while working on a bunch of other things, so it's a bit rough around the edges. I'm also not a particularly experienced C programmer, so apologies if my code is a bit rough.

However, it's already proven useful to me, and hopefully either the tool itself or the approach will be useful to others. I found a general lack of detailed information about using JVMTI when doing research.

OnLoad: Config, capabilities, events, and callbacks

The main entrypoint to a native JVMTI agent is the Agent_OnLoad function. I want to do three main things when my agent is loaded:

Load the configuration file so we know which breakpoints to set.
Let JVMTI know which capabilities the agent needs, and ensure the JVM supports them.
Let JVMTI know I'm interested in receiving events when classes are about to be loaded and when breakpoints occur, and register callback functions for these.

The configuration file's implementation is a bit out of scope for this blog. I wrote a quick, dirty, and exceptionally rough yaml parser in C (yes, I know... I must have been bored). You can look at config.c if you're truly interested in the sordid details of this. But ignoring that, here are the more interesting parts of OnLoad:

`JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *vm, char *options, void *reserved) { char *option = strtok(options, ","); // ... // Load the config file using options fromoption`. Also parse out an option for // a log file to write to. // ..

fprintf(stderr, "mlogagent: Loaded agent\n");

// Get a JVMTI env object, which is used to make function calls into JVMTI jvmtiEnv env; assert(JVMTI_ERROR_NONE == (vm)->GetEnv(vm, (void **)&env, JVMTI_VERSION));

// Let JVMTI know we are going to be registering breakpoints and accessing local // variables. If these capabilities are not supported by the JVM, then the AddCapabilities // function will return an error code, and our agent will terminate the VM. jvmtiCapabilities capabilities = { 0 }; capabilities.can_generate_breakpoint_events = 1; capabilities.can_access_local_variables = 1; assert(JVMTI_ERROR_NONE == (*env)->AddCapabilities(env, &capabilities));

// Register callbacks for ClassPrepare and Breakpoint events. We'll dig into these callback // functions soon. jvmtiEventCallbacks callbacks = { 0 }; callbacks.ClassPrepare = &ClassPrepareCallback; callbacks.Breakpoint = &BreakpointCallback; assert(JVMTI_ERROR_NONE == (*env)->SetEventCallbacks(env, &callbacks, sizeof(callbacks)));

// Tell JVMTI to enable breakpoint and class prepare events so our callbacks will be invoked. assert(JVMTI_ERROR_NONE == (env)->SetEventNotificationMode(env, JVMTI_ENABLE, JVMTI_EVENT_BREAKPOINT, NULL)); assert(JVMTI_ERROR_NONE == (env)->SetEventNotificationMode(env, JVMTI_ENABLE, JVMTI_EVENT_CLASS_PREPARE, NULL));

return JNI_OK;}``

Registering method breakpoints

The next step is to attach breakpoints to methods we're interested in. We do this when the ClassPrepare event is fired, which happens just before the JVM loads a class. In our case, the implementation of this is a bit complicated by the fact that we have a config file that drives setting the breakpoints, but at the core, we just need to find the class we're interested in, find the method we care about, then attach a breakpoint to it. Here's a simplified version of the code:

`void JNICALL ClassPrepareCallback(jvmtiEnv jvmti_env, JNIEnv jni, jthread thread, jclass klass) { // Avoid re-entrancy, and return early if we already attached. if (in_prepare || attached) { return; }

in_prepare = true;

jclass clazz = (jni)->FindClass(jni, classConfig->name); // In case we don't find the class, don't throw an exception (jni)->ExceptionClear(jni); if (clazz != NULL) { attachMethodBreakpoint(jvmti_env, jni, clazz); (*jni)->DeleteLocalRef(jni, clazz); attached = true; }

in_prepare = false;}

void attachMethodBreakpoint(jvmtiEnv jvmti_env, JNIEnv jni, jclass clazz) { // Look for the method. Here, // methodName is something like "someMethod" // methodSignature is something like "(Lfrodo/Test$RealFile;)Ljava/lang/String;" jmethodID mid = (jni)->GetMethodID(jni, clazz, methodName, methodSignature); if (mid != NULL) { // Actually set the method breakpoint assert(JVMTI_ERROR_NONE == (jvmti_env)->SetBreakpoint(jvmti_env, mid, 0)); } else { fprintf(stderr, "mlogagent: Can't find the method\n"); }}`

The last part is to handle the method breakpoint callback when it happens. In the real program, we keep track of methodIDs we've registered breakpoints for, so that when we get this event we can quickly determine which method triggered the breakpoint, and look up information about how to display it in the config. In the real code, this is one of the more complex parts of the implementation, since it uses a bunch of JNI in order to generate a string to represent the method parameter we're interested in. For this simplified example, I'm just going to show what the callback looks like, how to extract a parameter, and how to print the first few frames of the stack trace.

`// Called when a breakpoint is hit.void JNICALL BreakpointCallback(jvmtiEnv jvmti_env, JNIEnv jni, jthread thread, jmethodID method, jlocation location) { // Get hold of the parameter. In this example, we just get the first parameter. // Which parameter is first depends on whether this is a static or instance method. // For instance methods, the first parameter is a synthetic parameter representing // the current instance of the class. So here, we get the first "real" parameter. int parameterPos = 1;

jobject the_parameter; assert(JVMTI_ERROR_NONE == (*jvmti_env)->GetLocalObject(jvmti_env, thread, 0, parameterPos, &the_parameter));

// TODO: something that displays the parameter... See the full code for details.

// Show a simplified stack trace. jvmtiFrameInfo frames[8]; jint count; jvmtiError err;

err = (jvmti_env)->GetStackTrace(jvmti_env, thread, 0, 8, frames, &count); if (err == JVMTI_ERROR_NONE && count >= 1) { char methodName; // fout is the output log file fprintf(fout, " trace: "); for (int i = 0; i < count; i++) { err = (*jvmti_env)->GetMethodName(jvmti_env, frames[i].method, &methodName, NULL, NULL); if (err == JVMTI_ERROR_NONE) { fprintf(fout, "<- %s", methodName); } } fprintf(fout, "\n"); }

fflush(fout);}`

Compiling and running the agent

To compile the agent, you need to add to your compiler's include path header files from the java runtime you're compiling for. The location of these vary from system to system. On my system, I use a command something like this:

gcc -shared \  -I/usr/local/java-runtime/impl/8/include \  -I/usr/local/java-runtime/impl/8/include/darwin \  mlogagent.c \  -o /tmp/libmlogagent.dylib

To use the agent in a running java program, we pass the -agentpath argument to java with the path of the agent, and any options we want to pass in:

java -agentpath:/tmp/libmlogagent.dylib=config=test.conf,file=/tmp/out.txt \   -cp classes frodo.Test

Summing up

So we've seen how to write a fairly basic JVMTI agent that can log method calls. It's pretty efficient - in an example application which makes thousands of method calls a second, there's no perceptible performance impact when the agent is enabled. You can find full source for the tool on github.

I'd like to experiment more with this tool, perhaps making it more flexible and customizable. Some of the things I want to poke around with are:

Rewriting it in Rust. This has advantages like me not having to reinvent a configuration file parser from scratch because I'm too lazy to pull in third party C libraries, as well as likely making it easier to maintain and extend the code in future.
Looking into whether it'd be easier to do the same thing with a java-based agent or JDI, and seeing whether the performance characteristics of these are different.
Adding much more flexibility around logging various bits of context when a breakpoint is hit. For example, it'd be useful to have the option to log fields, and all of the parameters instead of just one.

You can find the code for this tool in github.

Dynamic method tracing in in Java

Brian Duff — Tue, 29 Sep 2020 18:29:00 GMT

loadLanguages(['java']);

Are you a printf debugger? I certainly am. IDE based Debuggers (or tools like jdb / gdb) have their place, and I use them a lot too. But I often want to know when and why some function is called in contexts where attaching a debugger isn't an option.

As one example, a lot of people at my current company use IntelliJ / Android Studio. Like other large companies, we have a gigantic monorepo. Understanding how and why the IDE accesses the filesystem can help us understand performance problems. Everyone experiences slightly different behavior with the filesystem depending on what has been changed, and what's being cached. I'd like to turn on verbose logging of filesystem operations remotely for particular users so we can diagnose issues they're having, and I want to do this in a way that doesn't degrade performance.

Just add logging (or println!)

This is often a simple and good choice, although it often leads to this kind of thing:

`public void doSomethingInteresting(Thingy t) { if (Config.shoudLogInterestingThings()) { LOG.info("doSomethingInteresting(%s)", t); }

// Actually do something interesting}`

That's ok, it's not too onerous. You need to know where you want to add logging upfront before you deploy your app. If you want to change the places where you log dynamically, you're going to have to redeploy your app. Also, if the calls you want to log are not in code owned by you (e.g. in an upstream library, or the JDK itself), this approach may be difficult and require you to build a third party library from source with modifications.

Just use dtruss / strace?

Tools like strace or dtruss monitor system calls. Brendan Gregg details many useful dtrace recipes he wrote in dtrace scripts that ship with Mac OS X for monitoring this kind of thing. You can use dtruss to monitor system I/O calls like stat64().

One problematic thing with this approach on Mac OS X is that these trace scripts need to run as root. In addition, you must turn off system integrity protection for them to yield useful results. This is fine if you're just debugging locally on your own machine. But you can't really do that remotely on people's machines.

Another problem is that often the output of dtruss is voluminous, and there's little context about where system calls were invoked from up at the Java level.

Enter jvmti!

Java has a number of tools that can be used to instrument or inspect code execution in a running VM. You can write a Java-based instrumentation agent to intercept class loading and rewrite the instructions in a class before it is used. I'll cover some interesting examples of using this in a future blog post.

The Java Platform Debugger Architecture (JPDA) includes tools that also make inspecting and manipulating a running VM possible. The high level Java Debug Interface (JDI) is the main way to integrate a debugger into the VM. It communicates with a remote VM using the Java Debug Wire Protocol (JDWP).

That's an awful lot of acronyms. I'm going to mention one more: the Java Virtual Machine Tool Interface (JVMTI) which is the low level foundation of the JPDA, and which is used by the standard implementation of JDWP in the JVM. JVMTI has a number of functions, but useful to our problem is the ability register and be notified when breakpoints occur, and inspect the state of the running VM.

Configurable debugging

My goal was to be able to control method logging with a small configuration file that could be updated on developers machines independent of binary rollouts. Imagine I have some super simple code that looks like this:

`package frodo;

public class Example { public static void main(String[] args) { new Test().run(); }

public void run() { System.out.println(someMethod(new RealFile("/foo"))); System.out.println(someMethod(new RealFile("/bar"))); }

public String someMethod(RealFile file) { return "Hello " + file.getPath(); }

private static class RealFile { private final String path;

RealFile(String path) { this.path = path; }public String getPath() { return path; }public String toString() {  return "A file called " + getPath();}

}}`

I'd like to turn on logging for someMethod with a configuration like this:

`# This is the class I care about

frodo/Test
And this is a method I want to log with its super ugly JVM signature
- someMethod(Lfrodo/Test$RealFile;)Ljava/lang/String;
  Please show me an abbreviated stack trace for each call
  - showTrace: true
    And use the getPath() method to convert the parameter to a String
  - displayMethod: getPath`

This would log the following output for the above example:

someMethod: /foo trace: <- someMethod <- run <- mainsomeMethod: /bar trace: <- someMethod <- run <- main

There are lots of ways this can be extended to be more flexible / configurable, but this is enough for the basic requirement.

Part 2 of this blog post, dives into the implementation of this jvmti agent. You can find the code for this tool in github.

Hello again :)

Brian Duff — Tue, 29 Sep 2020 08:37:00 GMT

After a long absence I'm planning to blog a bit more actively. This is partly motivated by the fact that I've been working on some quite fun, gnarly stuff recently, and I think it'd be useful to me to get back into the habit of writing more about the things I'm futzing around with.

Quite a lot has changed since I last wrote in this blog eleven and a half years ago (oh, wow), in the world, and in my own life. The world seems exceptionally messed up in a variety of ways, and I wish it'd be better, but I sort of try to be eternally optimistic about things and hope that we're going through a tough patch and we'll all pull together and make things better (and I'm not just talking about Coronavirus, although that's pretty terrible in its own right).

In my own life, since I last wrote, I got married to the most amazing person in the world, grew two cheeky and rambunctious kids who are lovely and keep me terribly busy, and manage to surprise and fascinate me on a regular basis. It's really the best thing in the world to be surrounded by a family you love and who love you.

At the time I wrote my last blog post, I was a little under a year into my new job at Google, hacking away on developer infrastructure. The Google years were a fantastic adventure, and I enjoyed over ten of them. I did so many things there, made so many friends, and grew so significantly as an engineer and a person. I was so busy both with work and with my own life, which was full of temporary relationship drama (ah, to be in your late 20s-30s!) followed by a long period of stability and family building, that I basically never had time to do things like blogging. I still probably don't, but I'll try to do it nonetheless. I'll also try to get around to writing about some of the Google experiences if I can figure out how to unpack them into words :)

I joined Facebook last April, and got to enjoy the really quite splendid food they provide there as well as many other fun things around the office for about a year before things locked into the work from home reality that we find ourselves in now. Interestingly, despite having done many different things at Google, including an extended stint in management, I find myself back doing pretty hardcore engineering productivity related development. I guess I must find this kind of stuff fun, and my first few blog posts will be about adventures related to this.

Ok, well. I don't imagine there are many folks left who subscribe to this from over eleven years ago, and if there are, it might be surprising that this suddenly popped up in Google Reader (errrrrr, wait a minute...). So I might just be writing in a small echo chamber for myself. But it's nice to be writing again anyway :)

Cookie handling with Apache XML-RPC

Brian Duff — Fri, 10 Jul 2009 02:14:00 GMT

The example on the Apache website that explains how to pass cookies when calling an XML-RPC service is unfortunately... a bit b0rked. It doesn't compile, for starters. It's also quite a bit more complicated (and handwavy) than it needs to be. I don't know if this is because the API has changed over time. I tend to think it has suffered the copy-paste equivalent of chinese whispers from an origin in some mailing list.

Anyway, I needed to send a single sign on (SSO) cookie to an XML-RPC service recently. I'm using Apache XML-RPC 3.1 (I notice in the Javadoc that this code might be slightly different if you're using a later version). Here's roughly what I did:

First, the standard, boring stuff:

XmlRpcClientConfigImpl config = new XmlRpcClientConfigImpl();config.setServerURL(new URL("https://myservice.com/api"));XmlRpcClient client = new XmlRpcClient();client.setConfig(config);

Now, the juicy part. It's really, really easy. Look, ma, so much less code than the example on the ws.apache.org website, and it's even syntactically correct!:

XmlRpcTransportFactory factory = new XmlRpcSunHttpTransportFactory(client) { public XmlRpcTransport getTransport() { return new XmlRpcSunHttpTransport(client) { @Override protected void initHttpHeaders(XmlRpcRequest request) throws XmlRpcClientException { super.initHttpHeaders(request); setRequestHeader("Cookie", myLovelyCookieData); } } }};client.setTransportFactory(factory);

Update: Fixed a bug in the code. Oh, the irony. That'll teach me to be so arrogant :P

Find a file inside a zip

Brian Duff — Fri, 05 Jun 2009 02:15:00 GMT

Brutally simple shell script I often use to find a class file that I know exists somewhere in a directory tree full of jar files (but generally useful for finding files somewhere in a directory tree of zips):

#!/bin/bash

for zip in $*
do
echo $zip
for file in $(unzip -Z -1 $zip)
do
echo "$zip:$file"
done
done

I put this in a file called zipdump, then do things like this:

find -name "*.jar" | xargs zipdump | grep SomeClass

And get:

./some/random/path/foo.jar:com/google/whatever/SomeClass.class

Listing Chinese Words by Frequency of Use

Brian Duff — Thu, 14 May 2009 00:38:00 GMT

A commenter on my previous blog entry, Yong Huang, pointed me to some wonderful research he did using search engine results to compile a list of Chinese characters in frequency order. I managed to miss his comment when he posted it back in March (d'oh. I really should have paid more attention to my blogger settings), but it's an interesting read and technique:

http://yong321.freeshell.org/misc/ChineseCharFrequency.html

Getting serious about Mandarin

Brian Duff — Sun, 11 Jan 2009 22:02:00 GMT

I'm finally making a big effort to learn mandarin properly. In addition to taking formal lessons at UC Berkley starting in February, and learning about usage from Cindy, I'm using flash cards to increase my knowledge of Chinese characters and their mandarin pronunciation (including the tones, which I've had a hard job remembering up till now while picking up mandarin informally through random conversation).

I've been using some tricks to memorize characters, and this technique seems to work well for me.

Here's the first set of characters. The first ten are numbers, and mostly pretty easy.

one (yi1), two (er4), three (san1). These are quite possibly the simplest Chinese characters ever. :)

four (si4). This one is kind of easy for me to remember, since I visited sichuan last year, and this character was everywhere :) It's also a box with four sides.

five (wu3). I convinced myself that this character has five distinct lines in it, which made it easy to remember (although it actually has only four strokes - the middle horizontal line and the smaller vertical line are combined together as a single stroke).

six (liu4). I found this one a bit tough. I sort of convinced myself that it looks like a lion (the sound liu4 is a bit like the English "leo", as in "Leo the Lion").

seven (qi1). This was tricky too. Until I turned my flash card upside down and realized it looks like an upside down 7 with a stroke through it. Nice.

eight (ba1). No tricks here, I just remember this character because it's pretty simple.

nine (jiu3). With the right amount of squinting, this character looks like a roman J and lower case i without the dots and with a weird line connecting the J to the top of the i. So if you can remember that jiu3 == nine, it's pretty easy.

ten (shi2). If you have Chinese friends, you might notice that sometimes when they say "ten", they make a hand gesture where they cross both index fingers over each other. If you ever wondered why, this character is the answer :)

And here's a random sampling of two more non numeric characters:

ask (wen4). This combines the character for door ( men2) with the character for mouth ( kou3). Can think of it as opening the door of your mouth to ask a question. The sound (wen2) is also similar to the sound for door (men2).

good (hao3). This combines the characters for female ( n3) and offspring / son / child ( zi3). I guess that's good and harmonic.

Anyway, lots more where those came from. Kind of fun to learn.... :)

Still a bit proud

Brian Duff — Fri, 12 Dec 2008 04:10:00 GMT

Looking at Steve Muench's blog with its screenshot of JDeveloper 11g production makes me feel a little bit proud of what we built. Somehow I left Oracle with the feeling that I hadn't been very productive for a while, but seeing what Steve posted reminded me that the team really poured a lot of stuff into 11g.

Just visually from the screenshot, you can see the new look and feel I implemented based on a design from our talented visual design team. It got a lot of flak (experience is teaching me that fear or dislike of change is a very common trait), but apart from being unbelievably blue, I think it's quite attractive.

Also shown in the screenshot is quick search (which, honestly, we always referred to internally as "Google-like search". Hehe). This was something I wanted and so hacked together on a lazy afternoon without any kind of design or project plan while we were supposed to be in bug fixing mode. Despite its birth, it somehow made it into the final product in a very visible way. A very talented member of the team (Neil) did some fantastic work improving the visual design of the component while I was buried under classloading related tasks. One of the things I loved about the JDev team in the early days was the freedom to do this kind of innovation. Although that flexibility to innovate had been almost entirely crushed by the time I left, you could sometimes still get away with it and succeed.

Finally, Steve blogs elsewhere about log window search, which people begged and begged for until we finally relented and I was assigned the task late in the day.

I'm still very proud of these things, and even prouder of all the other innovations my old team implemented.

The Consequences of Job Hopping

Brian Duff — Sun, 07 Dec 2008 21:57:00 GMT

I was pretty comfortable, and definitely not bored in my old job. Had a lot of stuff to do. The trouble though with saying in the same role for a very long time is that it's really hard to find time, reason, or energy to step outside your main job responsibilities a little bit and learn something new. I must have encountered a gazillion technologies and products that I really wanted to get a chance to play with but never did.

I wasn't sure when I quit my job at Oracle that working at Google would be any different necessarily. I felt like I'd probably swap my old batch of responsibilities for a new batch of responsibilities, and it would kind of be similar (but hopefully more fun). One thing I somehow failed to realize was that the really great thing about changing jobs is that it gives you that rare chance to really stretch a bit and learn something new.

So in my three months at Google so far, I've been able to get to know some fun new stuff. And instead of just reading about this stuff, I actually get the chance to play with it all the time. For me, this experience has reignited my passion for learning. I'm hoping that I can keep up the momentum.

Here are a small subset of the things I've learned about and used so far (not necessarily all Google technologies):

GXP is a templating engine that's really useful for putting together dynamic web pages. I never got much of a chance to play with JSP, JavaServer Faces etc. before. I like that GXP is straightforward and easy to use. Haven't built anything spectacularly complex with it yet, but it works fantastically well for simple dynamic web content.
IntelliJ Plugin API. I'm still getting my head around this. It's familiar to me in many respects (there seem to be only so many ways of designing IDE plugin APIs). The API is a tad on the underdocumented side, and there sometimes aren't enough samples. I'm really learning "from the other side" what it's like to have a plugin API without source code. I really wished we'd been able to do better about that at Oracle with JDeveloper, and I hope one day JetBrains might see the light too, and make it easier for its plugin developers by giving them access to source code. But it's still a nice API, and you can get a lot done with it.
SWT. I've been using Swing for 10 years, and somehow got into the "SWT is evil" mentality without really giving it a chance. It's actually a pretty fine API for most UI problems. Slightly depressing to see that layout seems to be an intractable problem that all widget toolkits seem to have trouble making straightforward.
BigTable is cool :)
Google Chart API makes it really easy to draw pretty graphs 'n charts 'n stuff
Debian package management
GNU Fortran don't ask. hehe.

Well... that's not a comprehensive list, but I think I've been able to learn quite a bit more over the last few months than I did over about a year or two. Changing jobs is good for your brain :)

Brings back memories

Brian Duff — Fri, 21 Nov 2008 18:38:00 GMT

Ah... this brings back memories: Feel motivated yet? (via: http://thebode.blogspot.com/2006/07/demotivation.html)

Shirley that's not Shirley

Brian Duff — Tue, 09 Sep 2008 22:03:00 GMT

Warning: herein lie spoilers!

Somewhat surprised to see Shirley Manson, Scottish lead singer of rock band Garbage) show up in the second season premier of Fox's Terminator: The Sarah Connor Chronicles last night. Bit of a "holy crap, that's Shirley Manson" double take when the scene opened on her as an evil CEO who actually turns out to be a T-1000 model terminator (T-1000 is the gloopy, liquid metal model that Robert Patrick played so well in Terminator 2) hell bent on bringing about the Rise of the Machines.

Haven't quite figured out yet whether she's a good actor or not... She certainly came across as kind of cold, dominating, fearsome, and a little stiff. At first, I thought her acting was a bit iffy, but the reveal that she's actually a T-1000 sort of changed my perception a bit.

Was really nice hearing a genuine Scottish accent on US network television too. Scottish actors don't get to play the Bad Guy (or Lass) in TV sci fi nearly often enough. Witness the saccharinely well behaved Carson Beckett from Stargate Atlantis (Paul McGillion's accent is also faaar from genuine, probably on account of the fact that he moved to Canada when he was 2)

Nooglerization done (but is it ever, I wonder... ;) )

Brian Duff — Tue, 09 Sep 2008 04:12:00 GMT

My first two weeks are over, now I'm taking a quick bout of unpaid leave to take Cindy to Paris and Barcelona. She's never been to europe - and even though I'm from Europe, it's a big place and I've never really spent much time in those two cities. We're also going to spend some time with my parents in the south east of Spain.

Cindy swears that I've put on weight in the mere two weeks since I joined google on account of the free food. This is an assertion which I stringently deny, alas to no avail. I try to convince myself that all of the cycling between buildings (or walking in the fairly regular no-available-bikes scenario) is burning off enough calories to make up for the additional input. I may or may not be kidding myself.

I've been using the convenient excuse that, since it takes some period of time after initial employment before registering for the gym is possible, working out and getting rid of the extra pounds is something I can put off until after our vacation.

Anyhow, looking forward to taking a break, and then getting down to some serious geekyness (er.. geekery?) when I get back. I haven't written code for a while, and my fingers are getting twitchy.

On shortcuts

Brian Duff — Thu, 07 Aug 2008 22:07:00 GMT

I made the Big Switch from Windows to Linux at work several years ago, and never really looked back. For whatever reason, our version control system and builds are always about twice as fast on Linux compared to Windows. I'm also much more comfortable in a UNIX-y shell / scripting environment than I am in Windows Command Prompt, probably because of hours spent in various computer labs at university when I really should have been doing something more fun (like drinking beer).

As well as the raw productivity gain from the faster environment, I also customized my environment a great deal to minimize keypresses. For example, where most other people in my team might type something like this:

cd $ADE_VIEW_ROOT/jdevadf/modules/ide/src/oracle/ideade co -nc Ide.java MainWindow.javacd `pwd`emacs Ide.java MainWindow.javacd $ADE_VIEW_ROOT/jdevadfmake release

I type:

m ide/src/oracle/ideco Ide.java MainWindow.javaemacs Ide.java MainWindow.javajdevadfmrel

Doesn't seem like much, but over many many months and years of coding, all those extra keystrokes add up... Better still, I had some immensely powerful tools in my box that would invoke compound commands to provide interesting information.

So, if I wanted to quickly find a file in our source control system (an operation that can be sped up by grepping some version control metadata files, chopping the file up to make it readable), I'd type wi SomeFile.java (where is?). If I wanted to know which module (out of about 600 in our system) was responsible for delivering a jar file (again which involved a lot of grepping through multiple files and cleaning up the output), I could ask wm foo.java (which module?).

These utilities ran ridiculously quickly, because I figured out tricks that were somewhat specific to our system. For example, grepping the version control files to locate a file is way faster than using "find", since it's a grep on a single file instead of a directory tree traversal. Their real power was in compound commands. I could check out all files with the extension xml that contained the string "brian.duff" with a command like this:

wi '\.xml$' | xargs grep -l 'brian\.duff' | xargs co

This is all fine and great, but there's a huge downside. Every time I sat at someone else's terminal to help them with some problem and attempted to run some commands, I had to remember to type the longwinded versions of all these commands again. Like being thrown back into the dark ages. Worse than that, it actually slowed me down even compared to someone without the shortcuts, since my fingers would frequently forget and type my shortcuts instinctively even though my higher brain was fully aware that they wouldn't work. Finger memory is a strange thing.

AWT Freudian Slips...

Brian Duff — Mon, 12 May 2008 23:18:00 GMT

BufferedImage image = new BuggeredImage(...

Duffblog

Showstopper!

RTO at the Googleplex

Here be dragons

Instinct

Feedback

Data Studio Custom Visualizations using React

Setting up

Create the visualization

Bundling with webpack

Writing the manifest and config

Deploying to Google Cloud Storage

Trying it out

Scripts to make life easier

Wrapping up

Cloud Functions with TypeScript

Setting up the project

Deploying to Google Cloud

Actually using types!

Lessons from a toy project: Heimdall

Noogler became n00b became Tweep!

HashFile: A disk-based hash structure

Performance

Implementation

The case of the unwieldy HashMap

Load the file lazily when we need to do a lookup

Optimize the in-memory representation of the data

Load the file lazily and hold it in a time-based cache

Change the file to a binary format to speed up reads

Don't load the file into memory at all

In the lunch line with Larry Page

Shoes and secret projects with Vic

The very scientific microkitchen testing event

It's a like a startup inside a startup, Topaz

Intercepting behavior with java agents

A simple program

Agent basic structure

Byte code swizzling

Deploying and using the agent

Summing up

Aapt2: Tower of Babel

Aapt2: Please don't delete me!

How I learned to love aapt2

Dynamic Method Tracing in Java: The Implementation

OnLoad: Config, capabilities, events, and callbacks

Registering method breakpoints

Compiling and running the agent

Summing up

Dynamic method tracing in in Java

Just add logging (or println!)

Just use dtruss / strace?

Enter jvmti!

Configurable debugging

And this is a method I want to log with its super ugly JVM signature

Please show me an abbreviated stack trace for each call

And use the getPath() method to convert the parameter to a String

Hello again :)

Cookie handling with Apache XML-RPC

Find a file inside a zip

Listing Chinese Words by Frequency of Use

Getting serious about Mandarin

Still a bit proud

The Consequences of Job Hopping

Brings back memories

Shirley that's not Shirley

Nooglerization done (but is it ever, I wonder... ;) )

On shortcuts

AWT Freudian Slips...