<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Duffblog]]></title><description><![CDATA[I'm Brian Duff, a Scot 🏴󠁧󠁢󠁳󠁣󠁴󠁿 living and working in the California Bay Area. I've worked, written, and presented about technology since the 90s, but my ]]></description><link>https://duff.blog</link><generator>RSS for Node</generator><lastBuildDate>Thu, 23 Apr 2026 03:14:37 GMT</lastBuildDate><atom:link href="https://duff.blog/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Showstopper!]]></title><description><![CDATA[I've changed companies again! I seem to be doing this a lot lately, but there are (as always) good reasons for it. This time around, the nice folks at LinkedIn reached out about a Distinguished Engineer role they had in the developer productivity spa...]]></description><link>https://duff.blog/showstopper</link><guid isPermaLink="true">https://duff.blog/showstopper</guid><category><![CDATA[Microsoft]]></category><category><![CDATA[LinkedIn]]></category><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Wed, 15 Nov 2023 08:16:45 GMT</pubDate><content:encoded><![CDATA[<p>I've changed companies again! I seem to be doing this a lot lately, but there are (as always) good reasons for it. This time around, the nice folks at LinkedIn reached out about a Distinguished Engineer role they had in the developer productivity space. It seemed like such a perfect fit for me, that I couldn't say no. I had a tremendously rewarding experience going through the interview process and got to meet many of LinkedIn's most senior technical leaders. They were direct and candid, and I liked talking with them a lot.</p>
<p>One thing that's especially interesting about this particular job is that I finally (in an indirect kind of way) also now work for Microsoft. I didn't contemplate this much during the interview process, but since I accepted and started sharing the news with folks, several people reacted, "Ah... that means you're working at Microsoft now?".</p>
<p>Back in 1994, when I was just starting my first year of University, I read "Showstopper!" by G. Pascal Zachary. It painted an honestly slightly scary picture of the experience of several engineers with very little in the way of what we'd call work / life balance today at Microsoft during the time when Windows NT was coming into existence. At that time, and in that very different world, I was a huge fan of Microsoft. This book, despite depicting such harrowing things as a person who got divorced because they were working too hard, had a weird effect - it ignited me on a journey that would lead to 25 years of working at various technology companies, and propel me thousands of miles away from home to Silicon Valley.</p>
<p>There was something compelling, not about the workaholism, but about the sheer passion with which those individuals and teams took on their work. I wanted nothing more than to be a part of something monumental like that. In some strange way, hard work doesn't feel like that when you're doing things that are truly meaningful and when you're enfranchised to make a difference. Maybe not coincidentally, I'm feeling that way about my new job, early as it is.</p>
<p>I'm in a very, very different place now, and Microsoft is in a very, very different place, and I don't actually work <em>in</em> Microsoft, even though Microsoft is the parent company of the place I do work. I get to visit Redmond in a few weeks as an extended employee for a summit, and it's something I'm <em>awfully</em> excited about. Even though I've never been there before, it's oddly like coming home to a formative idea of what it was like to work in the software industry.</p>
]]></content:encoded></item><item><title><![CDATA[RTO at the Googleplex]]></title><description><![CDATA[I've been back at Google for a year already - it's hard to believe the time has passed so quickly. It will soon also be the 15th anniversary since the day I first joined Google in 2008.
A lot has changed. It's a bit of an eerie experience at times - ...]]></description><link>https://duff.blog/rto-at-the-googleplex</link><guid isPermaLink="true">https://duff.blog/rto-at-the-googleplex</guid><category><![CDATA[Google]]></category><category><![CDATA[Office furniture ]]></category><category><![CDATA[tech industry]]></category><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Sat, 08 Jul 2023 09:12:00 GMT</pubDate><content:encoded><![CDATA[<p>I've been back at Google for a year already - it's hard to believe the time has passed so quickly. It will soon also be the 15th anniversary since the day I first joined Google in 2008.</p>
<p>A lot has changed. It's a bit of an eerie experience at times - the company itself is very different, but there's also this uncanny feeling when I walk around the hallways, especially at the Mountain View campus.</p>
<p>I found myself the other day wandering around in building 900, which is the building that housed the DevTools team when I joined in 2008 (they'd just moved there from the beating heart of Google in Building 43 on the Googleplex, and I remember some were a bit unhappy about that shift and what it meant). This was during a golden period for DevTools - Blaze was quickly becoming the main build tool for google3, and the years immediately following it would see quite significant improvements with Forge and Objfs making google3 into quite the impressive (and scalable) monorepo. It was also the year that Chrome launched, and learning about it and playing with it before its release was quite exciting.</p>
<p>Like most Google buildings, 900 has changed enormously since 2008 on the inside. It's pleasant (REWS, the real estate folks at Google, do a tremendous job of making the buildings very stylish - things felt much more cheap &amp; cheerful and weirdly less well-lit when I joined), but entire office configurations have disappeared. The area where I sat when I first joined is now a bit of a hallway. The nearby micro kitchen is still in the same place, and it feels quite familiar, albeit with far fewer snacks than it had 15 years ago.</p>
<p>There's a window on the second floor of 900. It's designed in a way that looking out of it feels a bit like peering out of the laser turret of the Millenium Falcon. I remember back in 2008 the window used to be always covered in dusty cobwebs. The atrium it was in was kind of dark, had a broken down arcade machine, and a bright red sofa with scratchy material of the kind that you could find all over the Googleplex in those days. I used to go there to eat MK snacks (again, they had quite yummy ones at the time) and maybe read a bit of the late Bob Lee's book on Guice. Because Guice was the hotness then. Now it's sparkly clean, bright, and full of modern expensive office furniture. When I sat there the other day I sort of felt the ghost of my old self and that old place and company and the magic it had. I overheard a conversation between two Googlers bemoaning the recent RTO announcement and how all that time spent commuting would be such a waste of time they could spend being productive.</p>
<p>I have mixed feelings about the enforced RTO thing - I'm a parent so I enjoy the flexibility of hybrid (while simultaneously realizing that we always had a fair amount of flexibility, but now it's sort of enforced). I'm also a dinosaur, so I'm just used to being in an office and collaborating with people by, like, talking with them.</p>
<p>When I heard that conversation though, I felt a mild sense of loss about a time when we were just excited beyond belief to be <em>here at Google</em>. It'll never again I think have quite the same sense of magic and specialness as it did in those days, and that's ok. While knowing that it's different now, I'm glad to have lived in that time, and it has certainly been nice coming back and getting the chance to see it all once again.</p>
]]></content:encoded></item><item><title><![CDATA[Here be dragons]]></title><description><![CDATA[When you first join a new org or company, draw a map.
Months into some roles, I've made faulty assumptions due to my understanding of organizational relationships. I'd wonder why a function was inefficiently split across two orgs, only to find that i...]]></description><link>https://duff.blog/here-be-dragons</link><guid isPermaLink="true">https://duff.blog/here-be-dragons</guid><category><![CDATA[leadership]]></category><category><![CDATA[Collaboration]]></category><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Tue, 24 May 2022 17:00:23 GMT</pubDate><content:encoded><![CDATA[<p>When you first join a new org or company, draw a map.</p>
<p>Months into some roles, I've made faulty assumptions due to my understanding of organizational relationships. I'd wonder why a function was inefficiently split across two orgs, only to find that it gained a different kind of efficiency with that split. I'd ponder why there was no team to solve some problem, when in fact there was, but they weren't solving it, just "licking the cookie".</p>
<p>I'm working on getting better at drawing organizational maps. It's just a bunch of diagrams, notes, and annotations about who does what at an individual and team level, and how things are related. </p>
<p>Figure out where the unexplored territories are. Identify who to talk to in those teams. Mark up parts of it with "here be dragons" based on the conversations you have. Form questions to ask, or ideas about taming the dragons. Your first versions of the map will look like a crude cave picture six months from now. The orgs and people will change. It's ok - keep revising the map as you learn more.</p>
<p>There's usually more than one way to visualize the teams and individuals that make up an organization. Don't be satisfied with org charts alone as a primary structure. The real value is in the "shadow graph" through which influence and support flow.</p>
]]></content:encoded></item><item><title><![CDATA[Instinct]]></title><description><![CDATA[It just doesn't feel right.
Your instinct is telling you it's wrong.
But you don't trust it. You're rational. You believe in the scientific method. Cause and effect. Engineers don't operate on instinct. We're data driven. We look for root causes. We ...]]></description><link>https://duff.blog/instinct</link><guid isPermaLink="true">https://duff.blog/instinct</guid><category><![CDATA[leadership]]></category><category><![CDATA[problem solving skills]]></category><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Fri, 20 May 2022 18:21:04 GMT</pubDate><content:encoded><![CDATA[<p>It just doesn't feel right.</p>
<p>Your instinct is telling you it's wrong.</p>
<p>But you don't trust it. You're rational. You believe in the scientific method. Cause and effect. Engineers don't operate on instinct. We're data driven. We look for root causes. We analyze and determine outcomes.</p>
<p>It took me a while to learn the importance of instinct in technical work. An instinctive sense is often the result of accumulated experience. I can't always immediately explain the link between something I've experienced before and an instinctive insight. When I changed my perspective of instinct and began seeing it as an incredible tool, I became a much better engineer. Instinct can hugely help to reduce the investigation space when trying to root cause. It can help you to sense smells and identify opportunities for improvement.</p>
<p>It's only part of the story. Convincing others on technical matters with instinct alone is a difficult path. Technical influence rightly prioritizes evidence with data. As a technical contributor, evidence should be key to your own decision making. Instinct can result from bias. Bias is also a product of accumulated experience, but perhaps more from your internal voice. Evidence helps you to check those biases.</p>
<p>Use instinct as a shortcut to trim the solution space, or find different ways to think about a problem. After that, go do the (often) grungy work of actually finding and presenting the data that proves or disproves your instinct before trying to convince others.</p>
]]></content:encoded></item><item><title><![CDATA[Feedback]]></title><description><![CDATA[You feel like you did OK, but the outcome isn't what you were anticipating. 
It often means you didn't do as well as you thought. There may be a slight, but correctable gap between your situational perception and reality.
Sometimes external factors m...]]></description><link>https://duff.blog/feedback</link><guid isPermaLink="true">https://duff.blog/feedback</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Tue, 04 Jan 2022 20:13:34 GMT</pubDate><content:encoded><![CDATA[<p>You feel like you did OK, but the outcome isn't what you were anticipating. </p>
<p>It often means you didn't do as well as you thought. There may be a slight, but correctable gap between your situational perception and reality.</p>
<p>Sometimes external factors mean that no matter what you did, the outcome was going to be the same. However, when success is truly related to how you interact, you have a lot of control. I think it's quite healthy to believe that you have control in most such situations. It leads to a reflection and improvement instead of a feeling of fatality due to external circumstances.</p>
<p>From time to time, others will give you direct feedback about where you fell short. It can be hard to hear, but it's almost always a gift. I've had this happen in interview situations, or in fit calls with candidates where they ultimately decided to choose another company or team. When you get such feedback, the extent to which you have to guess is reduced. It's enormously helpful emotionally, because guessing can sometimes lead to quite a lot of self doubt. </p>
<p>There have been times in the past where I did a bad job of accepting such feedback even though it was spot on. Ego is a terrible trap, but the good news is that if you're open to hearing feedback and genuinely want to get better, it gets easier to hear what you need to hear.</p>
<p>Giving feedback is a great service you can provide to others. The best mentors and leaders consistently provide such feedback in a constructive way. Nowadays, I feel blessed when people directly give me feedback with candor. I appreciate it much more than invoking The System that's preventing a favorable outcome, whatever that may be: the promotion quotas, or the bell curve, or the hiring constraints. It's almost always easy to deconstruct "it's the system" rationales and prove they're an excuse with counter examples.</p>
<p>I don't like making resolutions that much - the new year is such an arbitrary excuse. That said, one thing I'd like to do more of this year is to look for opportunities to give better feedback to others, and to avoid the urge to blame systems.</p>
]]></content:encoded></item><item><title><![CDATA[Data Studio Custom Visualizations using React]]></title><description><![CDATA[Google Data Studio is pretty easy to use for simple analytics dashboarding. Its standard visualization components are pretty comprehensive for most of the simple problems I try to solve. When you want to do something more complex, you can build custo...]]></description><link>https://duff.blog/data-studio-visualizations-react</link><guid isPermaLink="true">https://duff.blog/data-studio-visualizations-react</guid><category><![CDATA[React]]></category><category><![CDATA[JSX]]></category><category><![CDATA[#data visualisation]]></category><category><![CDATA[google cloud]]></category><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Wed, 24 Nov 2021 00:53:01 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1637715283938/Qxr4VGTQ1.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a target="_blank" href="https://developers.google.com/datastudio">Google Data Studio</a> is pretty easy to use for simple analytics dashboarding. Its standard visualization components are pretty comprehensive for most of the simple problems I try to solve. When you want to do something more complex, you can build custom visualizations with HTML or SVG using  <a target="_blank" href="https://developers.google.com/datastudio/visualization">Community Visualizations</a> .</p>
<p>The  <a target="_blank" href="https://developers.google.com/datastudio/visualization/write-viz">documentation</a> for writing visualizations is great, but the examples involve direct DOM manipulation:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">var</span> chartElement = <span class="hljs-built_in">document</span>.createElement(<span class="hljs-string">'div'</span>);
chartElement.id = <span class="hljs-string">'myViz'</span>;
<span class="hljs-built_in">document</span>.body.appendChild(chartElement);
</code></pre>
<p>I recently worked on a complex visualization for a test flakiness dashboard, and found it convenient to use  <a target="_blank" href="https://reactjs.org">React</a> and  <a target="_blank" href="https://reactjs.org/docs/introducing-jsx.html">JSX</a>  instead of hand crafting DOM nodes. In this post, I'll develop a simple community visualization that uses React. In a later blog, I'll expand this example to show how it can populate with a datasource from Data Studio.</p>
<p>You can find the code for this example over on <a target="_blank" href="https://github.com/brianduff/datastudio-react">brianduff/datastudio-react</a> on GitHub.</p>
<h2 id="heading-setting-up">Setting up</h2>
<p>I'm going to use TypeScript, but you can use plain old JavaScript if you want.</p>
<pre><code class="lang-bash">npm init -y

<span class="hljs-comment"># Install the DSCC library and react</span>
npm install @google/dscc react react-dom
npm install --save-dev @types/react @types/react-dom

<span class="hljs-comment"># Install and initialize TypeScript (optional)</span>
npm install --save-dev typescript
npx tsc init
</code></pre>
<p>Among the things this generates is the <code>tsconfig.json</code> file. We'll make a few minor changes to it. Here's the complete file:</p>
<pre><code class="lang-json"><span class="hljs-comment">// <span class="hljs-doctag">TODO:</span> more stuff to add here.</span>
{
  <span class="hljs-attr">"compilerOptions"</span>: {
    <span class="hljs-attr">"target"</span>: <span class="hljs-string">"es2016"</span>,
    <span class="hljs-attr">"module"</span>: <span class="hljs-string">"commonjs"</span>,
    <span class="hljs-attr">"esModuleInterop"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">"forceConsistentCasingInFileNames"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">"strict"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">"skipLibCheck"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-comment">// Enable jsx</span>
    <span class="hljs-attr">"jsx"</span>: <span class="hljs-string">"react-jsx"</span>,
    <span class="hljs-comment">// Send output to a different folder</span>
    <span class="hljs-attr">"outDir"</span>: <span class="hljs-string">"built-tsc"</span>
  },
  <span class="hljs-comment">// Tell tsc to find our code in a folder called src.</span>
  <span class="hljs-attr">"include"</span>: [<span class="hljs-string">"src"</span>]
}
</code></pre>
<h2 id="heading-create-the-visualization">Create the visualization</h2>
<p>Let's create a really simple visualization component to get started in <code>src/Hello.tsx</code>:</p>
<pre><code class="lang-jsx"><span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">Hello</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">return</span> <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>Hello!<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>
}
</code></pre>
<p>Now, we'll create the entry point that integrates with Data Studio's DSCC library to embed our component in <code>src/index.tsc</code>. There's a bit of unavoidable DOM nastiness here, but it's confined this one place.</p>
<pre><code class="lang-jsx"><span class="hljs-keyword">import</span> { ObjectFormat, objectTransform, subscribeToData } <span class="hljs-keyword">from</span> <span class="hljs-string">'@google/dscc'</span>
<span class="hljs-keyword">import</span> { Hello } <span class="hljs-keyword">from</span> <span class="hljs-string">'./Hello'</span>
<span class="hljs-keyword">import</span> ReactDOM <span class="hljs-keyword">from</span> <span class="hljs-string">'react-dom'</span>

<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">drawViz</span>(<span class="hljs-params">data: ObjectFormat</span>) </span>{
  <span class="hljs-comment">// Insert or replace the visualization element</span>
  <span class="hljs-keyword">let</span> element = <span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">'viz'</span>)
  <span class="hljs-keyword">if</span> (element) {
    element.parentNode?.removeChild(element)
  }
  element = <span class="hljs-built_in">document</span>.createElement(<span class="hljs-string">'div'</span>)
  element.setAttribute(<span class="hljs-string">"id"</span>, <span class="hljs-string">"viz"</span>)
  <span class="hljs-built_in">document</span>.body.appendChild(element)

  <span class="hljs-comment">// Actually render our component</span>
  ReactDOM.render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">Hello</span> /&gt;</span></span>, element)
}

<span class="hljs-comment">// Connect our drawViz function to Data Studio</span>
subscribeToData(drawViz, { <span class="hljs-attr">transform</span>: objectTransform })
</code></pre>
<p>At this point, we can check that the code compiles using <code>npx tsc</code>. It should emit a <code>built-tsc</code> folder containing JavaScript versions of the above code.</p>
<h2 id="heading-bundling-with-webpack">Bundling with webpack</h2>
<p>In order to actually deploy the visualization, Data Studio needs it to be bundled up into a self-contained single JavaScript file along with all of its dependencies. For us, this includes the react and jsx runtimes as well as the DSCC library itself.  <a target="_blank" href="https://webpack.js.org/">Webpack</a>  is designed exactly for this kind of usecase, so we'll use it here. </p>
<p>Let's install webpack:</p>
<pre><code class="lang-bash">npm install --save-dev webpack webpack-cli
</code></pre>
<p>Then, we'll create a simple webpack config file in <code>webpack.config.js</code>:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> path = <span class="hljs-built_in">require</span>(<span class="hljs-string">'path'</span>)
<span class="hljs-keyword">const</span> isProduction = process.env.NODE_ENV == <span class="hljs-string">'production'</span>

<span class="hljs-keyword">const</span> config = {
  <span class="hljs-attr">entry</span>: <span class="hljs-string">'./built-tsc/index.js'</span>,
  <span class="hljs-attr">output</span>: {
    <span class="hljs-attr">path</span>: path.resolve(__dirname, <span class="hljs-string">'dist'</span>),
    <span class="hljs-attr">filename</span>: <span class="hljs-string">'viz.js'</span>,
  }
}

<span class="hljs-built_in">module</span>.exports = <span class="hljs-function">() =&gt;</span> {
  config.mode = isProduction ? <span class="hljs-string">"production"</span> : <span class="hljs-string">"development"</span>
  <span class="hljs-keyword">return</span> config
}
</code></pre>
<p>We can now generate a single <code>dist/viz.js</code> containing the visualization and its dependencies:</p>
<pre><code class="lang-bash">npx webpack
</code></pre>
<h2 id="heading-writing-the-manifest-and-config">Writing the manifest and config</h2>
<p>When we deploy custom components for Data Studio, we can actually deploy a set of components in a library. We describe the whole set of components in a  <a target="_blank" href="https://developers.google.com/datastudio/visualization/manifest-reference">manifest</a> file, and each individual component includes a  <a target="_blank" href="https://developers.google.com/datastudio/visualization/config-reference">config</a> file. Let's go ahead and create these config files for our component.</p>
<p><code>manifest.json</code> describes our whole "library" of components. We'll come back to <code>BUCKETNAME</code> and what it means later.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"Example Visualizations"</span>,
  <span class="hljs-attr">"organization"</span>: <span class="hljs-string">"Brian Duff"</span>,
  <span class="hljs-attr">"description"</span>: <span class="hljs-string">"Example visualizations using React and JSX"</span>,
  <span class="hljs-attr">"logoUrl"</span>: <span class="hljs-string">"https://img.icons8.com/dotty/80/000000/view-file.png"</span>,
  <span class="hljs-attr">"packageUrl"</span>: <span class="hljs-string">"https://duff.blog"</span>,
  <span class="hljs-attr">"supportUrl"</span>: <span class="hljs-string">"https://duff.blog"</span>,
  <span class="hljs-attr">"components"</span>: [
    {
      <span class="hljs-attr">"id"</span>: <span class="hljs-string">"hello"</span>,
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"Hello"</span>,
      <span class="hljs-attr">"description"</span>: <span class="hljs-string">"Just says hello"</span>,
      <span class="hljs-attr">"iconUrl"</span>: <span class="hljs-string">"https://img.icons8.com/material/24/000000/hello.png"</span>,
      <span class="hljs-attr">"resource"</span>: {
        <span class="hljs-attr">"js"</span>: <span class="hljs-string">"gs://BUCKETNAME/hello/viz.js"</span>,
        <span class="hljs-attr">"config"</span>: <span class="hljs-string">"gs://BUCKETNAME/hello/hello.config.json"</span>
      }
    }
  ]
}
</code></pre>
<p>And <code>hello.config.json</code> describes this simple component we've created. For now, we'll create a config with a simple dimension. We'll customize this in later blog posts.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"data"</span>: [
    {
      <span class="hljs-attr">"id"</span>: <span class="hljs-string">"data"</span>,
      <span class="hljs-attr">"label"</span>: <span class="hljs-string">"Data"</span>,
      <span class="hljs-attr">"elements"</span>: [
        {
          <span class="hljs-attr">"id"</span>: <span class="hljs-string">"someDimension"</span>,
          <span class="hljs-attr">"label"</span>: <span class="hljs-string">"A Dimension"</span>,
          <span class="hljs-attr">"type"</span>: <span class="hljs-string">"DIMENSION"</span>
        }
      ]
    }
  ]
}
</code></pre>
<h2 id="heading-deploying-to-google-cloud-storage">Deploying to Google Cloud Storage</h2>
<p>Data Studio loads custom components from  <a target="_blank" href="https://cloud.google.com/storage">Google Cloud Storage</a>. To deploy our custom component, we must upload the <code>manifest.json</code>, <code>vis.js</code>, and <code>hello.config.json</code> files to a Cloud Storage bucket. Create a new bucket following the instructions in  <a target="_blank" href="https://cloud.google.com/storage/docs/creating-buckets">Creating a storage bucket</a>, and make a note of your bucket name. You'll want to modify <code>manifest.json</code> to replace <code>BUCKETNAME</code> with the actual name of the bucket you created.</p>
<p>After that, you can upload using the <code>gsutil</code> command, which you should have installed as part of the bucket creation instructions. </p>
<p>First, you'll want to change the ACLs of the bucket so that they allow public access. New buckets in GCS are created with uniform bucket-level access, which means you can't set permissions for individual files in the bucket (or the bucket itself), and are not visible to public by default. The instructions for Data Studio haven't been updated yet to account for this; the <code>-a public-read</code> option to <code>gsutil cp</code> won't work. Let's go ahead and make our new bucket visible to the public (replace BUCKETNAME with the name of your bucket):</p>
<pre><code class="lang-bash">gsutil iam ch allUsers:objectViewer gs://BUCKETNAME
</code></pre>
<p>Now, let's organize our files the way we want to upload them:</p>
<pre><code class="lang-bash">mkdir -p deploy/hello &amp;&amp; \
  cp manifest.json deploy/ &amp;&amp; \
  cp dist/viz.js deploy/hello/ &amp;&amp; \
  cp hello.config.json deploy/hello/
</code></pre>
<p>After that, we can copy the files up to GCS like so:</p>
<pre><code class="lang-bash">gsutil cp -r deploy/* gs://BUCKETNAME
</code></pre>
<h2 id="heading-trying-it-out">Trying it out</h2>
<p>We should be able to try the component out in Data Studio now. Go over to https://datastudio.google.com and create a new blank report. Add a data source to your report (it can just be a Google Sheet). Click on the Community Visualizations and Components toolbar button:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1637713221996/G4dmK-Lam.png" alt="Screen Shot 2021-11-23 at 4.20.07 PM.png" /></p>
<p>Click "+ Explore More", then click the "Build your own visualization" button in the Community Gallery. In the Manifest path, type the bucket URL of your manifest:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1637714012088/sD_KROSwv.png" alt="Screen Shot 2021-11-23 at 4.33.23 PM.png" /></p>
<p>Note here, that you're entering the bucket path (i.e. <code>gs://BUCKETNAME</code>), <strong>not</strong> the path of the actual manifest.json file within it (e.g. <code>gs://BUCKETNAME/manifest.json</code>). This tripped me up when I first tried this out, and the error message is quite opaque.</p>
<p>Click on the Hello component and grant it permission, and you should see it render in your report:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1637714147504/f_L1UYw9fI.png" alt="Screen Shot 2021-11-23 at 4.35.39 PM.png" /></p>
<h2 id="heading-scripts-to-make-life-easier">Scripts to make life easier</h2>
<p>Let's make things a bit easier for future development by adding some scripts to <code>package.json</code>.  In the scripts section of this file, we'll add the following:</p>
<pre><code class="lang-json">  <span class="hljs-string">"scripts"</span>: {
    <span class="hljs-attr">"build"</span>: <span class="hljs-string">"npx tsc"</span>,
    <span class="hljs-attr">"prepare"</span>: <span class="hljs-string">"mkdir -p deploy/hello &amp;&amp; cp manifest.json deploy/ &amp;&amp; cp dist/viz.js deploy/hello/ &amp;&amp; cp hello.config.json deploy/hello/"</span>,
    <span class="hljs-attr">"deploy"</span>: <span class="hljs-string">"npm run build &amp;&amp; npm run prepare &amp;&amp; gsutil cp -r deploy/* gs://BUCKETNAME"</span>
  },
</code></pre>
<p>Now we can build and deploy in one step with <code>npm run deploy</code>.</p>
<h2 id="heading-wrapping-up">Wrapping up</h2>
<p>Phew, that was a lot for a glorified Hello World :P However, now it's pretty easy to write quite complex custom visualization components using react that work well with Data Studio. In future blogs, I'll explore more of what you can do.</p>
<p>The code for this simple project might serve as a good starting point for your own visualizations. You can grab the code at <a target="_blank" href="https://github.com/brianduff/datastudio-react">brianduff/datastudio-react</a>, and let me know in the comments if you run into any issues or have questions!</p>
]]></content:encoded></item><item><title><![CDATA[Cloud Functions with TypeScript]]></title><description><![CDATA[Here's a small  sample that uses TypeScript + Cloud Functions 
I have a complicated relationship with JavaScript. I'm old enough to remember when it first appeared in Netscape Navigator back when the web was truly tiny and everything had a kind of bo...]]></description><link>https://duff.blog/cloud-functions-with-typescript</link><guid isPermaLink="true">https://duff.blog/cloud-functions-with-typescript</guid><category><![CDATA[TypeScript]]></category><category><![CDATA[serverless]]></category><category><![CDATA[google cloud]]></category><category><![CDATA[Node.js]]></category><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Sun, 24 Oct 2021 09:07:17 GMT</pubDate><content:encoded><![CDATA[<p>Here's a small  <a target="_blank" href="https://github.com/brianduff/cloudfunctions">sample that uses TypeScript + Cloud Functions</a> </p>
<p>I have a complicated relationship with JavaScript. I'm old enough to remember when it first appeared in Netscape Navigator back when the web was truly tiny and everything had a kind of boring gray background, and the days when "Dynamic HTML" started to become a thing. It was awful back then. Truly, terribly awful.</p>
<p>At some point in the last few years, I rediscovered JavaScript through the world of Node.js and React, and I can't really imagine using anything other than React to build web apps these days (I tried Vue.js for at least one project, and found it quite lacking).</p>
<p>But the lack of strong typing is bothersome. I successfully strolled along for a while trying to get away with writing React and Node apps without configuring TypeScript, but now I'm all in on TypeScript.</p>
<p>When it comes to the backend, I oscillate a bit between using Rust+Rocket and Node.js with Express. But recently, I've been poking around increasingly with <a target="_blank" href="https://developers.google.com/learn/topics/functions">Google's Cloud Functions</a>. The out of the box instructions though don't do a very good job of explaining how to get things working with TypeScript (there are some <a target="_blank" href="https://firebase.google.com/docs/functions/typescript">instructions</a>, but they're for Firebase, and it's kind of similar, but different enough that it doesn't quite work if you just use Google Cloud directly. </p>
<p>It turns out to be relatively easy to set up, so I wrote it down for posterity.</p>
<h3 id="setting-up-the-project">Setting up the project</h3>
<p>First off do the usual npm initialization dance with <code>npm init -y</code>.</p>
<p>We're gonna want to install some deps. You'll find the <code>functions-framework</code> from Google useful, since it'll let you do local development with Cloud Functions, and also provides types. Additionally as always, we'll want <code>typescript</code> itself as a dev dependency:</p>
<pre><code><span class="hljs-built_in">npm</span> install --save-dev @google-cloud/functions-framework typescript
</code></pre><p>Next, let's set up TypeScript. This'll create a <code>tsconfig.json</code> file.</p>
<pre><code>npx tsc <span class="hljs-comment">--init</span>
</code></pre><p>I like to make put my code in a <code>src</code> folder, and have it output to a separate <code>ts-built</code> folder. Here's a minimal <code>tsconfig.json</code> file that will achieve that:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"compilerOptions"</span>: {
    <span class="hljs-attr">"target"</span>: <span class="hljs-string">"es6"</span>,
    <span class="hljs-attr">"module"</span>: <span class="hljs-string">"commonjs"</span>,
    <span class="hljs-attr">"moduleResolution"</span>: <span class="hljs-string">"node"</span>,
    <span class="hljs-attr">"esModuleInterop"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">"rootDir"</span>: <span class="hljs-string">"src"</span>,
    <span class="hljs-attr">"outDir"</span>: <span class="hljs-string">"ts-built"</span>,
  }
}
</code></pre>
<p>Next, let's write a simple cloud function in <code>src/index.ts</code>. Sssh, we're not using any types yet, but as usual with TypeScript, that's ok:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// src/index.ts</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> hello = <span class="hljs-function">(<span class="hljs-params">req, res</span>) =&gt;</span> res.send(<span class="hljs-string">"Hello!"</span>)
</code></pre>
<p>Let's check if it compiles:</p>
<pre><code class="lang-bash">npx tsc
</code></pre>
<p>If all is well, you should see a <code>ts-built/index.js</code> file that contains the compiled JavaScript code.</p>
<p>It'd be cool to see if our Cloud Function is working with the framework. Before we can do that, we need to tell Cloud Functions where to find our <code>index.js</code> file. Update <code>package.json</code> to set the <code>main</code> property to its path:</p>
<pre><code class="lang-json">{
...
   <span class="hljs-attr">"main"</span>: <span class="hljs-string">"ts-built/index.js"</span>
...
}
</code></pre>
<p>Now, we can run the framework:</p>
<pre><code>$ npx functions-framework --target=hello
Serving <span class="hljs-keyword">function</span>...
<span class="hljs-built_in">Function</span>: hello
Signature <span class="hljs-keyword">type</span>: http
URL: http:<span class="hljs-comment">//localhost:8080/</span>
</code></pre><p>If you visit the localhost URL, you should be greeted with the expected friendly message.</p>
<h3 id="deploying-to-google-cloud">Deploying to Google Cloud</h3>
<p>When we deploy this to Google Cloud, it will run <code>npm ci</code> to build the project. That's just a more efficient way to invoke <code>npm install</code>. This isn't enough - we need it to also run the TypeScript compiler, otherwise <code>index.js</code> won't exist when the code is up in the Cloud. Luckily, Google Cloud provides a hook to perform extra steps during the build. In <code>package.json</code>, we need to add a <code>gcp-build</code> script. We'll add a <code>build</code> script and a <code>start</code> script to make local development easier too:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"scripts"</span>: {
    <span class="hljs-attr">"gcp-build"</span>: <span class="hljs-string">"npm run build"</span>,
    <span class="hljs-attr">"build"</span>: <span class="hljs-string">"tsc"</span>,
    <span class="hljs-attr">"start"</span>: <span class="hljs-string">"npm run build &amp;&amp; npx @google-cloud/functions-framework --target=hello"</span>
  }
}
</code></pre>
<p>Before deploying, it's a good idea to check that it works with <code>npm run gcp-build</code>.</p>
<p>To deploy, you'll need to create a Google Cloud project and enable the APIs - the stuff in the "Before you begin" section in <a target="_blank" href="https://cloud.google.com/functions/docs/quickstart-nodejs#before-you-begin">this document</a>. You'll also need to install the <a target="_blank" href="https://cloud.google.com/sdk/docs/install">Cloud SDK</a> locally so you can use the <code>gcloud</code> command. You'll also need to authorize and select your project using (replace PROJECTNAME):</p>
<pre><code class="lang-bash">gcloud auth login
gcloud config <span class="hljs-built_in">set</span> project PROJECTNAME
</code></pre>
<p>Now you can deploy with:</p>
<pre><code class="lang-bash">gcloud <span class="hljs-built_in">functions</span> deploy simple-function --entry-point hello \
    --allow-unauthenticated --trigger-http --runtime nodejs16
</code></pre>
<p>It'll churn away for a wee while doing its thing, but after it's done you should be able to go to the console to find out the URL of your new function. In my case, this is https://us-central1-dubh-cloud.cloudfunctions.net/simple-function</p>
<h3 id="actually-using-types">Actually using types!</h3>
<p>Ok, so far we've got it working end to end with TypeScript, but we didn't actually use types. You don't have to, of course, but if you want to, you can be more specific. For example, the <code>functions-framework</code> defines a type called <code>HttpFunction</code> for cloud functions themselves, and we could write:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { HttpFunction } <span class="hljs-keyword">from</span> <span class="hljs-string">'@google-cloud/functions-framework'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> hello: HttpFunction = <span class="hljs-function">(<span class="hljs-params">req, res</span>) =&gt;</span> res.send(<span class="hljs-string">"Hello!"</span>)
</code></pre>
<p>The actual request and response are express types. I'll write an example in a future blog that takes advantage of TypeScript to do much more interesting things. </p>
]]></content:encoded></item><item><title><![CDATA[Lessons from a toy project: Heimdall]]></title><description><![CDATA[https://soundcloud.com/brian-duff-467458622/duffblog-heimdall
After the kids are fast asleep, I often tinker around with toy projects. I have a long history going back to when I first started programming at 8 or so of starting and generally never fin...]]></description><link>https://duff.blog/lessons-from-a-toy-project-heimdall</link><guid isPermaLink="true">https://duff.blog/lessons-from-a-toy-project-heimdall</guid><category><![CDATA[Rust]]></category><category><![CDATA[side project]]></category><category><![CDATA[macOS]]></category><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Thu, 29 Apr 2021 05:19:08 GMT</pubDate><content:encoded><![CDATA[<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://soundcloud.com/brian-duff-467458622/duffblog-heimdall">https://soundcloud.com/brian-duff-467458622/duffblog-heimdall</a></div>
<p>After the kids are fast asleep, I often tinker around with toy projects. I have a long history going back to when I first started programming at 8 or so of starting and generally never finishing such projects, but they're inevitably useful for learning new things. </p>
<p>My kids, like many others, have been zooming into school from home for most of the last year. At the start of the lockdown, they were 6 and 7. We'd rarely allowed them to use a computer unattended before. All of a sudden, they were sitting at a desk for 6½ hours a day. </p>
<p>They quickly discovered things like YouTube, and online web based games. There were good things too (Michael became inordinately good at Chess), but we sometimes worried. We especially wanted to limit their access to the computers outside of school hours, because even after 6½ hours, we'd often find them continuing to use their computers for a couple more hours after school. </p>
<p>We set up an elaborate system involving <a target="_blank" href="https://blocksite.co/">Blocksite</a>, and using Google WiFi to block internet access at certain times. However, this wasn't enough: Google WiFi's scheduling controls aren't fine-grained enough, we sometimes had to poke holes in Blocksite to let them use YouTube legitimately for school, and we found that they got around the Internet being unavailable by using Screen Recorder on the mac to save local copies of YouTube videos (which despite the terrible audio quality, I thought was quite irritatingly ingenious given their age).</p>
<p>What I really wanted was a way to lock them out of their computer entirely on a schedule. Then a remote control to temporarily unlock them from my cell phone. Finally, to be really cool, we could ideally hook it up to Google Calendar so that it would track when they were and weren't supposed to be using their computer for either school or after school classes.</p>
<p>So ridiculously late in the lockdown (as it turned out mere weeks before they went back to school in person), I hacked together a thing called <a target="_blank" href="https://github.com/brianduff/heimdall">Heimdall</a> that was going to do some of this. </p>
<p>Heimdall is an unfinished lump, but through it I learned some interesting and perhaps useful things, continued my journey getting more experienced with Rust and modern web frameworks, and learned some interesting things about Mac OS that I never knew before. The next sequence of blog posts will be about the things I learned along the way. Anyway as a taste, here's what it looks like:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.youtube.com/watch?v=2JCbYFstPG4">https://www.youtube.com/watch?v=2JCbYFstPG4</a></div>
]]></content:encoded></item><item><title><![CDATA[Noogler became n00b became Tweep!]]></title><description><![CDATA[Friday will be the 14th anniversary of my first tweet. Umm... it hasn't aged well. The Internet was stunned by the revelation that I was reading my email.

Reading email and messing around with Eclipse 3.3M5.
— Brian Duff (@brianduff) March 5, 2007

...]]></description><link>https://duff.blog/noogler-became-n00b-became-tweep-1</link><guid isPermaLink="true">https://duff.blog/noogler-became-n00b-became-tweep-1</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Tue, 02 Mar 2021 16:54:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1619327082525/DfAhs0VVA.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Friday will be the 14th anniversary of my first tweet. Umm... it hasn't aged well. The Internet was <em>stunned</em> by the revelation that I was reading my email.</p>
<blockquote>
<p>Reading email and messing around with Eclipse 3.3M5.</p>
<p>— Brian Duff (@brianduff) <a target="_blank" href="https://twitter.com/brianduff/status/5841962?ref_src=twsrc%5Etfw">March 5, 2007</a></p>
</blockquote>
<p>In more contemporary news, this week I joined Twitter as a Principal Engineer in Engineering Effectiveness.</p>
<p><a target="_blank" href="https://1.bp.blogspot.com/-5QIsBYd7QWU/YCeJLbE1mGI/AAAAAAAC2Fg/Zna8fGUFEIQ0eytiT1LxzSpfjJHjQInNQCLcBGAsYHQ/s2048/Twitter-Logo.png"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619326921433/cnQoxBnTi.png" alt /></a></p>
<p>I'll try to build things that help engineers inside Twitter have a lovely, productive time creating the cool things that they make every day. Delighting developers is something that I've continued to be passionate about across Oracle, Google, and Facebook. I'm really excited to make the jump from being a long time user of Twitter to being part of the Twitter team. It's pretty cool that I get to continue to work on stuff that I enjoy so much. Onboarding remotely is a... weird... experience, but so far I'm having a whale of a time (geddit? gurgle).</p>
<p>I do leave Facebook with a great amount of sadness and fond remembrance. Whatever you may think about it (or any of the tech companies, frankly) from a societal and moral perspective, the experience of being an engineer at these companies is truly awe inspiring, humbling, and transformative (and frankly, fun). I was lucky to learn and grow with an exceptional set of talented and passionate people. I got to work on some challenging and fascinating projects. Most of all, I experienced a tremendous amount of support and care from people I worked with at every level as we went through the tummult and disconnection of working from home and adjusting to how that impacted just about everything in strange and unexpected ways.</p>
<p>This is only the fourth company I've worked for in (a quite shocking and hard to believe) 23 years in the software industry, 16 of those years living in Silicon Valley. Seeing huge change in the perception of the tech industry, watching cozy little startups transform into big hulking tech, and noticing the general perception of the Bay Area as a whole shift significantly, I still feel a sense of delighted disbelief about where I am. This wee man from a working class family in Leith somehow made it to be the first person in the family to finish six years of secondary school, make it to university, and then crazily make it all the way to this lucky life in a distant country doing what I'm passionate about for a living for most of my adult life. I've worked hard, but I think I've also been very very lucky, and it's part of my fiber that I will always do whatever I can to help others who need a bit of that luck too, whatever background they're from.</p>
<p>So pumped and ready to get started on chapter 4 :)</p>
]]></content:encoded></item><item><title><![CDATA[HashFile: A disk-based hash structure]]></title><description><![CDATA[Previously, I introduced a problem I was trying to solve where a large datastructure was being pinned into memory for occasional lookups. This post delves into the implemented solution which pushes it onto disk but retains (relatively) fast lookups. ...]]></description><link>https://duff.blog/hashfile-a-disk-based-hash-structure-1</link><guid isPermaLink="true">https://duff.blog/hashfile-a-disk-based-hash-structure-1</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Wed, 14 Oct 2020 18:07:00 GMT</pubDate><content:encoded><![CDATA[<p><a target="_blank" href="https://blog.dubh.org/2020/10/the-case-of-unwieldy-hashmap.html">Previously</a>, I introduced a problem I was trying to solve where a large datastructure was being pinned into memory for occasional lookups. This post delves into the implemented solution which pushes it onto disk but retains (relatively) fast lookups. I think using a database or a B-Tree is a good solution to this kind of problem in general, but it was fun and inexpensive to implement this utility, and it turned out to be generally useful. Bear with me if you already understand HashMaps pretty well, because I'm basically describing a HashMap here, but it's a disk-based HashMap.</p>
<p>Logically, the data consists of a series of key-value pairs. The keys and values are of variable size, because they contain strings. If we were to write only the values to disk in a binary format, we might have something like this for the JSON example in the <a target="_blank" href="https://blog.dubh.org/2020/10/the-case-of-unwieldy-hashmap.html">previous post</a>:</p>
<p><a target="_blank" href="https://storage.googleapis.com/discobubble-quiz/binary_file1.png"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619326925339/pP453KUD2.png" alt /></a></p>
<p>There are two records, at offsets 0x00000000 and 0x000000A4. If we had some way to map a key to one of these offsets, we could seek() to that offset on disk and read a single record. We'll need some kind of index for that. A simple thing we could do is to store a table of the hashCode() of each key to the offset of the value. The hashcodes are as follows:</p>
<ul>
<li>//src/com/foo/bar/baz:baz ⟶ -691376290 (0xD6CA6F5E)</li>
<li>//foo/far/fun:fun ⟶ -1488203677 (0xA74BD063)</li>
</ul>
<p>So, our index is a simple table that looks like the diagram below. Notice that we've added 0x10 to the offsets because the index is at the start of the file and is 0x10 bytes long, pushing the values down by that much (also, in the real implementation, the offsets are longs, but I made them ints to keep this diagram and example more readable :)).</p>
<p><a target="_blank" href="https://storage.googleapis.com/discobubble-quiz/hashfile_index2.png"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619326926829/DMbfAu9NX.png" alt /></a></p>
<p>We store the index sorted by the hashcode. Given a key to look up, we can then compute its hashcode and use binary search in the index portion of the file to easily find an offset. This requires <em>O(log n)</em> seeks over the index, followed by a single seek to the position of the record.</p>
<p>We could also have stored this in a more conventional hashmap style by calculating the modulus of the hashcode with some known index size, or computing a perfect hash. However, I'm always looking for a random excuse to write binary search again :)</p>
<p>With this scheme, there's still the possibility that a hashcode will match for two keys, so we actually store a list of records at each offset, along with their keys so we can disambiguate. In practice, in our real dataset, there happen to be zero collisions at present, so this is a bit wasteful. Again, should really use a <a target="_blank" href="https://en.wikipedia.org/wiki/Perfect_hash_function">perfect hash</a>.</p>
<p>This datastructure is completely impractical if we want to support insertion, because we'd have to push the entire value set down in the file. In practice, we always just write the entire file each time (it takes about 250ms for the data we have), which neatly side steps this problem. If insertion were desirable, separating the index and data into two separate files would probably be a better approach, so the data could just be appended to the values file. You'd probably rewrite the index each time because it's sorted, but the index is much smaller than the values data anyway (in our case, it's a little over 1MB with 100,000 entries - 12 bytes per entry, and it could probably be 8 bytes per entry if we used int rather than long offsets). </p>
<h4 id="performance">Performance</h4>
<p>The performance of a single key lookup with this is approximately _disk_seek_time + (log n * disk_seek_time) + record_read_time_. For a SSD with seek time of 0.10ms and 100,000 entries, we'd expect a lookup to take around 2-3ms. This is an upper bound of the performance I see experimentally from the implementation. It'd be far worse on a spinning disk, but our developers don't have those. The memory cost is O(1), a fancy way of just saying that we don't need to load the whole blimmin' map into memory like we were before.</p>
<h4 id="implementation">Implementation</h4>
<p>As it happens, this whole thing was implemented as part of <a target="_blank" href="https://buck.build">Buck</a>, which is opensource. So if you're interested, you can find the <a target="_blank" href="https://github.com/facebook/buck/commit/abd9e22d65250d7d1626f300c241b34d6796cc15">source code in GitHub</a>. The main implementation is in <a target="_blank" href="https://github.com/facebook/buck/blob/dev/src/com/facebook/buck/features/project/intellij/targetinfo/HashFile.java">HashFile.java</a> - it's pretty simple (less than 200 lines of code). You can find some unit tests that show off usage of it in <a target="_blank" href="https://github.com/facebook/buck/blob/dev/test/com/facebook/buck/features/project/intellij/targetinfo/HashFileTest.java">HashFileTest.java</a>. Enjoy!</p>
]]></content:encoded></item><item><title><![CDATA[The case of the unwieldy HashMap]]></title><description><![CDATA[Some data structure was pinning over 70MB of heap space in Android Studio. Our developers have limited memory on laptops, and are often upset about memory consumption in general. This behemoth (retained by our internal plugins) was the second largest...]]></description><link>https://duff.blog/the-case-of-the-unwieldy-hashmap-1</link><guid isPermaLink="true">https://duff.blog/the-case-of-the-unwieldy-hashmap-1</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Tue, 13 Oct 2020 15:58:00 GMT</pubDate><content:encoded><![CDATA[<p>Some data structure was pinning over 70MB of heap space in Android Studio. Our developers have limited memory on laptops, and are often upset about memory consumption in general. This behemoth (retained by our internal plugins) was the second largest allocated and pinned single object in the whole of AS's heap.</p>
<p><a target="_blank" href="https://buck.build/command/project.html">buck project</a> generates IDE project definitions from buck targets. It can be configured it to emit a target-info.json file, which contains simple mappings that look something like this:</p>
<pre><code class="lang-lang=json">{
  "//src/com/foo/bar/baz:baz" : {
    "buck.type": "java_library",
    "intellij.file_path" : ".idea/modules/src_com_foo_bar_baz_baz.iml",
    "intellij.name" : "src_com_foo_bar_baz_baz",
    "intellij.type" : "module"
    "module.lang" : "KOTLIN",
    "generated_sources" : [ 
      "buck-out/fe3a3a3/src/com/foo/bar/baz/__gen__", 
      "buck-out/fe3a3a3/src/com/foo/bar/baz/__gen_more__" 
    ],
  },
  "//foo/far/fun:fun" : {
    "buck.type": "cxx_library",
    "intellij.file_path" : ".idea/modules/foo_far_fun_fun.iml",
    "intellij.name" : "foo_far_fun_fun",
    "intellij.type" : "module"
  }
}
</code></pre>
<p>We have some large number of these targets (ballpark 100k or so), so the existing datastructure representation of this (a hashmap correlating to the structure above) could become quite large. The datastructure was intentionally pinned in memory in case we needed it.</p>
<p>This was a fun small optimization problem. The map is accessed infrequently in bursts, and the number of keys we typically have to look up are a tiny proportion of the total set of keys in the map. Our ideal structure has relatively fast lookups with low memory overhead. Here are some of the things we might try:</p>
<h4 id="load-the-file-lazily-when-we-need-to-do-a-lookup">Load the file lazily when we need to do a lookup</h4>
<p>Instead of pinning this datastructure in memory permanently, try to arrange the code so that we load the file once into a HashMap, and use it locally where it's needed, allowing the HashMap to be garbage collected when we're done. </p>
<p>Loading the file is relatively slow: it takes about 600ms to read and parse when the system's filesystem cache is cold, and about 200ms otherwise. But assuming we can arrange the code in a way where this happens only once, and it happens in a way that doesn't block anything else, that might be acceptable. </p>
<p>This option turned out to be impractical, because of the architecture of the part of the plugin API in IntelliJ it was invoked from. The component which renders file inspections is recreated multiple times while a file is visible on screen, and there's no convenient way to attach the loaded HashMap to the context of an open editor. Given that, most lookups would take in the 200ms range, and this would generate large amounts of garbage to be collected on the heap, leading to increased GC times.</p>
<h4 id="optimize-the-in-memory-representation-of-the-data">Optimize the in-memory representation of the data</h4>
<p>There's a fair amount of repetitive pattern based naming in the original structure. For example, a target called //foo/bar maps to a module called foo_bar. There are different configurable schemes for how targets map to module names, so the above mapping isn't necessarily canonical. We could consider the options used to generate the project information, and map back the names dynamically at runtime. </p>
<p>I didn't extensively investigate this option. Logically, it'd reduce memory usage by about a third, and the extra processing required would likely be quite cheap.</p>
<h4 id="load-the-file-lazily-and-hold-it-in-a-time-based-cache">Load the file lazily and hold it in a time-based cache</h4>
<p>This is similar to the first option, except we alleviate the problem of not having a convenient place to keep hold of the HashMap by retaining it in a WeakReference cache for a fixed period of time after it's last used. I ended up implementing this first as a stopgap, because it's relatively easy. </p>
<p>It does have the downside of still using a large amount of heap for some period of time after the last access, and potentially can also generate a lot of garbage to be collected depending on usage patterns.</p>
<h4 id="change-the-file-to-a-binary-format-to-speed-up-reads">Change the file to a binary format to speed up reads</h4>
<p>The 600ms initial disk read time seems high. This file clocks in at 37MB, under ideal conditions on a 500 MB/s SSD, we'd expect to be able to read it in under 100ms. Indeed that matches what we see if we use dd to measure read performance:</p>
<pre><code># Purge disk <span class="hljs-keyword">buffers</span>
$ sudo purge

# First <span class="hljs-keyword">read</span> <span class="hljs-keyword">is</span> about <span class="hljs-number">200</span>ms
$ <span class="hljs-type">time</span> dd <span class="hljs-keyword">if</span>=target-<span class="hljs-keyword">info</span>.json <span class="hljs-keyword">of</span>=/dev/<span class="hljs-keyword">null</span> bs=<span class="hljs-number">8</span>k
<span class="hljs-number">4740</span>+<span class="hljs-number">1</span> records <span class="hljs-keyword">in</span>
<span class="hljs-number">4740</span>+<span class="hljs-number">1</span> records <span class="hljs-keyword">out</span>
<span class="hljs-number">38837604</span> bytes transferred <span class="hljs-keyword">in</span> <span class="hljs-number">0.019201</span> secs (<span class="hljs-number">2022682285</span> bytes/sec)
dd <span class="hljs-keyword">if</span>=target-<span class="hljs-keyword">info</span>.json <span class="hljs-keyword">of</span>=/dev/<span class="hljs-keyword">null</span> bs=<span class="hljs-number">8</span>k  <span class="hljs-number">0.00</span>s <span class="hljs-keyword">user</span> <span class="hljs-number">0.01</span>s <span class="hljs-keyword">system</span> <span class="hljs-number">67</span>% cpu <span class="hljs-number">0.023</span> total

# Second <span class="hljs-keyword">read</span>, buffered <span class="hljs-keyword">in</span> the disk <span class="hljs-keyword">cache</span>, is about <span class="hljs-number">100</span>ms
$ <span class="hljs-type">time</span> dd <span class="hljs-keyword">if</span>=target-<span class="hljs-keyword">info</span>.json <span class="hljs-keyword">of</span>=/dev/<span class="hljs-keyword">null</span> bs=<span class="hljs-number">8</span>k
<span class="hljs-number">4740</span>+<span class="hljs-number">1</span> records <span class="hljs-keyword">in</span>
<span class="hljs-number">4740</span>+<span class="hljs-number">1</span> records <span class="hljs-keyword">out</span>
<span class="hljs-number">38837604</span> bytes transferred <span class="hljs-keyword">in</span> <span class="hljs-number">0.010184</span> secs (<span class="hljs-number">3813571762</span> bytes/sec)
dd <span class="hljs-keyword">if</span>=target-<span class="hljs-keyword">info</span>.json <span class="hljs-keyword">of</span>=/dev/<span class="hljs-keyword">null</span> bs=<span class="hljs-number">8</span>k  <span class="hljs-number">0.00</span>s <span class="hljs-keyword">user</span> <span class="hljs-number">0.01</span>s <span class="hljs-keyword">system</span> <span class="hljs-number">92</span>% cpu <span class="hljs-number">0.013</span> total
</code></pre><p>We should profile to see why there's such a large discrepancy between ideal and observed read time, but some theories about this:</p>
<ul>
<li>Read contention</li>
<li>Overhead of parsing JSON</li>
<li>Cost of allocating object on the heap while building the map</li>
</ul>
<p>Writing some code to use a raw binary format for the file clearly demonstrated that the JSON parsing wasn't the issue. I didn't profile it to my shame, but I think it's likely allocation is the primary contributor. Overall, it's very wasteful that we're allocating so many objects that we don't need.</p>
<h4 id="dont-load-the-file-into-memory-at-all">Don't load the file into memory at all</h4>
<p>Ideally, we'd just avoid reading this whole file into memory altogether. Since we're unlikely to use most of it, it's always going to be pretty wasteful. </p>
<p>We could use some kind of mechanism to <a target="_blank" href="https://en.wikipedia.org/wiki/JSON_streaming">stream the JSON</a> and find just the key we're looking for. But since we control the generation of the file, it seems like it might be better just to write out a file format that's makes it easy to look up a specific key and read data directly from the file for that key. A <a target="_blank" href="https://en.wikipedia.org/wiki/B-tree">B-Tree</a> is a good datastructure for this kind of problem, but I wound up doing something much simpler to implement. The <a target="_blank" href="https://blog.dubh.org/2020/10/hashfile-disk-based-hash-structure.html">next post</a> talks about a <a target="_blank" href="https://blog.dubh.org/2020/10/hashfile-disk-based-hash-structure.html">disk-based hash structure</a> I used to solve this problem.</p>
]]></content:encoded></item><item><title><![CDATA[In the lunch line with Larry Page]]></title><description><![CDATA[The Whisper project (which became Nearby) got started as part of the Google+ org. The Google+ team sat in the same building as Larry Page, and we'd often see him in his office or walking around the building.
One day, a few of my teammates and I were ...]]></description><link>https://duff.blog/in-the-lunch-line-with-larry-page-1</link><guid isPermaLink="true">https://duff.blog/in-the-lunch-line-with-larry-page-1</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Mon, 12 Oct 2020 22:03:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1619327184348/cz7nnUvzJ3.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The Whisper project (which became <a target="_blank" href="https://developers.google.com/nearby">Nearby</a>) got started as part of the Google+ org. The Google+ team sat in the same building as Larry Page, and we'd often see him in his office or walking around the building.</p>
<p>One day, a few of my teammates and I were standing in line at Cloud Cafe, in the restricted part of building 1900 where the Google+ team sat. Right in front of us in the line was none other than Larry Page himself. One of the engineers on the team struck up a conversation with him, and Larry asked us how Whisper was going. We hadn't launched anything yet, and were deeply in the midst of doing crazy cool things with ultrasound.</p>
<p>My teammate answered quite honestly that it was hard, and it was taking much longer than we expected to get it to a dogfoodable and eventually launchable state. Larry asked him why - what made it hard? Probably a fairly innocuous, polite question, but coming from the CEO and founder of Google, it sort of takes on an extra weight. We tried our best within 1.5 minutes to explain how a variety of unexpected complications contributed to delay and difficulty.</p>
<p>Throughout my career, I've been in a bunch of review meetings with super senior executive people. Usually a big amount of preparation and forethought (not to mention rehearsing and preparing answers to difficult questions you expect to be asked) goes in to these things. But that totally unprepared conversation in the lunch line with one of the founders of perhaps the most rapidly successful company in history made me think that sometimes it'd be nice if there was a bit less formality and you could just have an open and honest chat about where things are at without all the prep.</p>
<p>After we ate our lunch together in a chittery, excited huddle on a sofa (there were <em>never</em> any tables free at lunchtime in Cloud - I wonder where Larry sat), I wandered back to my desk thinking, "is it really all that difficult?" and plotting how we could try to cut out some of the complexity. So I've no idea what impression Larry had from it (I'd be surprised if it's something he even remembers), but it had quite an effect on me and the others who were there at the time.</p>
]]></content:encoded></item><item><title><![CDATA[Shoes and secret projects with Vic]]></title><description><![CDATA[Google used to have a tradition called TGIF. It still seems sad that I'm talking about this in past tense, but hey ho. At TGIF, the founders and other executives got up on stage, welcomed nooglers (new Google people), introduced a bunch of prepared s...]]></description><link>https://duff.blog/shoes-and-secret-projects-with-vic-1</link><guid isPermaLink="true">https://duff.blog/shoes-and-secret-projects-with-vic-1</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Fri, 09 Oct 2020 15:36:00 GMT</pubDate><content:encoded><![CDATA[<p>Google used to have a tradition called TGIF. It still seems sad that I'm talking about this in <a target="_blank" href="https://www.wired.com/story/google-shakes-up-its-tgif-and-ends-its-culture-of-openness/">past tense</a>, but hey ho. At TGIF, the founders and other executives got up on stage, welcomed nooglers (new Google people), introduced a bunch of prepared speakers who talked about interesting things that were going on, then opened themselves and other executives up to pre-voted and audience questions. </p>
<p>There are many infamous things that happened at TGIF during my time there that I can't talk about, but I was physically present in Charlies for these:</p>
<ul>
<li>the time they let off fireworks <em>inside</em> Charlies to celebrate a Nexus device launch and gave everyone in the company the new device</li>
<li>the time they announced that everyone at Google was getting a significant pay raise and bonus. People went wild. There was screaming and yipping. It was like a music concert in the 80s.</li>
<li>the several times Patrick Pichette came by with a huge backpack of cash and everyone got an envelope with 10x$100 bills as a Christmas gift.</li>
<li>the time someone's mic had to be cut during the live q&amp;a because they were ranting quite wildly at Larry Page for a significant period of time, fairly incoherently.</li>
<li>the infamous Handbook Guy incident, which I am mildly surprised to find no references to on the Internet. But I was there in the audience that day in Charlies, and wow.</li>
<li>the TGIF where Google Glass was revealed internally for the first time. People gave it a standing ovation.</li>
</ul>
<p><a target="_blank" href="https://storage.googleapis.com/discobubble-quiz/20180116_122807.jpg"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619326937217/vkm-fThYo.jpeg" alt /></a></p>
<p><em>Caitlin enjoying the food in Charlies.</em> <em>Not a picture of TGIF, since those aren't allowed (ok, <a target="_blank" href="https://gcatrip.wordpress.com/2015/07/22/first-live-google-tgif-in-mtv/">not everyone got the message</a> apparently, which is probably part of why they're not around any more)</em> </p>
<p>Anyway. At one point, I'd recently started working on mobile infrastructure for Google+, and at TGIF, <a target="_blank" href="https://en.wikipedia.org/wiki/Andy_Rubin">Andy Rubin</a> (of all people - this was before the terribleness) was up on stage talking about how a lot of the developer pain with Android was going to be solved soon by some upcoming project. This was intriguing, and so I figured I'd pop him a quick email to ask about it. To my surprise, Andy replied almost immediately, and this resulted in a meeting between myself, my manager, and <a target="_blank" href="https://en.wikipedia.org/wiki/Vic_Gundotra">Vic Gundotra</a>, the charismatic overall lead of Google's social efforts.</p>
<p><a target="_blank" href="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Google_VP_Engineering_Vic_Gundotra(cropped"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619326938513/UB8s4fnpH.html" alt />.jpg/440px-Google_VP_Engineering_Vic_Gundotra(cropped).jpg)</a>.jpg/440px-Google_VP_Engineering_Vic_Gundotra(cropped).jpg)</p>
<p>I never did learn what this mysterious solution to all developer pain was - the intrigue around it only deepened in the conversation with Vic. He alluded to some shadowy organization, hidden from the org chart, and working on a project so secret we didn't even want those people to show up in internal systems for fear of opposition. They were working on something that might never pan out, and it was being given a bubble of space in which to grow without criticism. I think I can sort of guess with the benefit of hindsight what it became eventually, but it was super intriguing at the time.</p>
<p>Vic did enthusiastically share his love of shoes with us though, at great length. I really enjoyed the passion with which he talked about this, and how he connected it back to the overall product direction of Google+ at that time. Interest based channels eventually became a key component of Google+. I think a lot of it had to do with shoes.</p>
]]></content:encoded></item><item><title><![CDATA[The very scientific microkitchen testing event]]></title><description><![CDATA[One of the things I miss most about the office are the microkitchens. Facebook and Google both have fantastic selections of yummy snacks to fuel folks through the day. It's actually quite great for adhoc conversations and just getting away from your ...]]></description><link>https://duff.blog/the-very-scientific-microkitchen-testing-event-1</link><guid isPermaLink="true">https://duff.blog/the-very-scientific-microkitchen-testing-event-1</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Thu, 08 Oct 2020 16:05:00 GMT</pubDate><content:encoded><![CDATA[<p>One of the things I miss most about the office are the microkitchens. Facebook and Google both have fantastic selections of yummy snacks to fuel folks through the day. It's actually quite great for adhoc conversations and just getting away from your desk for a bit. I've tried to replicate this by purchasing a box of Funyuns from Amazon to keep at home, but it's just not the same. As I'm... you know... pathetically eating my Funyuns at home on my own with my shorts on.</p>
<p>At Google, the microkitchens were legendary when I started in 2008. But by the time I left in 2019, they had changed a lot. The stock was intentionally kept low, and the snacks were healthier. This wasn't always a popular thing. For a period of about 3 years or so, the snack selection, which used to rotate fairly regularly, was frozen in time with the same set of things. I think this was due to Google trying to plan the future of the microkitchen program, and it just took a while. Or something like that. Anyway, this fallow period was great if you were the world's biggest fan of French Onion SunChips. </p>
<p><a target="_blank" href="https://storage.googleapis.com/discobubble-quiz/frenchonion.png"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619326942344/XeUmrHrTf.png" alt /></a></p>
<p>At some point in 2016, I was asked, perhaps because of my infamous love of snacky goodness, or probably for some other random reason, to organize a snack tasting party for people on my floor in <a target="_blank" href="https://www.glassdoor.com/Photos/Google-Office-Photos-IMG457.htm">building 43</a>. Facilities sent me a gigantic box of snacks, and we had to very scientifically. Like, <strong>very scientifically</strong>, try out the snacks and give feedback about which things we liked. Based on the feedback from similar parties like this all over the campus, they rotated the microkitchen snacks after a long period of way too many French Onion SunChips.</p>
<p>So in this, <em>incredibly serious business meeting</em>, we're eating as much as we can. For science.</p>
<p><a target="_blank" href="https://storage.googleapis.com/discobubble-quiz/IMG_20161117_151907.jpg"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619326943858/E4ABDuJL9.jpeg" alt /></a></p>
<p>(Sorry for the blurry image, I was probably eating something)</p>
<p>It was pretty obvious which were the unpopular snacks, since they remained in the big box from facilities outside my desk for several weeks despite my urgent pleas to the mailing list for the floor of my building to please come and eat them.</p>
]]></content:encoded></item><item><title><![CDATA[It's a like a startup inside a startup, Topaz]]></title><description><![CDATA[At Google, I knew a great engineer (let's call them Topaz) who, after working on a bunch of different teams and being pretty successful at what they did, decided to join the hot new team that was hiring like crazy. It was pretty exciting for them, th...]]></description><link>https://duff.blog/its-a-like-a-startup-inside-a-startup-topaz-1</link><guid isPermaLink="true">https://duff.blog/its-a-like-a-startup-inside-a-startup-topaz-1</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Wed, 07 Oct 2020 16:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1619328056621/n_1Jjb6OX.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>At Google, I knew a great engineer (let's call them Topaz) who, after working on a bunch of different teams and being pretty successful at what they did, decided to join the hot new team that was hiring like crazy. It was pretty exciting for them, they told me many times. For me, I've been in some of those situations where you're anticipating something new in your life so much you have the most vivid dreams about it actually happening. As if it were actually happening now. It was like that for this person. </p>
<p>Now this was part of Google was <em>hot</em> at the time; it was in the realm of all things social when Google was trying to do that. It was breathtaking how all-in Google went on the social stuff so quickly, and there was a buzzy aura around the teams who worked on it. The building they were in had restricted access (because reasons), still a relatively rare thing at that time, and its own not-so-secret restaurant. </p>
<p><a target="_blank" href="https://storage.googleapis.com/discobubble-quiz/IMG_20190407_095918.jpg"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619326947150/lJOVXnzBk.jpeg" alt /></a></p>
<p>So, the big day came, and Topaz joined the new team (let's call it the Flingwheel team), and had their first 1:1 with their manager,  Borantz. All good fictional names should end with a z. Apparently. Something was just <em>off</em> about that from the start. The manager was charming, and nice - a very sociable and amenable guy, but there was a weird disconnect. It felt like the conversation was one-way. Topaz would talk about what they were looking to learn about and develop on the team, and Borantz would nod and talk about how cool this team was because it was a startup within a startup or something like that. </p>
<p>Topaz wasn't sure if it was just their imagination. A single 1:1 isn't enough, so Topaz paid close attention the next few times they met with Borantz. After a few weeks on the team, Topaz had realized that there was a huge opportunity here - they wanted to hone their frontend web development skills, and this project's frontend was moving slowly and jeopardizing the whole project.  Topaz thought this would be an excellent thing to talk to Borantz about, and they did. And Borantz talked about how cool the project was and how cool this team was because it was a startup with a startup, and hey, Topaz, could you do some backend stuff because that's what you're good at right?</p>
<p>Wait, what?</p>
<p>It turns out that Topaz's good friend Potaz happened to have worked with Topaz before about 87 years ago (er, slightly exaggerating here) knew of Topaz's legendary backend skills. Skills which in actuality amounted to a couple of years spent fiddling around with servlets on a shaky and not too reliable server somewhere in the back of an office. Potaz also worked on Flingwheel, and had recommended Topaz to Borantz, and had emphasized Topaz's aforementioned legendary backend skills.</p>
<p>The cycle of dysfunctional 1:1s went on, and Topaz noticed that many other things were dysfunctional about the team. All of a sudden, out of nowhere, the team ruptured into two parts, the Flingwheel team were now doing something that was decidedly less interesting than the original idea, and a new offshoot team went on to build.. well... something that you probably use now. Topaz was stuck on the Flingwheel team, and pretty unhappy at this point, since their job consisted chiefly of converting protocol buffer messages to other protocol buffer messages and writing unit tests for said rote conversion code.</p>
<p>The kicker came when one day, out of the blue, Topaz had a random meeting with Stanz, the director of all backend projects in his part of Google. Topaz's legendary backend skills were needed on the core backend team, and a transfer and reorg were imminent, and wasn't that exciting, Topaz? What's that? You want to be a full stack engineer? You don't actually like backend? Your manager hasn't actually mentioned any of this to you? Oh yea, the Flingwheel team was cool, and it was like a startup within a startup or something like that, and now you're going to be converting protocol buffers back and forth until the end of time, dear Topaz.</p>
<p>Topaz was sad for a bit, and quit, and then found something much more fun to work on, and a manager who listened to them.</p>
]]></content:encoded></item><item><title><![CDATA[Intercepting behavior with java agents]]></title><description><![CDATA[A previous post showed using JVMTI to log method calls in a non-intrusive way, and without having to make modifications to upstream libraries. JVMTI is much more powerful than that post showed - for example it can replace and modify code in a running...]]></description><link>https://duff.blog/intercepting-behavior-with-java-agents-1</link><guid isPermaLink="true">https://duff.blog/intercepting-behavior-with-java-agents-1</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Tue, 06 Oct 2020 16:30:00 GMT</pubDate><content:encoded><![CDATA[<p>A previous post showed <a target="_blank" href="https://blog.dubh.org/2020/09/dynamic-method-tracing-in-in-java.html">using JVMTI to log method calls</a> in a non-intrusive way, and without having to make modifications to upstream libraries. JVMTI is much more powerful than that post showed - for example it can replace and modify code in a running JVM altogether, which can be useful for things like logging or performance measurements, but also intercepting or changing behavior at runtime.</p>
<p>It is, however, quite cumbersome to write code for that sort of thing in C or C++ using the JNI interfaces. It turns out Java provides <a target="_blank" href="https://docs.oracle.com/javase/6/docs/api/java/lang/instrument/package-summary.html">a higher level interface</a> to instrument or redefine classes using the Java programming language itself. This post will demonstrate a ridiculously simple example of such an agent. You can find the <a target="_blank" href="https://github.com/brianduff/javaagent">example code in GitHub</a>.</p>
<p><a target="_blank" href="https://1.bp.blogspot.com/-YzjNS87aprg/X3pIGmrFRbI/AAAAAAACuqA/v18pQj_rFjsR0uXe7EJY9TeEr_4ar13hgCLcBGAsYHQ/s1280/agent-1294795_1280.png"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619326950325/LFNGEnrGn.png" alt="A gratuitous image of a secret agent" /></a></p>
<h3 id="a-simple-program">A simple program</h3>
<p>Let's start off with the really simple program that we want to instrument. The <code>Greeter</code> class does the time honored thing of saying <code>Hello World</code>. We've for some reason awkwardly and weirdly moved the <code>World</code> part of that into a helper method. It's totally artificial, but it helps keep this example straightforward. In addition to <code>Greeter</code>, there's a simple <code>Main</code> class (not shown) that just calls <code>Greeter.sayHello()</code>.</p>
<pre><code><span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Greeter</span> {</span>
  <span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">static</span> String <span class="hljs-title">getName</span><span class="hljs-params">()</span> </span>{
    <span class="hljs-keyword">return</span> <span class="hljs-string">"World"</span>;
  }

  <span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">sayHello</span><span class="hljs-params">()</span> </span>{
    System.out.<span class="hljs-built_in">printf</span>(<span class="hljs-string">"Hello %s\n"</span>, getName());
  }
}
</code></pre><p>It's easy and does what you'd expect. Using <a target="_blank" href="https://bazel.build/">Bazel</a>, here's how I build and run the program:</p>
<pre><code>$ bazel run src/main/java/org/dubh/examples/agent/<span class="hljs-symbol">target:</span>Target
Hello World
</code></pre><p>From this point on, let's assume we can't (for whatever reason) touch the code of <code>Greeter</code>. Being a bit selfish, I want this program to say hello to <em>me</em>, not the whole world. A Java agent can change the behavior without changing or recompiling <code>Target.java</code> or <code>Greeter.java</code>. I'll use it to change the implementation of <code>getName()</code> at runtime.</p>
<h3 id="agent-basic-structure">Agent basic structure</h3>
<p>The <code>main()</code> method is the entry point to a Java application. Java agents have special powers to do things before <code>main()</code> is called, so the entry point for an agent is <code>premain()</code>. You're passed arguments for the agent, and an object implementing <a target="_blank" href="https://docs.oracle.com/javase/6/docs/api/java/lang/instrument/Instrumentation.html"><code>Instrumentation</code></a>, which is how you access the APIs you'll need to transform classes. Our simple agent checks whether it's ok to redefine classes, and registers a <code>ClassFileTransformer</code>.</p>
<pre><code><span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">premain</span>(<span class="hljs-params">String agentArgs, Instrumentation inst</span>)</span> {
  <span class="hljs-keyword">if</span> (!inst.isRedefineClassesSupported()) {
    System.err.println(<span class="hljs-string">"ExampleAgent: not allowed to redefine classes!"</span>);
    <span class="hljs-keyword">return</span>;
  }

  inst.addTransformer(<span class="hljs-keyword">new</span> ClassFileTransformer() {
    @Override
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">byte</span>[] <span class="hljs-title">transform</span>(<span class="hljs-params">ClassLoader loader, String className, 
        Class&lt;?&gt; oldClazz, ProtectionDomain domain, <span class="hljs-keyword">byte</span>[] classfileBuffer</span>)</span> {

      <span class="hljs-keyword">if</span> (<span class="hljs-string">"org/dubh/examples/agent/target/Greeter"</span>.<span class="hljs-keyword">equals</span>(className)) {
        <span class="hljs-keyword">return</span> transformClass(classfileBuffer);
      }

      <span class="hljs-keyword">return</span> <span class="hljs-literal">null</span>;
    }
  });
}
</code></pre><p><code>transform()</code> is called when the JVM is loading a class, and provides a hook to rewrite its implementation. The <code>className</code> passed here is in <a target="_blank" href="https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.2.1">JVM internal form</a> (which in this simple case, just means replacing each <code>.</code> with a <code>/</code>). If we return <code>null</code> from this method, the class will be unaltered, which is what we want in all cases unless we're loading <code>Greeter</code>.</p>
<h3 id="byte-code-swizzling">Byte code swizzling</h3>
<p>All that remains is just to do the actual transformation. The array of bytes we were given in <code>classfileBuffer</code> is the original compiled code for the class in <a target="_blank" href="https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4">class file binary format</a>. If you were feeling super adventurous, you could swizzle around with the bytes of this array yourself. However, it's much easier to use a library that already understands this format and lets you manipulate it. <a target="_blank" href="https://asm.ow2.io/">ASM</a> is a popular library for doing just this kind of thing.</p>
<p>ASM makes it really easy to manipulate bytecode, but you'll still need a basic understanding of how JVM instructions work. Explaining this is beyond the scope of this post, but you can use the <code>javap</code> tool to look at <code>.class</code> files and see the instructions they contain. The body of the current <code>getName()</code> method looks like this:</p>
<pre><code>$ javap -p -c -cp Target_deploy.jar \
    org.dubh.examples.agent.target.Greeter
  <span class="hljs-keyword">private</span> <span class="hljs-built_in">static</span> java.lang.<span class="hljs-keyword">String</span> getName();
    Code:
       <span class="hljs-number">0</span>: ldc           <span class="hljs-comment">#2                  // String World</span>
       <span class="hljs-number">2</span>: areturn
</code></pre><p>It contains two instructions: The <code>ldc</code> operation pushes the constant value <code>"World"</code> on to the stack, and then the <code>areturn</code> instruction pops the top of the stack and returns it. We want to replace this with set of instructions that call a static method instead:</p>
<pre><code>  <span class="hljs-keyword">private</span> <span class="hljs-built_in">static</span> java.lang.<span class="hljs-keyword">String</span> getName();
    Code:
       <span class="hljs-number">0</span>: invokestatic  <span class="hljs-comment">#2                  // Method getNewName:()Ljava/lang/String;</span>
       <span class="hljs-number">3</span>: areturn
</code></pre><p>These new instructions consist of an <code>invokestatic</code> to call a <code>getNewName()</code> static method pushing its returned value on the stack, and an <code>areturn</code> like before to pop the stack and return it. Alongside the agent, we need to include the new method we want to be called, and we do that in a simple <code>NewGreeter</code> class that's compiled along with the agent:</p>
<pre><code><span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">NewGreeter</span> </span>{
  <span class="hljs-keyword">public</span> <span class="hljs-built_in">static</span> <span class="hljs-keyword">String</span> getNewName() {
    <span class="hljs-keyword">return</span> <span class="hljs-string">"Brian"</span>;
  }
}
</code></pre><p>Here's what the <code>transformClass()</code> method looks like with comments that hopefully explain what's going on:</p>
<pre><code><span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">byte</span>[] <span class="hljs-title">transformClass</span>(<span class="hljs-params"><span class="hljs-keyword">byte</span>[] classfileBuffer</span>)</span> {
  <span class="hljs-comment">// ClassReader knows how to grok the buffer of bytes as a Java class.</span>
  ClassReader reader = <span class="hljs-keyword">new</span> ClassReader(classfileBuffer);

  <span class="hljs-comment">// ClassNode is a visitor over the things in the classfile that collects</span>
  <span class="hljs-comment">// them into an in-memory data structure that we can easily traverse. You</span>
  <span class="hljs-comment">// can also avoid creating a separate in-memory representation by just</span>
  <span class="hljs-comment">// implementing a simple ClassVisitor, but it often requires more code.</span>
  ClassNode classNode = <span class="hljs-keyword">new</span> ClassNode();
  reader.accept(classNode, Opcodes.ASM8);

  <span class="hljs-comment">// Now ClassNode contains a data strcuture with all the things in the</span>
  <span class="hljs-comment">// class, and we can look through the methods for the one we care about.</span>
  <span class="hljs-keyword">for</span> (MethodNode method : classNode.methods) {
    <span class="hljs-comment">// You'd maybe want to check the signature also in a real program.</span>
    <span class="hljs-keyword">if</span> (<span class="hljs-string">"getName"</span>.<span class="hljs-keyword">equals</span>(method.name)) {
      <span class="hljs-comment">// Method bodies contain instruction lists. Here, we create a simple</span>
      <span class="hljs-comment">// instruction list with two instructions - one to call a static </span>
      <span class="hljs-comment">// method, and another to return whatever that static method returned.</span>
      InsnList instructions = <span class="hljs-keyword">new</span> InsnList();
      instructions.<span class="hljs-keyword">add</span>(<span class="hljs-keyword">new</span> MethodInsnNode(Opcodes.INVOKESTATIC, 
          <span class="hljs-string">"org/dubh/examples/agent/NewGreeter"</span>, <span class="hljs-string">"getNewName"</span>,
          <span class="hljs-string">"()Ljava/lang/String;"</span>));
      instructions.<span class="hljs-keyword">add</span>(<span class="hljs-keyword">new</span> InsnNode(Opcodes.ARETURN));

      <span class="hljs-comment">// This replaces the existing instruction list of the method with our</span>
      <span class="hljs-comment">// new instruction list.</span>
      method.instructions = instructions;
    }
  }

  <span class="hljs-comment">// ClassWriter is a visitor that knows how to traverse the data structure,</span>
  <span class="hljs-comment">// and write back out the bytes of a class.</span>
  ClassWriter writer = <span class="hljs-keyword">new</span> ClassWriter(ClassWriter.COMPUTE_FRAMES | ClassWriter.COMPUTE_MAXS);
  classNode.accept(writer);

  <span class="hljs-keyword">return</span> writer.toByteArray();
}
</code></pre><h3 id="deploying-and-using-the-agent">Deploying and using the agent</h3>
<p>There's one last thing we need to do in order to make our agent work. Agents must be compiled into a <code>jar</code> file that contains instructions about where to find the premain class and which capabilities our agent has. For this example, the <code>MANIFEST.MF</code> looks like the one below.</p>
<pre><code><span class="hljs-attr">Manifest-Version:</span> <span class="hljs-number">1.0</span>
<span class="hljs-attr">Premain-Class:</span> <span class="hljs-string">org.dubh.examples.agent.ExampleAgent</span>
<span class="hljs-attr">Agent-Class:</span> <span class="hljs-string">org.dubh.examples.agent.ExampleAgent</span>
<span class="hljs-attr">Can-Redefine-Classes:</span> <span class="hljs-literal">true</span>
<span class="hljs-attr">Can-Retransform-Classes:</span> <span class="hljs-literal">true</span>
</code></pre><p>If you're using Bazel, you can accomplish this using the <code>deploy_manifest_lines</code> attribute on <code>java_binary</code>, like so:</p>
<pre><code>java_binary(
    <span class="hljs-type">name</span> ="agent",
    runtime_deps = [ ":agent_lib" ],
    main_class = "org.dubh.examples.agent.ExampleAgent",
    deploy_manifest_lines = [
        "Premain-Class: org.dubh.examples.agent.ExampleAgent",
        "Agent-Class: org.dubh.examples.agent.ExampleAgent",
        "Can-Redefine-Classes: true",
        "Can-Retransform-Classes: true",
    ]
)
</code></pre><p>With this in place, let's try running our program with and without the agent. We use the <code>-javaagent</code> argument to <code>java</code> to tell it where our agent jar is.</p>
<pre><code>$ cd bazel-bin/src/main/java/org/dubh/examples/agent
$ java -jar target/Target_deploy.jar
Hello world
$ java -<span class="hljs-symbol">javaagent:</span>agent_deploy.jar -jar target/Target_deploy.jar
Hello Brian
</code></pre><p>It works!</p>
<h3 id="summing-up">Summing up</h3>
<p>This is a fairly trivial example of how to write a Java agent, and there's lots more to dive into for complex things. At the core though, this setup of using ASM to rewrite bytes is a template for much more complicated things. I missed out a few details around Bazel in the interest of making the post as simple as possible, but you can play around with the full example in the <a target="_blank" href="https://github.com/brianduff/javaagent">javaagent github project</a>. Hope this has been useful. I'd love to hear about the kinds of problems you're solving with Java agents in the comments :)</p>
]]></content:encoded></item><item><title><![CDATA[Aapt2: Tower of Babel]]></title><description><![CDATA[OK, we'd fixed a bug that brought us back to our baseline build speed with aapt2. But could it be made even faster?
This is the final part of a three part series about adventures with aapt2, Android's resource compiler / optimizer. You can read the i...]]></description><link>https://duff.blog/aapt2-tower-of-babel-1</link><guid isPermaLink="true">https://duff.blog/aapt2-tower-of-babel-1</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Mon, 05 Oct 2020 16:30:00 GMT</pubDate><content:encoded><![CDATA[<p>OK, we'd fixed a bug that brought us back to our baseline build speed with aapt2. But could it be made even faster?</p>
<p>This is the final part of a three part series about adventures with aapt2, Android's resource compiler / optimizer. You can read the intro bit and get more context <a target="_blank" href="https://blog.dubh.org/2020/09/how-i-learned-to-love-aapt2.html">here</a>. The <a target="_blank" href="https://blog.dubh.org/2020/10/aapt2-please-dont-delete-me.html">second post</a> explained a small fix that resulted in a nice performance win.</p>
<p>Big Android apps like the hypothetical ones from that hypothetical company that I hypothetically work for tend to contain a lot of strings that are translated into hypothetical languages (er, maybe I didn't need that last hypothetical). However, during the compile-run cycle, most developers are usually working with a single language. This is not to understate the importance of testing with a variety of languages, but it's reasonable and normal to restrict things somewhat in dev builds for developer efficiency reasons.</p>
<p>There are really a lot of strings in a lot of languages in some of these hypothetical apps. Like really a lot. I wish I could say how much, but think about what you'd consider a lot, then add some more to it. </p>
<p>My profiling from finding the <a target="_blank" href="https://blog.dubh.org/2020/09/aapt2-please-dont-delete-me.html">previous issue</a> had shown me that aapt2 was spending an <em>awful</em> lot of time dealing with the huge number of strings we were throwing at it. But most of the time, developers really only cared about a tiny fraction of these strings in developer builds. I spotted a friendly looking option in that looked like it might be quite useful:</p>
<p><a target="_blank" href="https://1.bp.blogspot.com/-hG1xc_pDG00/X3Uiv5kP94I/AAAAAAACuf0/eXv1gn4j5Po1lfiydYCvtAjDt-Ozbgw8ACLcBGAsYHQ/s1692/Screen%2BShot%2B2020-09-30%2Bat%2B5.28.37%2BPM.png"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619326953308/WSWK9G5Qp.png" alt /></a></p>
<p>So, I was thinking if maybe I could just do something like this in our dev builds, things would be waaaay faster:</p>
<p>  <code>aapt2 link -c en ...</code></p>
<p>Whelp, it didn't work! Erm.</p>
<p>So it turned out that the way aapt2 link processes the -c option is as a post filter - it still processes all the input resource files containing strings in every configuration. One way we could get around that is to filter these resources out at compile time so we never pass them into the link phase. Because of some particular complexities of our source / build system, that'd mean copying or creating symlink farms of resources. </p>
<p>I ended up instead just patching aapt2 to make it respect -c as a pre-filter instead. With this change, it completely ignores inputs that don't match the specified configurations. Doing that eliminated about a 60s build time penalty when resources are changed on every developer build.</p>
]]></content:encoded></item><item><title><![CDATA[Aapt2: Please don't delete me!]]></title><description><![CDATA[By making one weird change in aapt2, we sped up our build by 45 seconds. Developers love that stuff.
This is the second in a three part series about adventures with aapt2, Android's resource compiler / optimizer. You can read the first bit and get mo...]]></description><link>https://duff.blog/aapt2-please-dont-delete-me-1</link><guid isPermaLink="true">https://duff.blog/aapt2-please-dont-delete-me-1</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Fri, 02 Oct 2020 16:30:00 GMT</pubDate><content:encoded><![CDATA[<p>By making one weird change in aapt2, we sped up our build by 45 seconds. Developers love that stuff.</p>
<p>This is the second in a three part series about adventures with aapt2, Android's resource compiler / optimizer. You can read the first bit and get more context <a target="_blank" href="https://blog.dubh.org/2020/09/how-i-learned-to-love-aapt2.html">here</a>.</p>
<p>Proguard is an optimizer that many Android apps use. It can do nifty things like removing unused code and resources, inlining things that have no real reason to be in separate methods, and even obfuscating symbols so you can pretend like nobody will ever be able to figure out what your clever code is doing. In modern Android, the <a target="_blank" href="https://developer.android.com/studio/build/shrink-code#keep-code">r8 shrinker</a> has a similar function, and is driven by proguard configuration files.</p>
<p>However, r8 / Proguard can't always figure out if something is used, or sometimes optimizes more aggressively than you'd like. Configuration directives can be used to tell it to keep things that would otherwise be removed. aapt2 has options that let it emit configuration files for resource related code. </p>
<p>I used a profiler to look at the performance of aapt2 on our codebase, and it turned out that a significant chunk of the increased time was being spent in one function, aapt::proguard::<a target="_blank" href="https://cs.android.com/search?q=function:CollectLocations">CollectLocations</a>(), which is part of the machinery that generates these rules. In particular, it was spending a lot of time generating rules for the --proguard-conditional-keep-rules option, which removes resource ids that don't match certain usage patterns known to be used in layouts.</p>
<p>It turned out that we didn't have that option turned on in our codebase (we use <a target="_blank" href="https://fbredex.com/">other tools</a> for optimizations like this), so the extra work that was being done here was being thrown away anyway. I <a target="_blank" href="https://issuetracker.google.com/issues/144236322">wrote up my findings</a> and sent a <a target="_blank" href="https://android.googlesource.com/platform/frameworks/base/+/dc21dea9b8b1">patch</a> upstream, which I think is always quite a polite thing to do when you discover an easily fixable issue. This immediately sped up our round trip build time by about 45 seconds. Many thanks to the folks at Google for quickly accepting this upstream!</p>
<p>But I was still not happy with how long developers had to wait for aapt2 to do its thing... Stay tuned for another optimization in the next post.</p>
<p>The next blog post in this series talks about another <a target="_blank" href="https://blog.dubh.org/2020/10/aapt2-tower-of-babel.html">big performance optimization</a> that came from reducing the number of languages we use in dev builds..</p>
]]></content:encoded></item><item><title><![CDATA[How I learned to love aapt2]]></title><description><![CDATA[The Android Asset Packaging Tool (aapt2) takes all those lovely resources (images, strings, cat pictures, and whatnot) in your Android app, and compiles them into a binary format that the runtime understands. It's also the thing that generates numeri...]]></description><link>https://duff.blog/how-i-learned-to-love-aapt2-1</link><guid isPermaLink="true">https://duff.blog/how-i-learned-to-love-aapt2-1</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Thu, 01 Oct 2020 16:30:00 GMT</pubDate><content:encoded><![CDATA[<p>The Android Asset Packaging Tool (<a target="_blank" href="https://developer.android.com/studio/command-line/aapt2">aapt2</a>) takes all those lovely resources (images, strings, cat pictures, and whatnot) in your Android app, and compiles them into a binary format that the runtime understands. It's also the thing that generates numerical identifiers and constants for them in R.java, which is the class you use to refer to resources in code.</p>
<p>You should rarely have reasons to interact with aapt2 directly, since for most Android developers, it's something that happens automatically during a build with Gradle, or your build system of choice (e.g. <a target="_blank" href="https://bazel.build/">Bazel</a>, or in our case <a target="_blank" href="https://buck.build/">Buck</a>). Suffice to say, you're either doing seriously hardcore interesting things, or you're maybe working on a build / developer infra or something like that (we're hiring!) if you're interacting directly with this tool.</p>
<p>aapt2 operates in two phases; in the compile phase, it converts individual resources into a binary representation (either a .flat file or a .flata, which is just a zip of .flat files). The usually more expensive link phase merges all of these individual resources together into a final .ap_ file, which is just like an APK except with no code. For a while, I super enjoyed dropping "flata" and "ap underscore" into random conversations around the dinner table and relished the confusion on the face of those around me. My family love me really. Anyway.</p>
<p>For some reason, maybe following the, "if it ain't broke" principal, we were using quite an old version of aapt2 for quite some time, then tried to upgrade at some point. There were some new goodies we wanted in the latest version, but when we upgraded we ran into a slew of incompatibilities (caused by our own infamous creativity for the most part) that had to be fixed, and then a significant increase in the aapt2 link phase of our builds. Slow builds make everyone sad.</p>
<p>The next couple of blog entries will talk about two specific optimizations I made to aapt2 a while back to speed it up for a (cough) well known large Android app that may or may not be associated with my employer.</p>
<p>The next blog post in this series talks about a <a target="_blank" href="https://blog.dubh.org/2020/10/aapt2-please-dont-delete-me.html">small bug fix that saved 45s of build time</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Dynamic Method Tracing in Java: The Implementation]]></title><description><![CDATA[In the last blog entry, I talked about the need for a tool in Java that can be configured easily to log method calls without redeploying a binary, attaching a debugger, or obtaining root in order to trace system calls. In this blog, I'll dive into so...]]></description><link>https://duff.blog/dynamic-method-tracing-in-java-the-implementation-1</link><guid isPermaLink="true">https://duff.blog/dynamic-method-tracing-in-java-the-implementation-1</guid><dc:creator><![CDATA[Brian Duff]]></dc:creator><pubDate>Wed, 30 Sep 2020 16:56:00 GMT</pubDate><content:encoded><![CDATA[<p>In the <a target="_blank" href="https://blog.dubh.org/2020/09/dynamic-method-tracing-in-in-java.html">last blog entry</a>, I talked about the need for a tool in Java that can be configured easily to log method calls without redeploying a binary, attaching a debugger, or obtaining root in order to trace system calls. In this blog, I'll dive into some of how the tool was implemented.</p>
<p>A caveat: I knocked this tool together in my spare time one afternoon while working on a bunch of other things, so it's a bit rough around the edges. I'm also not a particularly experienced C programmer, so apologies if my code is a bit rough. </p>
<p>However, it's already proven useful to me, and hopefully either the tool itself or the approach will be useful to others. I found a general lack of detailed information about using JVMTI when doing research.</p>
<h2 id="onload-config-capabilities-events-and-callbacks">OnLoad: Config, capabilities, events, and callbacks</h2>
<p>The main entrypoint to a native JVMTI agent is the Agent_OnLoad function. I want to do three main things when my agent is loaded:</p>
<ol>
<li>Load the configuration file so we know which breakpoints to set.</li>
<li>Let JVMTI know which capabilities the agent needs, and ensure the JVM supports them.</li>
<li>Let JVMTI know I'm interested in receiving events when classes are about to be loaded and when breakpoints occur, and register callback functions for these.</li>
</ol>
<p>The configuration file's implementation is a bit out of scope for this blog. I wrote a quick, dirty, and exceptionally rough yaml parser in C (yes, I know... I must have been bored). You can look at <a target="_blank" href="https://github.com/brianduff/mlogagent/blob/master/config.c">config.c</a> if you're truly interested in the sordid details of this. But ignoring that, here are the more interesting parts of OnLoad:</p>
<p>  <code>`JNIEXPORT jint JNICALL 
Agent_OnLoad(JavaVM *vm, char *options, void *reserved) {
  char *option = strtok(options, ",");
  // ...
  // Load the config file using options from</code>option`. Also parse out an option for
  // a log file to write to.
  // .. </p>
<p>  fprintf(stderr, "mlogagent: Loaded agent\n");</p>
<p>  // Get a JVMTI env object, which is used to make function calls into JVMTI
  jvmtiEnv <em>env;
  assert(JVMTI_ERROR_NONE == (</em>vm)-&gt;GetEnv(vm, (void **)&amp;env, JVMTI_VERSION));</p>
<p>  // Let JVMTI know we are going to be registering breakpoints and accessing local
  // variables. If these capabilities are not supported by the JVM, then the AddCapabilities
  // function will return an error code, and our agent will terminate the VM.
  jvmtiCapabilities capabilities = { 0 };
  capabilities.can_generate_breakpoint_events = 1;
  capabilities.can_access_local_variables = 1;
  assert(JVMTI_ERROR_NONE == (*env)-&gt;AddCapabilities(env, &amp;capabilities));</p>
<p>  // Register callbacks for ClassPrepare and Breakpoint events. We'll dig into these callback
  // functions soon.
  jvmtiEventCallbacks callbacks = { 0 };
  callbacks.ClassPrepare = &amp;ClassPrepareCallback;
  callbacks.Breakpoint = &amp;BreakpointCallback;
  assert(JVMTI_ERROR_NONE == (*env)-&gt;SetEventCallbacks(env, &amp;callbacks, sizeof(callbacks)));</p>
<p>  // Tell JVMTI to enable breakpoint and class prepare events so our callbacks will be invoked.
  assert(JVMTI_ERROR_NONE == (<em>env)-&gt;SetEventNotificationMode(env, 
    JVMTI_ENABLE, JVMTI_EVENT_BREAKPOINT, NULL));
  assert(JVMTI_ERROR_NONE == (</em>env)-&gt;SetEventNotificationMode(env, 
    JVMTI_ENABLE, JVMTI_EVENT_CLASS_PREPARE, NULL));</p>
<p>  return JNI_OK;
}`` </p>
<h2 id="registering-method-breakpoints">Registering method breakpoints</h2>
<p>The next step is to attach breakpoints to methods we're interested in. We do this when the ClassPrepare event is fired, which happens just before the JVM loads a class. In our case, the implementation of this is a bit complicated by the fact that we have a config file that drives setting the breakpoints, but at the core, we just need to find the class we're interested in, find the method we care about, then attach a breakpoint to it. Here's a simplified version of the code:</p>
<p>  `void JNICALL ClassPrepareCallback(jvmtiEnv <em>jvmti_env, JNIEnv </em>jni, jthread thread, jclass klass) {
  // Avoid re-entrancy, and return early if we already attached.
  if (in_prepare || attached) {
    return;
  }</p>
<p>  in_prepare = true;</p>
<p>  jclass clazz = (<em>jni)-&gt;FindClass(jni, classConfig-&gt;name);
  // In case we don't find the class, don't throw an exception
  (</em>jni)-&gt;ExceptionClear(jni);
  if (clazz != NULL) {
    attachMethodBreakpoint(jvmti_env, jni, clazz);
    (*jni)-&gt;DeleteLocalRef(jni, clazz);
    attached = true;
  }</p>
<p>  in_prepare = false;
}</p>
<p>void attachMethodBreakpoint(jvmtiEnv <em>jvmti_env, JNIEnv </em>jni, jclass clazz) {
  // Look for the method. Here,
  //    methodName is something like "someMethod"
  //    methodSignature is something like "(Lfrodo/Test$RealFile;)Ljava/lang/String;"
  jmethodID mid = (<em>jni)-&gt;GetMethodID(jni, clazz, methodName, methodSignature);
  if (mid != NULL) {
    // Actually set the method breakpoint
    assert(JVMTI_ERROR_NONE == (</em>jvmti_env)-&gt;SetBreakpoint(jvmti_env, mid, 0));
  } else {
    fprintf(stderr, "mlogagent: Can't find the method\n");
  }
}` </p>
<p>The last part is to handle the method breakpoint callback when it happens. In the real program, we keep track of methodIDs we've registered breakpoints for, so that when we get this event we can quickly determine which method triggered the breakpoint, and look up information about how to display it in the config. In the real code, this is one of the more complex parts of the implementation, since it uses a bunch of JNI in order to generate a string to represent the method parameter we're interested in. For this simplified example, I'm just going to show what the callback looks like, how to extract a parameter, and how to print the first few frames of the stack trace.</p>
<p>  `// Called when a breakpoint is hit.
void JNICALL BreakpointCallback(jvmtiEnv <em>jvmti_env, JNIEnv </em>jni, jthread thread, jmethodID method, jlocation location) {
  // Get hold of the parameter. In this example, we just get the first parameter.
  // Which parameter is first depends on whether this is a static or instance method. 
  // For instance methods, the first parameter is a synthetic parameter representing
  // the current instance of the class. So here, we get the first "real" parameter.
  int parameterPos = 1;</p>
<p>  jobject the_parameter;
  assert(JVMTI_ERROR_NONE == (*jvmti_env)-&gt;GetLocalObject(jvmti_env, thread, 0, parameterPos, &amp;the_parameter));</p>
<p>  // TODO: something that displays the parameter... See the full code for details.</p>
<p>  // Show a simplified stack trace.
  jvmtiFrameInfo frames[8];
  jint count;
  jvmtiError err;</p>
<p>  err = (<em>jvmti_env)-&gt;GetStackTrace(jvmti_env, thread, 0, 8, frames, &amp;count);
  if (err == JVMTI_ERROR_NONE &amp;&amp; count &gt;= 1) {
    char </em>methodName;
    // fout is the output log file
    fprintf(fout, "  trace: ");
    for (int i = 0; i &lt; count; i++) {
      err = (*jvmti_env)-&gt;GetMethodName(jvmti_env, frames[i].method, &amp;methodName, NULL, NULL);
      if (err == JVMTI_ERROR_NONE) {
        fprintf(fout, "&lt;- %s", methodName);
      }
    }
    fprintf(fout, "\n");
  }</p>
<p>  fflush(fout);
}` </p>
<h2 id="compiling-and-running-the-agent">Compiling and running the agent</h2>
<p>To compile the agent, you need to add to your compiler's include path header files from the java runtime you're compiling for. The location of these vary from system to system. On my system, I use a command something like this:</p>
<pre><code>gcc -shared \
  -I/usr/<span class="hljs-keyword">local</span>/java-runtime/impl/<span class="hljs-number">8</span>/<span class="hljs-keyword">include</span> \
  -I/usr/<span class="hljs-keyword">local</span>/java-runtime/impl/<span class="hljs-number">8</span>/<span class="hljs-keyword">include</span>/darwin \
  mlogagent.c \
  -o /tmp/libmlogagent.dylib
</code></pre><p>To use the agent in a running java program, we pass the -agentpath argument to java with the path of the agent, and any options we want to pass in:</p>
<pre><code>java -agentpath:<span class="hljs-regexp">/tmp/</span>libmlogagent.dylib=config=test.conf,file=<span class="hljs-regexp">/tmp/</span>out.txt \
   -cp classes frodo.Test
</code></pre><h2 id="summing-up">Summing up</h2>
<p>So we've seen how to write a fairly basic JVMTI agent that can log method calls. It's pretty efficient - in an example application which makes thousands of method calls a second, there's no perceptible performance impact when the agent is enabled. You can find full source for the tool on <a target="_blank" href="https://github.com/brianduff/mlogagent">github</a>.</p>
<p>I'd like to experiment more with this tool, perhaps making it more flexible and customizable. Some of the things I want to poke around with are:</p>
<ul>
<li>Rewriting it in Rust. This has advantages like me not having to reinvent a configuration file parser from scratch because I'm too lazy to pull in third party C libraries, as well as likely making it easier to maintain and extend the code in future.</li>
<li>Looking into whether it'd be easier to do the same thing with a java-based agent or JDI, and seeing whether the performance characteristics of these are different.</li>
<li>Adding much more flexibility around logging various bits of context when a breakpoint is hit. For example, it'd be useful to have the option to log fields, and all of the parameters instead of just one.</li>
</ul>
<p><strong>You can find the code for this tool in <a target="_blank" href="https://github.com/brianduff/mlogagent">github</a></strong>.</p>
]]></content:encoded></item></channel></rss>