Google's Developer Infrastructure is Unmatched

Everything I wrote down here has already been discussed extensively on the Internet. The following post is just my experience on them.

Piper and CL (Change List)

While Piper is not the only code repository at Google, it’s the biggest and mostly used one. It stores all kinds of internal code, from experimental/users/<my ldap>’s random scripts and Colab kernel definitions to learning/gemini’s SoTA transformer architecture. It also has a GitHub-like web code browser, aptly named Code Search.

To manage open-source projects, Google also hosts an internal Git repo for major ones like Android OS and Chrome. For smaller-scale open-source projects, Google developed a service called copybara to sync between Piper and external repos like GitHub.

Piper has its own interface p4 interface for version control, but I still prefer the “close-to-git” mercury-like fig interface. Similar to Git’s branch, I need to create a new workspace in order to commit code, p4head is a CL number indicating the latest commit in the branch. After you make the changes, you upload them into a Change List (CL). Internally, fig would create another new workspace with all the changes and create a CL with the native Piper interface.

Fun fact, the command to submit a CL in fig is hg submit <cl number> and can be easily mistyped as hg sumbit <cl number>. hg will actually convert the CL number into binary, sum the bits and print it out!

Each CL is given a unique monotonically increasing number (OCL number) at upload, so that you can send it for review, and another one when the CL gets merged. This means committed CL numbers are always monotonically increasing. This has some convenient properties:

If a fix is submitted as cl/12345, I just need to sync past cl/12345 in my workspace to get the fix. No more reading of commit history, which is infeasible anyway, given the number of CLs humans and bots submits in a minute.
If a high-impact bug is introduced at cl/123 and fixed at cl/456, then you just need to mark [123, 456] as a bad CL range. Internal tools will recognize it and prevent you from running code in production machines built in the bad CL ranges.

Unlike Git, any further changes you make to the workspace after the initial commit is amended to the same commit (with a different hash), effectively doing Git’s squash-and-merge on the fly.

CitC (Code in the Cloud) and Cider-V

In my mind, CitC is a paradigm-shifting concept, and probably also inspired the collaborative Google docs. I can code, test and submit the code changes all using a web browser. Similar to a network file system, CitC virtually mounts the whole Piper codebase in a directory with a write-on-copy concept, so only my changes to files are stored somewhere outside piper. It’s easy to see that this doesn’t require a lot of dedicated resources and the entire codebase can be hosted together with an web-based editor. It also allows collaborated editing of the same file at the same time, with some delays! I personally never tried it as remote pair coding is very rare and nowadays people prefer bandwidth-hogging screenshare instead.

The hosted editor used to be Cider, an internally built editor. Luckily for me, when I joined Google in 2022, Cider-V, a modified VS Code editor, is well in beta. So I took the familiar option and never looked back. It offers a seamless user experience: the entire codebase shows up as soon as I open the Cider-V web app and is in a fig/piper workspace, unit tests runs on shared servers at the click of a button, Gemini for Google is available on the Chat panel, integrated terminal automatically connects to my Cloudtop, and creating a CL is just as easy.

Blaze BUILD and Hermetic Python

Blaze, open sourced as Bazel, is our internal build system that supports every language Google uses, from C++, Java, to Python. It’s sometimes annoying to write the build rules, but it bring a benefit: hermetic builds. Blaze will traverse through the dependency graph, build files from any language in their intend ways and collects all artifacts into a single file (probably in zip or archive format). Of course, it detects cyclic dependencies. Conceptually, it’s very similar to Java’s uber JAR, but extended to all languages. Even Python is built this way, so no more virtual environments or docker images.

Want to use open-source code? Since the build is hermetic to Piper, there are teams dedicated to importing popular open-source projects (e.g. numpy) into Piper under the /third_party/ directory and making sure that nothing breaks after each update. Our build rule will recognize the third_party prefix and build the hermetic Python executable such that importing those libraries is the same as how non-Googlers use them.

With this kind of stringent dependency management, Google can index every function in supported languages and link it to its call sites, so I can quickly and precisely navigate between function definition and function calls in both Cider-V and Code Search. It just works frictionlessly!

Borg and XManager

Since all machines at Google runs the same gLinux distribution and contains the same dependencies, we can execute any executables on any machine, bare metal. Borg, is like the Kubernetes, but managing programs running on bare metal machines, no hypervisor needed.

XManager was originally developed by DeepMind to manage training runs and TPU allocations. Its graphical interface is much better than Borg’s ancient look. Now, it’s platform to manage dynamic TPU allocations (thinking AWS Spot instance), hosts and manages training runs. Each training run is uniquely identified by a XID and can have multiple worker units, some worker units using TPU and some worker units using only CPU.

Logs can viewed and searched in an interface very similar to Google Cloud’s Logs Explorer. It’s fast and powerful, supporting filters by log level, time range, worker id, attempt id and fuzzy content match.

Debugging

While most of my debugging is still done in the traditional way with logs, sometimes it’s more convenient to use pdb to inspect runtime information, but that only works if my code runs on the Cloudtop. Some code, particular the ones intended to run on TPU-backed servers, has to be run remotely.

Luckily, Google has g3pdb, a remote Python debugger with a web interface. I just set a breakpoint in code and launch the program. When the breakpoint is reached, I get an email notification with a link to the debugger’s web interface. Just be careful about multi-threaded programs. Since each breakpoint call triggers an email, my carelessness once triggered a bombardment of over 2000 such notifications.

One other interesting note: when Google’s C++ program crashes, it not only prints out a stack trace, but also prints out a link to a tool called Coroner loaded with the crash dump. I don’t know how they did it, as the C programs I wrote only crashes with “segmentation fault. core dumped”.

Testing

Testing is in Google’s core engineering culture, so we have multiple testing infrastructure available:

presumbit run the directly affected unit tests
TAP (Test Automation Platform) presubmit discovers dependencies from other directories and runs those unit tests, too. If a test becomes flaky, Presubmit advisor can automatically ignore them. You can manually put a CL into a TAP train, which runs in an infinitely loop, grouping multiple CLs together and runs every unit test in the codebase to make sure nothing breaks.
Guitar tests runs longer-running integration tests. It’s configured in each directory’s blueprints.

It seems that Google’s test cases not only outputs regularly to stdout, but also outputs a parsable XML file that a web service can read. This is true even for tests running locally on the developer instance (I used a virtual Cloudtop). The web service can present test results nicely, breaking the test assertions outputs into individual test cases and present all test run logs in a separate tab. It would also aggregate test statistics for the presubmit advisor.

Profiling

Code performance is at the heart of Google as well, so profiling is as easy as a click of a button. Somehow Google managed to enable both CPU profiling (pprof) and TPU profiling (Xprof) transparently for all binaries, presumably with minimal performance hit. At a high level, I just need to identify the Borg job I want to profile, click a button, specify the duration I want to profile and wait. Profiling results are displayed as a flame graph for pprof and a trace timeline for xprof, all with complete symbols.