Windows is live on Git
May 25, 2017 3:42 PM   Subscribe

Over the past 3 months, we have largely completed the rollout of Git/GVFS to the Windows team at Microsoft.

As a refresher, the Windows code base is approximately 3.5M files and, when checked in to a Git repo, results in a repo of about 300GB. Further, the Windows team is about 4,000 engineers and the engineering system produces 1,760 daily “lab builds” across 440 branches in addition to thousands of pull request validation builds. All 3 of the dimensions (file count, repo size and activity), independently, provide daunting scaling challenges and taken together they make it unbelievably challenging to create a great experience. Before the move to Git, in Source Depot, it was spread across 40+ depots and we had a tool to manage operations that spanned them.
posted by cgc373 (47 comments total) 8 users marked this as a favorite
 
Ars Technica with further details.

If you told me twenty years ago that Microsoft would be doing this I would have laughed in your face.
posted by Talez at 3:47 PM on May 25, 2017 [7 favorites]


Why Google Stores Billions of Lines of Code in a Single Repository

"The Google codebase includes approximately one billion files and has a history of approximately 35 million commits spanning Google's entire 18-year existence. The repository contains 86TBa of data, including approximately two billion lines of code in nine million unique source files. "

So, meh, yeah, the MSFT repo is pretty cool I guess.
posted by GuyZero at 3:56 PM on May 25, 2017 [5 favorites]


What I have heard of Google's build tools is insane.

But good for MSFT
posted by PMdixon at 4:05 PM on May 25, 2017 [2 favorites]


GuyZero: "So, meh, yeah, the MSFT repo is pretty cool I guess.
"

The amazing thing about google's repo is that some of it is for projects they haven't even killed off yet. truly amazing
posted by boo_radley at 4:06 PM on May 25, 2017 [47 favorites]


So, meh, yeah, the MSFT repo is pretty cool I guess.

Google can brag about their code repository when they restore Google Reader.
posted by srboisvert at 4:09 PM on May 25, 2017 [34 favorites]


As a former TFS admin at a medium sized software firm,

440 branches

Gave me a literal stomach cramp just thinking about it.
posted by RolandOfEld at 4:24 PM on May 25, 2017 [2 favorites]


RolandOfEld: "As a former TFS admin at a medium sized software firm,

440 branches

Gave me a literal stomach cramp just thinking about it.
"

hello, I made the switch from TFS to git. We had ~950 projects in TFS 2012.

TFS branches are heavy duty server-only constructs. they are garbage for idiots in comparison to git branches. Make the transition now.
posted by boo_radley at 4:35 PM on May 25, 2017 [2 favorites]


Oh I'm long go e from that joint. Now I get to decipher Azure, SQL, and (more recently) Power BI odds and ends. I don't miss debugging builds and dealing with dev branch rights and releases.
posted by RolandOfEld at 4:39 PM on May 25, 2017


Is there a word for something that seems like a brag, and which is intended as a brag, but is actually kind of damning? An own-goal brag? A Pyrrhic brag? An anti-brag?

Because "oh boy our codebase is so GIANT and UNWIELDY that regular build tools can't cope and we've had to write our OWN PROPRIETARY BUILD SYSTEM" is one of those.
posted by Pyry at 4:43 PM on May 25, 2017 [12 favorites]


AFAIK Microsoft has never internally used its own version control products -- probably because they all sucked. First they used an internally-developed version of RCS, then a fork of Perforce, and now they use a virtual filesystem layer on top of git.

Last time I had to deal with Perforce at a remote site, it took a full week to sync up with the main repo at HQ, so I can see how they may have had some pain points. Let's chalk up a win for flat files and C (and some bash scripts, maybe)
posted by RobotVoodooPower at 4:52 PM on May 25, 2017 [4 favorites]


Yeah, MSFT has primarily used externally developed version control (well, Source Depot which was basically a tweaked version of Perforce), this is not really a big deal in that regard. And they've also used plenty of open source tools, also not new.

(I'm glad I got out before they switched to git, though, as I really don't like it.)
posted by thefoxgod at 4:59 PM on May 25, 2017 [1 favorite]


Bumblebrag?

I think any of bumble,stumble or fumble-brags would be good depending on the exact flavour of foot the brag has placed in the bragger's mouth.
posted by Jon Mitchell at 5:03 PM on May 25, 2017 [7 favorites]


"regular build tools can't cope and we've had to write our OWN PROPRIETARY BUILD SYSTEM"
I mean, how do you think "regular build tools" get written? You used to have to PAY MONEY for compilers, and people certainly still pay a lot of money for build management systems like CircleCI or a Jenkins consultant. People even pay people to administer "proprietary" systems like Perforce that they're already paying money for! People still pay Microsoft for compilers!

Developing a file system adapter that you open-source like this (instead of a custom, git-compatible virtualized file system like Google has) seems very savvy to me.
posted by JoeBlubaugh at 5:04 PM on May 25, 2017 [2 favorites]


Several years ago Google used to run Perforce and their central server was a single machine:

"Google’s main Perforce server supports over twelve thousand users, has more than a terabyte of metadata, and performs 11-12 million commands on an average day. The server runs on a 16-core machine with 256 GB of memory, running Linux. The metadata is on solid state disk, the depot files are on network-attached storage, and the logs and journal are on local RAID."

which I think was by far the largest single server in the company.

Is there a word for something that seems like a brag, and which is intended as a brag, but is actually kind of damning? An own-goal brag? A Pyrrhic brag? An anti-brag?

Because "oh boy our codebase is so GIANT and UNWIELDY that regular build tools can't cope and we've had to write our OWN PROPRIETARY BUILD SYSTEM" is one of those.


uh, well, size is a thing and I don't think it's that crazy. AOSP developed its own tool for changelist reviews, Gerrit. Which is itself a fork of other tools, but I don't think it's damning that AOSP has merge problems that most other open-source projects don't have.
posted by GuyZero at 5:07 PM on May 25, 2017 [3 favorites]


Git branches are nothing, you can spin off an merge a bunch of them before lunchtime and it;s just a normal day. It's astonishing if you've come from Perforce or TFS or whatever.
posted by Artw at 5:15 PM on May 25, 2017 [4 favorites]


Pyry: "Because "oh boy our codebase is so GIANT and UNWIELDY that regular build tools can't cope and we've had to write our OWN PROPRIETARY BUILD SYSTEM" is one of those.
"

Eh. If you're a software engineer and your codebase is your company's value -- you make it the best possible thing. 85% of the time, standard COTS tools are good enough for anything. If your codebase literally runs everything your company does everywhere for everything all the time all day, you might think about the complexities you're forced to deal with.
posted by boo_radley at 5:23 PM on May 25, 2017 [6 favorites]


oh boy our codebase is so GIANT and UNWIELDY that regular build tools can't cope

None of the problems they describe running into really have anything to do with the structure of the code, just the quantity. And given how much stuff Windows does and supports, the fact that they have a large codebase seems unavoidable, and not at all an indictment of their developers.

I think probably the more common solution for big projects is what they used to do - split things up into a bunch of different repos, and then make a bunch of hacky infrastructure to manage cross-repo dependencies and integration and testing and cetera, and it all mostly works but it's a pain to maintain and a pain to use. And you're still using a bunch of proprietary nonsense.
posted by aubilenon at 5:24 PM on May 25, 2017 [4 favorites]


looking through related threads:

I give it about 3 years before everyone decides that some new version control system is better, and that Git just won't do anymore.
posted by thelonius at 5:38 PM on February 11, 2014 [8 favorites +] [!]


my orb malfunctioned
posted by thelonius at 5:26 PM on May 25, 2017 [9 favorites]


"If you told me twenty years ago that Microsoft would be doing this I would have laughed in your face."

Well, git didn't exist then, so you might have just said "huh?".
posted by floppyroofing at 5:30 PM on May 25, 2017


None of the problems they describe running into really have anything to do with the structure of the code, just the quantity.

You can solve almost any quantity problem with proper separation of concerns and well defined interfaces haha oh god help me
posted by The Gaffer at 5:46 PM on May 25, 2017 [10 favorites]


"If you told me twenty years ago that Microsoft would be doing this I would have laughed in your face."

Even as little as one or two years ago, I would have...

...E pur si muovecommit.
posted by mystyk at 5:58 PM on May 25, 2017


None of the problems they describe running into really have anything to do with the structure of the code, just the quantity.

I really cannot agree with this statement. Having a ton of code is one thing, but having all that code in a single repository that has to be versioned / managed together is altogether a different thing. The former does not require the latter, and there's plenty of ways to split up code into reasonable modules such that while you may have gigabytes of code extant, nobody is having to deal with a single view into all that code at once (vs. looking at their small slice, plus pulling in built dependencies from an external source).
posted by tocts at 6:00 PM on May 25, 2017 [2 favorites]


The former does not require the latter, and there's plenty of ways to split up code into reasonable modules such that while you may have gigabytes of code extant, nobody is having to deal with a single view into all that code at once

Did you read the Why Google Stores Billions of Lines of Code in a Single Repository paper linked above? It's written by some people with a lot of practical experience from managing a single shared repository, and goes over the advantages and costs in detail.
posted by effbot at 6:11 PM on May 25, 2017 [6 favorites]


The whole point of Microsoft's new GVFS is that it allows for a single repo without requiring everyone to download and store everything.
posted by Harvey Kilobit at 6:11 PM on May 25, 2017 [2 favorites]


there's plenty of ways to split up code into reasonable modules

While I agree in principle, my experience has been that keeping dependencies isolated is truly a problem of political will, not a technical one, as such.
posted by smidgen at 6:17 PM on May 25, 2017 [3 favorites]


Oh, and for those wanting to use MSFT's fancy new "open" tech: "GVFS requires Windows 10 Creators Update or later".
posted by smidgen at 6:19 PM on May 25, 2017


Note: do not use git submodules.
posted by Artw at 6:20 PM on May 25, 2017 [8 favorites]


Did you read the Why Google Stores Billions of Lines of Code in a Single Repository paper linked above?

Yes, I did. It mostly sounds like a case study in how if you're stubborn enough and willing to dump enough money into custom tooling, you too can pretend that having a giant monolithic codebase isn't a problem.

The disadvantages section of that paper is both understating the problems and yet also still giving me hives.
posted by tocts at 6:37 PM on May 25, 2017


srboisvert: off-topic but I find NewsBlur is now better than Reader was, with the added perks of having a business model and being open source if the owner ever gets tired of it.
posted by adamsc at 6:38 PM on May 25, 2017 [2 favorites]


git /Status
posted by rhizome at 7:25 PM on May 25, 2017


Oh, and for those wanting to use MSFT's fancy new "open" tech: "GVFS requires Windows 10 Creators Update or later".

Pretty sure Git was Linux-only at first...
posted by save alive nothing that breatheth at 7:38 PM on May 25, 2017


Yeah, the point of being open source is not that you provide every possible implementation. The code is there, right? I haven't checked the license, but presumably someone could attempt to port it to another platform (although I suspect it would be a lot of work!).
posted by thefoxgod at 7:46 PM on May 25, 2017


Is it reliant on the new BASH support?
posted by Artw at 7:51 PM on May 25, 2017


AFAIK Microsoft has never internally used its own version control products

Every project I've worked on at Microsoft in the last nine years has been on TFS, including the component of Power BI I'm working on now. (In short, Microsoft is a land of contrasts?)
posted by Slothrup at 8:33 PM on May 25, 2017 [2 favorites]


In short, Microsoft is a land of contrasts

Hah, thats fair. Its a big place with lots of teams and does not have a single system to rule them all. Source Depot was pretty widely used at one point I believe, but never 100%.
posted by thefoxgod at 8:54 PM on May 25, 2017 [1 favorite]


Did You Know?

Linus Torvalds, creator of the Git version control system, also once wrote an operating system kernel.
posted by ckape at 8:57 PM on May 25, 2017 [9 favorites]


Right, it's part of systemd/GNU/Linux, if I might interject.
posted by save alive nothing that breatheth at 8:59 PM on May 25, 2017 [10 favorites]


Having a ton of code is one thing, but having all that code in a single repository that has to be versioned / managed together is altogether a different thing. The former does not require the latter, and there's plenty of ways to split up code into reasonable modules such that while you may have gigabytes of code extant, nobody is having to deal with a single view into all that code at once

Well of course you want to do that as much as possible anyway. But no matter how careful you are, you're sometimes going to have changes that cross module boundaries, and that sucks when they're in different repos. And it sucks for your integration tests. And it makes it hard to change module boundaries if your code changes over time. But, yes, it is the "standard" way to work around the limitations of version control software.
posted by aubilenon at 10:01 PM on May 25, 2017 [2 favorites]


The current vogue of micro services has inspired a dev team I know to spread their application over 20 repos. For an 8 person team.

Personally I've had my fill of projects that version submodules individually. F.i. there was this system that had 35 individually versioned maven modules. The result was that it was hard to keep track of which combination of module versions made for a stable system.
Strangely git submodules are not the nice middle ground that you think they'd be.
So I try to keep things in one git repo as long as that's feasible given the team size.
posted by jouke at 10:18 PM on May 25, 2017 [4 favorites]


Git is a weird duck if you're used to other source management paradigms, but at least the poor folks at Microsoft are not still having to use Source Safe. (That is, if they ever used it to begin with. One of the worst tools I've ever had the misfortune to use and easily in the top three of my personal Worst Tools Ever list.)
posted by fifteen schnitzengruben is my limit at 10:49 PM on May 25, 2017 [3 favorites]


Did You Know?

Linus Torvalds, creator of the Git version control system, also once wrote an operating system kernel.


And... way back in 2002 he previously used a closed-source, proprietary version control system called BitKeeper...

Unfortunately, eventually the owner/developer of BitKeeper (Larry McVoy) and the Linux open-source community (Andrew Tridgell) got into a bit of a disagreement... Linus developed a replacement, and gave it a name that reflected his opinion of either... himself, Andrew or some other person involved in the flamewars...

What I find absolutely hilarious is that Git is/was supposed to be a completely distributed version control tracking system - and now everyone has made themselves dependent on a single, centralized source... GitHub.com...
posted by jkaczor at 6:56 AM on May 26, 2017 [3 favorites]


"...but at least the poor folks at Microsoft are not still having to use Source Safe."

You're giving me flashbacks. SourceSafe was horrendous. TFS is not without its flaws (and is actually pretty good for smaller-scope deployments), but next to SourceSafe damn near anything would automatically be better.

In the last decade, TFS and Git are pretty much the only source control programs I've been using. Before that, I had an employer that required SourceSafe, but I also used Subversion and Mercurial occasionally.
posted by mystyk at 7:17 AM on May 26, 2017


Some say that the most valuable thing Microsoft got from buying WebTV was WebTV's uniquely liberal license to Perforce which included full source access and let them do the SD fork and tweak it as needed.

Google could never persuade Perforce to sell them a source license so were stuck with the off the shelf version for years. P4 was not universal at Google. I know I also used SVN and Git while there. Big companies always have exceptions.
posted by w0mbat at 8:18 AM on May 26, 2017 [5 favorites]


What I find absolutely hilarious is that Git is/was supposed to be a completely distributed version control tracking system - and now everyone has made themselves dependent on a single, centralized source... GitHub.com...

That's only for the minority of Git repositories that are pushed to a server. It's highly probable that the majority, the dark matter of the Git universe, is local disk repositories created in project directories by IDEs (Xcode, for one, can do this), which never get pushed to a server.
posted by acb at 2:50 PM on May 26, 2017 [1 favorite]


GitHub is popular but as a critique that seems pretty lazy: as acb noted that ignores tons of private repos (and don't forget BitBucket, GitLab, etc. for self-hosting) but there's an even more fundamental flaw because every Git repo has the full history. If your central Subversion, etc. repo goes down you can't work at all. If GitHub goes down, you can still work locally and push to any server or other user as you like and nothing will break when GitHub comes back online.
posted by adamsc at 3:05 PM on May 26, 2017 [3 favorites]


If GitHub goes down, you can still work locally and push to any server or other user as you like and nothing will break when GitHub comes back online.

And if GitHub never comes back online you're still okay.
posted by aubilenon at 10:39 PM on May 27, 2017 [1 favorite]


> Because "oh boy our codebase is so GIANT and UNWIELDY that regular build tools can't cope and we've had to write our OWN PROPRIETARY BUILD SYSTEM" is one of those.

I recently worked on a software product about 17 years old, and the SVN commits for every line ever written for it were still accessible. Transitioning the project to Git took a couple years of coordination. There was the non-technical problem of overcoming inertia among some managerial stakeholders. But the new-product developers, sustaining-product developers, and support staff all had working processes that couldn't afford SLA-penalized downtime just because somebody couldn't find something fast enough after the cutover.

Microsoft's core product has code that's over 35 years old in it, and every public release has to be backwards-compatible with every public release prior to it. That's an incredible volume of code and legacy to maintain; even if Windows was the sleekest software package on earth it'd still be billions of lines of code in millions of commits. If they were able to make a corporate-wide repo change in less than two years, they did pretty well.
posted by at by at 4:48 PM on May 28, 2017 [1 favorite]


« Older Be fat. Be ugly. Don't be boring.   |   Counting at $20 per second Newer »


This thread has been archived and is closed to new comments