Sourcegraph plans to index the entire open source web

Enterprise

The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Register now!


Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

Sourcegraph is expanding its universal code search platform to the cloud and in the process indexing millions of public repositories from GitHub and GitLab so anyone can search them. The launch comes hot on the heels of a $125 million series D funding round that valued the company at a hefty $2.6 billion.

“We’re launching Sourcegraph.com as a full-fledged product for searching the open source universe,” Sourcegraph cofounder and CTO Beyang Liu told VentureBeat.

Big code problem

Founded in 2013, Sourcegraph set out to “tackle the big code problem” with a platform that addresses the growing volume and variety of source code most businesses have to deal with across their projects. With every company now essentially a software company, they all have to deal with code (to varying degrees). But as these codebases grow and more repositories and developer tools are thrown into the giant coding cauldron, it becomes trickier to manage everything and harder for developers to meet sprint deadlines.

To address this challenge, Sourcegraph combines the various strands that make up a modern developer operations (DevOps) stack, spanning repositories, programming languages, file formats, editors, and more. Through Sourcegraph, developers can find and fix things more quickly, figure out how to use a particular function, establish what impact changing a piece of code will have on dependencies, automate large-scale refactors, and more.

Above: Sourcegraph: Large-scale refactor with automated “batch change”

So far, Sourcegraph customers such as Amazon, Cloudflare, Uber, and PayPal have had to run self-hosted Sourcegraph instances. But as part of its mission to index the entire open source web and make it searchable, the San Francisco-based company is also ushering the business side of its operations into the hosted cloud era.

While this will no doubt appeal to startups and individual coders, given that the cloud makes it easier to collaborate and search for repositories, it will also open Sourcegraph’s target market to a broader range of enterprise customers who prefer a cloud product.

The company hasn’t given a specific date for this shift, but it said today’s announcement sets the wheels in motion for a “bigger launch” this fall that will bring Sourcegraph “to a new batch of companies.”

SaaS-y

Sourcegraph’s new portal is a search engine for code that allows anyone to find and pore over millions of open source projects and personal private code for free — the ability to add private repositories to Sourcegraph’s cloud wasn’t available to the public before now. Sourcegraph will also charge companies to upload their private repositories so internal developers can search them from their browser.

“This is a significant move for us as a company because it signals our shift to a SaaS business model,” Liu said.

Prior to now, Sourcegraph.com was “basically a great big demo of Sourcegraph Enterprise,” according to Liu, meaning there was no way for users to add their own public or private repositories. “The search index was big by internal codebase standards but small compared to the overall volume of interesting open source [projects],” he said.

Though the public code search interface has been live for some time already as a proof of concept, for today’s official launch Sourcegraph has indexed the top 1 million repositories on GitHub and roughly 12,000 from GitLab. By the end of the year, it plans to push the total figure to more than 5 million — every GitHub and GitLab repository with more than one star.

“We’re prioritizing by quality because when you’re searching over code, you care about finding the best function or best usage example, not just some random code snippet that might contain bugs,” Liu explained.

Sourcegraph will also include prominent open source projects that aren’t on either GitHub or GitLab, and developers will be able to manually add any repository themselves, regardless of its star rating.

“Google for code”

While code is already searchable through its respective code hosts, Liu likens the status quo to that of web search in the days of AltaVista.

“What we’re building is more like a Google for code,” Liu explained. “Sourcegraph is obviously quite different from Google Search, because code is a very different form of data. But it’s similar in that we’re solving the search problem as a first-class citizen — we’ve invested in deep technology that enables us to build a much better user experience. And as a consequence, developers who use Sourcegraph find themselves searching over code an order of magnitude more than when they were just using their code host’s search functionality.”

Pooling GitHub and GitLab will likely cover the lion’s share of “worthwhile” open source projects and make them searchable through a single interface, saving developers from having to visit different channels and interfaces to find what they’re looking for.

“We see this all the time with our customers that have multiple code hosts — one of the big draws of Sourcegraph is it’s intuitive and everything is accessible in one place,” Liu explained. “Now we can have all the open source discoverable in one place too.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member