đ Developer experience newsletter
Welcome to the developer experience newsletter! This is a newsletter prepared by the DevX team to highlight contributions and updates to Sourcegraphâs developer experience, which is an area the DevX team focuses on but is owned by everyone.
To have your updates highlighted here, please tag your PR or issue with the dx-announce
label! If you have questions or feedback, feel free to reach out in #dev-experience or in our discussions as well.
To learn more about components of Sourcegraphâs developer experience, check out the developer documentation.
Sept 19th, 2022
Welcome to another iteration of the Developer Experience newsletter! As a reminder, you can check out previous iterations of the newsletter in the newsletter archive.
Tracing with OpenTelemetry
OpenTelemetry is now the default tracing export mechanism in Sourcegraph.
This means that traces now export to an OpenTelemetry Collector instance, which can then easily be configured to export data to a variety of backends - for example, in s2 we are currently exploring exporting to both Honeycomb, Google Cloud Traces, and Jaeger to assess the available options.
OpenTracing API calls are bridged to OpenTelemetry automatically, though we strongly recommend that everyone migrate to OpenTelemetry APIs (either internal/trace
or go.opentelemetry.io/otel
), which brings better API ergonomics and improved usage of context.Context
. Weâve added a linter that forbids new imports of OpenTracing APIs, and added deprecation notices on internal APIs that should no longer be used.
Additionally, the web app can now be instrumented with OpenTelemetry as well, with traces being sent to the frontend service which proxies it to the deployed OpenTelemetry Collector.
To get started with testing out OpenTelemetry locally, refer to our OpenTelemetry development docs and refer to our tracing for site admins guidance.
If you are interested in learning more about OpenTelemetry in general and the specifics of the engineering work required to make this happen, check out this recording of a DevX team Q&A about OpenTelemetry!
Frontend news
esbuild
for faster frontend builds: Shoutout to Nick Snyder who took it upon himself to get esbuild
in a usable state for Sourcegraph. It doesnât work for everything yet but a few people have reported a markable improvement in sg start
startup time! You can enable it by running DEV_WEB_BUILDER=esbuild sg start
. For more information on esbuild
please see the following docs.
Documentation for analyzing the bundlesize check failure: With more contributions focused on the core workflow improvements, the Frontend Platform team noticed an increased number of bundlesize check failures, and often the reason for it is not apparent. Valery added a step-by-step guide for debugging root causes which should help teams to triage this failure. The Frontend Platform team will be looking into automating these steps entirely next quarter!
sg
goodies
Finding builds or logs by commit: Ever wanted to look at the build of a particular commit? Not in the mood to go through all the pages of Buildkite? You donât have to anymore! With sg ci status
you can now pass it a --commit
flag and it will do all the detective work for you. sg ci logs
also accepts the --commit
flag so you can now easily look at the logs of a build for a particular commit too!
Build annotations, in your terminal!: Ever wanted to check the status of your build in your terminal, but couldnât see those nice little dialogues (annotations) at the top your build showing that test that failed and other helpful links? Well those days are gone! When you check the status of your build with sg ci status
it will now also print any annotation that is present on your build!
Inspect main branch tags with sg ops inspect-tag
: Weâve added a new subcommand named inspect-tag
which allows you to inspect main branch tags. For example you can now inspect the image with sg ops inspect-tag index.docker.io/sourcegraph/cadvisor:159625_2022-07-11_225c8ae162cc@sha256:foobar
or get the build number with sg ops inspect-tag -p build 159625_2022-07-11_225c8ae162cc
. For more examples and other options see sg ops inspect-tag --help
.
Update images in Docker compose manifests with sg ops update-images
: update-images
has been updated and can now update docker compose manifests with sg ops update-images -k compose
. With compose
entering the fold, sg ops update-images
is now able to update images in three different formats namely k8s
, helm
and compose
.
Commands in sg.config.overwrite.yml
should no longer cause as much headache: Thorsten has fixed an issue that many have felt the pain of! Weâve all added a custom command in sg.config.overwrite.yml
only for go run generate
to come along and ruin our dreams. Thorsten, having been bitten by this one too many times, landed a fix for this by adding a flag -disable-overwrite
to sg which is passed to sg when go run generate
runs to generate the reference.md
file. Due to the nature of the fix, there is nothing for you to do!
Multi user Auth testing just got a whole lot easier: Keegan recently added a http-header
auth-proxy that creates a few users and exposes each user on a different port locally. By accessing Sourcegraph through these ports you are authenticated as that user. To use it run go run dev/internal/cmd/auth-proxy-http-header.go
. Words donât do it justice so below is some output of it in action! For more information on how authentication happens please see the docs.
go run auth-proxy-http-header.go
https://docs.sourcegraph.com/admin/auth#http-authentication-proxies
"auth.providers": [
{
"type": "http-header",
"usernameHeader": "X-Forwarded-User",
"emailHeader": "X-Forwarded-Email"
}
]
Visit http://127.0.0.1:10810 for william william@sourcegraph.com
Visit http://127.0.0.1:10811 for user1 william+user1@sourcegraph.com
Visit http://127.0.0.1:10812 for user2 william+user2@sourcegraph.com
Visit http://127.0.0.1:10813 for user3 william+user3@sourcegraph.com
Visit http://127.0.0.1:10814 for user4 william+user4@sourcegraph.com
Visit http://127.0.0.1:10815 for user5 william+user5@sourcegraph.com
A complete set of commands to run web application Puppeteer tests: Have you ever wondered how to debug a CI client flake locally? Always unsure of what environment variables to set to get things right? sg
got your back! Valery updated respective commands, which now link directly to the up-to-date documentation:
sg test web-e2e
sg test web-regression
sg test web-integration
sg test web-integration:debug PATH_TO_THE_TEST_FILE_TO_DEBUG
CI improvements
Go to the Grafana logs of your build straight from your build: Previously, if you wanted to see the logs of your build you had to navigate to http://sourcegraph.grafana.net and wield the dark arts of creating a LogQL yourself to query the logs. Weâve updated annotations on builds to have an additional link named âView Grafana logsâ which will take you directly to Grafana with a prefilled LogQL query for your particular build. One small step to helping you diagnose build failures in your faster!
Flaky tests happen, which is why the docs have a specific section about how to deal with those. In a nutshell, disable them on sight and notify the ownning team so they can fix those. We want to say thank you those who took a few minutes out of their day to improve the CI experience for everyone else. Special thanks to Thorsten Ball who disabled the most of them on his own! And thanks to Camden Cheek, Alex Ostrikov who are following closely.
All builds on the main branch are now faster by about 6 to 8 minutes. This is achieved by caching the client bundle build in the job that builds the server container that is later used to run e2e tests.
The caching mechanism is disabled on releases, to be 100% sure we are shipping the right client bundle. When the caching is used, an annotation such as the one below is displayed, making it explicit from when the client bundle was cached, so we can easily see if it should have been invalidated. Those two extra precautions are taken because the caching is done externally and do not rely on any client tooling, therefore invalidation depends on how careful we are at not missing files that could change the build result.
Stability improvement for frontend steps: Npm has became increasingly unstable over the past weeks, which caused an increase in client flakes when fetching packages that our code depends on. The CI now wraps yarn install
in a retrying loop to mitigate those. Oldest trick in the book!
Build notifications: Weâve rolled out new build notifications! What was wrong with the old ones you might ask? Well, they just took you to the build and you were on your own from then on. With the new build notifications, we show what job failed on your build! We provide a link for you to go straight to the jobs output! If that wasnât enough, weâve also added a way for you to see all the logs of your build in Grafana. All of this from the comfort of Slack. Weâve aimed to make the notifications more actionable and since weâre now in control of the notifications we aim to make more improvements.
More annotations!: Weâve added a custom Mocha reporter which will upon any E2E and QA test failures generate a annotation on Buildkite with the relevant failures. Now you donât have to doom scroll through a bunch of lines just to find the failures! The default buildkite failures Slack messages are not the most actionnable ones, so we have decided to replace them with our in-house notifications which we have total control on:
S2 news
Sentry frontend and backend errors are now available in Sentry on the S2 project. It has a much lower volume than dotcom and reflects more accurately what we can expect to see in our customers deployments. Therefore, itâs a prime candidate to monitor your own domain and to create alerts for issues relevant to your team, thanks to the scope attribute. You can view the loom recording to refresh your memory about logging scopes.
Continuous deployments are live: every hour, S2 is updated with the latest known green commit from the main
branch.
Insights
Now that S2 is our default instance, all DevX insights have been ported over there. While a few of those relate mostly to what the DevX team is doing (for example the logging migration) others insights maybe quite interesting to take a look at, such at the insight tracking the evolution of tests in the codebase.
June 24, 2022
Welcome to another iteration of the Developer Experience newsletter! As a reminder, you can check out previous iterations of the newsletter in the newsletter archive.
New guides
Logging: Weâve made some updates to our logging guidance around the new logging library to include more concrete suggestions and examples around:
- How to use scopes
- Creating sub-loggers (e.g. those with traces and fields)
- Writing log messages
You can find the new docs in How to add logging!
Investigating flakes in CI: Have you ever merged a PR, got pinged in #buildkite-main, and thought âgosh this test failure has nothing to do with my changes đâ? Well with a few quick steps you can easily determine if youâve been hit with a frequently flaking test that should be disabled ASAP, and contribute to keeping our pipelines healthy! Check out our new Loom demo to learn more:
More details are available in the handbook: Grafana Cloud - CI logs, and if you have any questions please reach out in #dev-experience!
Observability features
Sentry errors: Errors are now automatically reported to Sentry from warning-level and above logs entries from sourcegraph/log
loggers that include an error field, which can be added using the log.Error
or log.NamedError
field constructors, for example:
logger.Error("something terrible happened", log.Error(err))
The reported Sentry event includes helpful information like the logger scope, log message, log fields, and more! Check out this demo to see it in action:
Because this raises the amount of errors being reported, weâre experimenting with sampling the errors, which both prevents to fill our quota too quickly and also ensure that weâre not spending too much resources on the reporting itself.
Under the hood, it uses a new âlog sinksâ mechanism that can easily be extended to accomodate new backends in the future - you can learn more about it in the package docs! As a byproduct, the codebase doesnât have anymore any explicit reference to Sentry, apart from the optional Sentry sink itself.
DataDog sunset: Note that the DataDog trial is being ended, and the plan is to sunset DataDog integrations within the codebase.
CLA bot automation
The process of ensuring an external contributor has signed Sourcegraphâs Contributor License Agreement (CLA) is now fully automated. Once a contributor has signed the CLA form, their provided GitHub handle will now automatically be synchronized to the list of approved users.
You can learn more in the clabot-config
repository and the accepting external contributions guide.
sg
goodies
sg [cmd...] --feedback
: Love (or hate) sg
? Want to make a suggestion or found that a command was acting strange? In addition to the --feedback
flag on all commands, weâve also added a feedback
subcommand enabling you to give us feedback right from the comfort of your terminal! When you opt to provide feedback a new discussion will be created in the dev experience category on the Sourcegraph repository.
Generated sg
documentation: Because maintaining documentation is always hard, whatâs better than automation to make sure its stays up to date? The sg
reference is now automatically generated.
sg setup
and sg lint
rewrite: We have rewritten sg lint
and sg setup
to use a shared internal framework that can be used to easily and flexibly build powerful âcheck-and-fixâ tasks. This enables features like:
sg setup -check
to quickly generate a report of what you have set upsg setup -fix
to fix all issues with your dev setup in one gosg lint -fix
to automatically try and fix your lint issues (only supported by a few linters at the moment, but more can easily be added!)- Continuous integration testing for
sg setup
dev/schemadoc
has been removed and is replaced by sg migration ...
commands. Example: sg migration describe -db codeintel --format=psql -force -out internal/database/schema.codeintel.md
to generate the schema for the codeintel
db thanks to #35905.
./dev/generate.sh
has been deprecated in favour of of sg generate go
(and its alias sg gen go
for teammates in a hurry).
sg generate go now displays a progress bar to indicate its status and is also noticeably faster thanks to 35807, #36742 and #36681.
Named migrations files: sg migration
now supports migrations with migration names embedded in the migrationâs directory, and all newly created migrations from sg migration add
will now have migration names included, which will make the migrations
directory a bit easier to browser. #37244
Readability improvements: sg start
logs are now easier to read, as the command names text is now justified. To ensure itâs still readable in small terminals, a few of them have be shortened and the enterprise-
prefix is now implicit whereas oss-
prefix has been introduced.
sg
analytics: The DevX team is experimenting with collecting analytics on sg
usage and issues! For now this is a manual process, so if youâd like to contribute you data to our explorations, you can submit your data with sg analytics submit [username]
, or check out what data has been collected with sg analytics view
!
ADRs â¤ď¸ sg
: You will soon be able to list, search, view, and create Architecture Decision Records directly from sg
! #37718
CI improvements
- The linter job that runs on every build is now inferring which linter task needs to run depending on the changes (except
main
where it runs everything), saving some time on pull requests thanks to #35331. - When you retry the
sg lint
step, theverbose
flag will be added allowing you to see more of what is going on. - You can now force the run of tests that are executed when your PR is ready for review by specifying in your commit message
[ready-for-review]
. Gone are the days of flipping your PR between draft and ready for review!
Tech Radar
You can browse this month tech-radar to get a birdâs-eye view of Sourcegraph tech stack! Check this guide to learn about how updating it with your team initiatives.
DevOps and DevX team merge
As you may have heard, the Cloud DevOps team is splitting up into a new Cloud team, with the remaining teammates merging into the DevX team - so weâre happy to welcome 3 new teammates to the DevX team!
A consequence of this change will be a shift in ownership of various domains - this is still a work in progress, but you can see an overview in the Cloud DevOps handoff plan. Broadly speaking, the main changes are that the DevX team will soon own and lead initatives on the following fronts:
- observability (including internal tooling and external services like Grafana Cloud, Sentry, etc)
- the operation of sourcegraph.com
- misc. ops-related support
We will have more to share soon as the dust settles!
May 20, 2022
Welcome to another iteration of the Developer Experience newsletter! It has been quite a while since the last newsletter, so this edition will focus on more recent changes and highlights. As a reminder, you can check out previous iterations of the newsletter in the newsletter archive.
Logs, logs, logs đď¸
A brand new logging package is now available in github.com/sourcegraph/log
. This library outputs a OpenTelemetry-compliant log format in JSON mode, paving the way towards enabling customers to more easily ingest and leverage logs, and also offers a performant, strongly-typed interface for providing log fields. The library also encodes a number of best practices:
- No global loggers - it is no longer possible to instantiate a logger at compile time, and users should hold their own references to loggers, so that fields can be attached and log output can be more useful with additional context, and pass Logger instances to components as required.
- Loggers are attached to scopes - this helps orient log entries within a larger codebase. For example, when creating a GitHub V3 client for making a auth provider to make requests, one might use
log.Scoped("provider.github.v3", "provider github client")
.
This library will also be the target of many observability improvements going forward, such as automated error reporting, improved test output, and more. To learn more, check out the new How to add logging guide.
Weâve also extended the existing internal/observation
package, which aims to provide all-in-one logging, tracing, and monitoring primitives, to integrate logging throughout all levels of an observation.
This enables you to easily write logs that includes helpful context like traces, metadata, observation context, and more.
To learn more, check out How to add observability.
The DevX team has been collaborating with teams to help migrations to the new logging library - we encourage everyone to start incrementally migrating their existing logging, and to reach out to #dev-experience for feedback and questions!
sg
experience â¨
The sg
experience has been a major focus for the DevX team and we have been working towards a variety of improvements for both users and contributers of sg
!
First up, usage improvements:
sg
now supports autocompletions that you can trigger using the Tab key to help you type out long command names and flags faster, and to help you learnsg
commands faster! To get started, make sure youâve runsg setup
. (#33817)- Many
sg
flags now have short aliases (such assg run -d enterprise-frontend
),and commands can declare shorter aliases too (such assg ci st
) to save you on some extra keystrokes. - Misspelled a command?
sg
will now prompt you with some suggestions! (#33943) - Help text is much improved, with
sg help
now rendering commands by category. sg lint
has seen a variety of improvements, and now powers all linters that we run in CI, which means you can easily replicate linter runs locally for debugging and enabling developers to customize linter output with much more granularity.sg
âs autoupdate mechanism has gone through a number of iterations and should now be reliably auto-updating yoursg
installation seamlessly whenever you runsg
.
For developers wanting to streamline their developer experience with sg
functionality, weâve also made a lot of internal improvements:
- Linters are easier than ever to build with the updated
lint.Runner
interface, which now also provides you an easy way to get changed files and iterate over added lines to perform incremental linting. To get started, just head on over to thedev/sg/linters
package! - The migration to a new CLI library,
urfave/cli
, includes features like:- A much nicer API for defining flags and fetching them without declaring global variables, and convenience functions for safely getting arguments: example.
- Developers can implement custom completions for their commands with the
BashComplete: completeOptions(...)
API. - Flags and commands can now have short aliases!
sg.config.yaml
can now leverage external secrets (we currently supportgcloud
only) with the newsecrets:
field, andsg
commands can use thesecretsStore.GetExternal(...)
API.- This is used by
sg test frontend-e2e
, which you can use to run Sourcegraphâs E2E tests with ease! (#34627)
- This is used by
- The output API has been overhauled to be centralized in
std.Out
, which now centralizes the exports of a variety ofsg
-specific utilities for incorporating ⨠fancy ⨠output for some added bling. (#35269) - Writing scripts? We strongly recommend everyone to start writing scripts in Go within
sg
, which gives us more code-sharing opportunities, better cross-platform compatibility, and more advanced features such as better output management. To enable this, the DevX team has started developing a new command execution library,github.com/sourcegraph/run
, aimed at providing a seamless way to execute commands and manipulate its output in Go. (#35417)
Following your code from PR to production đ˘
Deployments are now announced over Slack, in #alerts-preprod-cloud for the preprod and in #deployments-cloud for Cloud deployments. If you want to receive a mention on those announcement when your PR is getting deployed, you can use the notify-on-deploy
label. If the label is present when the PR is deployed youâll receive the notification.
Deployements schedules can be observed in Honeycomb Dashboard which tracks how much time elapsed from the moment a PR being merged to the moment it got deployed.
Smoke testing âď¸
Every ten minutes, smoke tests are being run against Sourcegraph CLoud, first making basic infrastructure tests then performing a quick search to ensure that the application is up. And itâs for real this time, #16589 turns failures into an automated incident being created.
Learning resources đĽ
Check out this selection of some of our recently published learning resources!
- The Sourcegraph Codebase Tour video gives an overview of our main repository and whatâs in it.
- The Tour of Secondary Repositories goes over some of the secondary repositories at Sourcegraph.
- Local development with sg setup is a video demonstrating how to use
sg setup
to set up your local development environment.
Architecture decision records đ°
The idea behind architecture decision records, or ADRs, is to have small documents that are part of the codebase, not an external artifact that you have to be aware of like an RFC.
We encourage the use of Architecture Decision Records (ADRs) for logging decisions that have notable architectural impact on our codebase. Since weâre a high-agency company, we encourage any contributor to commit an ADR if theyâve made an architecturally significant decision.
Note that ADRs are not meant to replace our current RFC process but to complement it by capturing decisions made in RFCs. However, ADRs do not need to come out of RFCs only. GitHub issues or pull requests, PoCs, team-wide discussions, and similar processes may result in an ADR as well, allowing to keep everyone in touch.
To learn more, check out the ADR index page and ADR 1: Record architecture decisions.
Database migration tooling đď¸
There are two new goodies for database tooling available via sg migration
locally and via the migrator
binary shipped with every Sourcegraph instance.
- New
describe
command provides a formatted version of your databaseâs current schema - New
drift
command provides a diff of the expected schema and your databaseâs current schema
And for some added bling, both of the new commands have been beautified! (#35722, #35735)
Preproduction environment đŹ
Before going out into production on Cloud, all changes are going throuh the preprod environment. The preprod environment is running in DotCom mode with a smaller dataset but with similar resources. Notably, itâs running some services which are sharded in production, but not within CI, at the miminum size that enables to exercise all code paths. This opened the path to increase our confidence toward changes through automated testing that werenât previously possible.
Because tests on the prepod requires to put the application in a specific state to perform testing, state is being restored based on a snapshot which makes deterministic (#16249)
As a result, #16301 the code intel QA test suite is now running in preprod, and others will follow shortly.
Buildkite foundations âľ
Since the end of March, 100% of our CI builds are now run on stateless agents. It means that by design, itâs now guaranteed that a given build cannot impact following ones. This enabled to roll out our own autoscaler backed by an in-house buildkite-job-dispatcher. This resulted in a whopping 30% decrease of our CI spending on GCP in April!
To try and maintain parity with the stateful agents of old, we have implemented a variety of measures to keep CI times down:
- Cross-node git repository mirrors means that repository cloning is consistently just as fast - if not faster! - than stateful builds.
asdf
caching has been used to speed up the installation process of all languages and tools needed to run our CI builds as we have now been running all builds on stateless agents. It has been extracted into a plugin, making it available for other pipelines as well.- We have enabled image streaming for our CI cluster, which has reduced the time to pull an image and start it from 1m50s to ~2s, which means lower wait times for your CI builds. (infrastructure#3296)
Some other CI goodies include:
- Many linting steps have been rolled into a new CI step powered by
sg lint
, which now generates output that is actually readable compared to the âMisc lintersâ step of old! If you run into issues, (hopefully) helpful annotations will also be added to your build summary. - Sending out a Slack mention when a specific step failed is also available as a plugin, which is also useful to make sure to notice a failure on a single step independently of the build result. For example, this is being used to alert the Code Intel team if their test suite fails on a preprod build.
- Changes to the client app now render app previews running against k8s.sgdev.org! Learn more in the Sourcegraph docs: Exploring client changes with PR previews
Tech Radar đĄ
Thoughtworks Technology Radar is a well known source for getting a sense of where the technology landscape is going. What really makes it stand out is how it surfaces Thoughtworks opinions in a format that is really easy to process. What if we had if we used the same medium to keep everyone in the loop within Sourcegraph on our various initiatives on the engineering front?
Thanks to #35538, itâs an ongoing exploration that could potentially help all of us to stay in touch with all the ongoing initiatives at Sourcegraph in the blink of an eye. Feedback and ideas are welcomed on the PR!
Feb 24, 2022
Welcome to another iteration of the Developer Experience newsletter of notable changes since the Jan 10th issue! As a reminder, you can check out previous iterations of the newsletter in the newsletter archive.
To have your updates highlighted here, please tag your PR or issue with the dx-announce
label! If you have questions or feedback, feel free to reach out in #dev-experience or in our discussions as well.
SOC2 compliance processes
A new bot, pr-auditor
, is now live in sourcegraph/sourcegraph
and is rolling out to a number of other repositories that houses code that reaches customers. pr-auditor
will add status checks on your pull requests when you edit descriptions to indicate whether or not it has detected a âtest planâ within your pull request description. If a âtest planâ is not provided by the time a PR is merged, an issue will be created in the sec-pr-audit-trail
repository requesting that the PR author document a test plan, or provide a reason for the exception. This serves as an audit log to help us achieve these two SOC2 control points:
GN-104 Code changes are systematically required to be peer-reviewed and approved prior to merging code into the main branch.
GN-105 Application and infrastructure changes are required to undergo functional, security, unit, integration, smoke, regression, and SAST testing prior to release to production.
What is a test plan? A test plan is denoted by content following # Test plan
, Test plan:
, ### Test Plan:
, etc. within a pull request description. All pull requests must provide test plans that indicate what has been done to test the changes being introduced. Testing methodologies could include:
- Automated testing, such as unit tests or integration tests
- Other testing strategies, such as manual testing, providing observability measures, or implementing a feature flag that can easibly be toggled to limit impact
Pull request reviews are now also required by default. Branch protections have been enabled in sourcegraph/sourcegraph
. In other repositories with pr-auditor
review checks must be opted out of by including No review required: ...
within a pull requestâs test plan.
To learn more, refer to our updated testing guidance. You can find DevX SOC compliance documentation by control point in this search notebook. If you have any questions or feedback, please do not hesitate to reach out in #dev-experience or in our GitHub discussions!
Internal tools and libraries
Database migrations update
We have now eradicated two classes of errors related to database migrations:
- On the site-administrator and ops side, we no longer spuriously mark the database as dirty and give up any attempt at migrations at the first sign of trouble. We no longer immediately fail an upgrade because of the mere presence of an empty table or a concurrently created index. Now we only fail for actual reasons.
- On the development side, we no longer have to worry about two independently created migrations clashing only after both are merged into
main
. That was very annoying to me and now it will never, ever happen again. Check out the help page for the newsg migration
to check out the new tooling.
See the migrator docs for additional info.
New lib/errors
package and MultiErrors
type
All errors in Sourcegraph backend services should now use the new github.com/sourcegraph/sourcegraph/internals/errors
package. This consolidation helps us restrict and control the ways that we can create, consume, and compare errors, and will allows us to control library behavior clashes more easily in the future. #30558
Additionally, all usages of the old MultiError
type has been replaced with a new, custom multi-error implementation (#31466, #698). This new error type is an interface that behaves much more closely to regular errors, prevents errors from disappearing due to library conflicts as was previously the case, and supports introspection with errors.Is
, errors.As
, and friends much more consistently.
var err errors.MultiError
for _, fn := range thingsToDo {
err = errors.Append(err, fn())
}
return err
Check out the source code in lib/errors
.
Actor propagation reminder
Unified actor propagation was introduced a few months ago as part of an effort to enable the implementation of sub-repository permissions across all Sourcegraph features.
There have been gradual efforts to roll out this actor propagation to more services, which may cause behavioural changes that impact how permissions are handled if, for example, internal
actors are not set explicitly.
When implementing new features please ensure that actors are correctly set and read from contexts.
To learn more, check out the intro to actor propagation search notebook.
New teams
package
There is now a unified library for interacting with Sourcegraph teammates for whatever fun integrations you want to build! It leverages team.yaml
data as well as additional GitHub and Slack metadata:
import "github.com/sourcegraph/sourcegraph/dev/internal/team"
func main() {
// Neither a GitHub client nor a Slack client is required, but each enables more ways
// to query for users and/or get additional metadata about a user.
teammates := team.NewTeammateResolver(githubClient, slackClient)
tm, _ := teammates.ResolveByName(ctx, "Robert")
println(tm.SlackID)
println(tm.HandbookLink)
println(tm.Role)
// etc.
}
sg teammate
, branch lock notifications, and Buidlkite failure mentions are all powered by this API.
Continuous integration
Slack mention notifications
We now generate notifications for failed builds based on the author of each commit (using the new teams package).
Make sure to set up your teams.yaml
entry with your GitHub handle to get notified when your changes fail in main
!
Pipeline readability improvements
Pipeline operations can now be configured into groups with operations.NewNamedSet
(#30381). The result looks like this:
sg ci preview
also leverages this grouping to improve readability of pipeline steps, as well as now leveraging a terminal Markdown renderer to generate nicer output! (#30724)
Build traces are now uploaded to Honeycomb
Build traces are now uploaded to Honeycomb to dive into the performance of each command that gets run in a pipeline! A link to the uploaded build trace is added as an annotation on the results of each Buildkite build.
To learn more, check out the Pipeline command tracing docs.
Test analytics preview
We have started rolling out Buildkite test analytics support for Go tests and a subset of frontend tests that get run in continuous integration. This is still an experimental Buildkite feature, but you can learn more about it in our Test analytics docs.
Pipeline documentation
A new command, sg ci docs
, can now render a full, up-to-date reference of various run types that our pipeline can generate as well as example pipelines of each, such as what gets run with various diff types.
You can also see a web version of this in the Pipeline types reference.
Our pipeline development guide has also been refereshed with updated content, featuring a series of embedded search notebooks! This includes new guidance on:
- Creating pipeline annotations (using a new API introduced in #30951)
- Caching build artefacts (only available on stateless agents)
- Pipeline observability features
Generate builds using run types
sg ci build
now supports an additional argument to automatically generate a Buildkite build using a specified run type (#30932). For example, to create a main
dry run build:
sg ci build main-dry-run
This now also supports run types that require arguments, such as docker-images-patch
- learn more in #31193.
Coming soon: stateless Buildkite agents
We will soon be rolling out stateless Buildkite agents to all pipeline builds. These should improve the stability and reliability of all pipelines by removing any issues that might be caused by lingering state from other builds. Learn more in this Loom demo! (#31003)
Optimizations
- Improvements on the
server
andgitserver
Docker images building: after the addition ofp4-fusion
artifacts, thegitserver
Docker image build time increased to 4 minutes to complete, which also impacted theserver
image. It has been fixed by caching the resulting binary, which brought the build time forgitserver
down to about 40 seconds, thanks to #31317. go-mockgen
is now much faster: a misconfiguration was causinggo-mockgen
to be downloaded multiple times throughout ago generate
run. This has been fixed, and run times forgo generate
is now much faster (#31597).
Local development
Log entries now link to source in VS Code
Each log entry now prints an iTerm link that links to each log statementâs source file:line in VS Code (#30439).
Workaround for macOS firewalls
A new -add-to-macos-firewall
flag, enabled by default on macOS, is now available on sg start
and sg run
to avoid all those pop-up prompts you get in macOS when firewalls are enabled. #30747
If this causes issues for you, the behaviour can be disabled with -add-to-macos-firewall=false
.
sg
highlights
You can now see what has changed as part of your fresh sg
installation with the sg version changelog
command! You can also use it to see whatâs coming up next with sg version changelog -next
. #30697
sg start
now waits for all commands to install before starting them (#29760).
M1 macs no longer require any additional workarounds (#29815).
sg checks docker
now features a custom Dockerfile parser to enable more powerful checks, such as validating apk add
arguments as well as also running more existing checks.
It now powers the Docker check in CI as well! (#31217)
sg setup
now features an overhauled checks system to make sure your dev environment is ready to go (#29849).
sg setup
now supports Ubuntu as a first class citizen and provides automated installation (#31312).
Jan 10, 2022
Happy new year, and welcome to another iteration of the Developer Experience newsletter! Itâs been a little while since the last issue, so this is going to be a long one đ As a reminder, you can check out previous iterations of the newsletter in the newsletter archive.
To have your updates highlighted here, please tag your PR or issue with the dx-announce
label! If you have questions or feedback, feel free to reach out in #dev-experience or in our discussions as well.
Internal tools and libraries
Backward-compatible database migrations are now enforced
Backward-compatible database migrations are now enforced in the CI pipeline for sourcegraph/sourcegraph
- see the PR to re-enable the check at #28872. This PR contains some initial documentation on writing backwards-compatible migrations, but it is still a work in progress.
What is a backwards compatible migration?: A migration is backwards-compatible with a particular Sourcegraph version if those changes can be applied to a version without ill-effect.
What has already changed? (TL;DR): Weâve removed our use of golang-migrate that ran database migrations on startup of the frontend service and added a migrator
service that runs database migrations separately from and prior to instance upgrades. This puts us well on our way to removing the entire class of frequent âdirty databaseâ bugs that plagues many site-administrators on every upgrade.
What else is changing?: We will soon be enforcing that the unit tests of the previous minor release continue to pass with the newest database schema. This gives high confidence that any changes to the database will not negatively affect a running instance (behind at most one minor version). This allow site-administrators to upgrade an instance without requiring downtime to run the migrations.
Of course, this check will come with escape hatches in the event of flake or test failures that are locked in the past. Weâre currently fleshing out the documentation on the subject, so keep an eye out for updates!
For the full announcement or to leave comments, check out the Slack discussion!
Actor propagation
Actors (used to identify a request in the context of a user or internal actor) are now propagated across all internal requests when using the httpcli
library, and the various approaches for propagating actors across services has been standardized with the new actor.HTTPMiddleware
. This makes it easier to enforce permissions across services. For more details, see #28117.
Database connections
dbconn.Global
has been removed! This is a huge step towards bringing better database mocking to the entire codebase (check out the code insights dashboard tracking relevant migrations!)
Tracking issues
Tracking issues now support a new marker, <!-- OPTIONAL LABEL: my-label -->
, that allows you to add labels on a tracking issue that do not need to be present on child issues for them to be considered part of this tracking issue. This is useful for making tracking issues easier to find without adding labels to every single issue within the tracking issue. For more details, see #28665.
Continuous integration
Subsequent main
pipeline failures will now result in a branch lock
In response to a variety of CI incidents (including INC-21 at the end of September) we have introduced automated branch locks via a tool called buildchecker
. When buildchecker
detects a series of CI failures, it will now automatically restrict push access to main
to authors of recent failed builds and the DevX team until it sees a passed build, at which point it will unlock the branch. A notification will be posted in Slack to #buildkite-main as well mentioning the relevant teammates.
It is the responsibility of authors of recently failed builds to investigate what might have gone wrong, seek help if needed, and help get the pipeline back green. We hope this will prevent long periods of time where many commits to main
go untested due to failing jobs. To learn more, check out the branch lock playbook
Weâve also made significant investments towards improving and streamlining the pipeline for better stability and observability - most recently, a large number of E2E/QA tests were dropped - which will hopefully help with minimizing locks triggered by test and infrastructure flakes.
Specifying tools and language versions ran by any continuous pipeline
In response to INC-59 we have reworked which tools and languages versions are to be used in a given CI job. Previously, the agents where running a mix of asdf
and natively installed versions which created trouble when diagnosing build failures that werenât caused by the test themselves.
It is now the responsibility of each repository to provide an adequate .tools-version
file that defines what are the versions it needs. There are no more pre-installed go
version for example.
Presently, this approach is limited by having the plugin for that particular tool installed beforehand on the agents images (we are working on removing this limitation). The overarching goal is to make the agents reasonably independent from what they are actually building.
E2E and QA Tests survey results
RFC 544 explored the result of the e2e and qa tests survey. Thanks to the efforts of every team that took part to that survey, a large amount of irrelevant tests have been removed. As a result, those tests are about seven minutes faster than before and the average build time on the main
branch is hovering around the 20 minutes mark instead of 25 minutes.
There is more to come on that topic and the Frontend Platform team has plans to rework those tests as well as providing guidance on how to write them in reliable fashion.
Buildkite agent selection
Buildkite pipeline steps should now explicitly declare queue: standard
to avoid experimental or temporary agents. For more details, see infrastructure#2939.
Terraform vulnerability scanning
The security team has introduced Checkov checks to the infrastructure
repository and performed a cleanup to fix or suppress all high and critical issues!
Going forward, the Checkov step of the infrastructure pipeline will be set to fail in the event it finds a Terraform security issue. If the pipeline fails a warning block will be displayed in the pipeline output - a link will take you to the handbook with guidance on how to continue, and additional output will help point you towards how to correct the issue. For more details, see Checkov Terraform vulnerability scanning
If anyone has any questions or issues, please post in the #security channel!
Sentry integration to monitor internal pipeline scripts and hooks
There are scripts and components of the CI pipeline that should never fail, independently of the tests results. These have proved be to hard to monitor, especially when the scripts are called from build hooks. Being notified when these failures happen enables faster reaction time. Here is an example to get monitor a command so that a Sentry issue in the Buildkite project is created on a non zero exit code.
Observability
The previous raw Grafana configuration used to add template variables to dashboards has been replaced with Container::Variables
that abstracts away a lot of the behind-the-scenes dashboard config and potential gotchas to make it easier to define template variables on dashboards! Dashboard template variables are used to filter individual panels down by substituting variables in panel queries. Learn more in the ContainerVariable
API docs.
Local development
Revamped introductory documentation
The local development docs homepage has been revamped! Check it out at docs.sourcegraph.com/dev. The quickstart docs has also been overhauled with a streamlined setup experience featuring sg setup
, which has been greatly improved!
sg
improvements
sg
now ships a command that can reset databases as well as creating a site-admin: sg db
(early adopters may have seen it under the name of sg reset
). You can read more about the sg db [reset-pg|reset-redis|add-user]
in the documentation
If you have ideas of other features that would be great, donât hesitate to join the sg
hack hour on Fridays at 4PM UTC!
Nov 23, 2021
Hello everyone, and welcome to another iteration of the Developer Experience newsletter!
To have your updates highlighted here, please tag your PR or issue with the dx-announce
label! If you have questions or feedback, feel free to reach out in #dev-experience or in our discussions as well.
Onboarding
Significant progress has been made with sg setup, a new command that is slated to replace all the manual fenangling that must be done today to set up a Sourcegraph development environment. See a sneak peak of the upcoming iteration of the tool here!
Continuous integration
The Dev Experience team is proposing a âbuild sheriffâ rotation in RFC 515, with the goal of distributing knowledge and responsibilities around our CI infrastructure to all of engineering through regular rotations of âbuild sheriffsâ.
You may have noticed a daily update in #dev-experience providing an overview of how CI has behaved that dayâthis will be helping us track our progress towards a flake-free pipeline! If you need more details, a dashboard is now available in Grafana Cloud that features an overview of recently failed builds, steps, and potentially relevant logs. You can use this to see if lots of builds are failing on similar steps, which steps are the most problematic, and whether the issues are potentially related. A link can also be found in the Slack summaries. Let us know what you think on #26118!
This dashboard is powered by build logs that are now parsed from Buildkite output and uploaded to Loki, a log database available for query in Grafana Cloud using LogQL. Try it out here! This can be especially useful when seeing if a build issue is a common recurrence.
We are also trialing a number of additional annotations for build failures that should serve to help surface actionable errors more easily, and are working towards exporting an API for it that will enable more checks to easily add digestible output to builds. Let us know in #dev-experience if you have any ideas for how this could be improved!
Observability
A proposed revamp of how Honey events are created has been proposed in #27964, furthering work on turning internal/observation into the go-to package for all application observability needs.
Distributed tracing is now available on worker jobs, enabling Jaeger traces to be collected for worker job processing. This is currently only enabled for precise-code-intel-worker in Cloud, and enabling this for other workers is in the works.
RFC 501 REVIEW: Runtime error monitoring implementation is also progressing, which will allow errors to be more easily surfaced in Sentry to complement alerting.
Code health
Work on reducing usages of globals has continued with improvements to how site configuration is accessed that allows site configuration clients to be injected into places that require it. This makes site configuration easier to mock out and test without replacing a global variable in mocks.
On a similar note, tests have been undergoing incremental updates to leverage the more ergonomic and self-contained database mocksâa brief guide is available if you know an area of the codebase that could use a similar update!
Nov 2, 2021
Hello everyone! Welcome back to the Developer Experience newsletter. It is a compilation of announcements related to development experience at Sourcegraph. DevX is a global effort
To be mentioned here in the next iteration, please tag your PR or issue with dx-announce!
DevX team mission statement
Published Developer Experience team mission and strategy: handbook.sourcegraph.com/company/strategy/dev-experience
Buildkite incident post-mortem(s)
On Sep 19th, for about two hours, it wasnât possible to interact with any container registries from Google Cloud platform, which interrupted the process for release 3.33. You can find the detailed report here: Postmortem Review: INC-25 Buildkite pipelines are not able to interact with container registry .
On October 26th, for another two hours, the pipeline agents were down. You can find the detailed report here: REVIEW: INC-30 Buildkite pipelines are failing due to pipeline generator failing to run
CI Pipeline highlights!
- All-in-one pipeline - check all your build jobs in one place! #26051
- Cross-build search for Buildkite failures: #26259
- We are now measuring how long the pipeline stays red per day. It captures both how reliable the pipeline is and how fast it gets back to green.
- 22th: red for 1h8m
- 21th: red for 1h34m
- 20th: red for 21m
- 19th: red for 2h4m
- 18h: red for 54m
- Contractors are now able to access CI builds, as long as they prefix their PR with contractors/ and they have been manually added to buildkite contractors team.
- SQL queries are now displayed on failure in Go tests, both locally and in the CI. #26020
- One less papercut, remember the warning sign at the beginning of every step logs? Itâs not there anymore. #26233
- RFC 497 WIP: Restructuring CI Experience is now open for feedback!
SG Highlights
sg
is a CLI tool that wraps commands to run the local environment and interact with various Sourcegraph resources such as CI builds or RFC.
A new home for sg documentation: https://docs.sourcegraph.com/dev/background-information/sg
- sg ci logs - browse, grep, or save Buildkite output
- Try the Loki integration locally for advanced search! #25835
- sg ci status âwait - get notified as soon as your Buildkite build completes
- sg version: displays what version of sg youâre currently running. Adding it to your message when requesting support for sg will really help!
- Our first bug report that came from the community has been fixed! sg: include original err in install err
From the wider Sourcegraph community
-
Runtime errors monitoring: RFC 501 REVIEW: Runtime error monitoring
-
Database mocking proposal: #26129
Oct 8, 2021
Hello everyone! This is the first iteration of the Developer Experience newsletter. It is a compilation of announcements related to development experience at Sourcegraph.
To be mentioned here in the next iteration, please tag your PR or issue with dx-announce!
A team has been created
The Developer Experience team has been created in mid September! Our first goal is to improve the CI experience.
Buildkite incident post-mortem
Between Sep 21th and Sep 24th, our main branch builds were failing. Due to the difficulties we were having to reliably make it pass, we escalated it to an incident. To prevent new failures from piling up on the already broken branch, we made a decision to lock the main branch, which is a pretty unusual event.
You can find the detailed report inPostmortem REVIEW: INC-21 Builds failing on main, which is now in a reviewable state and open to feedback and any inputs.
Weâd like to thank dearly all of those who helped to fix this: Patrick Dubroy, Robert Lin, Eric Fritz, Valery Bugakov, TomĂĄs Senart, Thorsten Ball, Dax McDonald, Geoffrey Gilmore, Erik Seliger, Dave Try and JH. With the actions weâve proposed in the postmortem, we donât expect such an event to happen in the future.
Pipeline improvements
The CI is what enables us to feel confident when delivering our changes to our users, and is one of the key components enabling Sourcegraph to deliver quality software.
Previously, it was really hard to find time to improve the CI because it was competing with infrastructure work in terms of prioritization, making it a frustrating but rational choice. With the recent team reorganization, making that hard choice is not a problem anymore as this component is now owned by the DX team.
Following up on the above incidents, it became absolutely clear that the CI is a big contender in the list of pains faced by everyone. The good news is that itâs a pretty actionable one!
Letâs start with some numbers:
- August average build time on the main branch: 19m57, on PR 20m24s
- September average build time on the main branch: 27m47s, on PRs 22m32 (1)
- October average build time on the main branch: 17m48s, on PRs 9m34
- Pull requests now run a smaller set of checks on average, and it is easier to add additional PR checks of your own that run over subsets of code that you care about within the pipeline generator. See the Introductory documentation to help you get started with hacking on the pipeline generator
- Puppeteer testsare nowrun in parallel multiple smaller steps, netting almost a 50% improvement :fire:
- (1) spiked because thatâs when the executor pipeline was introduced.
Whatâs next?
Observability is crucial to being able to know when and on what to act. This led to the creation of RFC 496 REVIEW: Continuous integration observability which is now in a reviewable state for everyone.
More speed improvements on the builds are being worked on, stay tuned!
sg is officially entering our daily workflows
What if we had a tool that would be the entry point to interact with our development environment? Thatâs the idea behind sg! Thorsten Ball has been driving this, with contributions from many other engineers. After a few months in a beta state, itâs now becoming an integral part of our workflow.
sg is now the default way to run the Sourcegraph development environment locally.
- After half a year of working on sg, the PR to remove what we once knew as
dev/start.sh
andenterprise/dev/start.sh
has been merged. Adios, 993 lines of shell script! - The docs have also been updated: the Getting Started guide now uses sg.
But wait, thereâs more! A new group of commands has been added, the âciâ commands.
- sg ci preview: You can now preview which steps your branch is going to run on the CI with the sg ci preview command. See something that shouldnât be running in there? Open a PR on the pipeline generator !
- sg ci status: No more clicking around to find the current build in Buildkite!
- sg ci build: will trigger a manual build, useful if working from a sourcegraph fork.
And additional goodies: the sg teammate time and handbook commands that will tell you what is the current time of that person that lives very far from you, without having to leave your terminal.
Whatâs next?
This is just the beginning. Work on sg setup has begun. The idea is that we can reduce the Getting Started guide from 8 pages down to âinstall sg and run sg setupâ.
Grafana cloud is now available to all!
Just sign up via GSuite SSO on https://sourcegraph.grafana.net. This Grafana instance currently has logs for Sourcegraph Cloud, available for search with LogQL via Loki. It has support for querying inferred fields from log messages, filtering for substring matches, and more. Try it out!
Metrics and parity with /-/debug/grafana is on the roadmapâfollow #25407 for updates on that!
Shoutouts to teammates that improved our dev experience in September Robert Lin, Valery Bugakov, Thorsten Ball, JH, Camden Cheek, Erik Seliger, Coury Clark and Quinn Slack .