Changelog History
Page 2
-
v6.2.0 Changes
June 02, 2020๐ฑ ๐ feature
๐ Operators can now limit the number of concurrent API requests that your web node will serve by passing a flag like
--concurrent-request-limit action:limit
whereaction
is the API action name as they appear in the action matrix in our docs.๐ If the web node is already concurrently serving the maximum number of requests allowed by the specified limit, any additional concurrent requests will be rejected with a
503 Service Unavailable
status. If the limit is set to0
, the endpoint is effectively disabled, and all requests will be rejected with a501 Not Implemented
status.๐ท Currently the only API action that can be limited in this way is
ListAllJobs
-- we considered allowing this limit on arbitrary endpoints but didn't want to enable operators to shoot themselves in the foot by limiting important internal endpoints like worker registration. If theListAllJobs
endpoint is disabled completely (with a concurrent request limit of 0), the dashboard reflects this by showing empty pipeline cards labeled 'no data'.๐ It is important to note that, if you use this configuration, it is possible for super-admins to effectively deny service to non-super-admins. This is because when super-admins look at the dashboard, the API returns a huge amount of data (much more than the average user) and it can take a long time (over 30s on some clusters) to serve the request. If you have multiple super-admin dashboards open, they are pretty much constantly consuming some portion of the number of concurrent requests your web node will allow. Any other requests, even if they are potentially cheaper for the API to service, are much more likely to be rejected because the server is overloaded by super-admins. Still, the web node will no longer crash in these scenarios, and non-super-admins will still see their dashboards, albeit without nice previews. To work around this scenario, it is important to be careful of the number of super-admin users with open dashboards. #5429, #5529
๐ฑ ๐ breaking
- ๐ The above-mentioned
--concurrent-request-limit
flag replaces the--disable-list-all-jobs
flag introduced in v5.2.8 and v5.5.9. To get consistent functionality, change--disable-list-all-jobs
to--concurrent-request-limit ListAllJobs:0
in your configuration. #5429
๐ฑ ๐ breaking
- โฌ๏ธ It has long been possible to configure concourse either by passing flags to the binary, or by passing their equivalent
CONCOURSE_*
environment variables. Until now we had noticed that when an environment variable is passed, the flags library we use would treat it as a "default" value -- this is a bug. We issued a PR to that library adding stricter validation for flags passed via environment variables. What this means is that operators may have been passing invalid configuration via environment variables and concourse wasn't complaining -- after this upgrade, that invalid configuration will cause the binary to fail. Hopefully it's a good prompt to fix up your manifests! #5429
๐ฑ ๐ feature
- โฑ @shyamz-22, @HannesHasselbring and @tenjaa added a metric for the amount of tasks that are currently waiting to be scheduled when using the
limit-active-tasks
placement strategy. #5448
๐ฑ ๐ fix
- ๐ท Close Worker's registration connection to the TSA on application level keepalive failure
- โ Add 5 second timeout for keepalive operation. #5802
๐ฑ ๐ fix
- ๐ Improve consistency of auto-scrolling to highlighted logs. #5457
๐ฑ ๐ fix
- ๐ง @shyamz-22 added ability to configure NewRelic insights endpoint which allows us to use EU or US data centers. #5452
๐ฑ ๐ fix
- ๐ Fix a bug that when
--log-db-queries
is enabled only part of DB queries were logged. Expect to see more log outputs when using the flag now. #5520
๐ฑ ๐ fix
- ๐ Fix a bug where a Task's image or input volume(s) were redundantly streamed from another worker despite having a local copy. This would only occur if the image or input(s) were provided by a resource definition (eg. Get step). #5485
๐ฑ ๐ fix
- ๐ Previously, aborting a build could sometimes result in an
errored
status rather than anaborted
status. This happened when step code wrapped theerr
return value, fooling our==
check. We now useerrors.Is
(new in Go 1.13) to check for the error indicating the build has been aborted, so now the build should be correctly given theaborted
status even if the step wraps the error. #5604
๐ฑ ๐ fix
- ๐ @lbenedix and @shyamz-22 improved the way auth config for teams are validated. Now operators cannot start a web node with an empty
--main-team-config
file, andfly set-team
will fail if it would result in a team with no possible members. This prevents scenarios where users can get accidentally locked out of concourse. #5596
๐ฑ ๐ feature
๐ Support path templating for secret lookups in Vault credential manager.
Previously, pipeline and team secrets would always be searched for under "/prefix/TEAM/PIPELINE/" or "/prefix/TEAM/", where you could customize the prefix but nothing else. Now you can supply your own templates if your secret collections are organized differently, including for use in
var_sources
. #5013๐ฑ ๐ fix
- ๐ป @evanchaoli enhanced to change the Web UI and
fly teams
to show teams ordering by team names, which allows users who are participated in many teams to find a specific team easily. #5622
๐ฑ ๐ fix
- ๐ Fix a bug that crashes web node when renaming a job with
old_name
equal toname
. #5639
๐ฑ ๐ fix
- ๐ @evanchaoli enhanced task step
vars
to support interpolation. #5620
๐ฑ ๐ fix
- ๐ Fixed a bug where fly would no longer tell you if the team you logged in with was invalid. #5624
๐ฑ ๐ fix
- ๐ @evanchaoli changed the behaviour of the web to retry individual build steps that fail when a worker disappears. #5192
๐ฑ ๐ fix
- โ Added a new HTTP wrapper that returns HTTP 409 for endpoints listed in concourse/rfc#33 when the requested pipeine is archived. #5549
๐ฑ ๐ fix
๐ฑ ๐ feature
- โ Added tracing to the lidar component, where a single trace will be emitted for each run of the scanner and the consequential checking that happens from the checker. The traces will allow for more in depth monitoring of resource checking through describing how long each resource is taking to scan and check. #5575
๐ฑ ๐ fix
- @ozzywalsh added the
--team
flag to thefly unpause-pipeline
command. #5617
- ๐ The above-mentioned
-
v6.1.0 Changes
May 12, 2020๐ฑ ๐ feature, breaking
๐ฒ "Have you tried logging out and logging back in?"
- Probably every concourse operator at some point
๐ฒ In the old login flow, concourse used to take all your upstream third party info (think github username, teams, etc) figure out what teams you're on, and encode those into your auth token. The problem with this approach is that every time you change your team config, you need to log out and log back in. So now concourse doesn't do this anymore. Instead we use a token directly from dex, the out-of-the-box identity provider that ships with concourse.
This new flow does introduce a few additional database calls on each request, but we've added some mitigations (caching and batching) to help reduce the impact. If you're interested in the details you can check out the original issue or the follow up with some of the optimizations.
โฌ๏ธ NOTE: And yes, you will need to log out and log back in after upgrading. Make sure you sync
fly
usingfly sync -c <concourse-url>
before logging in.๐ฑ ๐ fix, breaking
โ Remove
Query
argument fromfly curl
command.When passing curl options as
fly curl <url_path> -- <curl_options>
, the first curl option is parsed as query argument incorrectly, which then causes unexpected curl behaviour. #5366With fix in #5371,
<curl_options>
functions as documented and the way to add query params tofly curl
is more intuitive as following:fly curl <url_path?query_params> -- <curl_options>
๐ฑ ๐ fix, breaking
- When looking up credentials, we now prefer pipeline scoped credential managers over global ones. #5506
๐ฑ ๐ fix, breaking
- ๐ In a previous release, we made the switch to using
zstd
for compressing artifacts before they get streamed all over the place. This has proved to be unreliable for all our use cases so we switched the default back togzip
. We did make this configurable though so you can continue to usezstd
if you so choose. #5398
๐ฑ ๐ feature, breaking
โก๏ธ @pnsantos updated the Material Design icon library version to
5.0.45
. #5397note: some icons changed names (e.g.
mdi-github-circle
was changed tomdi-github
) so after this update you might have to update someicon:
references.๐ฑ ๐ fix, breaking
- โ @tjhiggins updated the flag for configuring the interval at which concourse runs its internal components.
CONCOURSE_RUNNER_INTERVAL
->CONCOURSE_COMPONENT_RUNNER_INTERVAL
. #5432
๐ฑ ๐ feature
Implemented the core functionality for archiving pipelines RFC #33. #5368
note : archived pipelines are neither visible in the web UI (#5370) nor in
fly pipelines
.note: archiving a pipeline will nullify the pipeline configuration. If for some reason you downgrade the version of Concourse, unpausing a pipeline that was previously archived will result in a broken pipeline. To fix that, set the pipeline again.
๐ฑ ๐ feature
- ๐ Since switching to using dex tokens, we started using the client credentials grant type to fetch tokens for the TSA. This seemed like a good opportunity to start
bcrypt
ing client secrets in the db. #5459
๐ฑ ๐ fix
- ๐ Thanks to some outstanding debugging from @agurney, we've fixed a deadlock in the notifications bus which caused the build page not to load under certain conditions. #5519
๐ฑ ๐ feature
- ๐ง @evanchaoli added a global configuration to override the check interval for any resources that have been configured with a webhook token. #5091
๐ฑ ๐ feature
โก๏ธ We've updated the way that hijacked containers get garbage collected
We are no longer relying on garden to clean up hijacked containers. Instead, we have implemented this functionality in concourse itself. This makes it much more portable to different container backends. #5305
๐ฑ ๐ feature
โก๏ธ @ebilling updated the way that containers associated with failed runs get garbage collected.
๐ง Containers associated with failed runs used to sit around until a new run is executed. They now have a max lifetime (default - 120 hours), configurable via 'failed-grace-period' flag. #5431
๐ฑ ๐ fix
- ๐ Fix rendering pipeline previews on the dashboard on Safari. #5375
๐ฑ ๐ fix
- ๐ Fix pipeline tooltips being hidden behind other cards. #5377
๐ฑ ๐ fix
- ๐ Fix log highlighting on the one-off-build page. Previously, highlighting any log lines would cause the page to reload. #5384
๐ฑ ๐ fix
- ๐ Fix regression which inhibited scrolling through the build history list. #5392
๐ฑ ๐ feature
- ๐ We've moved the "pin comment" field in the Resource view to the top of the page (next to the currently pinned version). The comment can be edited inline.
๐ฑ ๐ feature
- โ Add loading indicator on dashboard while awaiting initial API/cache response. #5458
๐ฑ ๐ fix
- ๐ Allow the dashboard to recover from the "show turbulence" view if any API call fails once, but starts working afterward. This will prevent users from needing to refresh the page after closing their laptop or in the presence of network flakiness. #5496
๐ฑ ๐ feature
- โก๏ธ Updated a migration that adds a column to the pipelines table. The syntax initially used is not supported by Postgres 9.5 which is still supported. Removed the unsupported syntax so users using Postgres 9.5 can run the migration. Our CI pipeline has also been updated to ensure we run our tests on Postgres 9.5. #5479
๐ฑ ๐ fix
- ๐ We fixed a bug where if you create a new build and then trigger a rerun build, both the builds will be stuck in pending state. #5452
๐ฑ ๐ feature
- We added a new flag (
CONCOURSE_CONTAINER_NETWORK_POOL
) to let you configure the network range used for allocating IPs for the containers created by Concourse. This is primarily intended to support the experimental containerd worker backend. Despite the introduction of this new flag,CONCOURSE_GARDEN_NETWORK_POOL
is still functional for the (stable and default) Garden worker backend. #5486
๐ฑ ๐ feature
- ๐ง We added support for the configuration of the set of DNS resolvers to be made visibile (through
/etc/resolv.conf
) to containers that Concourse creates when leveraging the experimental containerd worker backend. #5465
๐ฑ ๐ feature
- โ Added support to the experimental containerd worker backend to leverage the worker's DNS proxy to allow name resolution even in cases where the worker's set of nameservers are not reachable from the container's network namespace (for instance, when deploying Concourse workers in Docker, where the worker namerserver points to 127.0.0.11, an address that an inner container wouldn't be able to reach without the worker proxy). #5445
-
v6.0.0 Changes
March 25, 2020Concourse v6.0: it does things gooder.โข
๐ A whole new algorithm for deciding job inputs has been implemented which performs much better for Concourse instances that have a ton of version and build history. This algorithm works in a fundamentally different way, and in some situations will decide on inputs that differ from the previous algorithm. (More details follow in the actual release notes below.)
In the past we've done major version bumps as a ceremony to celebrate big shiny new features. This is the first time it's been done because there are backwards-incompatible changes to fundamental pipeline semantics.
We have tested this release against larger scale than we ever tried to support before, and we've been using it in our own environments for a while now. Despite all that, we still recommend that anyone using Concourse for mission-critical workflows (e.g. shipping security updates) wait for the next few releases, just in case any edge cases are found.
IMPORTANT : Please expect and prepare for some downtime when upgrading to v6.0. On our large scale deployments we have observed 10-20 minutes of downtime as the database is migrated, but this will obviously vary depending on the amount of data.
๐ As this is a significant release with changes that may affect user workflows, we will be taking some time after this release to listen for feedback before moving on to the next big thing.
Please leave any v6.0 feedback you have, good or bad, in issue #5360!
๐ฑ ๐ feature, fix, breaking
๐ท A new algorithm for determining inputs for jobs has been implemented.
๐ This new algorithm significantly reduces resource utilization on the
web
anddb
nodes, especially for long-lived and/or large-scale Concourse installations.๐ The old algorithm used to load up all the resource versions, build inputs, and build outputs into memory then use brute-force to figure out what the next inputs would be. This method worked well enough in most cases, but with a long-lived deployment with thousands or even millions of versions or builds it would start to put a lot of strain on the
web
anddb
nodes just to load up the data set. In the future, we plan to collect all versions of resources, which would make this even worse.๐ The new algorithm takes a very different approach which does not require the entire dataset to be held in memory and cuts out nearly all of the "brute force" aspect of the old algorithm. We even make use of fancy
jsonb
index functionality in Postgres; a successful build's set of resource versions are stored in a table which we can easily "intersect" in order to find matching candidates when evaluatingpassed
constraints.For a more detailed explanation of how the new algorithm works, check out the section on this in the v10 blog post.
๐ Before we show the shiny charts showing the improved performance, let's cover the breaking change that the new algorithm needed:
๐ฅ Breaking change: for inputs with
passed
constraints, the algorithm now chooses versions based on the build history of each job in thepassed
constraint, rather than version history of the input's resource.๐ท This might make more sense with an example. Let's say we have a pipeline with a resource (
Resource
) that is used as an input to two jobs (Job 1
andJob 2
):Resource
has three versions:v1
(oldest),v2
, andv3
(newest).๐
Job 1
hasResource
as an unconstrained input, so it will always grab the latest version available -v3
. In the scenario above, it has done this forBuild 1
but then a pipeline operator pinnedv1
, soBuild 2
then ran withv1
. So now we have bothv1
andv3
having "passed"Job 1
, but in
reverse order.๐ The difference between the old algorithm and the new one is which version
Job 2
will use for its next build whenv1
is un-pinned.โ With the old algorithm,
Job 2
would choosev3
as the input version as shown by the orange line. This is because the old algorithm would start from the latest version and then check if that version satisfies thepassed
constraints.๐ With the new algorithm,
Job 2
will instead end up withv1
, as shown by the green line. This is because the new algorithm starts with the versions from the latest build of the jobs listed inpassed
constraints, searching through older builds if necessary.๐ท The resulting behavior is that pipelines now propagate versions downstream from job to job rather than working backwards from versions.
This approach to selecting versions is much more efficient because it cuts out the "brute force" aspect: by treating the
passed
jobs as the source of versions, we inherently only attempt versions which already satisfy the constraints and passed through the same build together.The remaining challenge then is to find versions which satisfy all of the
passed
constraints, which the new algorithm does with a simple query utilizing ajsonb
index to perform a sort of 'set intersection' at the database level. It's pretty neato!๐ Improved metrics: now that the breaking change is out of the way, let's take a look at the metrics from our large-scale test environment and see if the whole thing was worth it from an efficiency standpoint.
The first metric shows the database CPU utilization:
๐ The left side shows that the CPU was completely pegged at 100% before the upgrade. This resulted in a slow web UI, slow pipeline scheduling performance, and complaints from our Concourse tenants.
โฌ๏ธ The right side shows that after upgrading to v6.0 the usage dropped to ~65%. This is still pretty high, but keep in mind that we intentionally gave this environment a pretty weak database machine so we don't just keep scaling up and pretending our users have unlimited funds for beefy hardware. Anything less than 100% usage here is a win.
This next metric is shows database data transfer:
This shows that after upgrading to 6.0 we do a lot less data transfer from the database, because we no longer have to load the full algorithm dataset into memory.
โฑ Not having to load the versions DB is also reflected in the amount of time it took just do do it as part of scheduling:
This graph shows that at the time just before the upgrade, the
web
node was spending 1 hour and 11 minutes of time per half-hour just loading the dataset. This entire step is gone, as reflected by the graph ending upon the upgrade to 6.0.โฑ Another optimization we've made is to simply not do work that doesn't need to be done. Previously Concourse would schedule every job every 10 seconds, but now it'll only schedule jobs which actually need to be scheduled.
โฑ This heat-map, combined with the graph below, shows the before vs. after distribution of job scheduling time, and shows that after 6.0 there is much less time being spent scheduling jobs, freeing the
web
node to do other more important things:โฑ Note that while the per-job scheduling time has increased in duration and variance, as shown in the heat map, this is not really a fair comparison: the data for the old algorithm doesn't include all the time spent loading up the dataset, which is done once per pipeline every 10 seconds.
โฑ The fact that the new algorithm is able to schedule some jobs in the same amount of time that the older, CPU-bound, in-memory, brute-force algorithm took is actually pretty nice considering it now involves going over the network to querying the database.
โฑ Note: this new approach means that we need to carefully keep tabs on state changes so that we know when jobs need to be scheduled. If you have a job doesn't seem to be queuering new builds, and you think it should, try out the new
fly schedule-job
command; it's been added just in case we missed a spot. This command will mark the job as 'needs to be scheduled' and the scheduler will pick it up on the next tick.Migrating existing data: you may be wondering how the upgrade's data migration works with such a huge change to how the scheduler uses data.
The answer is: very carefully.
โฌ๏ธ If we were to do an in-place migration of all of the data to the new format used by the algorithm, the upgrade would take forever. To give you an idea of how long, even just adding a column to the
builds
table in our environment took about 16 minutes. Now imagine that multiplied by all of the inputs and outputs for each build.๐ So instead of doing it all at once in a migration on startup, the algorithm will lazily migrate data for builds as it needs to. Overall, this should result in very little work to do as most jobs will have a satisfiable set of inputs without having to go too far back in the history of upstream jobs.
Bonus feature: along with the new algorithm, we wanted to make it easier to troubleshoot why a build is stuck in a "pending" state. So, if the algorithm fails to find a satisfactory set of inputs, the reason will now be shown for each input in the build preparation.
Bonus fix: the new algorithm fixes an edge case described in #3832. In this case, multiple resources with corresponding versions (e.g. a v1.2.3 semver resource and then a binary in S3 corresponding to that version) are correlated by virtue of being passed along the pipeline together.
When one of the correlated versions was disabled, the old algorithm would incorrectly continue to use the other versions, matching it with an incorrect version for the resource whose version was disabled. Bad news bears!
Because the new algorithm always works by selecting entire sets of versions at a time, they will always be correlated, and this problem goes away. Good news...uh, goats!
๐ฑ ๐ feature, breaking
LIDAR is now on by default! In fact, not only is it on by default, it is now the only option. The old and busted 'Radar' resource checking component has been removed and the
--enable-lidar
flag will no longer be recognized. #3704๐ With the switch to LIDAR, the metrics pertaining to resource checking have also changed (via #5171). Please consult the now-updated Metrics documentation and update your dashboards accordingly!
๐ฑ ๐ breaking
๐ We have removed support for emitting metrics to Riemann.
๐ In the early days, Riemann gave us hope of only having to support a single metrics sink. That world didn't really pan out.
๐ We're now trying to standardize on OpenTelemetry, and pull-style metrics (i.e. Prometheus) rather than push, which means we'll be slowly transitioning away from our current support of many-different-metrics-sinks as it is a bit of a maintenance nightmare.
๐ ๐ fix, security
- ๐ Fix an edge case of CVE-2018-15798 where redirect URI during login flow could be embedded with a malicious host.
๐ฑ ๐ feature
#413. We finally did it.
๐ Build re-running has been implemented.
๐ Build re-running was one of the most long-running and popular feature requests. We never got around to it because most of the motivation for it came from PR flows, which was (and still is) a pretty broken usage pattern which we were trying to address directly rather than have it act as a primary motivator for design decisions like this one.
๐ Ultimately, the validity of re-triggering builds was never in question - but we try to look a level deeper and get to the bottom of things, sometimes to a fault. Ideas surrounding PR flows dragged on for a while and went through a few redesigns (context in the v10 blog post), so we're sorry it took so long to finally do this - but it's here now!
Why was this feature implemented in this release after so much time, you ask? Well, the new scheduling and pinning semantics kind of made it necessary to have proper support for re-running builds. Without re-running, folks were relying on version pinning in order to re-run a job with an older version. But with the new scheduling semantics, that pinned version would propagate to jobs downstream as if it's the latest version for the pipeline to converge on, which might not be what you want. (But it probably is what you want if you're using pinning how it's originally meant to be used.)
๐ฑ ๐ feature
๐ฑ Following a multi-pronged attack through various optimizations, the dashboard has become more responsive:
๐ With #4862, we optimized the
ListAllJobs
endpoint so that it no longer requires decrypting and parsing the configuration for every single job. This dramatically reduces resource utilization for deployments with a ton of pipelines.With #5262, we now cache the last-fetched data in local browser storage so that navigating to the dashboard renders at least some useful data rather than blocking on all the data being fetched fresh from the backend.
๐ With #5118, we implemented infinite scrolling and lazy rendering, which should greatly improve performance on installations with a ton of pipelines configured. The initial page load can still be quite laggy, but interacting with the page afterwards now performs a lot better. We'll keep chipping away at this problem and may have larger changes in store for the future.
0๏ธโฃ With #5023, the dashboard will no longer "pile on" requests to a slow backend. Previously if the
web
node was under too much load, it could take longer to respond to theListAllJobs
endpoint than the default polling interval, and the dashboard could start another request before the last one finished. It will now wait for the previous request to complete before making another.๐ Overall, while we're still not completely happy with the dashboard performance on gigantic installations, these changes should make the experience feel a bit better.
๐ฑ ๐ feature
๐ @evanchaoli introduced another new step type in #4973: the
load_var
step! This step can be used to load a value from a file at runtime and set it in a "local var source" so that later steps in the build may pass the value to fields likeparams
.With this primitive, resource type authors will no longer have to implement two ways to parameterize themselves (i.e.
tag
andtag_file
). Resource types can now implement simpler interfaces which expect values to be set directly, and Concourse can handle the busywork of reading the value from a file.This feature, like
set_pipeline
step, is considered experimental until its corresponding RFC, RFC #27 is resolved. The step will helpfully remind you of this fact by printing a warning on every single use.๐ฑ ๐ feature
๐ In #4614, @julia-pu implemented a way for
put
steps to automatically determine the artifacts they need, by configuringinputs: detect
. Withdetect
, the step will walk over itsparams
and look for paths that correspond to artifact names in the build plan (e.g.tag: foo/bar
or justrepository: foo
). When it comes time to run, only those named artifacts will be given to the step, which can avoid wasting a lot of time transferring artifacts the step doesn't even need.0๏ธโฃ This feature may become the default in the future if it turns out to be useful and safe enough in practice. Try it out!
๐ฑ ๐ fix
- In #5149, @evanchaoli implemented an optimization which should lower the resource checking load on some instances: instead of checking all resources, only resources which are actually used as inputs will be checked.
๐ฑ ๐ fix
- โฌ๏ธ We fixed a bug where users that have upgraded from Concourse v5.6.0 to v5.8.0 with lidar enabled, they might experience a resource never being able to check because it is failing to create a check step. #5014
๐ฑ ๐ fix
- ๐ Builds could get stuck in pending state for jobs that are set to run serially. If a build is scheduled but not yet started and the ATC restarts, the next time the build is picked up it will get stuck in pending state. This is because the ATC sees the job is set to run in serial and there is already a build being scheduled, so it will not continue to start that scheduled build. This bug is now fixed with the new release, where builds will never be stuck in a scheduled state because of it's serial configuration. #4065
๐ฑ ๐ fix
- ๐ If you had lidar enabled, there is the possibility of some duplicate work being done in order to create checks for custom resource types. This happens when there are multiple resources that use the same custom resource type, they will all try to create a check for that custom type. In the end, there will only be one check that happens but the work involved with creating the check is duplicated. This bug was fixed so that there will be only one attempt to create a check for a custom resource type even if there are multiple resources that use it. #5158
๐ฑ ๐ fix
- ๐ The length of time to keep around the history of a resource check was defaulted to 6 hours, but we discovered that this default might cause slowness for large deployments because of the number of checks that are kept around. The default is changed to 1 minute, and it is left up to the user to configure it higher if they would like to keep around the history of checks for longer. #5157
๐ฑ ๐ feature
๐ฒ We have started adding a
--team
flag to Fly commands so that you can run them against different teams that you're authorized to perform actions against, without having to log in to the team with a separate Fly target. (#4406)๐ So far, the flag has been added to
intercept
,trigger-job
,pause-job
,unpause-job
, andjobs
. In the future we will likely either continue with this change or start to re-think the overall Fly flow to see if there's a better alternative.๐ฑ ๐ fix
- ๐ Previously, the build tracker would unconditionally fire off a goroutine for each in-flight build (which then locks and short-circuits if the build is already tracked). We changed it so that the build tracker will only do so if we don't have a goroutine for it already. #5075
๐ฑ ๐ fix
- We fixed a bug for job that have any type of serial groups set (
serial: true
,serial_groups
ormax_in_flight
). Whenever a build for that job would be scheduled and we check for if the job has hit max in flight, it would unnecessarily recreate all the serial groups in the database. #2724
๐ฑ ๐ fix
- โฑ The scheduler will separate the scheduling of rerun and regular builds (builds created by the scheduler and manually triggered builds) so that in situations where a rerun build is failing to schedule, maybe the input versions are not found, it will not block the scheduler from scheduling regular builds. #5039
๐ฑ ๐ feature
- ๐ป You can now easily enable or disable a resource version from the comfort of your command line using the new fly commands
fly enable-resource-version
andfly disable-resource-version
, thanks to @stigtermichiel! #4876
๐ฑ ๐ fix
- ๐ท We fixed a bug where the existence of missing volumes that had child volumes referencing it was causing garbage collecting all missing volumes to fail. Missing volumes are any volumes that exists in the database but not on the worker. #5038
๐ฑ ๐ fix
- ๐ The ResourceTypeCheckingInterval is not longer respected because of the removal of
radar
in this release withlidar
becoming the default resource checker. Thanks to @evanchaoli for removed the unused flag--resource-type-checking-interval
! #5100
๐ฑ ๐ fix
- ๐ The link for the helm chart in the concourse github repo README was fixed thanks to @danielhelfand! #4986
๐ฑ ๐ feature
- ๐ Include job label in build duration metrics exported to Prometheus. #4976
๐ฑ ๐ fix
- The database will now use a version hash to look up resource caches in order to speed up any queries that reference resource caches. This will help speed up the resource caches garbage collection. #5093
๐ฑ ๐ fix
- โ
If you have
lidar
enabled, we fixed a bug where pinning an old version of a mock resource would cause it to become the latest version. #5127
๐ฑ ๐ fix
- ๐ Explicitly whitelisted all traffic for concourse containers in order to allow outbound connections for containers on Windows. Thanks to @aemengo! #5159
๐ฑ ๐ feature
โ Add experimental support for exposing traces to Jaeger or Stackdriver.
๐ With this feature enabled (via
--tracing-(jaeger|stackdriver)-*
variables inconcourse web
), theweb
node starts recording traces that represent the various steps that a build goes through, sending them to the configured trace collector. #5043๐ As this feature is being built using OpenTelemetry, expect to have support for other systems soon.
๐ฑ ๐ feature
- @joshzarrabi added the
--all
flag to thefly pause-pipeline
andfly unpause-pipeline
commands. This allows users to pause or unpause every pipeline on a team at the same time. #4092
๐ฑ ๐ fix
- In the case that a user has multiple roles on a team, the pills on the team headers on the dashboard now accurately reflect the logged-in user's most-privileged role on each team. #5133
๐ฑ ๐ fix
- 0๏ธโฃ Set a default value of
4h
forrebalance-interval
. Previously, this value was unset. With the new default, the workers will reconnect to a randomly selected TSA (SSH Gateway) every 4h.
๐ฑ ๐ fix
- ๐ท With #5015, worker state metrics will be emitted even for states with 0 workers, rather than not emitting the metric at all. This should make it easier to confirm that there are in fact 0 stalled workers as opposed to not having any knowledge of it.
๐ฑ ๐ fix
- โฌ๏ธ Bump golang.org/x/crypto module from
v0.0.0-20191119213627-4f8c1d86b1ba
tov0.0.0-20200220183623-bac4c82f6975
to address vulnerability in ssh package.
๐ฑ ๐ feature
- ๐ Improve the initial page load time by lazy-loading Javascript that isn't necessary for the first render. #5148
๐ฑ ๐ feature
- โก๏ธ @aledeganopix4d added a
last updated
column to the output offly pipelines
showing
the last date where the pipeline was set or reset. #5113
๐ฑ ๐ fix
- ๐ Ensure the build page doesn't get reloaded when you highlight a log line, and fix auto-scrolling to a highlighted log line. #5275
๐ฑ ๐ fix
๐ With #4168,
fly sync
no longer requires a target to be registered beforehand; instead, a--concourse-url
(or-c
) flag may be specified.๐ This should make it a bit easier to keep your CLI in sync if and when we change the login process again.
๐ฑ ๐ fix
fly validate-pipeline
will no longer blow up when given a pipeline config which usesvar_sources
.
๐ฑ ๐ fix
- ๐ We've tweaked the UI on the resource page; when a version is pinned, rather than cramming the pinned version into the header, the "pin bar" for the version will now replace the "checking successfully" bar, since pinning ultimately prevents a resource from checking.
-
v6.0.0-rc.10
February 03, 2020 -
v6.0.0-pre Changes
January 28, 2020๐ > This is a pre-release of 6.0 which brings major changes not yet captured by the following release notes. Watch this space! ๐
๐ฑ ๐ feature
- ๐ Include job label in build duration metrics exported to Prometheus. #4976
๐ฑ ๐ fix
- ๐ป The dashboard page refreshes its data every 5 seconds. Until now, it was possible (especially for admin users) for the dashboard to initiate an ever-growing number of API calls, unnecessarily consuming browser, network and API resources. Now the dashboard will not initiate a request for more data until the previous request finishes. #5023
๐ฑ ๐ feature
- โ Add experimental support for exposing traces to Jaeger or Stackdriver.
With this feature enabled (via
--tracing-(jaeger|stackdriver)-*
variables in
๐concourse web
), theweb
node starts recording traces that represent the
๐ various steps that a build goes through, sending them to the configured trace
collector. #4607๐ As this feature is being built using OpenTelemetry, expect to have support for
other systems soon. -
v5.8.1 Changes
March 24, 2020๐ฑ ๐ fix
- โฌ๏ธ Bump golang.org/x/crypto module from
v0.0.0-20191119213627-4f8c1d86b1ba
tov0.0.0-20200220183623-bac4c82f6975
to address vulnerability in ssh package.
๐ฑ ๐ fix
- ๐ Fix an edge case of CVE-2018-15798 where redirect URI during login flow could be embedded with a malicious host.
- โฌ๏ธ Bump golang.org/x/crypto module from
-
v5.8.0 Changes
January 08, 2020๐ฑ ๐ feature
The first step (heh) along our road to v10 has been taken!
@evanchaoli implemented the
set_pipeline
step described by RFC #31. The RFC is still technically in progress so the step is 'experimental' for now.๐ The
set_pipeline
step allows a build to configure a pipeline within the build's team. This is the first "core" step type added since the concept of "build plans" was introduced, joiningget
,put
, andtask
. Exciting!๐ง The key goal of the v10 roadmap is to support multi-branch and PR workflows, which require something more dynamic than
fly set-pipeline
. The theory is that by making pipelines more first-class - allowing them to be configured and automated by Concourse itself - we can support these more dynamic use cases by leveraging existing concepts instead of adding complexity to existing ones.As a refresher, here's where this piece fits in our roadmap for multi-branch/PR workflows:
๐ With RFC #33: archiving pipelines, any pipelines set by a
set_pipeline
step will be subject to automatic archival once a new build of the same job completes that no longer sets the pipeline. This way pipelines that are removed from the build plan will automatically go away, while preserving their build history.๐ With RFC #34: instanced pipelines, pipelines sharing a common template can be configured with a common name, using
((vars))
to identify the instance. For example, you could have many instances of abranches
pipeline, with((branch_name))
as the "instance" var. Building on the previous point, instances which are no longer set by the build will be automatically archived.โก๏ธ With RFC #29: spatial resources, the
set_pipeline
step can be automated to configure a pipeline instance corresponding to each "space" of a resource - i.e. all branches or pull requests in a repo. This RFC needs a bit of TLC (it hasn't been updated to be prototype-based), but the basic idea is there.With all three of these RFCs delivered, we will have complete automation of pipelines for branches and pull requests! For more detail on the whole approach, check out the original v10 blog post.
Looking further ahead on the roadmap, RFC #32: projects proposes introduce a more explicit GitOps-style approach to configuration automation. In this context the
set_pipeline
step may feel a lot more natural. Until then, theset_pipeline
step can be used as a simpler alternative to theconcourse-pipeline
resource, with the key difference being that theset_pipeline
step doesn't need any auth config.๐ฑ ๐ feature
- @evanchaoli added support for
var_sources
in the pipeline config. With this feature, concourse can fetch secrets from multiple independent credential managers per pipeline. While this feature is currently in an experimental state and not yet tested in production, it is the first step to enabling workflows where teams sharing a Concourse instance can independently manage their own credential managers. For the moment, only vault or the dummy credential manager can be used to back avar_source
(the other credential manager types do not work). #4600, #4777
๐ฑ ๐ feature
- ๐ง @evanchaoli added the ability to tune the mapping between API actions and roles via the
--config-rbac
flag. While you can't yet create your own roles, you can customize the built-in ones by promoting and demoting the roles to which certain API actions are assigned. #4657
๐ฑ ๐ feature
- ๐ @AndrewCopeland and @cyberark-bizdev added support for Conjur as a credential manager. #4693
๐ฑ ๐ feature
๐ฑ ๐ feature
- ๐ The pin menu on the pipeline page now matches the sidebar, and the dropdown toggles on clicking the pin icon. #4688
๐ฑ ๐ feature
- Prometheus and NewRelic can receive Lidar check-finished event now. #4556
๐ฑ ๐ feature
- ๐ง Make Garden client HTTP timeout configurable. #4707
๐ฑ ๐ feature
- ๐ฒ @pivotal-bin-ju @taylorsilva @xtreme-sameer-vohra added batching to the NewRelic emitter and logging info for non 2xx responses from NewRelic. #4698
๐ฑ ๐ feature
- ๐ @andhadley added support for Vault namespaces. #4748
๐ฑ ๐ feature
- ๐ @hfinucane added a
--url
flag tofly watch
, so now you can just copy the URL of a build from your browser and paste it in your terminal to keep watching the build. #4323
๐ฑ ๐ feature
- Concourse team roles can now be assigned to different CF space roles independently. For example, you can now create role mappings like "auditors in my CF space should be viewers in my Concourse team", whereas before you could only assign Concourse roles to CF developers. #4712, #4729
๐ฑ ๐ feature
- ๐ Concourse now emits some useful metrics when lidar is enabled: the size of the check queue, the number of checks queued per atc each tick, number of checks GCed at a time, checks started and checks finished. #4692
๐ฑ ๐ feature
- ๐ The build page now shows text labels for different step types, like
get:
,task:
andset_pipeline:
, instead of the icons from previous versions. Hopefully this is more accessible and easier to interpret! #4942
๐ฑ ๐ feature, stub
- The Concourse team is in the early stages of implementing a new backend for our container runtime based on containerd, which is more featureful than the guardian we have relied on until now. We have not yet implemented all of the methods required by Garden, so the existing work (which can be enabled by passing the
--use-containerd
flag toconcourse worker
) is in a non-functional state. This work is tracked in this project. #4779, #4778, #4752, #4853, #4784
๐ฑ ๐ fix
- ๐ @kcmannem finally fixed the jagged edges on the progress bar indicators used by the dashboard. #4865
๐ฑ ๐ fix
๐ @evanchaoli fixed a weird behavior with secret redaction wherein a secret containing e.g.
{
on its own line (i.e. formatted JSON) would result in{
being replaced with((redacted))
in build logs. Single-character lines will instead be skipped. #4749As an aside, anyone with a truly single-character credential may want to add another character or two.
๐ฑ ๐ fix
- 0๏ธโฃ @vito bumped the
autocert
dependency so that Let's Encrypt will default to the ACME v2 API. #4804
๐ฑ ๐ fix
- ๐ Bumped the
registry-image
resource to v0.8.2, which should resolveDIGEST_INVALID
errors (among others) introduced by faulty retry logic. Additionally, the resource will now retry on429 Too Many Requests
errors from the registry, with exponential back-off up to 1 hour.
๐ฑ ๐ fix
- ๐ @evanchaoli fixed a race condition resulting in a crash with LIDAR enabled. #4808
๐ฑ ๐ fix
- ๐ @evanchaoli fixed a regression introduced with the secret redaction work which resulted in build logs being buffered. #4817
๐ฑ ๐ fix
- Fixed the problem of when fail_fast for in_parallel is true, a failing step causes the in_parallel to fall into on_error. #4746
๐ฑ ๐ fix
๐ฑ ๐ fix
- ๐ง @evanchaoli changed the behaviour of
fly set-team
so that when a role has no groups or users configured, it no longer raises an error. #4858
๐ฑ ๐ fix
๐ฑ ๐ fix
- @xtremerui changed the
concourse
CLI to output help text onstdout
when the-h
or--help
flag is passed. This makes it easier to use other tools likegrep
to find relevant parts of the usage text. #4745
๐ฑ ๐ fix
- ๐ Concourse used to check the existence of legacy migration table by accessing
information_schema
and parsed out the error messagedoes not exist
in English; @xtremerui changed it by usingto_regclass
in postgres 9.4+, which resolved the issue for users with non-English (i.e. German) system language setup failed to migrate database. #4701
๐ฑ ๐ fix
โ @vito bumped the default value for the Let's Encrypt ACME URL to point to their v2 API instead of v1. This should have been in v5.7.2, but we had no automated testing for Let's Encrypt integration so there wasn't really a mental cue to check for this sort of thing.
โ We're adding Let's Encrypt to our smoke tests now to catch API deprecations more quickly, and a unit test has been added to ensure that the default value for the ACME URL flag matches the default value for the client. #4869
๐ฑ ๐ fix
- ๐ @pivotal-bin-ju fixed x509 issue when the super admin login without CACert after the first sucessful login. #4587
๐ฑ ๐ fix
- ๐ท @kirillbilchenko fixed a bug where the
concourse_workers_registered
metric would never go below 1, even when workers were pruned. #4895
๐ฑ ๐ enhancement
- ๐ @matthewpereira enlarged the build prep list font to match the other build log output styling. #4826
๐ฑ ๐ fix
- ๐ @cirocosta fixed a bug where an error that's not specific could lead to null pointer exception during the container creation phase. #4932
- @evanchaoli added support for
-
v5.7.2 Changes
November 29, 2019๐ฑ ๐ fix
- 0๏ธโฃ @vito bumped the
autocert
dependency so that Let's Encrypt will default to the ACME v2 API. #4805
๐ฑ ๐ fix
- ๐ @evanchaoli fixed a race condition resulting in a crash with LIDAR enabled. #4808
๐ฑ ๐ fix
- ๐ @evanchaoli fixed a regression introduced with the secret redaction work which resulted in build logs being buffered. #4817
- 0๏ธโฃ @vito bumped the
-
v5.7.1 Changes
November 18, 2019๐ฑ ๐ fix
- v5.7.0 changed how CloudFoundry roles mapped to Concourse RBAC when using the CF Auth connector.
๐ Instead of enforcing this change, we would rather support both configurations in a future release.
๐ The original change is documented in v5.7.0 release notes. #4699
๐ฑ ๐ feature
- ๐ง Make Garden client HTTP timeout configurable. #4707
๐ฑ ๐ feature
- ๐ฒ Batch emissions and logging info for non-2xx responses from NewRelic, for NewRelic emitter #4698.
- v5.7.0 changed how CloudFoundry roles mapped to Concourse RBAC when using the CF Auth connector.
-
v5.7.0 Changes
October 31, 2019๐ฑ ๐ feature
๐ We've introduced a
components
table in order to better synchronize all the internal processes that run on the web nodes.This should help reduce the amount of duplicated work (when running more than 1 ATC), and decrease the load on your database.
๐ง There is no configuration required to take advantage of these new improvements.
๐ฑ ๐ feature, breaking
- ๐ง The CloudFoundry auth connector, when configured to authorize users based on CF space membership, will now authorize space auditors and space managers in addition to space developers. This is a breaking change as any teams with CF space-based configuration may grant access to users that they wouldn't have before. #4661
๐ฑ ๐ feature, breaking
- ๐ฐ All API payloads are now gzipped. This should help save bandwidth and make the web UI load faster. #4470
๐ฑ ๐ feature
- ๐ @ProvoK added support for a
?title=
query parameter on the pipeline/job badge endpoints! Now you can make it say something other than "build". #4480
๐ฑ ๐ feature
- @evanchaoli added a feature to stop ATC from attempting to renew Vault leases that are not renewable #4518.
๐ฑ ๐ feature
- @aledeganopix4d added a feature sort pipelines alphabetically #4334.
๐ฑ ๐ feature
- ๐ป API endpoints have been changed to use a single transaction per request, so that they become "all or nothing" instead of holding data in memory while waiting for another connection from the pool. In the past, this could lead to snowballing and increased memory usage as requests from the web UI (polling every 5 seconds) piled up. #4494
๐ฑ ๐ feature
๐ฑ ๐ fix
- ๐ @iamjarvo fixed a bug where
fly builds
would show the wrong duration for cancelled builds #4507.
๐ฑ ๐ feature
- โก๏ธ @pnsantos updated the Material Design icon library so now the
concourse-ci
icon is available for resources ๐ #4590
๐ฑ ๐ fix
- The
fly format-pipeline
now always produces a formatted pipeline, instead of declining to do so when it was already in the expected format. #4492
๐ฑ ๐ fix
- ๐ Fixed a regression when running
fly sync
it shows warning of parsing Content-Length and progress bar not showing downloading progress. #4666
๐ฑ ๐ feature
- ๐ท Concourse now garbage-collects worker containers and volumes that are not tracked in the database. In some niche cases, it is possible for containers and/or volumes to be created on the worker, but the database (via the web) assumes their creation had failed. If this occurs, these untracked containers can pile up on the worker and use resources. #3600 ensures that they get cleaned appropriately.
๐ฑ ๐ feature
- โ Add 5 minute timeout for baggageclaim destroy calls. #4516
๐ฑ ๐ feature
- โ Add 5 minute timeout for worker's garden client http calls. This is primarily to address cases such as destroy which may hang indefinitely causing GC to stop occurring. #4467
๐ฑ ๐ fix
- ๐ท Transition
failed
state containers todestroying
resulting in them being GC'ed. This ensures that if web's call to garden to create a container times out, the container is subsequently deleted from garden prior to being deleted from the db. This keeps the web's and worker's state consistent. #4562
๐ฑ ๐ fix
- ๐ Previously, if a worker stalled, the atc would still countdown and remove any 'missing' containers. If the worker ever came back it would still have these containers, but we would not longer be tracking them in the database. Even though we're now garbage collecting these unknown containers, we'd rather that be a last resort. So we fixed it.
๐ฑ ๐ feature
- โก๏ธ @wagdav updated worker heartbeat log level from
debug
toinfo
to reduce extraneous log output for operators #4606
๐ฑ ๐ fix
- ๐ Fixed a bug where your dashboard search string would end up with
+
s instead of spaces when logging in. #4265
๐ฑ ๐ fix
- ๐ Fixed a bug where the job page would show a loading spinner forever when there were no builds (like before the job had ever been run) #4636.
๐ฑ ๐ fix
- ๐ Fixed a bug where the tooltip that says 'new version' on a get step on the build page could be hidden underneath the build header #4630.
๐ฑ ๐ fix
๐ฑ ๐ fix
- ๐ @evanchaoli fixed a bug where secret redaction incorrectly "redacts" empty string resulting in mangled logs. #4668
๐ฑ ๐ feature
- ๐ We've restyled the resource metadata displayed in a get step on the build page. It should be easier to read and follow, let us know your critiques on the issue. #4421 #4476
๐ฑ ๐ fix
- ๐ง @CliffHoogervorst fixed an issue in the git resource, where the version order was not correct when using
paths
concourse/git-resource#273.
๐ฑ ๐ fix
- ๐ท @evanchaoli fixed an issue, where
fly workers
would show the wrong age for a worker if that worker was under an hour old #4548.
๐ฑ ๐ fix
- ๐ @hbd fixed a bug in the
registry-image
resource whereget
steps would mysteriously give a 404 error concourse/registry-image-resource#67.
๐ฑ ๐ fix
- ๐ณ Made the
registry-image
resource more resilient - requests that get a 429 (Too Many Requests) from Docker Hub will be retried concourse/registry-image-resource#69.
๐ฑ ๐ fix
- ๐ @ProvoK fixed an issue, that will help resource authors better understand the errors being returned by concourse.
๐ฑ ๐ fix
๐ We fixed an issue, introduced in 5.6.0, where checking a resource would fail if the resource and resource type shared the same name.
๐ This actually seemed to exacerbate another issue, which we also took the time to fix in #4626.
You gotta spend money to make money.
๐ฑ ๐ feature
- @evanchaoli added
minimum_succeeded_builds
to the build log rentention on the job config, that will ensure the build reaper keeps around logs for N successful builds, even if your builds are on a killer losing streak.
๐ฑ ๐ fix
๐ We fixed a migration from 5.4.0. It only affected a small number users that had old unused resources left over from the ancient times. This probably isn't you, so don't worry.
If you ran into this error
<3
s for being a long time concourse user.๐ฑ ๐ fix
- ๐ @aledeganopix4d added some lock types that weren't getting emitted as part of our metrics, so that's neat. You might actually see your lock metrics shoot up because of this, don't panic, it's expected.
๐ฑ ๐ fix
- ๐ง @evanchaoli fixed a bug where vault users, that hadn't configured a shared path, would end up searching the top level
prefix
path for secrets.
๐ฑ ๐ fix
- ๐ @evanchaoli fixed yet another bug where the builds api would return the wrong builds if you gave it a date newer than the most recent build.