fetch: pass transport to post-fetch connectivity check#2123
Open
spkrka wants to merge 1 commit into
Open
Conversation
When fetching with a transport that sets `self_contained_and_connected` (as index-pack does for self-contained packs), check_connected() can use find_pack_entry_one() to skip connectivity verification for refs whose objects exist in the new pack. This avoids sending those OIDs to the rev-list child process. However, store_updated_refs() never passed the transport to check_connected(), so opt.transport was always NULL and this optimization was dead code for post-fetch connectivity checks. Thread the transport parameter through store_updated_refs() and set opt.transport so that check_connected() can take advantage of self-contained packs. On a large repository (2.4M commits, 374K files, 10.9K local refs), fetching 200 new commits: Before: rev-list connectivity check 22s, total fetch 36s After: rev-list connectivity check 5s, total fetch 14s The remaining 5s is spent verifying refs not contained in the new pack. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Author
|
/submit |
|
Submitted as pull.2123.git.1779625693328.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
Junio C Hamano wrote on the Git mailing list (how to reply to this email): "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Kristofer Karlsson <krka@spotify.com>
>
> When fetching with a transport that sets `self_contained_and_connected`
> (as index-pack does for self-contained packs), check_connected() can
> use find_pack_entry_one() to skip connectivity verification for refs
> whose objects exist in the new pack. This avoids sending those OIDs to
> the rev-list child process.
>
> However, store_updated_refs() never passed the transport to
> check_connected(), so opt.transport was always NULL and this
> optimization was dead code for post-fetch connectivity checks.
>
> Thread the transport parameter through store_updated_refs() and set
> opt.transport so that check_connected() can take advantage of
> self-contained packs.
>
> On a large repository (2.4M commits, 374K files, 10.9K local refs),
> fetching 200 new commits:
>
> Before: rev-list connectivity check 22s, total fetch 36s
> After: rev-list connectivity check 5s, total fetch 14s
>
> The remaining 5s is spent verifying refs not contained in the new pack.
Impressive.
The check_connected() function itself is a battle tested helper
function, with the optimization that originates in c6807a40 (clone:
open a shortcut for connectivity check, 2013-05-26), and then
polished in 26b974b3 (check_connected(): delay opening new_pack,
2026-03-05), allowing available "transport" to be taken into account
does make very good sense.
The other call to check_connected() that appear in builtin/fetch.c
does not pass opt.transport, either, but this one checks before we
even fetch any packs over any transport, so a tweak similar to this
patch would not help that code path, I guess. In fact, many calls
to check_connected() elsewhere use opt that is often local to the
scope, that do not have transport at all. I wonder if there are
some of them that benefit from a similar tweak?
Thanks.
>
> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> ---
> fetch: pass transport to post-fetch connectivity check
>
> We're working on reducing git fetch times on a large monorepo (2.4M
> commits, 374K files, 10.9K local refs). Profiling showed the post-fetch
> connectivity check (rev-list --objects --stdin --not --all) dominating
> wall time when there are new objects.
>
> While investigating, I noticed that check_connected() already has a fast
> path for self-contained packs — it uses find_pack_entry_one() to skip
> refs whose objects are in the new pack. builtin/clone.c passes the
> transport to enable this, but store_updated_refs() in builtin/fetch.c
> does not, making the optimization dead code for fetches.
>
> The fix is a three-line change to thread the transport through.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2123%2Fspkrka%2Ffetch-transport-fix-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2123/spkrka/fetch-transport-fix-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/2123
>
> builtin/fetch.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index a22c319467..647fd1c30c 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1213,6 +1213,7 @@ N_("it took %.2f seconds to check forced updates; you can use\n"
> "to avoid this check\n");
>
> static int store_updated_refs(struct display_state *display_state,
> + struct transport *transport,
> int connectivity_checked,
> struct ref_transaction *transaction, struct ref *ref_map,
> struct fetch_head *fetch_head,
> @@ -1228,6 +1229,7 @@ static int store_updated_refs(struct display_state *display_state,
> if (!connectivity_checked) {
> struct check_connected_options opt = CHECK_CONNECTED_INIT;
>
> + opt.transport = transport;
> opt.exclude_hidden_refs_section = "fetch";
> rm = ref_map;
> if (check_connected(iterate_ref_map, &rm, &opt)) {
> @@ -1432,7 +1434,7 @@ static int fetch_and_consume_refs(struct display_state *display_state,
> }
>
> trace2_region_enter("fetch", "consume_refs", the_repository);
> - ret = store_updated_refs(display_state, connectivity_checked,
> + ret = store_updated_refs(display_state, transport, connectivity_checked,
> transaction, ref_map, fetch_head, config,
> display_array);
> trace2_region_leave("fetch", "consume_refs", the_repository);
>
> base-commit: 6a4418c36d6bad69a599044b3cf49dcbd049cb45 |
|
Kristofer Karlsson wrote on the Git mailing list (how to reply to this email): Good catch! After finding this case, I looked into the other related
call sites but found that they are already correct as-is:
- builtin/clone.c - already passes opt.transport (this is where I
copied it from)
- builtin/receive-pack.c (3 calls) - no transport object available to propagate
- fetch-pack.c - only used for the --deepen path, which sets
connectivity_checked when it passes,
so the store_updated_refs() check is skipped entirely and transport
is not needed
- bundle.c - no need for transport
I am not 100% sure, but I suppose it's always possible to follow up
with more reuse of this later.
- Kristofer
On Sun, 24 May 2026 at 14:53, Junio C Hamano <gitster@pobox.com> wrote:
>
> "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Kristofer Karlsson <krka@spotify.com>
> >
> > When fetching with a transport that sets `self_contained_and_connected`
> > (as index-pack does for self-contained packs), check_connected() can
> > use find_pack_entry_one() to skip connectivity verification for refs
> > whose objects exist in the new pack. This avoids sending those OIDs to
> > the rev-list child process.
> >
> > However, store_updated_refs() never passed the transport to
> > check_connected(), so opt.transport was always NULL and this
> > optimization was dead code for post-fetch connectivity checks.
> >
> > Thread the transport parameter through store_updated_refs() and set
> > opt.transport so that check_connected() can take advantage of
> > self-contained packs.
> >
> > On a large repository (2.4M commits, 374K files, 10.9K local refs),
> > fetching 200 new commits:
> >
> > Before: rev-list connectivity check 22s, total fetch 36s
> > After: rev-list connectivity check 5s, total fetch 14s
> >
> > The remaining 5s is spent verifying refs not contained in the new pack.
>
> Impressive.
>
> The check_connected() function itself is a battle tested helper
> function, with the optimization that originates in c6807a40 (clone:
> open a shortcut for connectivity check, 2013-05-26), and then
> polished in 26b974b3 (check_connected(): delay opening new_pack,
> 2026-03-05), allowing available "transport" to be taken into account
> does make very good sense.
>
> The other call to check_connected() that appear in builtin/fetch.c
> does not pass opt.transport, either, but this one checks before we
> even fetch any packs over any transport, so a tweak similar to this
> patch would not help that code path, I guess. In fact, many calls
> to check_connected() elsewhere use opt that is often local to the
> scope, that do not have transport at all. I wonder if there are
> some of them that benefit from a similar tweak?
>
> Thanks.
>
>
> >
> > Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> > ---
> > fetch: pass transport to post-fetch connectivity check
> >
> > We're working on reducing git fetch times on a large monorepo (2.4M
> > commits, 374K files, 10.9K local refs). Profiling showed the post-fetch
> > connectivity check (rev-list --objects --stdin --not --all) dominating
> > wall time when there are new objects.
> >
> > While investigating, I noticed that check_connected() already has a fast
> > path for self-contained packs — it uses find_pack_entry_one() to skip
> > refs whose objects are in the new pack. builtin/clone.c passes the
> > transport to enable this, but store_updated_refs() in builtin/fetch.c
> > does not, making the optimization dead code for fetches.
> >
> > The fix is a three-line change to thread the transport through.
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2123%2Fspkrka%2Ffetch-transport-fix-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2123/spkrka/fetch-transport-fix-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/2123
> >
> > builtin/fetch.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/fetch.c b/builtin/fetch.c
> > index a22c319467..647fd1c30c 100644
> > --- a/builtin/fetch.c
> > +++ b/builtin/fetch.c
> > @@ -1213,6 +1213,7 @@ N_("it took %.2f seconds to check forced updates; you can use\n"
> > "to avoid this check\n");
> >
> > static int store_updated_refs(struct display_state *display_state,
> > + struct transport *transport,
> > int connectivity_checked,
> > struct ref_transaction *transaction, struct ref *ref_map,
> > struct fetch_head *fetch_head,
> > @@ -1228,6 +1229,7 @@ static int store_updated_refs(struct display_state *display_state,
> > if (!connectivity_checked) {
> > struct check_connected_options opt = CHECK_CONNECTED_INIT;
> >
> > + opt.transport = transport;
> > opt.exclude_hidden_refs_section = "fetch";
> > rm = ref_map;
> > if (check_connected(iterate_ref_map, &rm, &opt)) {
> > @@ -1432,7 +1434,7 @@ static int fetch_and_consume_refs(struct display_state *display_state,
> > }
> >
> > trace2_region_enter("fetch", "consume_refs", the_repository);
> > - ret = store_updated_refs(display_state, connectivity_checked,
> > + ret = store_updated_refs(display_state, transport, connectivity_checked,
> > transaction, ref_map, fetch_head, config,
> > display_array);
> > trace2_region_leave("fetch", "consume_refs", the_repository);
> >
> > base-commit: 6a4418c36d6bad69a599044b3cf49dcbd049cb45 |
|
User |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We're working on reducing
git fetchtimes on a large monorepo (2.4Mcommits, 374K files, 10.9K local refs). Profiling showed the post-fetch
connectivity check (
rev-list --objects --stdin --not --all) dominatingwall time when there are new objects.
While investigating, I noticed that
check_connected()already has afast path for self-contained packs — it uses
find_pack_entry_one()toskip refs whose objects are in the new pack.
builtin/clone.cpassesthe transport to enable this, but
store_updated_refs()inbuiltin/fetch.cdoes not, making the optimization dead code forfetches.
The fix is a three-line change to thread the transport through.
cc: Kristofer Karlsson krka@spotify.com