Junio C Hamano [Fri, 25 Oct 2019 06:10:32 +0000 (15:10 +0900)]
Merge branch 'pb/pretty-email-without-domain-part' into pu
* pb/pretty-email-without-domain-part:
pretty: add "%aL"|"%al|%cL|%cl" option to output local-part of email addresses
t4203: use test-lib.sh definitions
t6006: use test-lib.sh definitions
Junio C Hamano [Fri, 25 Oct 2019 06:10:18 +0000 (15:10 +0900)]
Merge branch 'ra/rebase-i-more-options' into pu
"git rebase -i" learned a few options that are known by "git
rebase" proper.
* ra/rebase-i-more-options:
rebase: add --reset-author-date
rebase -i: support --ignore-date
sequencer: rename amend_author to author_to_rename
rebase -i: support --committer-date-is-author-date
sequencer: allow callers of read_author_script() to ignore fields
rebase -i: add --ignore-whitespace flag
Junio C Hamano [Fri, 25 Oct 2019 06:10:17 +0000 (15:10 +0900)]
Merge branch 'mt/threaded-grep-in-object-store' into pu
Traditionally, we avoided threaded grep while searching in objects
(as opposed to files in the working tree) as accesses to the object
layer is not thread-safe. This limitation is getting lifted.
* mt/threaded-grep-in-object-store:
grep: move driver pre-load out of critical section
grep: re-enable threads in non-worktree case
grep: protect packed_git [re-]initialization
grep: allow submodule functions to run in parallel
submodule-config: add skip_if_read option to repo_read_gitmodules()
grep: replace grep_read_mutex by internal obj read lock
object-store: allow threaded access to object reading
replace-object: make replace operations thread-safe
grep: fix racy calls in grep_objects()
grep: fix race conditions at grep_submodule()
grep: fix race conditions on userdiff calls
Junio C Hamano [Fri, 25 Oct 2019 06:10:17 +0000 (15:10 +0900)]
Merge branch 'js/protocol-advertise-multi' into pu
The transport layer has been updated so that the protocol version
used can be negotiated between the parties, by the initiator
listing the protocol versions it is willing to talk, and the other
side choosing from one of them.
Junio C Hamano [Fri, 25 Oct 2019 06:10:16 +0000 (15:10 +0900)]
Merge branch 'mk/use-size-t-in-zlib' into pu
The wrapper to call into zlib followed our long tradition to use
"unsigned long" for sizes of regions in memory, which have been
updated to use "size_t".
* mk/use-size-t-in-zlib:
zlib.c: use size_t for size
Junio C Hamano [Fri, 25 Oct 2019 06:10:15 +0000 (15:10 +0900)]
Merge branch 'jc/format-patch-delay-message-id' into pu
The location "git format-patch --thread" adds the Message-Id:
header in the series of header fields has been moved down, which
may help working around a suspected bug in GMail MSA, reported at
<CAHk-=whP1stFZNAaJiMi5eZ9rj0MRt20Y_yHVczZPH+O01d+sA@mail.gmail.com>
* jc/format-patch-delay-message-id:
format-patch: move message-id and related headers to the end
Junio C Hamano [Fri, 25 Oct 2019 06:08:21 +0000 (15:08 +0900)]
Merge branch 'ds/sparse-cone' into jch
Management of sparsely checked-out working tree has gained a
dedicated "sparse-checkout" command.
* ds/sparse-cone:
sparse-checkout: cone mode should not interact with .gitignore
sparse-checkout: write using lockfile
sparse-checkout: update working directory in-process
sparse-checkout: sanitize for nested folders
read-tree: show progress by default
unpack-trees: add progress to clear_ce_flags()
unpack-trees: hash less in cone mode
sparse-checkout: init and set in cone mode
sparse-checkout: use hashmaps for cone patterns
sparse-checkout: add 'cone' mode
trace2: add region in clear_ce_flags
sparse-checkout: create 'disable' subcommand
sparse-checkout: add '--stdin' option to set subcommand
sparse-checkout: 'set' subcommand
clone: add --sparse mode
sparse-checkout: create 'init' subcommand
sparse-checkout: create builtin with 'list' subcommand
Junio C Hamano [Fri, 25 Oct 2019 06:08:20 +0000 (15:08 +0900)]
Merge branch 'jk/cleanup-object-parsing-and-fsck' into jch
Crufty code and logic accumulated over time around the object
parsing and low-level object access used in "git fsck" have been
cleaned up.
* jk/cleanup-object-parsing-and-fsck: (23 commits)
fsck: accept an oid instead of a "struct tree" for fsck_tree()
fsck: accept an oid instead of a "struct commit" for fsck_commit()
fsck: accept an oid instead of a "struct tag" for fsck_tag()
fsck: rename vague "oid" local variables
fsck: don't require an object struct in verify_headers()
fsck: don't require an object struct for fsck_ident()
fsck: drop blob struct from fsck_finish()
fsck: accept an oid instead of a "struct blob" for fsck_blob()
fsck: don't require an object struct for report()
fsck: only require an oid for skiplist functions
fsck: only provide oid/type in fsck_error callback
fsck: don't require object structs for display functions
fsck: use oids rather than objects for object_name API
fsck_describe_object(): build on our get_object_name() primitive
fsck: unify object-name code
fsck: require an actual buffer for non-blobs
fsck: stop checking tag->tagged
fsck: stop checking commit->parent counts
fsck: stop checking commit->tree value
remember commit/tag parse failures
...
Prarit Bhargava [Thu, 24 Oct 2019 23:36:17 +0000 (19:36 -0400)]
pretty: add "%aL"|"%al|%cL|%cl" option to output local-part of email addresses
In many projects the number of contributors is low enough that users know
each other and the full email address doesn't need to be displayed.
Displaying only the author's username saves a lot of columns on the screen.
For example displaying "prarit" instead of "prarit@redhat.com" saves 11
columns.
Add a "%aL"|"%al|%cL|%cl" option that output the local-part of an email
address.
Also add tests for "%ae","%an", "%ce", and "%cn".
Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 18 Oct 2019 05:02:08 +0000 (01:02 -0400)]
fsck: accept an oid instead of a "struct tree" for fsck_tree()
We don't actually look at the tree struct in fsck_tree() beyond its oid
and type (which is obviously OBJ_TREE). Just taking an oid gives our
callers more flexibility to avoid creating a struct, and makes it clear
that we are fscking just what is in the buffer, not any pre-parsed bits
from the struct.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 18 Oct 2019 05:01:48 +0000 (01:01 -0400)]
fsck: accept an oid instead of a "struct commit" for fsck_commit()
We don't actually look at the commit struct in fsck_commit() beyond its
oid and type (which is obviously OBJ_COMMIT). Just taking an oid gives
our callers more flexibility to avoid creating or parsing a struct, and
makes it clear that we are fscking just what is in the buffer, not any
pre-parsed bits from the struct.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 18 Oct 2019 05:01:26 +0000 (01:01 -0400)]
fsck: accept an oid instead of a "struct tag" for fsck_tag()
We don't actually look at the tag struct in fsck_tag() beyond its oid
and type (which is obviously OBJ_TAG). Just taking an oid gives our
callers more flexibility to avoid creating or parsing a struct, and
makes it clear that we are fscking just what is in the buffer, not any
pre-parsed bits from the struct.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 18 Oct 2019 05:00:59 +0000 (01:00 -0400)]
fsck: rename vague "oid" local variables
In fsck_commit() and fsck_tag(), we have local "oid" variables used for
parsing parent and tagged-object oids. Let's give these more specific
names in preparation for the functions taking an "oid" parameter for the
object we're checking.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 18 Oct 2019 05:00:04 +0000 (01:00 -0400)]
fsck: don't require an object struct for fsck_ident()
The only thing we do with the struct is pass its oid and type to
report(). We can just take those explicitly, which gives our callers
more flexibility.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 18 Oct 2019 04:59:54 +0000 (00:59 -0400)]
fsck: drop blob struct from fsck_finish()
Since fsck_blob() no longer requires us to have a "struct blob", we
don't need to create one. Which also means we don't need to worry about
handling the case that lookup_blob() returns NULL (we'll still catch
wrongly-identified blobs when we read the actual object contents and
type from disk).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 18 Oct 2019 04:59:29 +0000 (00:59 -0400)]
fsck: accept an oid instead of a "struct blob" for fsck_blob()
We don't actually need any information from the object struct except its
oid (and the type, of course, but that's implicitly OBJ_BLOB). This
gives our callers more flexibility to drop the object structs, too.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 18 Oct 2019 04:59:15 +0000 (00:59 -0400)]
fsck: don't require an object struct for report()
The report() function really only cares about the oid and type of the
object, not the full object struct. Let's convert it to take those two
items separately, which gives our callers more flexibility.
This makes some already-long lines even longer. I've mostly left them,
as our eventual goal is to shrink these down as we continue refactoring
(e.g., "&item->object" becomes "&item->object.oid, item->object.type",
but will eventually shrink down to "oid, type").
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 18 Oct 2019 04:58:40 +0000 (00:58 -0400)]
fsck: only provide oid/type in fsck_error callback
None of the callbacks actually care about having a "struct object";
they're happy with just the oid and type information. So let's give
ourselves more flexibility to avoid having a "struct object" by just
passing the broken-down fields.
Note that the callback already takes a "type" field for the fsck message
type. We'll rename that to "msg_type" (and use "object_type" for the
object type) to make the distinction explicit.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 18 Oct 2019 04:58:07 +0000 (00:58 -0400)]
fsck: don't require object structs for display functions
Our printable_type() and describe_object() functions take whole object
structs, but they really only care about the oid and type. Let's take
those individually in order to give our callers more flexibility.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 18 Oct 2019 04:57:37 +0000 (00:57 -0400)]
fsck: use oids rather than objects for object_name API
We don't actually care about having object structs; we only need to look
up decorations by oid. Let's accept this more limited form, which will
give our callers more flexibility.
Note that the decoration API we rely on uses object structs itself (even
though it only looks at their oids). We can solve this by switching to
a kh_oid_map (we could also use the hashmap oidmap, but it's more
awkward for the simple case of just storing a void pointer).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 18 Oct 2019 04:56:13 +0000 (00:56 -0400)]
fsck: unify object-name code
Commit 90cf590f53 (fsck: optionally show more helpful info for broken
links, 2016-07-17) added a system for decorating objects with names. The
code is split across builtin/fsck.c (which gives the initial names) and
fsck.c (which adds to the names as it traverses the object graph). This
leads to some duplication, where both sites have near-identical
describe_object() functions (the difference being that the one in
builtin/fsck.c uses a circular array of buffers to allow multiple calls
in a single printf).
Let's provide a unified object_name API for fsck. That lets us drop the
duplication, as well as making the interface boundaries more clear
(which will let us refactor the implementation more in a future patch).
We'll leave describe_object() in builtin/fsck.c as a thin wrapper around
the new API, as it relies on a static global to make its many callers a
bit shorter.
We'll also convert the bare add_decoration() calls in builtin/fsck.c to
put_object_name(). This fixes two minor bugs:
1. We leak many small strings. add_decoration() has a last-one-wins
approach: it updates the decoration to the new string and returns
the old one. But we ignore the return value, leaking the old
string. This is quite common to trigger, since we look at reflogs:
the tip of any ref will be described both by looking at the actual
ref, as well as the latest reflog entry. So we'd always end up
leaking one of those strings.
2. The last-one-wins approach gives us lousy names. For instance, we
first look at all of the refs, and then all of the reflogs. So
rather than seeing "refs/heads/master", we're likely to overwrite
it with "HEAD@{12345678}". We're generally better off using the
first name we find.
And indeed, the test in t1450 expects this ugly HEAD@{} name. After
this patch, we've switched to using fsck_put_object_name()'s
first-one-wins semantics, and we output the more human-friendly
"refs/tags/julius" (and the test is updated accordingly).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 24 Oct 2019 13:40:42 +0000 (13:40 +0000)]
commit-graph: fix writing first commit-graph during fetch
The previous commit includes a failing test for an issue around
fetch.writeCommitGraph and fetching in a repo with a submodule. Here, we
fix that bug and set the test to "test_expect_success".
The problem arises with this set of commands when the remote repo at
<url> has a submodule. Note that --recurse-submodules is not needed to
demonstrate the bug.
$ git clone <url> test
$ cd test
$ git -c fetch.writeCommitGraph=true fetch origin
Computing commit graph generation numbers: 100% (12/12), done.
BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
Aborted (core dumped)
As an initial fix, I converted the code in builtin/fetch.c that calls
write_commit_graph_reachable() to instead launch a "git commit-graph
write --reachable --split" process. That code worked, but is not how we
want the feature to work long-term.
That test did demonstrate that the issue must be something to do with
internal state of the 'git fetch' process.
The write_commit_graph() method in commit-graph.c ensures the commits we
plan to write are "closed under reachability" using close_reachable().
This method walks from the input commits, and uses the UNINTERESTING
flag to mark which commits have already been visited. This allows the
walk to take O(N) time, where N is the number of commits, instead of
O(P) time, where P is the number of paths. (The number of paths can be
exponential in the number of commits.)
However, the UNINTERESTING flag is used in lots of places in the
codebase. This flag usually means some barrier to stop a commit walk,
such as in revision-walking to compare histories. It is not often
cleared after the walk completes because the starting points of those
walks do not have the UNINTERESTING flag, and clear_commit_marks() would
stop immediately.
This is happening during a 'git fetch' call with a remote. The fetch
negotiation is comparing the remote refs with the local refs and marking
some commits as UNINTERESTING.
I tested running clear_commit_marks_many() to clear the UNINTERESTING
flag inside close_reachable(), but the tips did not have the flag, so
that did nothing.
It turns out that the calculate_changed_submodule_paths() method is at
fault. Thanks, Peff, for pointing out this detail! More specifically,
for each submodule, the collect_changed_submodules() runs a revision
walk to essentially do file-history on the list of submodules. That
revision walk marks commits UNININTERESTING if they are simplified away
by not changing the submodule.
Instead, I finally arrived on the conclusion that I should use a flag
that is not used in any other part of the code. In commit-reach.c, a
number of flags were defined for commit walk algorithms. The REACHABLE
flag seemed like it made the most sense, and it seems it was not
actually used in the file. The REACHABLE flag was used in early versions
of commit-reach.c, but was removed by 4fbcca4 (commit-reach: make
can_all_from_reach... linear, 2018-07-20).
Add the REACHABLE flag to commit-graph.c and use it instead of
UNINTERESTING in close_reachable(). This fixes the bug in manual
testing.
Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Jeff King <peff@peff.net> Helped-by: Szeder Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
While dogfooding, Johannes found a bug in the fetch.writeCommitGraph
config behavior. His example initially happened during a clone with
--recurse-submodules, we found that this happens with the first fetch
after cloning a repository that contains a submodule:
$ git clone <url> test
$ cd test
$ git -c fetch.writeCommitGraph=true fetch origin
Computing commit graph generation numbers: 100% (12/12), done.
BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
Aborted (core dumped)
In the repo I had cloned, there were really 60 commits to scan, but
only 12 were in the list to write when calling
compute_generation_numbers(). A commit in the list expects to see a
parent, but that parent is not in the list.
A follow-up will fix the bug, but first we create a test that
demonstrates the problem. This test must be careful about an existing
commit-graph file, since GIT_TEST_COMMIT_GRAPH=1 will cause the repo we
are cloning to already have one. This then prevents the incremtnal
commit-graph write during the first 'git fetch'.
Helped-by: Jeff King <peff@peff.net> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Szeder Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Heba Waly [Thu, 24 Oct 2019 11:29:11 +0000 (11:29 +0000)]
documentation: remove empty doc files
Remove empty and redundant documentation files from the
Documentation/technical/ directory.
The empty doc files included only TODO messages with no documentation for
years. Instead an approach is being taken to keep all doc beside the code
in the relevant header files.
Having empty doc files is confusing and disappointing to anybody looking
for information, besides having the documentation in header files makes it
easier for developers to find the information they are looking for.
Some of the content which could have gone here already exists elsewhere:
- api-object-access.txt -> sha1-file.c and object.h have some details.
- api-quote.txt -> quote.h has some details.
- api-xdiff-interface.txt -> xdiff-interface.h has some details.
- api-grep.txt -> grep.h does not have enough documentation at the moment.
Signed-off-by: Heba Waly <heba.waly@gmail.com> Reviewed-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 24 Oct 2019 04:34:26 +0000 (13:34 +0900)]
Merge branch 'sg/dir-trie-fixes' into jch
Code clean-up and a bugfix in the logic used to tell worktree local
and repository global refs apart.
* sg/dir-trie-fixes:
path.c: don't call the match function without value in trie_find()
path.c: clarify two field names in 'struct common_dir'
path.c: mark 'logs/HEAD' in 'common_list' as file
path.c: clarify trie_find()'s in-code comment
Documentation: mention more worktree-specific exceptions
Junio C Hamano [Thu, 24 Oct 2019 04:34:25 +0000 (13:34 +0900)]
Merge branch 'js/advise-rebase-skip' into jch
The logic used in "git commit" to give hints and errors depending
on what operation was in progress learned to distinguish rebase and
cherry-pick better.
* js/advise-rebase-skip:
commit: give correct advice for empty commit during a rebase
sequencer: export the function to get the path of `.git/rebase-merge/`
cherry-pick: add test for `--skip` advice in `git commit`
Junio C Hamano [Thu, 24 Oct 2019 04:34:25 +0000 (13:34 +0900)]
Merge branch 'wb/midx-progress' into jch
The code to generate multi-pack index learned to show (or not to
show) progress indicators.
* wb/midx-progress:
multi-pack-index: add [--[no-]progress] option.
midx: honor the MIDX_PROGRESS flag in midx_repack
midx: honor the MIDX_PROGRESS flag in verify_midx_file
midx: add progress to expire_midx_packs
midx: add progress to write_midx_file
midx: add MIDX_PROGRESS flag
Junio C Hamano [Thu, 24 Oct 2019 04:34:25 +0000 (13:34 +0900)]
Merge branch 'jc/log-graph-simplify' into jch
The implementation of "git log --graph" got refactored and then its
output got simplified.
* jc/log-graph-simplify:
graph: fix coloring of octopus dashes
graph: flatten edges that fuse with their right neighbor
graph: smooth appearance of collapsing edges on commit lines
graph: rename `new_mapping` to `old_mapping`
graph: commit and post-merge lines for left-skewed merges
graph: tidy up display of left-skewed merges
graph: example of graph output that can be simplified
graph: extract logic for moving to GRAPH_PRE_COMMIT state
graph: remove `mapping_idx` and `graph_update_width()`
graph: reduce duplication in `graph_insert_into_new_columns()`
graph: reuse `find_new_column_by_commit()`
graph: handle line padding in `graph_next_line()`
graph: automatically track display width of graph lines
Junio C Hamano [Thu, 24 Oct 2019 04:34:25 +0000 (13:34 +0900)]
Merge branch 'en/merge-recursive-directory-rename-fixes' into jch
When all files from some subdirectory were renamed to the root
directory, the directory rename heuristics would fail to detect that
as a rename/merge of the subdirectory to the root directory, which has
been corrected.
* en/merge-recursive-directory-rename-fixes:
t604[236]: do not run setup in separate tests
merge-recursive: fix merging a subdirectory into the root directory
merge-recursive: clean up get_renamed_dir_portion()
Junio C Hamano [Thu, 24 Oct 2019 04:34:24 +0000 (13:34 +0900)]
Merge branch 'dd/notes-copy-default-dst-to-head' into jch
"git notes copy $original" ought to copy the notes attached to the
original object to HEAD, but a mistaken tightening to command line
parameter validation made earlier disabled that feature by mistake.
* dd/notes-copy-default-dst-to-head:
notes: fix minimum number of parameters to "copy" subcommand
t3301: test diagnose messages for too few/many paramters
Junio C Hamano [Thu, 24 Oct 2019 04:34:23 +0000 (13:34 +0900)]
Merge branch 'pw/post-commit-from-sequencer' into jch
"rebase -i" ceased to run post-commit hook by mistake in an earlier
update, which has been corrected.
* pw/post-commit-from-sequencer:
sequencer: run post-commit hook
move run_commit_hook() to libgit and use it there
sequencer.h fix placement of #endif
t3404: remove uneeded calls to set_fake_editor
t3404: set $EDITOR in subshell
t3404: remove unnecessary subshell
Junio C Hamano [Thu, 24 Oct 2019 04:34:23 +0000 (13:34 +0900)]
Merge branch 'dl/format-patch-cover-from-desc' into jch
The branch description ("git branch --edit-description") has been
used to fill the body of the cover letters by the format-patch
command; this has been enhanced so that the subject can also be
filled.
* dl/format-patch-cover-from-desc:
format-patch: teach --cover-from-description option
format-patch: use enum variables
format-patch: replace erroneous and condition
Junio C Hamano [Thu, 24 Oct 2019 04:34:22 +0000 (13:34 +0900)]
Merge branch 'ag/sequencer-todo-updates' into jch
Reduce unnecessary reading of state variables back from the disk
during sequener operation.
* ag/sequencer-todo-updates:
sequencer: directly call pick_commits() from complete_action()
rebase: fill `squash_onto' in get_replay_opts()
sequencer: move the code writing total_nr on the disk to a new function
sequencer: update `done_nr' when skipping commands in a todo list
sequencer: update `total_nr' when adding an item to a todo list
Ever since worktrees were introduced, the `git_path()` function _really_
needed to be called e.g. to get at the path to `logs/HEAD` (`HEAD` is
specific to the worktree, and therefore so is its reflog). However, the
wrong path is returned for `logs/HEAD.lock`.
This does not matter as long as the Git executable is doing the asking,
as the path for that `logs/HEAD.lock` file is constructed from
`git_path("logs/HEAD")` by appending the `.lock` suffix.
However, Git GUI just learned to use `--git-path` instead of appending
relative paths to what `git rev-parse --git-dir` returns (and as a
consequence not only using the correct hooks directory, but also using
the correct paths in worktrees other than the main one). While it does
not seem as if Git GUI in particular is asking for `logs/HEAD.lock`,
let's be safe rather than sorry.
Side note: Git GUI _does_ ask for `index.lock`, but that is already
resolved correctly, due to `update_common_dir()` preferring to leave
unknown paths in the (worktree-specific) git directory.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will cause a "unable to load config blob object", because the
fetch_config_from_gitmodules() invocation in cmd_fetch() will attempt to
load ".gitmodules" (which Git knows to exist because the client has the
tree of HEAD) while fetch_if_missing is set to 0.
fetch_if_missing is set to 0 too early - ".gitmodules" here should be
lazily fetched. Git must set fetch_if_missing to 0 before the fetch
because as part of the fetch, packfile negotiation happens (and we do
not want to fetch any missing objects when checking existence of
objects), but we do not need to set it so early. Move the setting of
fetch_if_missing to the earliest possible point in cmd_fetch(), right
before any fetching happens.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Wed, 23 Oct 2019 20:38:57 +0000 (20:38 +0000)]
repo-settings: read an int for index.version
Several config options were combined into a repo_settings struct in
ds/feature-macros, including a move of the "index.version" config
setting in 7211b9e (repo-settings: consolidate some config settings,
2019-08-13).
Unfortunately, that file looked like a lot of boilerplate and what is
clearly a factor of copy-paste overload, the config setting is parsed
with repo_config_ge_bool() instead of repo_config_get_int(). This means
that a setting "index.version=4" would not register correctly and would
revert to the default version of 3.
I caught this while incorporating v2.24.0-rc0 into the VFS for Git
codebase, where we really care that the index is in version 4.
This was not caught by the codebase because the version checks placed
in t1600-index.sh did not test the "basic" scenario enough. Here, we
modify the test to include these normal settings to not be overridden by
features.manyFiles or GIT_INDEX_VERSION. While the "default" version is
3, this is demoted to version 2 in do_write_index() when not necessary.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Wed, 23 Oct 2019 23:32:38 +0000 (16:32 -0700)]
apply: respect merge.conflictStyle in --3way
Before, when doing a 3-way merge, the merge.conflictStyle option was not
respected and the "merge" style was always used, even if "diff3" was
specified.
Call git_xmerge_config() at the end of git_apply_config() so that the
merge.conflictStyle config is read.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Wed, 23 Oct 2019 23:32:36 +0000 (16:32 -0700)]
t4108: demonstrate bug in apply
Currently, apply does not respect the merge.conflictStyle setting.
Demonstrate this by making the 'apply with --3way' test case generic and
extending it to show that the configuration of
merge.conflictStyle = diff3 causes a breakage.
Change print_sanitized_conflicted_diff() to also sanitize `|||||||`
conflict markers.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Wed, 23 Oct 2019 23:32:33 +0000 (16:32 -0700)]
t4108: use `test_config` instead of `git config`
Since `git config` leaves the configurations set even after the test
case completes, use `test_config` instead so that the configurations are
reset once the test case finishes.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Wed, 23 Oct 2019 23:32:31 +0000 (16:32 -0700)]
t4108: remove git command upstream of pipe
Before, the output of `git diff HEAD` would always be piped to
sanitize_conflicted_diff(). However, since the Git command was upstream
of the pipe, in case the Git command fails, the return code would be
lost. Rewrite into separate statements so that the return code is no
longer lost.
Since only the command `git diff HEAD` was being piped to
sanitize_conflicted_diff(), move the command into the function and rename
it to print_sanitized_conflicted_diff().
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Wed, 23 Oct 2019 23:32:28 +0000 (16:32 -0700)]
t4108: replace create_file with test_write_lines
Since the locally defined create_file() duplicates the functionality of
the test_write_lines() helper function, remove create_file() and replace
all instances with test_write_lines(). While we're at it, move
redirection operators to the end of the command which is the more
conventional place to put it.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER Gábor [Thu, 24 Oct 2019 00:20:40 +0000 (02:20 +0200)]
ci: fix GCC install in the Travis CI GCC OSX job
A few days ago Travis CI updated their existing OSX images, including
the Homebrew database in the xcode10.1 OSX image that we use. Since
then installing dependencies in the 'osx-gcc' job fails when it tries
to link gcc@8:
+ brew link gcc@8
Error: No such keg: /usr/local/Cellar/gcc@8
GCC8 is still installed but not linked to '/usr/local' in the updated
image, as it was before this update, but now we have to link it by
running 'brew link gcc'. So let's do that then, and fall back to
linking gcc@8 if it doesn't, just to be sure.
Our builds on Azure Pipelines are unaffected by this issue. The OSX
image over there doesn't contain the gcc@8 package, so we have to
'brew install' it, which already takes care of linking it to
'/usr/local'. After that the 'brew link gcc' command added by this
patch fails, but the ||-chained fallback 'brew link gcc@8' command
succeeds with an "already linked" warning.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Heba Waly [Wed, 23 Oct 2019 05:30:52 +0000 (05:30 +0000)]
config: move documentation to config.h
Move the documentation from Documentation/technical/api-config.txt into
config.h as it's easier for the developers to find the usage information
beside the code instead of looking for it in another doc file, also
documentation/technical/api-config.txt is removed because the information
it has is now redundant and it'll be hard to keep it up to date and
syncronized with the documentation in config.h
Signed-off-by: Heba Waly <heba.waly@gmail.com> Reviewed-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 23 Oct 2019 05:43:11 +0000 (14:43 +0900)]
Merge branch 'cb/pcre2-chartables-leakfix'
Leakfix.
* cb/pcre2-chartables-leakfix:
grep: avoid leak of chartables in PCRE2
grep: make PCRE2 aware of custom allocator
grep: make PCRE1 aware of custom allocator
SZEDER Gábor [Mon, 21 Oct 2019 16:00:43 +0000 (18:00 +0200)]
path.c: don't call the match function without value in trie_find()
'logs/refs' is not a working tree-specific path, but since commit b9317d55a3 (Make sure refs/rewritten/ is per-worktree, 2019-03-07)
'git rev-parse --git-path' has been returning a bogus path if a
trailing '/' is present:
We use a trie data structure to efficiently decide whether a path
belongs to the common dir or is working tree-specific. As it happens b9317d55a3 triggered a bug that is as old as the trie implementation
itself, added in 4e09cf2acf (path: optimize common dir checking,
2015-08-31).
- According to the comment describing trie_find(), it should only
call the given match function 'fn' for a "/-or-\0-terminated
prefix of the key for which the trie contains a value". This is
not true: there are three places where trie_find() calls the match
function, but one of them is missing the check for value's
existence.
- b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten'
and 'logs/refs/worktree', next to the already existing
'logs/refs/bisect'. This resulted in a trie node with the path
'logs/refs/', which didn't exist before, and which doesn't have a
value attached. A query for 'logs/refs/' finds this node and then
hits that one callsite of the match function which doesn't check
for the value's existence, and thus invokes the match function
with NULL as value.
- When the match function check_common() is invoked with a NULL
value, it returns 0, which indicates that the queried path doesn't
belong to the common directory, ultimately resulting the bogus
path shown above.
Add the missing condition to trie_find() so it will never invoke the
match function with a non-existing value. check_common() will then no
longer have to check that it got a non-NULL value, so remove that
condition.
I believe that there are no other paths that could cause similar bogus
output. AFAICT the only other key resulting in the match function
being called with a NULL value is 'co' (because of the keys 'common'
and 'config'). However, as they are not in a directory that belongs
to the common directory the resulting working tree-specific path is
expected.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER Gábor [Mon, 21 Oct 2019 16:00:42 +0000 (18:00 +0200)]
path.c: clarify two field names in 'struct common_dir'
An array of 'struct common_dir' instances is used to specify whether
various paths in $GIT_DIR are specific to a worktree, or are common,
i.e. belong to main worktree. The names of two fields in this
struct are somewhat confusing or ambigious:
- The path is recorded in the struct's 'dirname' field, even though
several entries are regular files e.g. 'gc.pid', 'packed-refs',
etc.
Rename this field to 'path' to reduce confusion.
- The field 'exclude' tells whether the path is excluded... from
where? Excluded from the common dir or from the worktree? It
means the former, but it's ambigious.
Rename this field to 'is_common' to make it unambigious what it
means. This, however, means the exact opposite of what 'exclude'
meant, so we have to negate the field's value in all entries as
well.
The diff is best viewed with '--color-words'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER Gábor [Mon, 21 Oct 2019 16:00:41 +0000 (18:00 +0200)]
path.c: mark 'logs/HEAD' in 'common_list' as file
'logs/HEAD', i.e. HEAD's reflog, is a file, but its entry in
'common_list' has the 'is_dir' bit set.
Unset that bit to make it consistent with what 'logs/HEAD' is supposed
to be.
This doesn't make a difference in behavior: check_common() is the only
function that looks at the 'is_dir' bit, and that function either
returns 0, or '!exclude', which for 'logs/HEAD' results in 0 as well.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER Gábor [Mon, 21 Oct 2019 16:00:40 +0000 (18:00 +0200)]
path.c: clarify trie_find()'s in-code comment
A fairly long comment describes trie_find()'s behavior and shows
examples, but it's slightly incomplete/inaccurate. Update this
comment to specify how trie_find() handles a negative return value
from the given match function.
Furthermore, update the list of examples to include not only two but
three levels of path components. This makes the examples slightly
more complicated, but it can illustrate the behavior in more corner
cases.
Finally, basically everything refers to the data stored for a key as
"value", with two confusing exceptions:
- The type definition of the match function calls its corresponding
parameter 'data'.
Rename that parameter to 'value'. (check_common(), the only
function of this type already calls it 'value').
- The table of examples above trie_find() has a "val from node"
column, which has nothing to do with the value stored in the trie:
it's a "prefix of the key for which the trie contains a value"
that led to that node.
Rename that column header to "prefix to node".
Note that neither the original nor the updated description and
examples correspond 100% to the current implementation, because the
implementation is a bit buggy, but the comment describes the desired
behavior. The bug will be fixed in the last patch of this series.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER Gábor [Mon, 21 Oct 2019 16:00:39 +0000 (18:00 +0200)]
Documentation: mention more worktree-specific exceptions
If a directory in $GIT_DIR is overridden when $GIT_COMMON_DIR is set,
then usually all paths within that directory are overridden as well.
There are a couple of exceptions, though, and two of them, namely
'refs/rewritten' and 'logs/HEAD' are not mentioned in
'gitrepository-layout'. Document them as well.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:26 +0000 (13:56 +0000)]
sparse-checkout: cone mode should not interact with .gitignore
During the development of the sparse-checkout "cone mode" feature,
an incorrect placement of the initializer for "use_cone_patterns = 1"
caused warnings to show up when a .gitignore file was present with
non-cone-mode patterns. This was fixed in the original commit
introducing the cone mode, but now we should add a test to avoid
hitting this problem again in the future.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:25 +0000 (13:56 +0000)]
sparse-checkout: write using lockfile
If two 'git sparse-checkout set' subcommands are launched at the
same time, the behavior can be unexpected as they compete to write
the sparse-checkout file and update the working directory.
Take a lockfile around the writes to the sparse-checkout file. In
addition, acquire this lock around the working directory update
to avoid two commands updating the working directory in different
ways.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:24 +0000 (13:56 +0000)]
sparse-checkout: update working directory in-process
The sparse-checkout builtin used 'git read-tree -mu HEAD' to update the
skip-worktree bits in the index and to update the working directory.
This extra process is overly complex, and prone to failure. It also
requires that we write our changes to the sparse-checkout file before
trying to update the index.
Remove this extra process call by creating a direct call to
unpack_trees() in the same way 'git read-tree -mu HEAD' does. In
addition, provide an in-memory list of patterns so we can avoid
reading from the sparse-checkout file. This allows us to test a
proposed change to the file before writing to it.
An earlier version of this patch included a bug when the 'set' command
failed due to the "Sparse checkout leaves no entry on working directory"
error. It would not rollback the index.lock file, so the replay of the
old sparse-checkout specification would fail. A test in t1091 now
covers that scenario.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:23 +0000 (13:56 +0000)]
sparse-checkout: sanitize for nested folders
If a user provides folders A/ and A/B/ for inclusion in a cone-mode
sparse-checkout file, the parsing logic will notice that A/ appears
both as a "parent" type pattern and as a "recursive" type pattern.
This is unexpected and hence will complain via a warning and revert
to the old logic for checking sparse-checkout patterns.
Prevent this from happening accidentally by sanitizing the folders
for this type of inclusion in the 'git sparse-checkout' builtin.
This happens in two ways:
1. Do not include any parent patterns that also appear as recursive
patterns.
2. Do not include any recursive patterns deeper than other recursive
patterns.
In order to minimize duplicate code for scanning parents, create
hashmap_contains_parent() method. It takes a strbuf buffer to
avoid reallocating a buffer when calling in a tight loop.
Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:22 +0000 (13:56 +0000)]
read-tree: show progress by default
The read-tree builtin has a --verbose option that signals to show
progress and other data while updating the index. Update this to
be on by default when stderr is a terminal window.
This will help tools like 'git sparse-checkout' to automatically
benefit from progress indicators when a user runs these commands.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:21 +0000 (13:56 +0000)]
unpack-trees: add progress to clear_ce_flags()
When a large repository has many sparse-checkout patterns, the
process for updating the skip-worktree bits can take long enough
that a user gets confused why nothing is happening. Update the
clear_ce_flags() method to write progress.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:20 +0000 (13:56 +0000)]
unpack-trees: hash less in cone mode
The sparse-checkout feature in "cone mode" can use the fact that
the recursive patterns are "connected" to the root via parent
patterns to decide if a directory is entirely contained in the
sparse-checkout or entirely removed.
In these cases, we can skip hashing the paths within those
directories and simply set the skipworktree bit to the correct
value.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:19 +0000 (13:56 +0000)]
sparse-checkout: init and set in cone mode
To make the cone pattern set easy to use, update the behavior of
'git sparse-checkout [init|set]'.
Add '--cone' flag to 'git sparse-checkout init' to set the config
option 'core.sparseCheckoutCone=true'.
When running 'git sparse-checkout set' in cone mode, a user only
needs to supply a list of recursive folder matches. Git will
automatically add the necessary parent matches for the leading
directories.
When testing 'git sparse-checkout set' in cone mode, check the
error stream to ensure we do not see any errors. Specifically,
we want to avoid the warning that the patterns do not match
the cone-mode patterns.
Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:18 +0000 (13:56 +0000)]
sparse-checkout: use hashmaps for cone patterns
The parent and recursive patterns allowed by the "cone mode"
option in sparse-checkout are restrictive enough that we
can avoid using the regex parsing. Everything is based on
prefix matches, so we can use hashsets to store the prefixes
from the sparse-checkout file. When checking a path, we can
strip path entries from the path and check the hashset for
an exact match.
As a test, I created a cone-mode sparse-checkout file for the
Linux repository that actually includes every file. This was
constructed by taking every folder in the Linux repo and creating
the pattern pairs here:
/$folder/
!/$folder/*/
This resulted in a sparse-checkout file sith 8,296 patterns.
Running 'git read-tree -mu HEAD' on this file had the following
performance:
core.sparseCheckout=false: 0.21 s (0.00 s)
core.sparseCheckout=true: 3.75 s (3.50 s)
core.sparseCheckoutCone=true: 0.23 s (0.01 s)
The times in parentheses above correspond to the time spent
in the first clear_ce_flags() call, according to the trace2
performance traces.
While this example is contrived, it demonstrates how these
patterns can slow the sparse-checkout feature.
Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:17 +0000 (13:56 +0000)]
sparse-checkout: add 'cone' mode
The sparse-checkout feature can have quadratic performance as
the number of patterns and number of entries in the index grow.
If there are 1,000 patterns and 1,000,000 entries, this time can
be very significant.
Create a new Boolean config option, core.sparseCheckoutCone, to
indicate that we expect the sparse-checkout file to contain a
more limited set of patterns. This is a separate config setting
from core.sparseCheckout to avoid breaking older clients by
introducing a tri-state option.
The config option does nothing right now, but will be expanded
upon in a later commit.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Mon, 21 Oct 2019 13:56:16 +0000 (13:56 +0000)]
trace2: add region in clear_ce_flags
When Git updates the working directory with the sparse-checkout
feature enabled, the unpack_trees() method calls clear_ce_flags()
to update the skip-wortree bits on the cache entries. This
check can be expensive, depending on the patterns used.
Add trace2 regions around the method, including some flag
information, so we can get granular performance data during
experiments. This data will be used to measure improvements
to the pattern-matching algorithms for sparse-checkout.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:15 +0000 (13:56 +0000)]
sparse-checkout: create 'disable' subcommand
The instructions for disabling a sparse-checkout to a full
working directory are complicated and non-intuitive. Add a
subcommand, 'git sparse-checkout disable', to perform those
steps for the user.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:14 +0000 (13:56 +0000)]
sparse-checkout: add '--stdin' option to set subcommand
The 'git sparse-checkout set' subcommand takes a list of patterns
and places them in the sparse-checkout file. Then, it updates the
working directory to match those patterns. For a large list of
patterns, the command-line call can get very cumbersome.
Add a '--stdin' option to instead read patterns over standard in.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 21 Oct 2019 13:56:13 +0000 (13:56 +0000)]
sparse-checkout: 'set' subcommand
The 'git sparse-checkout set' subcommand takes a list of patterns
as arguments and writes them to the sparse-checkout file. Then, it
updates the working directory using 'git read-tree -mu HEAD'.
The 'set' subcommand will replace the entire contents of the
sparse-checkout file. The write_patterns_and_update() method is
extracted from cmd_sparse_checkout() to make it easier to implement
'add' and/or 'remove' subcommands in the future.
If the core.sparseCheckout config setting is disabled, then enable
the config setting in the worktree config. If we set the config
this way and the sparse-checkout fails, then re-disable the config
setting.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>