The first line shows that we are reusing a large chunk of
objects, and then we further count any objects not included
in the reused portion with an actual traversal.
These are all implementation details that the user does not
need to care about. Instead, we can show the reused objects
in the normal "counting..." progress meter (which will
simply go much faster than normal), and then continue to add
to it as we traverse.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Sat, 15 Mar 2014 02:26:21 +0000 (22:26 -0400)]
pack-objects: show progress for reused packfiles
When the "--all-progress" option is in effect, pack-objects
shows a progress report for the "writing" phase. If the
repository has bitmaps and we are reusing a packfile, the
user sees no progress update until the whole packfile is
sent. Since this is typically the bulk of what is being
written, it can look like git hangs during this phase, even
though the transfer is proceeding.
This generally only happens with "git push" from a
repository with bitmaps. We do not use "--all-progress" for
fetch (since the result is going to index-pack on the
client, which takes care of progress reporting). And for
regular repacks to disk, we do not reuse packfiles.
We already have the progress meter setup during
write_reused_pack; we just need to call display_progress
whiel we are writing out the pack. The progress meter is
attached to our output descriptor, so it automatically
handles the throughput measurements.
However, we need to update the object count as we go, since
that is what feeds the percentage we show. We aren't
actually parsing the packfile as we send it, so we have no
idea how many objects we have sent; we only know that at the
end of N bytes, we will have sent M objects. So we cheat a
little and assume each object is M/N bytes (i.e., the mean
of the objects we are sending). While this isn't strictly
true, it actually produces a more pleasing progress meter
for the user, as it moves smoothly and predictably (and
nobody really cares about the object count; they care about
the percentage, and the object count is a proxy for that).
One alternative would be to actually show two progress
meters: one for the reused pack, and one for the rest of the
objects. That would more closely reflect the data we have
(the first would be measured in bytes, and the second
measured in objects). But it would also be more complex and
annoying to the user; rather than seeing one progress meter
counting up to 100%, they would finish one meter, then start
another one at zero.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 14 Mar 2014 21:27:21 +0000 (14:27 -0700)]
Merge branch 'sr/add--interactive-term-readkey'
* sr/add--interactive-term-readkey:
git-add--interactive: warn if module for interactive.singlekey is missing
git-config: document interactive.singlekey requires Term::ReadKey
Junio C Hamano [Fri, 14 Mar 2014 21:27:06 +0000 (14:27 -0700)]
Merge branch 'mh/replace-refs-variable-rename'
* mh/replace-refs-variable-rename:
Document some functions defined in object.c
Add docstrings for lookup_replace_object() and do_lookup_replace_object()
rename read_replace_refs to check_replace_refs
Junio C Hamano [Fri, 14 Mar 2014 21:26:52 +0000 (14:26 -0700)]
Merge branch 'da/difftool-git-files'
"git difftool" misbehaved when the repository is bound to the
working tree with the ".git file" mechanism, where a textual
file ".git" tells us where it is.
* da/difftool-git-files:
t7800: add a difftool test for .git-files
difftool: support repositories with .git-files
Junio C Hamano [Fri, 14 Mar 2014 21:26:50 +0000 (14:26 -0700)]
Merge branch 'tg/index-v4-format'
* tg/index-v4-format:
read-cache: add index.version config variable
test-lib: allow setting the index format version
introduce GIT_INDEX_VERSION environment variable
Junio C Hamano [Fri, 14 Mar 2014 21:26:29 +0000 (14:26 -0700)]
Merge branch 'mh/object-code-cleanup'
* mh/object-code-cleanup:
sha1_file.c: document a bunch of functions defined in the file
sha1_file_name(): declare to return a const string
find_pack_entry(): document last_found_pack
replace_object: use struct members instead of an array
"git push" did not pay attention to branch.*.pushremote if it is
defined earlier than remote.pushdefault; the order of these two
variables in the configuration file should not matter, but it did by
mistake.
* jk/remote-pushremote-config-reading:
remote: handle pushremote config in any order
Junio C Hamano [Fri, 14 Mar 2014 21:25:44 +0000 (14:25 -0700)]
Merge branch 'jk/commit-dates-parsing-fix'
Tighten codepaths that parse timestamps in commit objects.
* jk/commit-dates-parsing-fix:
show_ident_date: fix tz range check
log: do not segfault on gmtime errors
log: handle integer overflow in timestamps
date: check date overflow against time_t
fsck: report integer overflow in author timestamps
t4212: test bogus timestamps with git-log
"git diff --external-diff" incorrectly fed the submodule directory
in the working tree to the external diff driver when it knew it is
the same as one of the versions being compared.
* tr/diff-submodule-no-reuse-worktree:
diff: do not reuse_worktree_file for submodules
Junio C Hamano [Fri, 14 Mar 2014 21:25:02 +0000 (14:25 -0700)]
Merge branch 'nd/reset-setup-worktree'
"git reset" needs to refresh the index when working in a working
tree (it can also be used to match the index to the HEAD in an
otherwise bare repository), but it failed to set up the working
tree properly, causing GIT_WORK_TREE to be ignored.
* nd/reset-setup-worktree:
reset: optionally setup worktree and refresh index on --mixed
Junio C Hamano [Fri, 14 Mar 2014 21:24:40 +0000 (14:24 -0700)]
Merge branch 'ks/config-file-stdin'
"git config" learned to read from the standard input when "-" is
given as the value to its "--file" parameter (attempting an
operation to update the configuration in the standard input of
course is rejected).
* ks/config-file-stdin:
config: teach "git config --file -" to read from the standard input
config: change git_config_with_options() interface
builtin/config.c: rename check_blob_write() -> check_write()
config: disallow relative include paths from blobs
Junio C Hamano [Fri, 14 Mar 2014 21:24:37 +0000 (14:24 -0700)]
Merge branch 'jk/janitorial-fixes'
* jk/janitorial-fixes:
open_istream(): do not dereference NULL in the error case
builtin/mv: don't use memory after free
utf8: use correct type for values in interval table
utf8: fix iconv error detection
notes-utils: handle boolean notes.rewritemode correctly
Junio C Hamano [Fri, 14 Mar 2014 21:24:18 +0000 (14:24 -0700)]
Merge branch 'jk/http-no-curl-easy'
Uses of curl's "multi" interface and "easy" interface do not mix
well when we attempt to reuse outgoing connections. Teach the RPC
over http code, used in the smart HTTP transport, not to use the
"easy" interface.
* jk/http-no-curl-easy:
http: never use curl_easy_perform
Junio C Hamano [Fri, 14 Mar 2014 21:23:37 +0000 (14:23 -0700)]
Merge branch 'nd/gitignore-trailing-whitespace'
Trailing whitespaces in .gitignore files, unless they are quoted for
fnmatch(3), e.g. "path\ ", are warned and ignored.
Strictly speaking, this is a backward incompatible change, but very
unlikely to bite any sane user and adjusting should be obvious and
easy.
* nd/gitignore-trailing-whitespace:
t0008: skip trailing space test on Windows
dir: ignore trailing spaces in exclude patterns
dir: warn about trailing spaces in exclude patterns
Junio C Hamano [Fri, 14 Mar 2014 21:05:59 +0000 (14:05 -0700)]
Merge branch 'jc/check-attr-honor-working-tree'
"git check-attr" when (trying to) work on a repository with a
working tree did not work well when the working tree was specified
via --work-tree (and obviously with --git-dir).
The command also works in a bare repository but it reads from the
(possibly stale, irrelevant and/or nonexistent) index, which may
need to be fixed to read from HEAD, but that is a completely
separate issue. As a related tangent to this separate issue, we
may want to also fix "check-ignore", which refuses to work in a
bare repository, to also operate in a bare one.
* jc/check-attr-honor-working-tree:
check-attr: move to the top of working tree when in non-bare repository
t0003: do not chdir the whole test process
Johannes Sixt [Tue, 11 Mar 2014 07:46:40 +0000 (08:46 +0100)]
t0008: skip trailing space test on Windows
The Windows API does not preserve file names with trailing spaces (and
dots), but rather strips them. Our tools (MSYS bash, git) base the POSIX
emulation on the Windows API. As a consequence, it is impossible for bash
on Windows to allocate a file whose name has trailing spaces, and for git
to stat such a file. Both operate on a file whose name has the spaces
stripped. Skip the test that needs such a file name.
Note that we do not use (another incarnation of) prerequisite FUNNYNAMES.
The reason is that FUNNYNAMES is intended to represent a property of the
file system. But the inability to have trailing spaces in a file name is
a property of the Windows API. The file system (NTFS) does not have this
limitation.
Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 7 Mar 2014 17:15:01 +0000 (12:15 -0500)]
show_ident_date: fix tz range check
Commit 1dca155fe3fa (log: handle integer overflow in
timestamps, 2014-02-24) tried to catch integer overflow
coming from strtol() on the timezone field by comparing against
LONG_MIN/LONG_MAX. However, the intermediate "tz" variable
is an "int", which means it can never be LONG_MAX on LP64
systems; we would truncate the output from strtol before the
comparison.
Clang's -Wtautological-constant-out-of-range-compare notices
this and rightly complains.
Let's instead store the result of strtol in a long, and then
compare it against INT_MIN/INT_MAX. This will catch overflow
from strtol, and also overflow when we pass the result as an
int to show_date.
Reported-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 5 Mar 2014 23:06:26 +0000 (15:06 -0800)]
Merge branch 'ks/combine-diff'
Teach combine-diff to honour the path-output-order imposed by
diffcore-order, and optimize how matching paths are found in
the N-way diffs made with parents.
* ks/combine-diff:
tests: add checking that combine-diff emits only correct paths
combine-diff: simplify intersect_paths() further
combine-diff: combine_diff_path.len is not needed anymore
combine-diff: optimize combine_diff_path sets intersection
diff test: add tests for combine-diff with orderfile
diffcore-order: export generic ordering interface
Sandy Carter [Mon, 3 Mar 2014 14:55:53 +0000 (09:55 -0500)]
i18n: proposed command missing leading dash
Add missing leading dash to proposed commands in french output when
using the command:
git branch --set-upstream remotename/branchname
and when upstream is gone
Signed-off-by: Sandy Carter <sandy.carter@savoirfairelinux.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tanay Abhra [Tue, 4 Mar 2014 21:06:30 +0000 (13:06 -0800)]
commit.c: use skip_prefix() instead of starts_with()
In record_author_date() & parse_gpg_output(), the callers of
starts_with() not just want to know if the string starts with the
prefix, but also can benefit from knowing the string that follows
the prefix.
By using skip_prefix(), we can do both at the same time.
Helped-by: Max Horn <max@quendi.de> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jens Lehmann [Fri, 28 Feb 2014 22:41:11 +0000 (22:41 +0000)]
submodule update: consistently document the '--checkout' option
Commit 322bb6e12f (add update 'none' flag to disable update of submodule
by default) added the '--checkout' option to "git submodule update" but
forgot to explicitly document it in synopsis, usage string and man page
(It is only mentioned implicitly in the man page). In 23d25e48 (submodule:
explicit local branch creation in module_clone) the synopsis of the man
page was updated, but the "OPTIONS" section of the man page and the usage
string of the git-submodule script still do not mention the '--checkout'
option.
Fix that by documenting this option in usage string and the "OPTIONS"
section of man page too. While at it group the update-mode options into
a single set in the usage string.
Reported-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ralf Thielow [Fri, 28 Feb 2014 19:27:29 +0000 (20:27 +0100)]
help.c: rename function "pretty_print_string_list"
The part "string_list" of the name of function
"pretty_print_string_list" is just an implementation
detail. The function pretty-prints command names so
rename it to "pretty_print_cmdnames".
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Scott J. Goldman [Fri, 28 Feb 2014 10:04:19 +0000 (05:04 -0500)]
add uploadarchive.allowUnreachable option
In commit ee27ca4, we started restricting remote git-archive
invocations to only accessing reachable commits. This
matches what upload-pack allows, but does restrict some
useful cases (e.g., HEAD:foo). We loosened this in 0f544ee,
which allows `foo:bar` as long as `foo` is a ref tip.
However, that still doesn't allow many useful things, like:
1. Commits accessible from a ref, like `foo^:bar`, which
are reachable
2. Arbitrary sha1s, even if they are reachable.
We can do a full object-reachability check for these cases,
but it can be quite expensive if the client has sent us the
sha1 of a tree; we have to visit every sub-tree of every
commit in the worst case.
Let's instead give site admins an escape hatch, in case they
prefer the more liberal behavior. For many sites, the full
object database is public anyway (e.g., if you allow dumb
walker access), or the site admin may simply decide the
security/convenience tradeoff is not worth it.
This patch adds a new config option to disable the
restrictions added in ee27ca4. It defaults to off, meaning
there is no change in behavior by default.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 28 Feb 2014 10:01:29 +0000 (05:01 -0500)]
docs: clarify remote restrictions for git-upload-archive
Commits ee27ca4 and 0f544ee introduced rules by which
git-upload-archive would restrict clients from accessing
unreachable objects. However, we never documented those
rules anywhere, nor their reason for being. Let's do so now.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 27 Feb 2014 22:01:50 +0000 (14:01 -0800)]
Merge branch 'nd/http-fetch-shallow-fix'
Attempting to deepen a shallow repository by fetching over smart
HTTP transport failed in the protocol exchange, when no-done
extension was used. The fetching side waited for the list of
shallow boundary commits after the sending end stopped talking to
it.
* nd/http-fetch-shallow-fix:
t5537: move http tests out to t5539
fetch-pack: fix deepen shallow over smart http with no-done cap
protocol-capabilities.txt: document no-done
protocol-capabilities.txt: refer multi_ack_detailed back to pack-protocol.txt
pack-protocol.txt: clarify 'obj-id' in the last ACK after 'done'
test: rename http fetch and push test files
Junio C Hamano [Thu, 27 Feb 2014 22:01:48 +0000 (14:01 -0800)]
Merge branch 'jk/pack-bitmap'
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
* jk/pack-bitmap: (26 commits)
ewah: unconditionally ntohll ewah data
ewah: support platforms that require aligned reads
read-cache: use get_be32 instead of hand-rolled ntoh_l
block-sha1: factor out get_be and put_be wrappers
do not discard revindex when re-preparing packfiles
pack-bitmap: implement optional name_hash cache
t/perf: add tests for pack bitmaps
t: add basic bitmap functionality tests
count-objects: recognize .bitmap in garbage-checking
repack: consider bitmaps when performing repacks
repack: handle optional files created by pack-objects
repack: turn exts array into array-of-struct
repack: stop using magic number for ARRAY_SIZE(exts)
pack-objects: implement bitmap writing
rev-list: add bitmap mode to speed up object lists
pack-objects: use bitmaps when packing objects
pack-objects: split add_object_entry
pack-bitmap: add support for bitmap indexes
documentation: add documentation for the bitmap format
ewah: compressed bitmap implementation
...
Junio C Hamano [Thu, 27 Feb 2014 22:01:46 +0000 (14:01 -0800)]
Merge branch 'dk/blame-janitorial'
Code clean-up.
* dk/blame-janitorial:
builtin/blame.c::find_copy_in_blob: no need to scan for region end
blame.c: prepare_lines should not call xrealloc for every line
builtin/blame.c::prepare_lines: fix allocation size of sb->lineno
builtin/blame.c: eliminate same_suspect()
builtin/blame.c: struct blame_entry does not need a prev link
Junio C Hamano [Thu, 27 Feb 2014 22:01:44 +0000 (14:01 -0800)]
Merge branch 'bc/gpg-sign-everywhere'
Teach "--gpg-sign" option to many commands that create commits.
* bc/gpg-sign-everywhere:
pull: add the --gpg-sign option.
rebase: add the --gpg-sign option
rebase: parse options in stuck-long mode
rebase: don't try to match -M option
rebase: remove useless arguments check
am: add the --gpg-sign option
am: parse options in stuck-long mode
git-sh-setup.sh: add variable to use the stuck-long mode
cherry-pick, revert: add the --gpg-sign option
Junio C Hamano [Thu, 27 Feb 2014 22:01:43 +0000 (14:01 -0800)]
Merge branch 'al/docs'
A handful of documentation updates, all trivially harmless.
* al/docs:
docs/git-blame: explain more clearly the example pickaxe use
docs/git-clone: clarify use of --no-hardlinks option
docs/git-remote: capitalize first word of initial blurb
docs/merge-strategies: remove hyphen from mis-merges
Junio C Hamano [Thu, 27 Feb 2014 22:01:38 +0000 (14:01 -0800)]
Merge branch 'ks/tree-diff-walk'
* ks/tree-diff-walk:
tree-walk: finally switch over tree descriptors to contain a pre-parsed entry
revision: convert to using diff_tree_sha1()
line-log: convert to using diff_tree_sha1()
tree-diff: convert diff_root_tree_sha1() to just call diff_tree_sha1 with old=NULL
tree-diff: allow diff_tree_sha1 to accept NULL sha1
Junio C Hamano [Thu, 27 Feb 2014 22:01:37 +0000 (14:01 -0800)]
Merge branch 'mw/symlinks'
All subcommands that take pathspecs mishandled an in-tree symbolic
link when given it as a full path from the root (which arguably is
a sick way to use pathspecs). "git ls-files -s $(pwd)/RelNotes" in
our tree is an easy reproduction recipe.
* mw/symlinks:
setup: don't dereference in-tree symlinks for absolute paths
setup: add abspath_part_inside_repo() function
t0060: add tests for prefix_path when path begins with work tree
t0060: add test for prefix_path when path == work tree
t0060: add test for prefix_path on symlinks via absolute paths
t3004: add test for ls-files on symlinks via absolute paths
Junio C Hamano [Thu, 27 Feb 2014 22:01:31 +0000 (14:01 -0800)]
Merge branch 'wk/submodule-on-branch'
Make sure 'submodule update' modes that do not detach HEADs can
be used more pleasantly by checking out a concrete branch when
cloning them to prime the well.
* wk/submodule-on-branch:
Documentation: describe 'submodule update --remote' use case
submodule: explicit local branch creation in module_clone
submodule: document module_clone arguments in comments
submodule: make 'checkout' update_module mode more explicit
Junio C Hamano [Thu, 27 Feb 2014 22:01:25 +0000 (14:01 -0800)]
Merge branch 'jk/config-path-include-fix'
include.path variable (or any variable that expects a path that can
use ~username expansion) in the configuration file is not a
boolean, but the code failed to check it.
* jk/config-path-include-fix:
handle_path_include: don't look at NULL value
expand_user_path: do not look at NULL path
Allow "git cmd path/", when the 'path' is where a submodule is
bound to the top-level working tree, to match 'path', despite the
extra and unnecessary trailing slash.
* nd/submodule-pathspec-ending-with-slash:
clean: use cache_name_is_other()
clean: replace match_pathspec() with dir_path_match()
pathspec: pass directory indicator to match_pathspec_item()
match_pathspec: match pathspec "foo/" against directory "foo"
dir.c: prepare match_pathspec_item for taking more flags
pathspec: rename match_pathspec_depth() to match_pathspec()
pathspec: convert some match_pathspec_depth() to dir_path_match()
pathspec: convert some match_pathspec_depth() to ce_path_match()
Allow "merge-recursive" to work in an empty (temporary) working
tree again when there are renames involved, correcting an old
regression in 1.7.7 era.
* bk/refresh-missing-ok-in-merge-recursive:
merge-recursive.c: tolerate missing files while refreshing index
read-cache.c: extend make_cache_entry refresh flag with options
read-cache.c: refactor --ignore-missing implementation
t3030-merge-recursive: test known breakage with empty work tree
Junio C Hamano [Thu, 27 Feb 2014 22:01:09 +0000 (14:01 -0800)]
Merge branch 'kb/fast-hashmap'
Improvements to our hash table to get it to meet the needs of the
msysgit fscache project, with some nice performance improvements.
* kb/fast-hashmap:
name-hash: retire unused index_name_exists()
hashmap.h: use 'unsigned int' for hash-codes everywhere
test-hashmap.c: drop unnecessary #includes
.gitignore: test-hashmap is a generated file
read-cache.c: fix memory leaks caused by removed cache entries
builtin/update-index.c: cleanup update_one
fix 'git update-index --verbose --again' output
remove old hash.[ch] implementation
name-hash.c: remove cache entries instead of marking them CE_UNHASHED
name-hash.c: use new hash map implementation for cache entries
name-hash.c: remove unreferenced directory entries
name-hash.c: use new hash map implementation for directories
diffcore-rename.c: use new hash map implementation
diffcore-rename.c: simplify finding exact renames
diffcore-rename.c: move code around to prepare for the next patch
buitin/describe.c: use new hash map implementation
add a hashtable implementation that supports O(1) removal
submodule: don't access the .gitmodules cache entry after removing it
Junio C Hamano [Thu, 27 Feb 2014 22:01:03 +0000 (14:01 -0800)]
Merge branch 'nv/commit-gpgsign-config'
Introduce commit.gpgsign configuration variable to force every
commit to be GPG signed. The variable cannot be overriden from the
command line of some of the commands that create commits except for
"git commit" and "git commit-tree", but I am not convinced that it
is a good idea to sprinkle support for --no-gpg-sign everywhere,
which in turn means that this configuration variable may not be
such a good idea.
* nv/commit-gpgsign-config:
test the commit.gpgsign config option
commit-tree: add and document --no-gpg-sign
commit-tree: add the commit.gpgsign option to sign all commits
Eric Sunshine [Thu, 2 Jan 2014 21:57:12 +0000 (16:57 -0500)]
name-hash: retire unused index_name_exists()
db5360f3f496 (name-hash: refactor polymorphic index_name_exists();
2013-09-17) split index_name_exists() into index_file_exists() and
index_dir_exists() but retained index_name_exists() as a thin wrapper
to avoid disturbing possible in-flight topics. Since this change
landed in 'master' some time ago and there are no in-flight topics
referencing index_name_exists(), retire it.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 13 Dec 2013 23:40:35 +0000 (15:40 -0800)]
commit-tree: add and document --no-gpg-sign
Document how to override commit.gpgsign configuration that is set to
true per "git commit" invocation (parse-options machinery lets us
say "--no-gpg-sign" to do so).
"git commit-tree" does not use parse-options, so manually add the
corresponding option for now.
Nicolas Vigier [Mon, 4 Nov 2013 23:14:41 +0000 (00:14 +0100)]
commit-tree: add the commit.gpgsign option to sign all commits
If you want to GPG sign all your commits, you have to add the -S option
all the time. The commit.gpgsign config option allows to sign all
commits automatically.
Signed-off-by: Nicolas Vigier <boklm@mars-attacks.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When QUICK is set (i.e. with --quiet) we try to do as little work as
possible, stopping after seeing the first change. stat-dirty is
considered a "change" but it may turn out not, if no actual content is
changed. The actual content test is performed too late in the process
and the shortcut may be taken prematurely, leading to incorrect return
code.
Assume we do "git diff --quiet". If we have a stat-dirty file "a" and
a really dirty file "b". We break the loop in run_diff_files() and
stop after "a" because we have got a "change". Later in
diffcore_skip_stat_unmatch() we find out "a" is actually not
changed. But there's nothing else in the diff queue, we incorrectly
declare "no change", ignoring the fact that "b" is changed.
This also happens to "git diff --quiet HEAD" when it hits
diff_can_quit_early() in oneway_diff().
This patch does the content test earlier in order to keep going if "a"
is unchanged. The test result is cached so that when
diffcore_skip_stat_unmatch() is done in the end, we spend no cycles on
re-testing "a".
Reported-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Kirill Smelkov [Mon, 3 Feb 2014 09:08:49 +0000 (13:08 +0400)]
tests: add checking that combine-diff emits only correct paths
where "correct paths" stands for paths that are different to all
parents.
Up until now, we were testing combined diff only on one file, or on
several files which were all different (t4038-diff-combined.sh).
As recent thinko in "simplify intersect_paths() further" showed, and
also, since we are going to rework code for finding paths different to
all parents, lets write at least basic tests.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 28 Jan 2014 21:55:59 +0000 (13:55 -0800)]
combine-diff: simplify intersect_paths() further
Linus once said:
I actually wish more people understood the really core low-level
kind of coding. Not big, complex stuff like the lockless name
lookup, but simply good use of pointers-to-pointers etc. For
example, I've seen too many people who delete a singly-linked
list entry by keeping track of the "prev" entry, and then to
delete the entry, doing something like
if (prev)
prev->next = entry->next;
else
list_head = entry->next;
and whenever I see code like that, I just go "This person
doesn't understand pointers". And it's sadly quite common.
People who understand pointers just use a "pointer to the entry
pointer", and initialize that with the address of the
list_head. And then as they traverse the list, they can remove
the entry without using any conditionals, by just doing a "*pp =
entry->next".
Applying that simplification lets us lose 7 lines from this function
even while adding 2 lines of comment.
I was tempted to squash this into the original commit, but because
the benchmarking described in the commit log is without this
simplification, I decided to keep it a separate follow-up patch.
Kirill Smelkov [Mon, 20 Jan 2014 16:20:41 +0000 (20:20 +0400)]
combine-diff: combine_diff_path.len is not needed anymore
The field was used in order to speed-up name comparison and also to
mark removed paths by setting it to 0.
Because the updated code does significantly less strcmp and also
just removes paths from the list and free right after we know a path
will not be needed, it is not needed anymore.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When generating combined diff, for each commit, we intersect diff
paths from diff(parent_0,commit) to diff(parent_i,commit) comparing
all paths pairs, i.e. doing it the quadratic way. That is correct,
but could be optimized.
Paths come from trees in sorted (= tree) order, and so does diff_tree()
emits resulting paths in that order too. Now if we look at diffcore
transformations, all of them, except diffcore_order, preserve resulting
path ordering:
- skip_stat_unmatch, grep, pickaxe, filter
-- just skip elements -> order stays preserved
- break -- just breaks diff for a path, adding path
dup after the path -> order stays preserved
So only diffcore_order changes diff paths ordering.
But diffcore_order meaning affects only presentation - i.e. only how to
show the diff, so we could do all the internal computations without
paths reordering, and order only resultant paths set. This is faster,
since, if we know two paths sets are all ordered, their intersection
could be done in linear time.
This patch does just that.
Timings for `git log --raw --no-abbrev --no-renames` without `-c` ("git log")
and with `-c` ("git log -c") before and after the patch are as follows:
linux.git v3.10..v3.11
log log -c
before 1.9s 20.4s
after 1.9s 16.6s
navy.git (private repo)
log log -c
before 0.83s 15.6s
after 0.83s 2.1s
P.S.
I think linux.git case is sped up not so much as the second one, since
in navy.git, there are more exotic (subtree, etc) merges.
P.P.S.
My tracing showed that the rest of the time (16.6s vs 1.9s) is usually
spent in computing huge diffs from commit to second parent. Will try to
deal with it, if I'll have time.
P.P.P.S.
For combine_diff_path, ->len is not needed anymore - will remove it in
the next noisy cleanup path, to maintain good signal/noise ratio here.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Kirill Smelkov [Mon, 20 Jan 2014 16:20:39 +0000 (20:20 +0400)]
diff test: add tests for combine-diff with orderfile
In the next patch combine-diff will have special code-path for taking
orderfile into account. Prepare for making changes by introducing
coverage tests for that case.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Kirill Smelkov [Mon, 20 Jan 2014 16:20:38 +0000 (20:20 +0400)]
diffcore-order: export generic ordering interface
diffcore_order() interface only accepts a queue of `struct
diff_filepair`.
In the next patches, we'll want to order `struct combine_diff_path`
by path, so let's first rework diffcore-order to also provide
generic low-level interface for ordering arbitrary objects, provided
they have path accessors.
The new interface is:
- `struct obj_order` for describing objects to ordering routine, and
- order_objects() for actually doing the ordering work.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Kirill Smelkov [Thu, 6 Feb 2014 11:36:31 +0000 (15:36 +0400)]
tree-walk: finally switch over tree descriptors to contain a pre-parsed entry
This continues 4651ece8 (Switch over tree descriptors to contain a
pre-parsed entry) and moves the only rest computational part
mode = canon_mode(mode)
from tree_entry_extract() to tree entry decode phase - to
decode_tree_entry().
The reason to do it, is that canon_mode() is at least 2 conditional
jumps for regular files, and that could be noticeable should canon_mode()
be invoked several times.
That does not matter for current Git codebase, where typical tree
traversal is
i.e. we do t -> sha1,path.mode "extraction" only once per entry. In such
cases, it does not matter performance-wise, where that mode
canonicalization is done - either once in tree_entry_extract(), or once
in decode_tree_entry() called by update_tree_entry() - it is
approximately the same.
But for future code, which could need to work with several tree_desc's
in parallel, it could be handy to operate on tree_desc descriptors, and
do "extracts" only when needed, or at all, access only relevant part of
it through structure fields directly.
And for such situations, having canon_mode() be done once in decode
phase is better - we won't need to pay the performance price of 2 extra
conditional jumps on every t->mode access.
So let's move mode canonicalization to decode_tree_entry(). That was the
final bit. Now after tree entry is decoded, it is fully ready and could
be accessed either directly via field, or through tree_entry_extract()
which this time got really "totally trivial".
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>