Junio C Hamano [Thu, 18 Jul 2013 19:59:56 +0000 (12:59 -0700)]
Merge branch 'sb/mailmap-updates'
* sb/mailmap-updates:
.mailmap: combine more (email, name) to individual persons
.mailmap: Combine more (email, name) to individual persons
.mailmap: Map email addresses to names
Junio C Hamano [Thu, 18 Jul 2013 19:59:41 +0000 (12:59 -0700)]
Merge branch 'jk/in-pack-size-measurement'
"git cat-file --batch-check=<format>" is added, primarily to allow
on-disk footprint of objects in packfiles (often they are a lot
smaller than their true size, when expressed as deltas) to be
reported.
* jk/in-pack-size-measurement:
pack-revindex: radix-sort the revindex
pack-revindex: use unsigned to store number of objects
cat-file: split --batch input lines on whitespace
cat-file: add %(objectsize:disk) format atom
cat-file: add --batch-check=<format>
cat-file: refactor --batch option parsing
cat-file: teach --batch to stream blob objects
t1006: modernize output comparisons
teach sha1_object_info_extended a "disk_size" query
zero-initialize object_info structs
Junio C Hamano [Thu, 18 Jul 2013 19:59:34 +0000 (12:59 -0700)]
Merge branch 'bp/mediawiki-preview'
Add a command to allow previewing the contents locally before
pushing it out, when working with a MediaWiki remote.
I personally do not think this belongs to Git. If you are working
on a set of AsciiDoc source files, you sure do want to locally
format to preview what you will be pushing out, and if you are
working on a set of C or Java source files, you do want to test it
before pushing it out, too. That kind of thing belongs to your
build script, not to your SCM.
But I'll let it pass, as this is only a contrib/ thing.
* bp/mediawiki-preview:
git-remote-mediawiki: add preview subcommand into git mw
git-remote-mediawiki: add git-mw command
git-remote-mediawiki: factoring code between git-remote-mediawiki and Git::Mediawiki
git-remote-mediawiki: update tests to run with the new bin-wrapper
git-remote-mediawiki: add a git bin-wrapper for developement
wrap-for-bin: make bin-wrappers chainable
git-remote-mediawiki: introduction of Git::Mediawiki.pm
Junio C Hamano [Thu, 18 Jul 2013 19:58:17 +0000 (12:58 -0700)]
Merge branch 'es/overlapping-range-set'
* es/overlapping-range-set:
range_set: fix coalescing bug when range is a subset of another
t4211: fix broken test when one -L range is subset of another
"git clone -s/-l" is a filesystem level copy and does not offer any
protection against source repository being corrupt. While the
connectivity validation checks commits and trees being readable, it
made the otherwise instantaneous local modes of clone much more
expensive, without protecting blob data from bitflips.
* jk/maint-clone-shared-no-connectivity-validation:
clone: drop connectivity check for local clones
Junio C Hamano [Mon, 15 Jul 2013 17:34:36 +0000 (10:34 -0700)]
Merge branch 'mt/send-email-cc-match-fix' into maint
Logic used by git-send-email to suppress cc mishandled names like "A
U. Thor" <author@example.xz>, where the human readable part needs to
be quoted (the user input may not have the double quotes around the
name, and comparison was done between quoted and unquoted strings).
It also mishandled names that need RFC2047 quoting.
* mt/send-email-cc-match-fix:
send-email: sanitize author when writing From line
send-email: add test for duplicate utf8 name
test-send-email: test for pre-sanitized self name
t/send-email: test suppress-cc=self with non-ascii
t/send-email: add test with quoted sender
send-email: make --suppress-cc=self sanitize input
t/send-email: test suppress-cc=self on cccmd
send-email: fix suppress-cc=self on cccmd
t/send-email.sh: add test for suppress-cc=self
Pass port number as a separate argument when send-email initializes
Net::SMTP, instead of as a part of the hostname, i.e. host:port.
This allows GSSAPI codepath to match with the hostname given.
* bc/send-email-use-port-as-separate-param:
send-email: provide port separately from hostname
Junio C Hamano [Mon, 15 Jul 2013 17:28:44 +0000 (10:28 -0700)]
Merge branch 'cp/submodule-custom-update'
In addition to the choice from "rebase, merge, or checkout-detach",
allow a custom command to be used in "submodule update" to update
the working tree of submodules.
* cp/submodule-custom-update:
submodule update: allow custom command to update submodule working tree
Junio C Hamano [Mon, 15 Jul 2013 17:28:39 +0000 (10:28 -0700)]
Merge branch 'jk/format-patch-from'
"git format-patch" learned "--from[=whom]" option, which sets the
"From: " header to the specified person (or the person who runs the
command, if "=whom" part is missing) and move the original author
information to an in-body From: header as necessary.
* jk/format-patch-from:
teach format-patch to place other authors into in-body "From"
pretty.c: drop const-ness from pretty_print_context
Junio C Hamano [Mon, 15 Jul 2013 17:28:34 +0000 (10:28 -0700)]
Merge branch 'mv/merge-ff-tristate'
The configuration variable "merge.ff" was cleary a tri-state to
choose one from "favor fast-forward when possible", "always create
a merge even when the history could fast-forward" and "do not
create any merge, only update when the history fast-forwards", but
the command line parser did not implement the usual convention of
"last one wins, and command line overrides the configuration"
correctly.
* mv/merge-ff-tristate:
merge: handle --ff/--no-ff/--ff-only as a tri-state option
Junio C Hamano [Mon, 15 Jul 2013 17:28:31 +0000 (10:28 -0700)]
Merge branch 'jk/fetch-pack-many-refs'
Fetching between repositories with many refs employed O(n^2)
algorithm to match up the common objects, which has been corrected.
* jk/fetch-pack-many-refs:
fetch-pack: avoid quadratic behavior in rev_list_push
commit.c: make compare_commits_by_commit_date global
fetch-pack: avoid quadratic list insertion in mark_complete
"It fails reliably without corrupting the receiving repository when
it should fail" may be better than the situation before the receiving
end was hardened recently, but the fact that sometimes the push does
not go through still remains. It is better to advice the users that
they cannot push from a shallow repository as a limitation before
they decide to use (or not to use) a shallow clone.
Stefan Beller [Sun, 14 Jul 2013 10:14:59 +0000 (12:14 +0200)]
.mailmap: Combine more (email, name) to individual persons
I got more responses from people regarding the .mailmap file.
All added persons gave permission to add them to the .mailmap file.
It's mostly email mappings again. However we also have Nick Stokoe,
who contributed as Nick Woolley. He changed his name, but kept the email.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stefan Beller [Fri, 12 Jul 2013 19:21:21 +0000 (21:21 +0200)]
.mailmap: Map email addresses to names
People change email addresses quite often and sometimes forget to
add their entry to the mailmap file. I have contacted lots of
people, whose name occurs multiple times in the short log having
different email addresses. The entries in the mailmap of this patch
are either confirmed by them or are trivial. Trivial means
different capitalisation of the domain (@MIT.EDU and @mit.edu) or
the domain was localhost, (none) or @local.
Additionally to adding (name, email) mappings to the .mailmap file,
it has also been sorted ("LC_ALL=C /usr/bin/sort", byte-value sort).
While the most changes happen at the email addresses, we also have a
name change in here. Karl Hasselström is now known as Karl Wiberg
due to marriage. Congratulations!
To find out whom to contact I used the following small
script:
#!/bin/bash
git shortlog -sne |awk '{ NF--; $1=""; print }' |sort |uniq -d > mailmapdoubles
while read line ; do
# remove leading whitespace
trimmed=$(echo $line | sed -e 's/^ *//g' -e 's/ *$//g')
echo "git shortlog -sne | grep \""$trimmed"\""
done < mailmapdoubles > mailmapdoubles2
sh mailmapdoubles2
rm mailmapdoubles
rm mailmapdoubles2
Also interesting for similar tasks are these snippets:
# Finding out duplicates by comparing email addresses:
git shortlog -sne |awk '{ print $NF }' |sort |uniq -d
# Finding out duplicates by comparing names:
git shortlog -sne |awk '{ NF--; $1=""; print }' |sort |uniq -d
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 12 Jul 2013 19:04:09 +0000 (12:04 -0700)]
Merge branch 'tf/gitweb-extra-breadcrumbs'
An Gitweb installation that is a part of larger site can optionally
show extra links that point at the levels higher than the Gitweb
pages itself in the link hierarchy of pages.
* tf/gitweb-extra-breadcrumbs:
gitweb: allow extra breadcrumbs to prefix the trail
"log --format=" did not honor i18n.logoutputencoding configuration
and this attempts to fix it.
* as/log-output-encoding-in-user-format:
t4205 (log-pretty-formats): avoid using `sed`
t6006 (rev-list-format): add tests for "%b" and "%s" for the case i18n.commitEncoding is not set
t4205, t6006, t7102: make functions better readable
t4205 (log-pretty-formats): revert back single quotes
t4041, t4205, t6006, t7102: use iso8859-1 rather than iso-8859-1
t4205: replace .\+ with ..* in sed commands
pretty: --format output should honor logOutputEncoding
pretty: Add failing tests: --format output should honor logOutputEncoding
t4205 (log-pretty-formats): don't hardcode SHA-1 in expected outputs
t7102 (reset): don't hardcode SHA-1 in expected outputs
t6006 (rev-list-format): don't hardcode SHA-1 in expected outputs
git-clone.txt: remove the restriction on pushing from a shallow clone
The document says one cannot push from a shallow clone. But that is
not true (maybe it was at some point in the past). The client does not
stop such a push nor does it give any indication to the receiver that
this is a shallow push. If the receiver accepts it, it's in.
Since 52fed6e (receive-pack: check connectivity before concluding "git
push" - 2011-09-02), receive-pack is prepared to deal with broken
push, a shallow push can't cause any corruption. Update the document
to reflect that.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 11 Jul 2013 12:16:00 +0000 (08:16 -0400)]
pack-revindex: radix-sort the revindex
The pack revindex stores the offsets of the objects in the
pack in sorted order, allowing us to easily find the on-disk
size of each object. To compute it, we populate an array
with the offsets from the sha1-sorted idx file, and then use
qsort to order it by offsets.
That does O(n log n) offset comparisons, and profiling shows
that we spend most of our time in cmp_offset. However, since
we are sorting on a simple off_t, we can use numeric sorts
that perform better. A radix sort can run in O(k*n), where k
is the number of "digits" in our number. For a 64-bit off_t,
using 16-bit "digits" gives us k=4.
On the linux.git repo, with about 3M objects to sort, this
yields a 400% speedup. Here are the best-of-five numbers for
running
echo HEAD | git cat-file --batch-check="%(objectsize:disk)
on a fully packed repository, which is dominated by time
spent building the pack revindex:
before after
real 0m0.834s 0m0.204s
user 0m0.788s 0m0.164s
sys 0m0.040s 0m0.036s
This matches our algorithmic expectations. log(3M) is ~21.5,
so a traditional sort is ~21.5n. Our radix sort runs in k*n,
where k is the number of radix digits. In the worst case,
this is k=4 for a 64-bit off_t, but we can quit early when
the largest value to be sorted is smaller. For any
repository under 4G, k=2. Our algorithm makes two passes
over the list per radix digit, so we end up with 4n. That
should yield ~5.3x speedup. We see 4x here; the difference
is probably due to the extra bucket book-keeping the radix
sort has to do.
On a smaller repo, the difference is less impressive, as
log(n) is smaller. For git.git, with 173K objects (but still
k=2), we see a 2.7x improvement:
before after
real 0m0.046s 0m0.017s
user 0m0.036s 0m0.012s
sys 0m0.008s 0m0.000s
On even tinier repos (e.g., a few hundred objects), the
speedup goes away entirely, as the small advantage of the
radix sort gets erased by the book-keeping costs (and at
those sizes, the cost to generate the the rev-index gets
lost in the noise anyway).
Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 10 Jul 2013 11:50:26 +0000 (07:50 -0400)]
pack-revindex: use unsigned to store number of objects
A packfile may have up to 2^32-1 objects in it, so the
"right" data type to use is uint32_t. We currently use a
signed int, which means that we may behave incorrectly for
packfiles with more than 2^31-1 objects on 32-bit systems.
Nobody has noticed because having 2^31 objects is pretty
insane. The linux.git repo has on the order of 2^22 objects,
which is hundreds of times smaller than necessary to trigger
the bug.
Let's bump this up to an "unsigned". On 32-bit systems, this
gives us the correct data-type, and on 64-bit systems, it is
probably more efficient to use the native "unsigned" than a
true uint32_t.
While we're at it, we can fix the binary search not to
overflow in such a case if our unsigned is 32 bits.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 11 Jul 2013 20:45:59 +0000 (16:45 -0400)]
cat-file: split --batch input lines on whitespace
If we get an input line to --batch or --batch-check that
looks like "HEAD foo bar", we will currently feed the whole
thing to get_sha1(). This means that to use --batch-check
with `rev-list --objects`, one must pre-process the input,
like:
Besides being more typing and slightly less efficient to
invoke `cut`, the result loses information: we no longer
know which path each object was found at.
This patch teaches cat-file to split input lines at the
first whitespace. Everything to the left of the whitespace
is considered an object name, and everything to the right is
made available as the %(reset) atom. So you can now do:
git rev-list --objects HEAD |
git cat-file --batch-check='%(objectsize) %(rest)'
to collect object sizes at particular paths.
Even if %(rest) is not used, we always do the whitespace
split (which means you can simply eliminate the `cut`
command from the first example above).
This whitespace split is backwards compatible for any
reasonable input. Object names cannot contain spaces, so any
input with spaces would have resulted in a "missing" line.
The only input hurt is if somebody really expected input of
the form "HEAD is a fine-looking ref!" to fail; it will now
parse HEAD, and make "is a fine-looking ref!" available as
%(rest).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 10 Jul 2013 11:46:25 +0000 (07:46 -0400)]
cat-file: add %(objectsize:disk) format atom
This atom is just like %(objectsize), except that it shows
the on-disk size of the object rather than the object's true
size. In other words, it makes the "disk_size" query of
sha1_object_info_extended available via the command-line.
This can be used for rough attribution of disk usage to
particular refs, though see the caveats in the
documentation.
This patch does not include any tests, as the exact numbers
returned are volatile and subject to zlib and packing
decisions. We cannot even reliably guarantee that the
on-disk size is smaller than the object content (though in
general this should be the case for non-trivial objects).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 10 Jul 2013 11:45:47 +0000 (07:45 -0400)]
cat-file: add --batch-check=<format>
The `cat-file --batch-check` command can be used to quickly
get information about a large number of objects. However, it
provides a fixed set of information.
This patch adds an optional <format> option to --batch-check
to allow a caller to specify which items they are interested
in, and in which order to output them. This is not very
exciting for now, since we provide the same limited set that
you could already get. However, it opens the door to adding
new format items in the future without breaking backwards
compatibility (or forcing callers to pay the cost to
calculate uninteresting items).
Since the --batch option shares code with --batch-check, it
receives the same feature, though it is less likely to be of
interest there.
The format atom names are chosen to match their counterparts
in for-each-ref. Though we do not (yet) share any code with
for-each-ref's formatter, this keeps the interface as
consistent as possible, and may help later on if the
implementations are unified.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 11 Jul 2013 20:05:52 +0000 (13:05 -0700)]
Merge branch 'pb/stash-refuse-to-kill'
"git stash save" is not just about "saving" the local changes, but
also is to restore the working tree state to that of HEAD. If you
changed a non-directory into a directory in the local change, you
may have untracked files in that directory, which have to be killed
while doing so, unless you run it with --include-untracked. Teach
the command to detect and error out before spreading the damage.
This needed a small fix to "ls-files --killed".
* pb/stash-refuse-to-kill:
git stash: avoid data loss when "git stash save" kills a directory
treat_directory(): do not declare submodules to be untracked
Junio C Hamano [Thu, 11 Jul 2013 20:05:34 +0000 (13:05 -0700)]
Merge branch 'jg/status-config'
"git status" learned status.branch and status.short configuration
variables to use --branch and --short options by default (override
with --no-branch and --no-short options from the command line).
* jg/status-config:
status/commit: make sure --porcelain is not affected by user-facing config
commit: make it work with status.short
status: introduce status.branch to enable --branch by default
status: introduce status.short to enable --short by default
Junio C Hamano [Thu, 11 Jul 2013 20:04:33 +0000 (13:04 -0700)]
Merge branch 'rr/rebase-checkout-reflog'
Invocations of "git checkout" used internally by "git rebase" were
counted as "checkout", and affected later "git checkout -" to the
the user to an unexpected place.
* rr/rebase-checkout-reflog:
checkout: respect GIT_REFLOG_ACTION
status: do not depend on rebase reflog messages
t/t2021-checkout-last: "checkout -" should work after a rebase finishes
wt-status: remove unused field in grab_1st_switch_cbdata
t7512: test "detached from" as well
Junio C Hamano [Thu, 11 Jul 2013 20:03:21 +0000 (13:03 -0700)]
Merge branch 'jc/triangle-push-fixup'
Earlier remote.pushdefault (and per-branch branch.*.pushremote)
were introduced as an additional mechanism to choose what
repository to push into when "git push" did not say it from the
command line, to help people who push to a repository that is
different from where they fetch from. This attempts to finish that
topic by teaching the default mechanism to choose branch in the
remote repository to be updated by such a push.
The 'current', 'matching' and 'nothing' modes (specified by the
push.default configuration variable) extend to such a "triangular"
workflow naturally, but 'upstream' and 'simple' have to be updated.
. 'upstream' is about pushing back to update the branch in the
remote repository that the current branch fetches from and
integrates with, it errors out in a triangular workflow.
. 'simple' is meant to help new people by avoiding mistakes, and
will be the safe default in Git 2.0.
In a non-triangular workflow, it will continue to act as a cross
between 'upstream' and 'current' in that it pushes to the current
branch's @{upstream} only when it is set to the same name as the
current branch (e.g. your 'master' forks from the 'master' from
the central repository).
In a triangular workflow, this series tentatively defines it as
the same as 'current', but we may have to tighten it to avoid
surprises in some way.
Jeff King [Wed, 10 Jul 2013 11:38:58 +0000 (07:38 -0400)]
cat-file: refactor --batch option parsing
We currently use an int to tell us whether --batch parsing
is on, and if so, whether we should print the full object
contents. Let's instead factor this into a struct, filled in
by callback, which will make further batch-related options
easy to add.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 10 Jul 2013 11:38:24 +0000 (07:38 -0400)]
cat-file: teach --batch to stream blob objects
The regular "git cat-file -p" and "git cat-file blob" code
paths already learned to stream large blobs. Let's do the
same here.
Note that this means we look up the type and size before
making a decision of whether to load the object into memory
or stream (just like the "-p" code path does). That can lead
to extra work, but it should be dwarfed by the cost of
actually accessing the object itself. In my measurements,
there was a 1-2% slowdown when using "--batch" on a large
number of objects.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 10 Jul 2013 11:36:43 +0000 (07:36 -0400)]
t1006: modernize output comparisons
In modern tests, we typically put output into a file and
compare it with test_cmp. This is nicer than just comparing
via "test", and much shorter than comparing via "test" and
printing a custom message.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eric Sunshine [Tue, 9 Jul 2013 05:55:05 +0000 (01:55 -0400)]
range_set: fix coalescing bug when range is a subset of another
When coalescing ranges, sort_and_merge_range_set() unconditionally
assumes that the end of a range being folded into a preceding range
should become the end of the coalesced range. This assumption, however,
is invalid when one range is a subset of another. For example, given
ranges 1-5 and 2-3 added via range_set_append_unsafe(),
sort_and_merge_range_set() incorrectly coalesces them to range 1-3
rather than the correct union range 1-5. Fix this bug.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eric Sunshine [Tue, 9 Jul 2013 05:55:04 +0000 (01:55 -0400)]
t4211: fix broken test when one -L range is subset of another
t4211 attempts to test multiple git-log -L ranges where one range is a
superset of the other, and falsely succeeds because its "expected"
output is incorrect.
Overlapping -L ranges handed to git-log are coalesced by
line-log.c:sort_and_merge_range_set() into a set of non-overlapping,
disjoint ranges. When one range is a subset of another,
sort_and_merge_range_set() should coalesce both ranges to the superset
range, but instead the coalesced range often is incorrectly truncated to
the end of the subset range. For example, ranges 2-8 and 3-4 are
coalesced incorrectly to 2-4.
One can observe this incorrect behavior with git-log -L using the test
repository created by t4211. The superset/subset ranges t4211 employs
are 4-$ and 8-12 (where $ represents end-of-file). The coalesced range
should be 4-$. Manually invoking git-log with the same ranges the test
employs, we see:
This last output is incorrect. 8-12 is a subset of 4-$, hence the output
of the coalesced range should be the same as the 4-$ output shown first.
In fact, the above incorrect output is the truncated bogus range 4-12:
Fix the test to correctly fail in the presence of the
sort_and_merge_range_set() coalescing bug. Do so by changing the
"expected" output to the commits mentioned in the 4-$ output above.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Peter Krefting [Tue, 9 Jul 2013 11:16:33 +0000 (12:16 +0100)]
commit: reject non-characters
Unicode clause D14 defines all characters U+nFFFE and U+nFFFF (where
0 <= n <= 10h) as well as the range U+FDD0..U+FDEF as non-characters,
reserved for internal use only. Disallow these characters in commit
messages as they are normally not recommended for interchange.
Signed-off-by: Peter Krefting <peter@softwolves.pp.se> Signed-off-by: Junio C Hamano <gitster@pobox.com>
John Keeping [Sun, 7 Jul 2013 19:02:15 +0000 (20:02 +0100)]
pull: change the description to "integrate" changes
Since git-pull learned the --rebase option it has not just been about
merging changes from a remote repository (where "merge" is in the sense
of "git merge"). Change the description to use "integrate" instead of
"merge" in order to reflect this.
Signed-off-by: John Keeping <john@keeping.me.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Thomas Rast [Mon, 8 Jul 2013 15:20:31 +0000 (17:20 +0200)]
t9902: fix 'test A == B' to use = operator
The == operator as an alias to = is not POSIX. This doesn't actually
matter for the execution of the script, because it only runs when the
shell is bash. However, it trips up test-lint, so it's nicer to use
the standard form.
Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
remote.c: avoid O(m*n) behavior in match_push_refs
When pushing using a matching refspec or a pattern refspec, each ref
in the local repository must be paired with a ref advertised by the
remote server. This is accomplished by using the refspec to transform
the name of the local ref into the name it should have in the remote
repository, and then performing a linear search through the list of
remote refs to see if the remote ref was advertised by the remote
system.
Each of these lookups has O(n) complexity and makes match_push_refs()
be an O(m*n) operation, where m is the number of local refs and n is
the number of remote refs. If there are many refs 100,000+, then this
ref matching can take a significant amount of time. Let's prepare an
index of the remote refs to allow searching in O(log n) time and
reduce the complexity of match_push_refs() to O(m log n).
We prepare the index lazily so that it is only created when necessary.
So, there should be no impact when _not_ using a matching or pattern
refspec, i.e. when pushing using only explicit refspecs.
Dry-run push of a repository with 121,913 local and remote refs:
before after
real 1m40.582s 0m0.804s
user 1m39.914s 0m0.515s
sys 0m0.125s 0m0.106s
The creation of the index has overhead. So, if there are very few
local refs, then it could take longer to create the index than it
would have taken to just perform n linear lookups into the remote
ref space. Using the index should provide some improvement when
the number of local refs is roughly greater than the log of the
number of remote refs (i.e. m >= log n). The pathological case is
when there is a single local ref and very many remote refs.
Dry-run push of a repository with 121,913 remote refs and a single
local ref:
before after
real 0m0.525s 0m0.566s
user 0m0.243s 0m0.279s
sys 0m0.075s 0m0.099s
Using an index takes 41 ms longer, or roughly 7.8% longer.
Jeff King measured a no-op push of a single ref into a remote repo
with 370,000 refs:
before after
real 0m1.087s 0m1.156s
user 0m1.344s 0m1.412s
sys 0m0.288s 0m0.284s
Using an index takes 69 ms longer, or roughly 6.3% longer.
None of the measurements above required transferring any objects to
the remote repository. If the push required transferring objects and
updating the refs in the remote repository, the impact of preparing
the search index would be even smaller.
A similar operation is performed in the reverse direction when pruning
using a matching or pattern refspec. Let's avoid O(m*n) behavior in
the same way by lazily preparing an index on the local refs.
Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Benoit Person [Thu, 4 Jul 2013 20:39:00 +0000 (22:39 +0200)]
git-remote-mediawiki: add preview subcommand into git mw
In the current state, a user of git-remote-mediawiki can edit the markup text
locally, but has to push to the remote wiki to see how the page is rendererd.
Add a new 'git mw preview' command that allows rendering the markup text on
the remote wiki without actually pushing any change on the wiki.
This uses Mediawiki's API to render the markup and inserts it in an actual
HTML page from the wiki so that CSS can be rendered properly. Most links
should work when the page exists on the remote.
Signed-off-by: Benoit Person <benoit.person@ensimag.fr> Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Benoit Person [Thu, 4 Jul 2013 20:38:59 +0000 (22:38 +0200)]
git-remote-mediawiki: add git-mw command
For now, git-remote-mediawiki is only a remote-helper. This patch adds a new
toolset script in which we will be able to build new tools for
git-remote-mediawiki.
This toolset uses a subcommand-mechanism to launch the proper action. For now
only the 'help' subcommand is implemented. It also provides some generic code
for the verbose and help command line options.
Signed-off-by: Benoit Person <benoit.person@ensimag.fr> Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Benoit Person [Thu, 4 Jul 2013 20:38:58 +0000 (22:38 +0200)]
git-remote-mediawiki: factoring code between git-remote-mediawiki and Git::Mediawiki
For now, Git::Mediawiki contains nothing.
This first patch moves some of git-remote-mediawiki.perl's factorisable code
into Git::Mediawiki. In the same time, it removes the side effects of that code
and renames the fucntions and constants moved to expose a better API.
Signed-off-by: Benoit Person <benoit.person@ensimag.fr> Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Benoit Person [Thu, 4 Jul 2013 20:38:57 +0000 (22:38 +0200)]
git-remote-mediawiki: update tests to run with the new bin-wrapper
Until now, if git-remote-mediawiki was not installed, the test suite
copied it to the toplevel directory. This solution pollutes the
directory with untracked files. Plus, we would need to copy the new
git-mw.perl file to test it too.
Signed-off-by: Benoit Person <benoit.person@ensimag.fr> Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Benoit Person [Thu, 4 Jul 2013 20:38:56 +0000 (22:38 +0200)]
git-remote-mediawiki: add a git bin-wrapper for developement
The introduction of the Git::Mediawiki package makes it impossible to test,
without installation, git-remote-mediawiki and git-mw.
Using a git bin-wrapper enables us to define proper $GITPERLLIB to force the
use of the developement version of the Git::Mediawiki package, bypassing its
installed version if any.
An alternate solution was to 'install' all the files required at each build
but it pollutes the toplevel with untracked files.
Signed-off-by: Benoit Person <benoit.person@ensimag.fr> Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Benoit Person [Thu, 4 Jul 2013 20:38:55 +0000 (22:38 +0200)]
wrap-for-bin: make bin-wrappers chainable
For now, bin-wrappers overwrites GITPERLLIB. If we want to chain to
those scripts and define GITPERLLIB before, our changes will be
discarded.
This patch makes the bin-wrappers prepend their modifications to
GITPERLLIB rather than redefining it. It also unset GITPERLLIB in the
test-suite to prevent broken $GITPERLLIB in the user's configuration
from interfering with the testsuite.
The codes using GIT_TEMPLATE_DIR and GIT_TEXTDOMAINDIR handle only one
path in each of this variable so this new behavior would be useless on
those variables.
Signed-off-by: Benoit Person <benoit.person@ensimag.fr> Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Benoit Person [Thu, 4 Jul 2013 20:38:54 +0000 (22:38 +0200)]
git-remote-mediawiki: introduction of Git::Mediawiki.pm
We would want to allow the user to preview what he has edited locally
before pushing it out (and thus creating a non-removable revision in
the mediawiki's history).
This patch introduces a new perl package in which we will be able to
share code between that new tool and the remote helper:
git-remote-mediawiki.perl.
A perl package offers the best way to handle such case: Each script
can select what should be imported in its namespace. The package
namespacing limits the use of side effects in the shared code.
An alternate solution is to concatenate a "toolset" file with each
*.perl when 'make'-ing the project. In that scheme, everything is
imported in the script's namespace. Plus, files should be renamed in
order to chain to Git's toplevel makefile. Hence, this solution is not
acceptable.
Signed-off-by: Benoit Person <benoit.person@ensimag.fr> Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 8 Jul 2013 07:30:41 +0000 (03:30 -0400)]
clone: drop connectivity check for local clones
Commit 0433ad1 (clone: run check_everything_connected,
2013-03-25) added the same connectivity check to clone that
we use for fetching. The intent was to provide enough safety
checks that "git clone git://..." could be counted on to
detect bit errors and other repo corruption, and not
silently propagate them to the clone.
For local clones, this turns out to be a bad idea, for two
reasons:
1. Local clones use hard linking (or even shared object
stores), and so complete far more quickly. The time
spent on the connectivity check is therefore
proportionally much more painful.
2. Local clones do not actually meet our safety guarantee
anyway. The connectivity check makes sure we have all
of the objects we claim to, but it does not check for
bit errors. We will notice bit errors in commits and
trees, but we do not load blob objects at all. Whereas
over the pack transport, we actually recompute the sha1
of each object in the incoming packfile; bit errors
change the sha1 of the object, which is then caught by
the connectivity check.
This patch drops the connectivity check in the local case.
Note that we have to revert the changes from 0433ad1 to
t5710, as we no longer notice the corruption during clone.
We could go a step further and provide a "verify even local
clones" option, but it is probably not worthwhile. You can
already spell that as "cd foo.git && git fsck && git clone ."
or as "git clone --no-local foo.git".
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
John Keeping [Sun, 7 Jul 2013 19:02:14 +0000 (20:02 +0100)]
push: avoid suggesting "merging" remote changes
With some workflows, it is more suitable to rebase on top of remote
changes when a push does not fast-forward. Change the advice messages
in git-push to suggest that a user "integrate the remote changes"
instead of "merge the remote changes" to make this slightly clearer.
Also change the suggested 'git pull' to 'git pull ...' to hint to users
that they may want to add other parameters.
Suggested-by: Philip Oakley <philipoakley@iee.org> Signed-off-by: John Keeping <john@keeping.me.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
John Keeping [Sun, 7 Jul 2013 19:49:56 +0000 (20:49 +0100)]
git-config(1): clarify precedence of multiple values
In order to clarify which value is used when there are multiple values
defined for a key, re-order the list of file locations so that it runs
from least specific to most specific. Then add a paragraph which simply
says that the last value will be used.
Signed-off-by: John Keeping <john@keeping.me.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Sun, 7 Jul 2013 10:04:00 +0000 (06:04 -0400)]
teach sha1_object_info_extended a "disk_size" query
Using sha1_object_info_extended, a caller can find out the
type of an object, its size, and information about where it
is stored. In addition to the object's "true" size, it can
also be useful to know the size that the object takes on
disk (e.g., to generate statistics about which refs consume
space).
This patch adds a "disk_sizep" field to "struct object_info",
and fills it in during sha1_object_info_extended if it is
non-NULL.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Sun, 7 Jul 2013 10:03:29 +0000 (06:03 -0400)]
zero-initialize object_info structs
The sha1_object_info_extended function expects the caller to
provide a "struct object_info" which contains pointers to
"query" items that will be filled in. The purpose of
providing pointers rather than storing the response directly
in the struct is so that callers can choose not to incur the
expense in finding particular fields that they do not care
about.
Right now the only query item is "sizep", and all callers
set it explicitly to choose whether or not to query it; they
can then leave the rest of the struct uninitialized.
However, as we add new query items, each caller will have to
be updated to explicitly turn off the new ones (by setting
them to NULL). Instead, let's teach each caller to
zero-initialize the struct, so that they do not have to
learn about each new query item added.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The path of the file to be locked is held in lock_file::filename,
which is a fixed-length buffer of length PATH_MAX. This buffer is
also (temporarily) used to hold the path of the lock file, which is
the path of the file being locked plus ".lock". Because of this, the
path of the file being locked must be less than (PATH_MAX - 5)
characters long (5 chars are needed for ".lock" and one character for
the NUL terminator).
On entry into lock_file(), the path length was only verified to be
less than PATH_MAX characters, not less than (PATH_MAX - 5)
characters.
When and if resolve_symlink() is called, then that function is
correctly told to treat the buffer as (PATH_MAX - 5) characters long.
This part is correct. However:
* If LOCK_NODEREF was specified, then resolve_symlink() is never
called.
* If resolve_symlink() is called but the path is not a symlink, then
the length check is never applied.
So it is possible for a path with length (PATH_MAX - 5 <= len <
PATH_MAX) to make it through the checks. When ".lock" is strcat()ted
to such a path, the lock_file::filename buffer is overflowed.
Fix the problem by adding a check when entering lock_file() that the
original path is less than (PATH_MAX - 5) characters.
[jc: with independent development by Peff]
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sat, 6 Jul 2013 13:53:27 +0000 (15:53 +0200)]
diffcore-pickaxe: simplify has_changes and contains
Halve the number of callsites of contains() to two using temporary
variables, simplifying the code. While at it, get rid of the
diff_options parameter, which became unused with 8fa4b09f.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default similarity index of 50% is documented in gitdiffcore(7)
but it is worth also mentioning it in the description of the
-M/--find-renames option.
Signed-off-by: Fraser Tweedale <frase@frase.id.au> Signed-off-by: Junio C Hamano <gitster@pobox.com>
For testing truncated log messages 'commit_msg' function uses `sed` to
cut a message. On various platforms `sed` behaves differently and
results of its work depend on locales installed. So, avoid using `sed`.
Use predefined expected outputs instead of calculated ones.
Signed-off-by: Alexey Shumkin <Alex.Crezoff@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t6006 (rev-list-format): add tests for "%b" and "%s" for the case i18n.commitEncoding is not set
In de6029a (pretty: Add failing tests: --format output should honor
logOutputEncoding, 2013-06-26) 'complex-subject' test was changed.
Revert it back, because that change actually removed tests for "%b"
and "%s" with i18n.commitEncoding set. Also, add two more tests for
mentioned above "%b" and "%s" to test encoding conversions with no
i18n.commitEncoding set.
Signed-off-by: Alexey Shumkin <Alex.Crezoff@gmail.com> Suggested-by: Johannes Sixt <j.sixt@viscovery.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t4205, t6006, t7102: make functions better readable
Function 'test_format' has become harder to read after its change in de6029a2 (pretty: Add failing tests: --format output should honor
logOutputEncoding, 2013-06-26). Simplify it by moving its "should we
expect it to fail?" parameter to the end.
Note, current code does not use this last parameter as far as there
are no tests expected to fail. We can keep that for future use.
Also, reformat comments.
Signed-off-by: Alexey Shumkin <Alex.Crezoff@gmail.com> Improved-by: Johannes Sixt <j.sixt@viscovery.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t4205 (log-pretty-formats): revert back single quotes
In previuos commit de6029a (pretty: Add failing tests: --format output
should honor logOutputEncoding, 2013-06-26) single quotes were replaced
with double quotes to make "$(commit_msg)" expression in heredoc to
work. The same effect can be achieved by using "EOF" as a heredoc
delimiter instead of "\EOF".
Signed-off-by: Alexey Shumkin <Alex.Crezoff@gmail.com> Suggested-by: Johannes Sixt <j.sixt@viscovery.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 5 Jul 2013 08:15:48 +0000 (01:15 -0700)]
Merge branch 'tr/test-v-and-v-subtest-only'
Allows N instances of tests run in parallel, each running 1/N parts
of the test suite under Valgrind, to speed things up.
* tr/test-v-and-v-subtest-only:
perf-lib: fix start/stop of perf tests
test-lib: support running tests under valgrind in parallel
test-lib: allow prefixing a custom string before "ok N" etc.
test-lib: valgrind for only tests matching a pattern
test-lib: verbose mode for only tests matching a pattern
test-lib: self-test that --verbose works
test-lib: rearrange start/end of test_expect_* and test_skip
test-lib: refactor $GIT_SKIP_TESTS matching
test-lib: enable MALLOC_* for the actual tests
Mark Levedahl [Thu, 4 Jul 2013 22:04:30 +0000 (18:04 -0400)]
test-lib.sh - cygwin does not have usable FIFOs
Do not use FIFOs on cygwin, they do not work. Cygwin includes
coreutils, so has mkfifo, and that command does something. However,
the resultant named pipe is known (on the Cygwin mailing list at
least) to not work correctly.
This disables PIPE for Cygwin, allowing t0008.sh to complete (all other
tests in that file work correctly).
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t4041, t4205, t6006, t7102: use iso8859-1 rather than iso-8859-1
Both "iso8859-1" and "iso-8859-1" are understood as latin-1 by
modern platforms, but the latter is not understood by older
platforms;update tests to use the former.
This is in line with 3994e8a9 (t4201: use ISO8859-1 rather than
ISO-8859-1, 2009-12-03), which did the same.
Signed-off-by: Alexey Shumkin <Alex.Crezoff@gmail.com> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tony Finch [Thu, 4 Jul 2013 17:02:12 +0000 (18:02 +0100)]
gitweb: allow extra breadcrumbs to prefix the trail
There are often parent pages logically above the gitweb projects
list, e.g. home pages of the organization and department that host
the gitweb server. This change allows you to include links to those
pages in gitweb's breadcrumb trail.
Signed-off-by: Tony Finch <dot@dotat.at> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Acked-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit code accepts pseudo-UTF-8 sequences that encode a character with more
bytes than necessary. Reject such sequences, since they are not valid UTF-8.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit code already contains code for validating UTF-8, but it does not
check for invalid values, such as guaranteed non-characters and surrogates. Fix
this by explicitly checking for and rejecting such characters.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>