Jeff King [Sat, 5 May 2018 00:03:35 +0000 (20:03 -0400)]
verify_path: disallow symlinks in .gitmodules
There are a few reasons it's not a good idea to make
.gitmodules a symlink, including:
1. It won't be portable to systems without symlinks.
2. It may behave inconsistently, since Git may look at
this file in the index or a tree without bothering to
resolve any symbolic links. We don't do this _yet_, but
the config infrastructure is there and it's planned for
the future.
With some clever code, we could make (2) work. And some
people may not care about (1) if they only work on one
platform. But there are a few security reasons to simply
disallow it:
a. A symlinked .gitmodules file may circumvent any fsck
checks of the content.
b. Git may read and write from the on-disk file without
sanity checking the symlink target. So for example, if
you link ".gitmodules" to "../oops" and run "git
submodule add", we'll write to the file "oops" outside
the repository.
Again, both of those are problems that _could_ be solved
with sufficient code, but given the complications in (1) and
(2), we're better off just outlawing it explicitly.
Note the slightly tricky call to verify_path() in
update-index's update_one(). There we may not have a mode if
we're not updating from the filesystem (e.g., we might just
be removing the file). Passing "0" as the mode there works
fine; since it's not a symlink, we'll just skip the extra
checks.
Jeff King [Mon, 14 May 2018 15:00:56 +0000 (11:00 -0400)]
update-index: stat updated files earlier
In the update_one(), we check verify_path() on the proposed
path before doing anything else. In preparation for having
verify_path() look at the file mode, let's stat the file
earlier, so we can check the mode accurately.
This is made a bit trickier by the fact that this function
only does an lstat in a few code paths (the ones that flow
down through process_path()). So we can speculatively do the
lstat() here and pass the results down, and just use a dummy
mode for cases where we won't actually be updating the index
from the filesystem.
Jeff King [Sun, 13 May 2018 17:00:23 +0000 (13:00 -0400)]
verify_path: drop clever fallthrough
We check ".git" and ".." in the same switch statement, and
fall through the cases to share the end-of-component check.
While this saves us a line or two, it makes modifying the
function much harder. Let's just write it out.
Jeff King [Sun, 13 May 2018 16:57:14 +0000 (12:57 -0400)]
skip_prefix: add case-insensitive variant
We have the convenient skip_prefix() helper, but if you want
to do case-insensitive matching, you're stuck doing it by
hand. We could add an extra parameter to the function to
let callers ask for this, but the function is small and
somewhat performance-critical. Let's just re-implement it
for the case-insensitive version.
When we started to catch NTFS short names that clash with .git, we only
looked for GIT~1. This is sufficient because we only ever clone into an
empty directory, so .git is guaranteed to be the first subdirectory or
file in that directory.
However, even with a fresh clone, .gitmodules is *not* necessarily the
first file to be written that would want the NTFS short name GITMOD~1: a
malicious repository can add .gitmodul0000 and friends, which sorts
before `.gitmodules` and is therefore checked out *first*. For that
reason, we have to test not only for ~1 short names, but for others,
too.
It's hard to just adapt the existing checks in is_ntfs_dotgit(): since
Windows 2000 (i.e., in all Windows versions still supported by Git),
NTFS short names are only generated in the <prefix>~<number> form up to
number 4. After that, a *different* prefix is used, calculated from the
long file name using an undocumented, but stable algorithm.
For example, the short name of .gitmodules would be GITMOD~1, but if it
is taken, and all of ~2, ~3 and ~4 are taken, too, the short name
GI7EBA~1 will be used. From there, collisions are handled by
incrementing the number, shortening the prefix as needed (until ~9999999
is reached, in which case NTFS will not allow the file to be created).
We'd also want to handle .gitignore and .gitattributes, which suffer
from a similar problem, using the fall-back short names GI250A~1 and
GI7D29~1, respectively.
To accommodate for that, we could reimplement the hashing algorithm, but
it is just safer and simpler to provide the known prefixes. This
algorithm has been reverse-engineered and described at
https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but
still available via https://web.archive.org/.
These can be recomputed by running the following Perl script:
Jeff King [Wed, 2 May 2018 19:23:45 +0000 (15:23 -0400)]
is_hfs_dotgit: match other .git files
Both verify_path() and fsck match ".git", ".GIT", and other
variants specific to HFS+. Let's allow matching other
special files like ".gitmodules", which we'll later use to
enforce extra restrictions via verify_path() and fsck.
Jeff King [Sun, 13 May 2018 16:09:42 +0000 (12:09 -0400)]
is_ntfs_dotgit: use a size_t for traversing string
We walk through the "name" string using an int, which can
wrap to a negative value and cause us to read random memory
before our array (e.g., by creating a tree with a name >2GB,
since "int" is still 32 bits even on most 64-bit platforms).
Worse, this is easy to trigger during the fsck_tree() check,
which is supposed to be protecting us from malicious
garbage.
Note one bit of trickiness in the existing code: we
sometimes assign -1 to "len" at the end of the loop, and
then rely on the "len++" in the for-loop's increment to take
it back to 0. This is still legal with a size_t, since
assigning -1 will turn into SIZE_MAX, which then wraps
around to 0 on increment.
Jeff King [Mon, 30 Apr 2018 07:25:25 +0000 (03:25 -0400)]
submodule-config: verify submodule names as paths
Submodule "names" come from the untrusted .gitmodules file,
but we blindly append them to $GIT_DIR/modules to create our
on-disk repo paths. This means you can do bad things by
putting "../" into the name (among other things).
Let's sanity-check these names to avoid building a path that
can be exploited. There are two main decisions:
1. What should the allowed syntax be?
It's tempting to reuse verify_path(), since submodule
names typically come from in-repo paths. But there are
two reasons not to:
a. It's technically more strict than what we need, as
we really care only about breaking out of the
$GIT_DIR/modules/ hierarchy. E.g., having a
submodule named "foo/.git" isn't actually
dangerous, and it's possible that somebody has
manually given such a funny name.
b. Since we'll eventually use this checking logic in
fsck to prevent downstream repositories, it should
be consistent across platforms. Because
verify_path() relies on is_dir_sep(), it wouldn't
block "foo\..\bar" on a non-Windows machine.
2. Where should we enforce it? These days most of the
.gitmodules reads go through submodule-config.c, so
I've put it there in the reading step. That should
cover all of the C code.
We also construct the name for "git submodule add"
inside the git-submodule.sh script. This is probably
not a big deal for security since the name is coming
from the user anyway, but it would be polite to remind
them if the name they pick is invalid (and we need to
expose the name-checker to the shell anyway for our
test scripts).
This patch issues a warning when reading .gitmodules
and just ignores the related config entry completely.
This will generally end up producing a sensible error,
as it works the same as a .gitmodules file which is
missing a submodule entry (so "submodule update" will
barf, but "git clone --recurse-submodules" will print
an error but not abort the clone.
There is one minor oddity, which is that we print the
warning once per malformed config key (since that's how
the config subsystem gives us the entries). So in the
new test, for example, the user would see three
warnings. That's OK, since the intent is that this case
should never come up outside of malicious repositories
(and then it might even benefit the user to see the
message multiple times).
Credit for finding this vulnerability and the proof of
concept from which the test script was adapted goes to
Etienne Stalmans.
Jeff King [Mon, 11 Sep 2017 14:24:26 +0000 (10:24 -0400)]
cvsimport: shell-quote variable used in backticks
We run `git rev-parse` though the shell, and quote its
argument only with single-quotes. This prevents most
metacharacters from being a problem, but misses the obvious
case when $name itself has single-quotes in it. We can fix
this by applying the usual shell-quoting formula.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 11 Sep 2017 15:27:51 +0000 (11:27 -0400)]
shell: drop git-cvsserver support by default
The git-cvsserver script is old and largely unmaintained
these days. But git-shell allows untrusted users to run it
out of the box, significantly increasing its attack surface.
Let's drop it from git-shell's list of internal handlers so
that it cannot be run by default. This is not backwards
compatible. But given the age and development activity on
CVS-related parts of Git, this is likely to impact very few
users, while helping many more (i.e., anybody who runs
git-shell and had no intention of supporting CVS).
There's no configuration mechanism in git-shell for us to
add a boolean and flip it to "off". But there is a mechanism
for adding custom commands, and adding CVS support here is
fairly trivial. Let's document it to give guidance to
anybody who really is still running cvsserver.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 31 Jul 2017 20:51:05 +0000 (13:51 -0700)]
Merge branch 'pw/unquote-path-in-git-pm' into maint
Code refactoring.
* pw/unquote-path-in-git-pm:
t9700: add tests for Git::unquote_path()
Git::unquote_path(): throw an exception on bad path
Git::unquote_path(): handle '\a'
add -i: move unquote_path() to Git.pm
Junio C Hamano [Mon, 31 Jul 2017 20:51:05 +0000 (13:51 -0700)]
Merge branch 'jk/gc-pre-detach-under-hook' into maint
We run an early part of "git gc" that deals with refs before
daemonising (and not under lock) even when running a background
auto-gc, which caused multiple gc processes attempting to run the
early part at the same time. This is now prevented by running the
early part also under the GC lock.
* jk/gc-pre-detach-under-hook:
gc: run pre-detach operations under lock
Junio C Hamano [Mon, 31 Jul 2017 20:51:04 +0000 (13:51 -0700)]
Merge branch 'rs/progress-overall-throughput-at-the-end' into maint
The progress meter did not give a useful output when we haven't had
0.5 seconds to measure the throughput during the interval. Instead
show the overall throughput rate at the end, which is a much more
useful number.
* rs/progress-overall-throughput-at-the-end:
progress: show overall rate in last update
Junio C Hamano [Mon, 31 Jul 2017 20:51:04 +0000 (13:51 -0700)]
Merge branch 'tb/push-to-cygwin-unc-path' into maint
On Cygwin, similar to Windows, "git push //server/share/repository"
ought to mean a repository on a network share that can be accessed
locally, but this did not work correctly due to stripping the double
slashes at the beginning.
This may need to be heavily tested before it gets unleashed to the
wild, as the change is at a fairly low-level code and would affect
not just the code to decide if the push destination is local. There
may be unexpected fallouts in the path normalization.
* tb/push-to-cygwin-unc-path:
cygwin: allow pushing to UNC paths
Jeff King [Fri, 28 Jul 2017 19:28:55 +0000 (15:28 -0400)]
connect: reject paths that look like command line options
If we get a repo path like "-repo.git", we may try to invoke
"git-upload-pack -repo.git". This is going to fail, since
upload-pack will interpret it as a set of bogus options. But
let's reject this before we even run the sub-program, since
we would not want to allow any mischief with repo names that
actually are real command-line options.
You can still ask for such a path via git-daemon, but there's no
security problem there, because git-daemon enters the repo itself
and then passes "." on the command line.
Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 28 Jul 2017 19:26:50 +0000 (15:26 -0400)]
connect: reject dashed arguments for proxy commands
If you have a GIT_PROXY_COMMAND configured, we will run it
with the host/port on the command-line. If a URL contains a
mischievous host like "--foo", we don't know how the proxy
command may handle it. It's likely to break, but it may also
do something dangerous and unwanted (technically it could
even do something useful, but that seems unlikely).
We should err on the side of caution and reject this before
we even run the command.
The hostname check matches the one we do in a similar
circumstance for ssh. The port check is not present for ssh,
but there it's not necessary because the syntax is "-p
<port>", and there's no ambiguity on the parsing side.
It's not clear whether you can actually get a negative port
to the proxy here or not. Doing:
git fetch git://remote:-1234/repo.git
keeps the "-1234" as part of the hostname, with the default
port of 9418. But it's a good idea to keep this check close
to the point of running the command to make it clear that
there's no way to circumvent it (and at worst it serves as a
belt-and-suspenders check).
Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 28 Jul 2017 19:25:45 +0000 (15:25 -0400)]
connect: factor out "looks like command line option" check
We reject hostnames that start with a dash because they may
be confused for command-line options. Let's factor out that
notion into a helper function, as we'll use it in more
places. And while it's simple now, it's not clear if some
systems might need more complex logic to handle all cases.
Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 26 Jul 2017 17:24:20 +0000 (10:24 -0700)]
connect: reject ssh hostname that begins with a dash
When commands like "git fetch" talk with ssh://$rest_of_URL/, the
code splits $rest_of_URL into components like host, port, etc., and
then spawns the underlying "ssh" program by formulating argv[] array
that has:
- the path to ssh command taken from GIT_SSH_COMMAND, etc.
- dashed options like '-batch' (for Tortoise), '-p <port>' as
needed.
- ssh_host, which is supposed to be the hostname parsed out of
$rest_of_URL.
- then the command to be run on the other side, e.g. git
upload-pack.
If the ssh_host ends up getting '-<anything>', the argv[] that is
used to spawn the command becomes something like:
which obviously is bogus, but depending on the actual value of
"<anything>", will make "ssh" parse and use it as an option.
Prevent this by forbidding ssh_host that begins with a "-".
Noticed-by: Joern Schneeweisz of Recurity Labs Reported-by: Brian at GitLab Signed-off-by: Junio C Hamano <gitster@pobox.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 28 Jul 2017 21:47:48 +0000 (17:47 -0400)]
t/lib-proto-disable: restore protocol.allow after config tests
The tests for protocol.allow actually set that variable in
the on-disk config, run a series of tests, and then never
clean up after themselves. This means that whatever tests we
run after have protocol.allow=never, which may influence
their results.
In most cases we either exit after running these tests, or
do another round of test_proto(). In the latter case, this happens to
work because:
1. Tests of the GIT_ALLOW_PROTOCOL environment variable
override the config.
2. Tests of the specific config "protocol.foo.allow"
override the protocol.allow config.
3. The next round of protocol.allow tests start off by
setting the config to a known value.
However, it's a land-mine waiting to trap somebody adding
new tests to one of the t581x test scripts. Let's make sure
we clean up after ourselves.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc: reformat the paragraph containing the 'cut-line'
The paragraph that describes the 'scissors' cleanup mode of
'commit' had the 'cut-line' in the middle of a sentence. This
made it possible for the line to get wrapped on smaler windows.
This shouldn't be the case as it makes it hard for the user to
understand the structure of the cut-line.
Reformat the pragraph to make the 'cut-line' stand on a line of
it's own thus distinguishing it from the rest of the paragraph.
This further prevents it from getting wrapped to some extent.
Signed-off-by: Kaartic Sivaraam <kaarticsivaraam91196@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Sun, 16 Jul 2017 10:45:32 +0000 (06:45 -0400)]
t: handle EOF in test_copy_bytes()
The test_copy_bytes() function claims to read up to N bytes,
or until it gets EOF. But we never handle EOF in our loop,
and a short input will cause perl to go into an infinite
loop of read() getting zero bytes.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1300: demonstrate that CamelCased aliases regressed
It is totally legitimate to add CamelCased aliases, but due to the way
config keys are compared, the case does not matter.
Except that now it does: the alias name is expected to be all
lower-case. This is a regression introduced by a9bcf6586d1 (alias: use
the early config machinery to expand aliases, 2017-06-14).
Noticed by Alejandro Pauly, diagnosed by Kevin Willford.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 12 Jul 2017 22:23:09 +0000 (15:23 -0700)]
Merge branch 'kn/ref-filter-branch-list' into maint
The rewrite of "git branch --list" using for-each-ref's internals
that happened in v2.13 regressed its handling of color.branch.local;
this has been fixed.
* kn/ref-filter-branch-list:
ref-filter.c: drop return from void function
branch: set remote color in ref-filter branch immediately
branch: use BRANCH_COLOR_LOCAL in ref-filter format
branch: only perform HEAD check for local branches
Junio C Hamano [Wed, 12 Jul 2017 22:20:35 +0000 (15:20 -0700)]
Merge branch 'jk/reflog-walk-maint' into maint
After "git branch --move" of the currently checked out branch, the
code to walk the reflog of HEAD via "log -g" and friends
incorrectly stopped at the reflog entry that records the renaming
of the branch.
* jk/reflog-walk-maint:
reflog-walk: include all fields when freeing complete_reflogs
reflog-walk: don't free reflogs added to cache
reflog-walk: duplicate strings in complete_reflogs list
reflog-walk: skip over double-null oid due to HEAD rename
Jeff King [Tue, 11 Jul 2017 09:06:35 +0000 (05:06 -0400)]
gc: run pre-detach operations under lock
We normally try to avoid having two auto-gc operations run
at the same time, because it wastes resources. This was done
long ago in 64a99eb47 (gc: reject if another gc is running,
unless --force is given, 2013-08-08).
When we do a detached auto-gc, we run the ref-related
commands _before_ detaching, to avoid confusing lock
contention. This was done by 62aad1849 (gc --auto: do not
lock refs in the background, 2014-05-25).
These two features do not interact well. The pre-detach
operations are run before we check the gc.pid lock, meaning
that on a busy repository we may run many of them
concurrently. Ideally we'd take the lock before spawning any
operations, and hold it for the duration of the program.
This is tricky, though, with the way the pid-file interacts
with the daemonize() process. Other processes will check
that the pid recorded in the pid-file still exists. But
detaching causes us to fork and continue running under a
new pid. So if we take the lock before detaching, the
pid-file will have a bogus pid in it. We'd have to go back
and update it with the new pid after detaching. We'd also
have to play some tricks with the tempfile subsystem to
tweak the "owner" field, so that the parent process does not
clean it up on exit, but the child process does.
Instead, we can do something a bit simpler: take the lock
only for the duration of the pre-detach work, then detach,
then take it again for the post-detach work. Technically,
this means that the post-detach lock could lose to another
process doing pre-detach work. But in the long run this
works out.
That second process would then follow-up by doing
post-detach work. Unless it was in turn blocked by a third
process doing pre-detach work, and so on. This could in
theory go on indefinitely, as the pre-detach work does not
repack, and so need_to_gc() will continue to trigger. But
in each round we are racing between the pre- and post-detach
locks. Eventually, one of the post-detach locks will win the
race and complete the full gc. So in the worst case, we may
racily repeat the pre-detach work, but we would never do so
simultaneously (it would happen via a sequence of serialized
race-wins).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Nieder [Mon, 10 Jul 2017 23:35:25 +0000 (16:35 -0700)]
pre-rebase hook: capture documentation in a <<here document
Without this change, the sample hook does not pass a syntax check
(sh -n):
$ sh -n hooks--pre-rebase.sample
hooks--pre-rebase.sample: line 101: syntax error near unexpected token `('
hooks--pre-rebase.sample: line 101: ` merged into it again (either directly or indirectly).'
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 10 Jul 2017 20:59:07 +0000 (13:59 -0700)]
Merge branch 'js/t5534-rev-parse-gives-multi-line-output-fix' into maint
A few tests that tried to verify the contents of push certificates
did not use 'git rev-parse' to formulate the line to look for in
the certificate correctly.
Junio C Hamano [Mon, 10 Jul 2017 20:59:05 +0000 (13:59 -0700)]
Merge branch 'cc/shared-index-permfix' into maint
The split index code did not honor core.sharedrepository setting
correctly.
* cc/shared-index-permfix:
t1700: make sure split-index respects core.sharedrepository
t1301: move modebits() to test-lib-functions.sh
read-cache: use shared perms when writing shared index
Junio C Hamano [Mon, 10 Jul 2017 20:58:59 +0000 (13:58 -0700)]
Merge branch 'pw/rebase-i-regression-fix-tests' into maint
Fix a recent regression to "git rebase -i" and add tests that would
have caught it and others.
* pw/rebase-i-regression-fix-tests:
t3420: fix under GETTEXT_POISON build
rebase: add more regression tests for console output
rebase: add regression tests for console output
rebase -i: add test for reflog message
sequencer: print autostash messages to stderr
Junio C Hamano [Mon, 10 Jul 2017 20:58:58 +0000 (13:58 -0700)]
Merge branch 'jk/add-p-commentchar-fix' into maint
"git add -p" were updated in 2.12 timeframe to cope with custom
core.commentchar but the implementation was buggy and a
metacharacter like $ and * did not work.
Junio C Hamano [Mon, 10 Jul 2017 20:58:57 +0000 (13:58 -0700)]
Merge branch 'js/alias-early-config' into maint
The code to pick up and execute command alias definition from the
configuration used to switch to the top of the working tree and
then come back when the expanded alias was executed, which was
unnecessarilyl complex. Attempt to simplify the logic by using the
early-config mechanism that does not chdir around.
* js/alias-early-config:
alias: use the early config machinery to expand aliases
t7006: demonstrate a problem with aliases in subdirectories
t1308: relax the test verifying that empty alias values are disallowed
help: use early config when autocorrecting aliases
config: report correct line number upon error
discover_git_directory(): avoid setting invalid git_dir
Junio C Hamano [Mon, 10 Jul 2017 20:58:57 +0000 (13:58 -0700)]
Merge branch 'rs/pretty-add-again' into maint
The pretty-format specifiers like '%h', '%t', etc. had an
optimization that no longer works correctly. In preparation/hope
of getting it correctly implemented, first discard the optimization
that is broken.
* rs/pretty-add-again:
pretty: recalculate duplicate short hashes
Reported-by: Andre Hinrichs <andre.hinrichs@gmx.de> Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first illustration of the "RECOVERING FROM UPSTREAM REBASE"
section in the 'git-rebase' documentation meant to depict that
there are number of commits on the 'master' branch, but it is
longer than the 'master' branch in the following illustrations
by one commit, even though there is no resetting of 'master' to
lose that commit.
Correct it.
Signed-off-by: Kaartic Sivaraam <kaarticsivaraam91196@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sat, 8 Jul 2017 16:43:42 +0000 (18:43 +0200)]
progress: show overall rate in last update
The values in struct throughput are only updated every 0.5 seconds. If
we're all done before that time span then the final update will show a
rate of 0 bytes/s, which is misleading if some bytes had been handled.
Remember the start time and show the total throughput instead.
And avoid division by zero by enforcing a minimum time span value of 1
(unit: 1/1024th of a second). That makes the resulting rate an
underestimation, but it's closer to the actual value than the currently
shown 0 bytes/s.
Reported-by: 積丹尼 Dan Jacobson <jidanni@jidanni.org> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>