makefile: auto-detect presence of various Lua, bsd
We favor LuaJIT over Lua. We disable Lua if neither can be found. We
error out if a particular Lua is specified via LUA_IMPLEMENTATION=JIT or
LUA_IMPLEMENTATION=VANILLA, but cannot be found. We print a status
message depending on what happens.
Also, we do not link against libdl on the BSDs, since they include it as
part of libc.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Since the email filter is called from lots of places, the script might
benefit from knowing the origin. That way it can modify its contents
and/or size depending.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
So that we don't have to include the if(filter) open_filter(filter)
block everywhere, we introduce the guard in the function itself. This
should simplify quite a bit of code.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Filters can now call hook_write and unhook_write if they want to
redirect writing to stdout to a different function. This saves us from
potential file descriptor pipes and other less efficient mechanisms.
We do this instead of replacing the call in html_raw because some places
stdlib's printf functions are used (ui-patch or within git itself),
which has its own internal buffering, which makes it difficult to
interlace our function calls. So, we dlsym libc's write and then
override it in the link stage.
While we're at it, we move considerations of argument count into the
generic new filter handler.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
At some point, we're going to want to do lazy deallocation of filters.
For example, if we implement lua, we'll want to load the lua runtime
once for each filter, even if that filter is called many times.
Similarly, for persistent exec filters, we'll want to load it once,
despite many open_filter and close_filter calls, and only reap the child
process at the end of the cgit process. For this reason, we add here a
cleanup function that is called at the end of cgit's main().
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
John Keeping [Sun, 12 Jan 2014 17:13:53 +0000 (17:13 +0000)]
filter: introduce "filter type" prefix
This allows different filter implementations to be specified in the
configuration file. Currently only "exec" is supported, but it may now
be specified either with or without the "exec:" prefix.
John Keeping [Sun, 12 Jan 2014 17:13:52 +0000 (17:13 +0000)]
filter: add interface layer
Change the existing cgit_{open,close,fprintf}_filter functions to
delegate to filter-specific implementations accessed via function
pointers on the cgit_filter object.
We treat the "exec" filter type slightly specially here by putting its
structure definition in the header file and providing an "init" function
to set up the function pointers. This is required so that the
ui-snapshot.c code that applies a compression filter can continue to use
the filter interface to do so.
John Keeping [Sun, 12 Jan 2014 17:13:51 +0000 (17:13 +0000)]
filter: add fprintf_filter function
This stops the code in cgit.c::print_repo needing to inspect the
cgit_filter structure, meaning that we can abstract out different filter
types that will have different fields that need to be printed.
Stefan Tatschner [Mon, 13 Jan 2014 21:10:45 +0000 (22:10 +0100)]
filters: Improved syntax-highlighting.py
- Switched back to python2 according to a problem in pygments with python3.
With the next release of pygments this problem should be fixed.
Issue see here:
https://bitbucket.org/birkenfeld/pygments-main/issue/901/problems-with-python3
- Just read the stdin, decode it to utf-8 and ignore unknown signs. This ensures
that even destroyed files do not cause any errors in the filter.
- Improved language guessing:
-> At first use guess_lexer_for_filename for a better detection of the used
programming languages (even mixed cases will be detected, e.g. php + html).
-> If nothing was found look if there is a shebang and use guess_lexer.
-> As default/fallback choose TextLexer.
Signed-off-by: Stefan Tatschner <stefan@sevenbyte.org>
John Keeping [Sun, 12 Jan 2014 19:45:16 +0000 (19:45 +0000)]
ui-shared: URL-escape script_name
As far as I know, there is no requirement that $SCRIPT_NAME contain only
URL-safe characters, so we need to make sure that any special characters
are escaped.
John Keeping [Sun, 12 Jan 2014 17:13:50 +0000 (17:13 +0000)]
filter: pass extra arguments via cgit_open_filter
This avoids poking into the filter data structure at various points in
the code. We rely on the fact that the number of arguments is fixed
based on the filter type (set in cgit_new_filter) and that the call
sites all know which filter type they're using.
John Keeping [Sun, 12 Jan 2014 17:13:49 +0000 (17:13 +0000)]
ui-snapshot: set unused cgit_filter fields to zero
By switching the assignment of fields in the cgit_filter structure to
use designated initializers, the compiler will initialize all other
fields to their default value. This will be needed when we add the
extra_args field in the next patch.
==18344== Conditional jump or move depends on uninitialised value(s)
==18344== at 0x406C83: open_slot (cache.c:63)
==18344== by 0x407478: cache_ls (cache.c:403)
==18344== by 0x404C9A: process_request (cgit.c:639)
==18344== by 0x406BD2: fill_slot (cache.c:190)
==18344== by 0x4071A0: cache_process (cache.c:284)
==18344== by 0x404461: main (cgit.c:952)
==18344== Uninitialised value was created by a stack allocation
==18344== at 0x40738B: cache_ls (cache.c:375)
This is caused by the keylen field being used to calculate whether or
not a slot is matched. We never then check the value of this and the
length of data read depends on the key length read from the file so this
isn't dangerous, but it's nice to avoid branching based on uninitialized
data.
It's only used in one place, and not useful to have around since
close_filter will die() if exit_status isn't what it expects, anyway. So
this is best as just a local variable instead of as part of the struct.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Lukas Fleischer [Fri, 10 Jan 2014 13:55:30 +0000 (14:55 +0100)]
cgit.c: Use "else" for mutually exclusive branches
When parsing command line arguments, no pair of command line options can
ever match simultaneously. Use "else if" blocks to reflect this. This
change improves both readability and speed.
Lukas Fleischer [Fri, 10 Jan 2014 11:44:36 +0000 (12:44 +0100)]
Disallow use of undocumented snapshot delimiters
Since the introduction of selective snapshot format configuration in dc3c9b5 (allow selective enabling of snapshots, 2007-07-21), we allowed
seven different delimiters for snapshot formats, while the documentation
has always been clear about spaces being the only valid delimiter:
The value is a space-separated list of zero or more of the values
"tar", "tar.gz", "tar.bz2", "tar.xz" and "zip".
Supporting the undocumented delimiters makes the code unnecessarily
complex. Remove them.
John Keeping [Sun, 6 Oct 2013 11:14:41 +0000 (12:14 +0100)]
plain: don't append charset for binary MIME types
When outputting the Content-Type HTTP header we print the MIME type and
then append "; charset=<charset>" if the charset variable is non-null.
We don't want a charset when we have selected "application/octet-stream"
or when the user has specified a custom MIME type, since they may have
specified their own charset. To avoid this, make sure we set the page's
charset to NULL in ui-plain before we generate the HTTP headers.
Signed-off-by: John Keeping <john@keeping.me.uk> Signed-off-by: Lukas Fleischer <cgit@cryptocrack.de>
Lukas Fleischer [Tue, 27 Aug 2013 08:40:51 +0000 (10:40 +0200)]
ui-shared: Drop filepair_cb_raw() and helper
Remove filepair_cb_raw() and all related functions. These are no longer
needed. We now use Git's internal functions for raw diff formatting
everywhere.
Lukas Fleischer [Tue, 20 Aug 2013 16:56:13 +0000 (18:56 +0200)]
ui-patch.c: Use log_tree_commit() to generate diffs
Instead of using our own formatting, use log_tree_commit() from Git to
create patches. This removes unnecessary duplicate code and also fixes a
bug with e-mail address formatting that existed in our own
implementation.
Lukas Fleischer [Tue, 20 Aug 2013 16:56:12 +0000 (18:56 +0200)]
ui-diff: Check the return value of get_sha1()
Sync with what we do everywhere else and check the return value of
get_sha1() instead of calling sha1_object_info() to validate the object.
Note that we later call lookup_commit_reference(), which checks that
both SHA1 values refer to commits, anyway.
Lukas Fleischer [Fri, 28 Jun 2013 08:58:14 +0000 (08:58 +0000)]
Fix section-from-path > 1
When having found the first path separator occurrence at position i, we
invoked strchr() on the same position i in subsequent iterations
resulting in the same path separator being returned by strchr() over and
over again. Increase the position by one to skip the occurrence that has
just been found and advance to the next separator.
Reported-by: Konstantin Ryabitsev <mricon@kernel.org> Signed-off-by: Lukas Fleischer <cgit@cryptocrack.de>
Lukas Fleischer [Tue, 4 Jun 2013 14:47:53 +0000 (14:47 +0000)]
Use strbuf for reading configuration files
Use struct strbuf from Git instead of fixed-size buffers to remove the
limit on the length of configuration file lines and refactor
read_config_line() to improve readability.
Note that this also fixes a buffer overflow that existed with the
original fixed-size buffer implementation.
We've long supported negative ttls, for infinite cache, except the
documentation incorrectly showed one of our defaults as being 5 and not
-1. As well, with a negative ttl, we were actually making the HTTP
expired header go backwards. This changes it to go ahead ten years
instead.
Further, we add an cache-about-ttl option to set a different ttl for
about pages, which are now increasingly being filtered through markdown
or just sent statically anyway.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
My dmesg is filled with the oom killer bringing down processes while the
Bingbot downloads every snapshot for every commit of the Linux kernel in
tar.xz format. Sure, I should be running with memory limits, and now I'm
using cgroups, but a more general solution is to prevent crawlers from
wasting resources like that in the first place.
Suggested-by: Natanael Copa <ncopa@alpinelinux.org> Suggested-by: Julius Plenz <plenz@cis.fu-berlin.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Features:
- update to git v1.8.3.
- expanded set of default filters to include markdown, restructuredtext, and
man pages.
- better sample configuration file in man page.
- "readme" may now be specified multiple times, and cgit will choose the first
one it finds.
- "readme" no longer needs a branch name. If prefixed with simply ":" it will
use the default branch.
- "branch-sort" allowing branches to be sorted either by "age" or "name", for
kernel.org.
- "enable-index-owner" allowing the owner column to be disabled in the index
page.
- print submodule revision next to submodule link.
- integrate more closely with git apis, such as strbuf.
- rely on git test harness and git makefiles.
- more robust test suite.
- more rebust makefile dependency accounting.
- pager navigation is now unordered list.
- span tag wraps commit directions.
Behavior changes:
- HOME is no longer passed as an environment variable to any filter api
scripts.
- "about-filter" now receives the filename being filtered as argv[1]. This may
disrupt existing scripts, so adjust accordingly.
- gitconfig and gitattributes are no longer loaded from any system directories
or home directories.
Security:
- CVE-2013-2117: disallow directory traversal when readme is set to filesystem
path.
Bug fixes:
- ssdiff now correctly manages tab expansion.
- support unannotated tags in http git clone.
- lots of cleanups of global variables and memory leaks.
- do not rely on gettext/libintl.
- better C standard compliance.
- make several functions and variables static.
- improved constification.
- remove unused functions.
- fix colspan values to correct width.
- fix out-of-bounds memory accesses with virtual_root="".
- cache repo config more precisely.
- die when write fails.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Using the url= query string, it was possible request arbitrary files
from the filesystem if the readme for a given page was set to a
filesystem file. The following request would return my /etc/passwd file:
This fix uses realpath(3) to canonicalize all paths, and then compares
the base components.
This fix introduces a subtle timing attack, whereby a client can check
whether or not strstr is called using timing measurements in order
to determine if a given file exists on the filesystem.
This fix also does not account for filesystem race conditions (TOCTOU)
in resolving symlinks.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The readme variable may now contain multiple space deliminated entries,
which per usual are either a filepath or a git ref filepath. If multiple
are specified, cgit will now select the first one in the list that
exists. This is to make it easier to specify multiple default readme
types in the main cgitrc file and have them automatically get applied to
each repo based on what exists.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>