granicus.if.org Git - postgresql/blob - doc/FAQ_DEV

   1
   2           Developer's Frequently Asked Questions (FAQ) for PostgreSQL
   3
   4    Last updated: Fri Jun 1 00:05:22 EDT 2007
   5
   6    Current maintainer: Bruce Momjian (bruce@momjian.us)
   7
   8    The most recent version of this document can be viewed at
   9    http://www.postgresql.org/docs/faqs.FAQ_DEV.html.
  10      _________________________________________________________________
  11
  12 General Questions
  13
  14    1.1) How do I get involved in PostgreSQL development?
  15    1.2) What development environment is required to develop code?
  16    1.3) What areas need work?
  17    1.4) What do I do after choosing an item to work on?
  18    1.5) I have developed a patch, what next?
  19    1.6) How is a patch reviewed?
  20    1.7) Where can I learn more about the code?
  21    1.8) How do I download/update the current source tree?
  22    1.9) How do I test my changes?
  23    1.10) What tools are available for developers?
  24    1.11) What books are good for developers?
  25    1.12) What is configure all about?
  26    1.13) How do I add a new port?
  27    1.14) Why don't you use threads, raw devices, async-I/O, <insert your
  28    favorite wizz-bang feature here>?
  29    1.15) How are RPM's packaged?
  30    1.16) How are CVS branches handled?
  31    1.17) Where can I get a copy of the SQL standards?
  32    1.18) Where can I get technical assistance?
  33    1.19) How do I get involved in PostgreSQL web site development?
  34    1.20) Why haven't you replaced CVS with SVN, Git, Monotone, VSS,
  35    <insert your favorite SCM system here>?
  36
  37 Technical Questions
  38
  39    2.1) How do I efficiently access information in tables from the
  40    backend code?
  41    2.2) Why are table, column, type, function, view names sometimes
  42    referenced as Name or NameData, and sometimes as char *?
  43    2.3) Why do we use Node and List to make data structures?
  44    2.4) I just added a field to a structure. What else should I do?
  45    2.5) Why do we use palloc() and pfree() to allocate memory?
  46    2.6) What is ereport()?
  47    2.7) What is CommandCounterIncrement()?
  48    2.8) What debugging features are available?
  49      _________________________________________________________________
  50
  51 General Questions
  52
  53   1.1) How do I get involved in PostgreSQL development?
  54
  55    Download the code and have a look around. See 1.8.
  56
  57    Subscribe to and read the pgsql-hackers mailing list (often termed
  58    'hackers'). This is where the major contributors and core members of
  59    the project discuss development.
  60
  61   1.2) What development environment is required to develop code?
  62
  63    PostgreSQL is developed mostly in the C programming language. It also
  64    makes use of Yacc and Lex.
  65
  66    The source code is targeted at most of the popular Unix platforms and
  67    the Windows environment (XP, Windows 2000, and up).
  68
  69    Most developers make use of the open source development tool chain. If
  70    you have contributed to open source software before, you will probably
  71    be familiar with these tools. They include: GCC (http://gcc.gnu.org,
  72    GDB (www.gnu.org/software/gdb/gdb.html), autoconf
  73    (www.gnu.org/software/autoconf/) AND GNU make
  74    (www.gnu.org/software/make/make.html.
  75
  76    Developers using this tool chain on Windows make use of MingW (see
  77    http://www.mingw.org/).
  78
  79    Some developers use compilers from other software vendors with mixed
  80    results.
  81
  82    Developers who regularly rebuild the source often pass the
  83    --enable-depend flag to configure. The result is that when you make a
  84    modification to a C header file, all files depend upon that file are
  85    also rebuilt.
  86
  87    src/Makefile.custom can be used to set environment variables, like
  88    CUSTOM_COPT, that are used for every compile.
  89
  90   1.3) What areas need work?
  91
  92    Outstanding features are detailed in the TODO list. This is located in
  93    doc/TODO in the source distribution or at
  94    http://www.postgresql.org/docs/faqs.TODO.html.
  95
  96    You can learn more about these features by consulting the archives,
  97    the SQL standards and the recommend texts (see 1.11).
  98
  99   1.4) What do I do after choosing an item to work on?
 100
 101    Send an email to pgsql-hackers with a proposal for what you want to do
 102    (assuming your contribution is not trivial). Working in isolation is
 103    not advisable because others might be working on the same TODO item,
 104    or you might have misunderstood the TODO item. In the email, discuss
 105    both the internal implementation method you plan to use, and any
 106    user-visible changes (new syntax, etc). For complex patches, it is
 107    important to get community feeback on your proposal before starting
 108    work. Failure to do so might mean your patch is rejected. If your work
 109    is being sponsored by a company, read this article for tips on being
 110    more effective.
 111
 112    A web site is maintained for patches awaiting review,
 113    http://momjian.postgresql.org/cgi-bin/pgpatches, and those that are
 114    being kept for the next release,
 115    http://momjian.postgresql.org/cgi-bin/pgpatches_hold.
 116
 117   1.5) I have developed a patch, what next?
 118
 119    You will need to submit the patch to pgsql-patches@postgresql.org. It
 120    will be reviewed by other contributors to the project and will be
 121    either accepted or sent back for further work. To help ensure your
 122    patch is reviewed and committed in a timely fashion, please try to
 123    make sure your submission conforms to the following guidelines:
 124     1. Ensure that your patch is generated against the most recent
 125        version of the code, which for developers is CVS HEAD. For more on
 126        branches in PostgreSQL, see 1.16.
 127     2. Try to make your patch as readable as possible by following the
 128        project's code-layout conventions. This makes it easier for the
 129        reviewer, and there's no point in trying to layout things
 130        differently than pgindent. Also avoid unnecessary whitespace
 131        changes because they just distract the reviewer, and formatting
 132        changes will be removed by the next run of pgindent.
 133     3. The patch should be generated in contextual diff format (diff -c
 134        and should be applicable from the root directory. If you are
 135        unfamiliar with this, you might find the script
 136        src/tools/make_diff/difforig useful. (Unified diffs are only
 137        preferable if the file changes are single-line changes and do not
 138        rely on surrounding lines.)
 139     4. PostgreSQL is licensed under a BSD license. By posting a patch to
 140        the public PostgreSQL mailling lists, you are giving the
 141        PostgreSQL Global Development Group the non-revokable right to
 142        distribute your patch under the BSD license.
 143     5. Confirm that your changes can pass the regression tests. If your
 144        changes are port specific, please list the ports you have tested
 145        it on.
 146     6. If you are adding a new feature, confirm that it has been tested
 147        thoroughly. Try to test the feature in all conceivable scenarios.
 148     7. New feature patches should also be accompanied by documentation
 149        patches. If you need help checking the SQL standard, see 1.17.
 150     8. Provide an implementation overview, preferably in code comments.
 151        Following the surrounding code commenting style is usually a good
 152        approach (also see this
 153        http://www.ibm.com/developerworks/linux/library/l-clear-code/?ca=d
 154        gr-FClnxw01linuxcodetips).
 155     9. If it is a performance patch, please provide confirming test
 156        results to show the benefit of your patch. It is OK to post
 157        patches without this information, though the patch will not be
 158        applied until somebody has tested the patch and found a
 159        significant performance improvement.
 160
 161    Even if you pass all of the above, the patch might still be rejected
 162    for other reasons. Please be prepared to listen to comments and make
 163    modifications.
 164
 165    You will be notified via email when the patch is applied, and your
 166    name will appear in the next version of the release notes.
 167
 168   1.6) How is a patch reviewed?
 169
 170    Patch committers check several things before applying a patch:
 171      * Patch follows the SQL standard or community agreed-upon behavior
 172      * Style merges seamlessly into the surrounding code
 173      * Written as simply and efficiently as possible
 174      * Uses the available PostgreSQL subsystems properly
 175      * Contains sufficient comments
 176      * Contains code that works on all supported operating systems
 177      * Has proper documentation
 178      * Passes all regression tests, and if needed, adds new ones
 179      * Behaves as expected, even under unusual cirumstances
 180      * Contains no reliability risks
 181      * Does not overly complicate the source code
 182      * If performance-related, has a measureable performance benefit
 183      * Is of sufficient usefulness to the average PostgreSQL user
 184      * Follows existing PostgreSQL coding standards
 185
 186   1.7) Where can I learn more about the code?
 187
 188    Other than documentation in the source tree itself, you can find some
 189    papers/presentations discussing the code at
 190    http://www.postgresql.org/developer. An excellent presentation is at
 191    http://neilconway.org/talks/hacking/
 192
 193   1.8) How do I download/update the current source tree?
 194
 195    There are several ways to obtain the source tree. Occasional
 196    developers can just get the most recent source tree snapshot from
 197    ftp://ftp.postgresql.org.
 198
 199    Regular developers might want to take advantage of anonymous access to
 200    our source code management system. The source tree is currently hosted
 201    in CVS. For details of how to obtain the source from CVS see
 202    http://developer.postgresql.org/docs/postgres/cvs.html.
 203
 204   1.9) How do I test my changes?
 205
 206    Basic system testing
 207
 208    The easiest way to test your code is to ensure that it builds against
 209    the latest version of the code and that it does not generate compiler
 210    warnings.
 211
 212    It is worth advised that you pass --enable-cassert to configure. This
 213    will turn on assertions with in the source which will often show us
 214    bugs because they cause data corruption of segmentation violations.
 215    This generally makes debugging much easier.
 216
 217    Then, perform run time testing via psql.
 218
 219    Regression test suite
 220
 221    The next step is to test your changes against the existing regression
 222    test suite. To do this, issue "make check" in the root directory of
 223    the source tree. If any tests failure, investigate.
 224
 225    If you've deliberately changed existing behavior, this change might
 226    cause a regression test failure but not any actual regression. If so,
 227    you should also patch the regression test suite.
 228
 229    Other run time testing
 230
 231    Some developers make use of tools such as valgrind
 232    (http://valgrind.kde.org) for memory testing, gprof (which comes with
 233    the GNU binutils suite) and oprofile
 234    (http://oprofile.sourceforge.net/) for profiling and other related
 235    tools.
 236
 237    What about unit testing, static analysis, model checking...?
 238
 239    There have been a number of discussions about other testing frameworks
 240    and some developers are exploring these ideas.
 241
 242    Keep in mind the Makefiles do not have the proper dependencies for
 243    include files. You have to do a make clean and then another make. If
 244    you are using GCC you can use the --enable-depend option of configure
 245    to have the compiler compute the dependencies automatically.
 246
 247   1.10) What tools are available for developers?
 248
 249    First, all the files in the src/tools directory are designed for
 250    developers.
 251     RELEASE_CHANGES changes we have to make for each release
 252     backend         description/flowchart of the backend directories
 253     ccsym           find standard defines made by your compiler
 254      copyright       fixes copyright notices
 255
 256     entab           converts spaces to tabs, used by pgindent
 257     find_static     finds functions that could be made static
 258     find_typedef    finds typedefs in the source code
 259     find_badmacros  finds macros that use braces incorrectly
 260     fsync           a script to provide information about the cost of cache
 261                      syncing system calls
 262     make_ctags      make vi 'tags' file in each directory
 263     make_diff       make *.orig and diffs of source
 264     make_etags      make emacs 'etags' files
 265     make_keywords   make comparison of our keywords and SQL'92
 266     make_mkid       make mkid ID files
 267     pgcvslog        used to generate a list of changes for each release
 268     pginclude       scripts for adding/removing include files
 269     pgindent        indents source files
 270     pgtest          a semi-automated build system
 271     thread          a thread testing script
 272
 273    In src/include/catalog:
 274     unused_oids     a script which generates unused OIDs for use in system
 275                      catalogs
 276     duplicate_oids  finds duplicate OIDs in system catalog definitions
 277
 278    If you point your browser at the tools/backend/index.html file, you
 279    will see few paragraphs describing the data flow, the backend
 280    components in a flow chart, and a description of the shared memory
 281    area. You can click on any flowchart box to see a description. If you
 282    then click on the directory name, you will be taken to the source
 283    directory, to browse the actual source code behind it. We also have
 284    several README files in some source directories to describe the
 285    function of the module. The browser will display these when you enter
 286    the directory also. The tools/backend directory is also contained on
 287    our web page under the title How PostgreSQL Processes a Query.
 288
 289    Second, you really should have an editor that can handle tags, so you
 290    can tag a function call to see the function definition, and then tag
 291    inside that function to see an even lower-level function, and then
 292    back out twice to return to the original function. Most editors
 293    support this via tags or etags files.
 294
 295    Third, you need to get id-utils from ftp://ftp.gnu.org/gnu/id-utils/
 296
 297    By running tools/make_mkid, an archive of source symbols can be
 298    created that can be rapidly queried.
 299
 300    Some developers make use of cscope, which can be found at
 301    http://cscope.sf.net/. Others use glimpse, which can be found at
 302    http://webglimpse.net/.
 303
 304    tools/make_diff has tools to create patch diff files that can be
 305    applied to the distribution. This produces context diffs, which is our
 306    preferred format.
 307
 308    Our standard format BSD style, with each level of code indented one
 309    tab, where each tab is four spaces. You will need to set your editor
 310    or file viewer to display tabs as four spaces:
 311     vi in ~/.exrc:
 312             set tabstop=4
 313             set sw=4
 314     more:
 315             more -x4
 316     less:
 317             less -x4
 318
 319    The tools/editors directory of the latest sources contains sample
 320    settings that can be used with the emacs, xemacs and vim editors, that
 321    assist in keeping to PostgreSQL coding standards.
 322
 323    pgindent will the format code by specifying flags to your operating
 324    system's utility indent. This article describes the value of a
 325    consistent coding style.
 326
 327    pgindent is run on all source files just before each beta test period.
 328    It auto-formats all source files to make them consistent. Comment
 329    blocks that need specific line breaks should be formatted as block
 330    comments, where the comment starts as /*------. These comments will
 331    not be reformatted in any way.
 332
 333    pginclude contains scripts used to add needed #include's to include
 334    files, and removed unneeded #include's.
 335
 336    When adding system types, you will need to assign oids to them. There
 337    is also a script called unused_oids in pgsql/src/include/catalog that
 338    shows the unused oids.
 339
 340   1.11) What books are good for developers?
 341
 342    I have four good books, An Introduction to Database Systems, by C.J.
 343    Date, Addison, Wesley, A Guide to the SQL Standard, by C.J. Date, et.
 344    al, Addison, Wesley, Fundamentals of Database Systems, by Elmasri and
 345    Navathe, and Transaction Processing, by Jim Gray, Morgan, Kaufmann
 346
 347    There is also a database performance site, with a handbook on-line
 348    written by Jim Gray at http://www.benchmarkresources.com..
 349
 350   1.12) What is configure all about?
 351
 352    The files configure and configure.in are part of the GNU autoconf
 353    package. Configure allows us to test for various capabilities of the
 354    OS, and to set variables that can then be tested in C programs and
 355    Makefiles. Autoconf is installed on the PostgreSQL main server. To add
 356    options to configure, edit configure.in, and then run autoconf to
 357    generate configure.
 358
 359    When configure is run by the user, it tests various OS capabilities,
 360    stores those in config.status and config.cache, and modifies a list of
 361    *.in files. For example, if there exists a Makefile.in, configure
 362    generates a Makefile that contains substitutions for all @var@
 363    parameters found by configure.
 364
 365    When you need to edit files, make sure you don't waste time modifying
 366    files generated by configure. Edit the *.in file, and re-run configure
 367    to recreate the needed file. If you run make distclean from the
 368    top-level source directory, all files derived by configure are
 369    removed, so you see only the file contained in the source
 370    distribution.
 371
 372   1.13) How do I add a new port?
 373
 374    There are a variety of places that need to be modified to add a new
 375    port. First, start in the src/template directory. Add an appropriate
 376    entry for your OS. Also, use src/config.guess to add your OS to
 377    src/template/.similar. You shouldn't match the OS version exactly. The
 378    configure test will look for an exact OS version number, and if not
 379    found, find a match without version number. Edit src/configure.in to
 380    add your new OS. (See configure item above.) You will need to run
 381    autoconf, or patch src/configure too.
 382
 383    Then, check src/include/port and add your new OS file, with
 384    appropriate values. Hopefully, there is already locking code in
 385    src/include/storage/s_lock.h for your CPU. There is also a
 386    src/makefiles directory for port-specific Makefile handling. There is
 387    a backend/port directory if you need special files for your OS.
 388
 389   1.14) Why don't you use threads, raw devices, async-I/O, <insert your
 390   favorite wizz-bang feature here>?
 391
 392    There is always a temptation to use the newest operating system
 393    features as soon as they arrive. We resist that temptation.
 394
 395    First, we support 15+ operating systems, so any new feature has to be
 396    well established before we will consider it. Second, most new
 397    wizz-bang features don't provide dramatic improvements. Third, they
 398    usually have some downside, such as decreased reliability or
 399    additional code required. Therefore, we don't rush to use new features
 400    but rather wait for the feature to be established, then ask for
 401    testing to show that a measurable improvement is possible.
 402
 403    As an example, threads are not currently used in the backend code
 404    because:
 405      * Historically, threads were unsupported and buggy.
 406      * An error in one backend can corrupt other backends.
 407      * Speed improvements using threads are small compared to the
 408        remaining backend startup time.
 409      * The backend code would be more complex.
 410
 411    So, we are not ignorant of new features. It is just that we are
 412    cautious about their adoption. The TODO list often contains links to
 413    discussions showing our reasoning in these areas.
 414
 415   1.15) How are RPMs packaged?
 416
 417    This was written by Lamar Owen and Devrim Gündüz:
 418
 419    2006-10-16
 420
 421    As to how the RPMs are built -- to answer that question sanely
 422    requires us to know how much experience you have with the whole RPM
 423    paradigm. 'How is the RPM built?' is a multifaceted question. The
 424    obvious simple answer is that we maintain:
 425     1. A set of patches to make certain portions of the source tree
 426        'behave' in the different environment of the RPMset;
 427     2. The initscript;
 428     3. Any other ancillary scripts and files;
 429     4. A README.rpm-dist document that tries to adequately document both
 430        the differences between the RPM build and the WHY of the
 431        differences, as well as useful RPM environment operations (like,
 432        using syslog, upgrading, getting postmaster to start at OS boot,
 433        etc);
 434     5. The spec file that throws it all together. This is not a trivial
 435        undertaking in a package of this size.
 436
 437    PGDG RPM Maintainer builds the SRPM and announces the SRPM to the
 438    pgsqlrpms-hackers list. This is a list where package builders are
 439    subscribed. Then, the builders download the SRPM and rebuild it on
 440    their machines.
 441
 442    We try to build on as many different canonical distributions as we
 443    can. Currently we are able to build on Red Hat Linux 9, RHEL 3 and
 444    above, and all Fedora Core Linux releases.
 445
 446    To test the binaries, we install them on our local machines and run
 447    regression tests. If the package builders uses postgres user to build
 448    the rpms, then it is possible to run regression tests during RPM
 449    builds.
 450
 451    Once the build passes these tests, the binary RPMs are sent back to
 452    PGDG RPM Maintainer and they are pushed to main FTP site, followed by
 453    a release announcement to pgsqlrpms-* lists, pgsql-general and
 454    pgsql-announce lists.
 455
 456    You will notice we said 'canonical' distributions above. That simply
 457    means that the machine is as stock 'out of the box' as practical --
 458    that is, everything (except select few programs) on these boxen are
 459    installed by RPM; only official Red Hat released RPMs are used (except
 460    in unusual circumstances involving software that will not alter the
 461    build -- for example, installing a newer non-RedHat version of the Dia
 462    diagramming package is OK -- installing Python 2.1 on the box that has
 463    Python 1.5.2 installed is not, as that alters the PostgreSQL build).
 464    The RPM as uploaded is built to as close to out-of-the-box pristine as
 465    is possible. Only the standard released 'official to that release'
 466    compiler is used -- and only the standard official kernel is used as
 467    well.
 468
 469    PGDG RPM Building Project does not build RPMs for Mandrake .
 470
 471    We usually have only one SRPM for all platforms. This is because of
 472    our limited resources. However, on some cases, we may distribute
 473    different SRPMs for different platforms, depending on possible
 474    compilation problems, especially on older distros.
 475
 476    Please note that this is a volunteered job -- We are doing our best to
 477    keep packages up to date. We, at least, provide SRPMs for all
 478    platforms. For example, if you do not find a RHEL 4 x86_64 RPM in our
 479    FTP site, it means that we do not have a RHEL 4 x86_64 server around.
 480    If you have one and want to help us, please do not hesitate to build
 481    rpms and send to us :-)
 482    http://pgfoundry.org/docman/view.php/1000048/98/PostgreSQL-RPM-Install
 483    ation-PGDG.pdf has some information about building binary RPMs using
 484    an SRPM.
 485
 486    PGDG RPM Building Project is a hosted on pgFoundry :
 487    http://pgfoundry.org/projects/pgsqlrpms. We are an open community,
 488    except one point : Our pgsqlrpms-hackers list is open to package
 489    builders only. Still, its archives are visible to public. We use a CVS
 490    server to save the work we have done so far. This includes spec files
 491    and patches; as well as documents.
 492
 493    As to why all these files aren't part of the source tree, well, unless
 494    there was a large cry for it to happen, we don't believe it should.
 495
 496   1.16) How are CVS branches managed?
 497
 498    This was written by Tom Lane:
 499
 500    2001-05-07
 501
 502    If you just do basic "cvs checkout", "cvs update", "cvs commit", then
 503    you'll always be dealing with the HEAD version of the files in CVS.
 504    That's what you want for development, but if you need to patch past
 505    stable releases then you have to be able to access and update the
 506    "branch" portions of our CVS repository. We normally fork off a branch
 507    for a stable release just before starting the development cycle for
 508    the next release.
 509
 510    The first thing you have to know is the branch name for the branch you
 511    are interested in getting at. To do this, look at some long-lived
 512    file, say the top-level HISTORY file, with "cvs status -v" to see what
 513    the branch names are. (Thanks to Ian Lance Taylor for pointing out
 514    that this is the easiest way to do it.) Typical branch names are:
 515     REL7_1_STABLE
 516     REL7_0_PATCHES
 517     REL6_5_PATCHES
 518
 519    OK, so how do you do work on a branch? By far the best way is to
 520    create a separate checkout tree for the branch and do your work in
 521    that. Not only is that the easiest way to deal with CVS, but you
 522    really need to have the whole past tree available anyway to test your
 523    work. (And you *better* test your work. Never forget that dot-releases
 524    tend to go out with very little beta testing --- so whenever you
 525    commit an update to a stable branch, you'd better be doubly sure that
 526    it's correct.)
 527
 528    Normally, to checkout the head branch, you just cd to the place you
 529    want to contain the toplevel "pgsql" directory and say
 530     cvs ... checkout pgsql
 531
 532    To get a past branch, you cd to wherever you want it and say
 533     cvs ... checkout -r BRANCHNAME pgsql
 534
 535    For example, just a couple days ago I did
 536     mkdir ~postgres/REL7_1
 537     cd ~postgres/REL7_1
 538     cvs ... checkout -r REL7_1_STABLE pgsql
 539
 540    and now I have a maintenance copy of 7.1.*.
 541
 542    When you've done a checkout in this way, the branch name is "sticky":
 543    CVS automatically knows that this directory tree is for the branch,
 544    and whenever you do "cvs update" or "cvs commit" in this tree, you'll
 545    fetch or store the latest version in the branch, not the head version.
 546    Easy as can be.
 547
 548    So, if you have a patch that needs to apply to both the head and a
 549    recent stable branch, you have to make the edits and do the commit
 550    twice, once in your development tree and once in your stable branch
 551    tree. This is kind of a pain, which is why we don't normally fork the
 552    tree right away after a major release --- we wait for a dot-release or
 553    two, so that we won't have to double-patch the first wave of fixes.
 554
 555   1.17) Where can I get a copy of the SQL standards?
 556
 557    There are three versions of the SQL standard: SQL-92, SQL:1999, and
 558    SQL:2003. They are endorsed by ANSI and ISO. Draft versions can be
 559    downloaded from:
 560      * SQL-92 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
 561      * SQL:1999
 562        http://www.cse.iitb.ac.in/dbms/Data/Papers-Other/SQL1999/ansi-iso-
 563        9075-2-1999.pdf
 564      * SQL:2003 http://www.wiscorp.com/sql_2003_standard.zip
 565
 566    Some SQL standards web pages are:
 567      * http://troels.arvin.dk/db/rdbms/links/#standards
 568      * http://www.wiscorp.com/SQLStandards.html
 569      * http://www.contrib.andrew.cmu.edu/~shadow/sql.html#syntax (SQL-92)
 570      * http://dbs.uni-leipzig.de/en/lokal/standards.pdf (paper)
 571
 572   1.18) Where can I get technical assistance?
 573
 574    Many technical questions held by those new to the code have been
 575    answered on the pgsql-hackers mailing list - the archives of which can
 576    be found at http://archives.postgresql.org/pgsql-hackers/.
 577
 578    If you cannot find discussion or your particular question, feel free
 579    to put it to the list.
 580
 581    Major contributors also answer technical questions, including
 582    questions about development of new features, on IRC at
 583    irc.freenode.net in the #postgresql channel.
 584
 585   1.19) How do I get involved in PostgreSQL web site development?
 586
 587    PostgreSQL website development is discussed on the
 588    pgsql-www@postgresql.org mailing list. The is a project page where the
 589    source code is available at
 590    http://gborg.postgresql.org/project/pgweb/projdisplay.php , the code
 591    for the next version of the website is under the "portal" module.
 592
 593   1.20) Why haven't you replaced CVS with SVN, Git, Monotone, VSS, <insert your
 594   favorite SCMS here>?
 595
 596    Currently the core developers see no SCMS that will provide enough
 597    benefit to outwiegh the pain involved in moving to a new SCMS. Typical
 598    problems that must be addressed by any new SCMS include:
 599      * Run natively on all of our supported platforms.
 600      * Integrate into the Buildfarm.
 601      * Import our entire CVS Repository while preserving complete
 602        history.
 603      * Allow for anonymous checkouts.
 604
 605    Currently there is no intention for switching to a new SCMS until at
 606    least the end of the 8.4 development cycle sometime in late 2008. For
 607    more information please refer to the mailing list archives.
 608
 609 Technical Questions
 610
 611   2.1) How do I efficiently access information in tables from the backend code?
 612
 613    You first need to find the tuples(rows) you are interested in. There
 614    are two ways. First, SearchSysCache() and related functions allow you
 615    to query the system catalogs. This is the preferred way to access
 616    system tables, because the first call to the cache loads the needed
 617    rows, and future requests can return the results without accessing the
 618    base table. The caches use system table indexes to look up tuples. A
 619    list of available caches is located in
 620    src/backend/utils/cache/syscache.c.
 621    src/backend/utils/cache/lsyscache.c contains many column-specific
 622    cache lookup functions.
 623
 624    The rows returned are cache-owned versions of the heap rows.
 625    Therefore, you must not modify or delete the tuple returned by
 626    SearchSysCache(). What you should do is release it with
 627    ReleaseSysCache() when you are done using it; this informs the cache
 628    that it can discard that tuple if necessary. If you neglect to call
 629    ReleaseSysCache(), then the cache entry will remain locked in the
 630    cache until end of transaction, which is tolerable but not very
 631    desirable.
 632
 633    If you can't use the system cache, you will need to retrieve the data
 634    directly from the heap table, using the buffer cache that is shared by
 635    all backends. The backend automatically takes care of loading the rows
 636    into the buffer cache.
 637
 638    Open the table with heap_open(). You can then start a table scan with
 639    heap_beginscan(), then use heap_getnext() and continue as long as
 640    HeapTupleIsValid() returns true. Then do a heap_endscan(). Keys can be
 641    assigned to the scan. No indexes are used, so all rows are going to be
 642    compared to the keys, and only the valid rows returned.
 643
 644    You can also use heap_fetch() to fetch rows by block number/offset.
 645    While scans automatically lock/unlock rows from the buffer cache, with
 646    heap_fetch(), you must pass a Buffer pointer, and ReleaseBuffer() it
 647    when completed.
 648
 649    Once you have the row, you can get data that is common to all tuples,
 650    like t_self and t_oid, by merely accessing the HeapTuple structure
 651    entries. If you need a table-specific column, you should take the
 652    HeapTuple pointer, and use the GETSTRUCT() macro to access the
 653    table-specific start of the tuple. You then cast the pointer as a
 654    Form_pg_proc pointer if you are accessing the pg_proc table, or
 655    Form_pg_type if you are accessing pg_type. You can then access the
 656    columns by using a structure pointer:
 657 ((Form_pg_class) GETSTRUCT(tuple))->relnatts
 658
 659    You must not directly change live tuples in this way. The best way is
 660    to use heap_modifytuple() and pass it your original tuple, and the
 661    values you want changed. It returns a palloc'ed tuple, which you pass
 662    to heap_replace(). You can delete tuples by passing the tuple's t_self
 663    to heap_destroy(). You use t_self for heap_update() too. Remember,
 664    tuples can be either system cache copies, which might go away after
 665    you call ReleaseSysCache(), or read directly from disk buffers, which
 666    go away when you heap_getnext(), heap_endscan, or ReleaseBuffer(), in
 667    the heap_fetch() case. Or it may be a palloc'ed tuple, that you must
 668    pfree() when finished.
 669
 670   2.2) Why are table, column, type, function, view names sometimes referenced
 671   as Name or NameData, and sometimes as char *?
 672
 673    Table, column, type, function, and view names are stored in system
 674    tables in columns of type Name. Name is a fixed-length,
 675    null-terminated type of NAMEDATALEN bytes. (The default value for
 676    NAMEDATALEN is 64 bytes.)
 677 typedef struct nameData
 678     {
 679         char        data[NAMEDATALEN];
 680     } NameData;
 681     typedef NameData *Name;
 682
 683    Table, column, type, function, and view names that come into the
 684    backend via user queries are stored as variable-length,
 685    null-terminated character strings.
 686
 687    Many functions are called with both types of names, ie. heap_open().
 688    Because the Name type is null-terminated, it is safe to pass it to a
 689    function expecting a char *. Because there are many cases where
 690    on-disk names(Name) are compared to user-supplied names(char *), there
 691    are many cases where Name and char * are used interchangeably.
 692
 693   2.3) Why do we use Node and List to make data structures?
 694
 695    We do this because this allows a consistent way to pass data inside
 696    the backend in a flexible way. Every node has a NodeTag which
 697    specifies what type of data is inside the Node. Lists are groups of
 698    Nodes chained together as a forward-linked list.
 699
 700    Here are some of the List manipulation commands:
 701
 702    lfirst(i), lfirst_int(i), lfirst_oid(i)
 703           return the data (a pointer, integer or OID respectively) of
 704           list cell i.
 705
 706    lnext(i)
 707           return the next list cell after i.
 708
 709    foreach(i, list)
 710           loop through list, assigning each list cell to i. It is
 711           important to note that i is a ListCell *, not the data in the
 712           List element. You need to use lfirst(i) to get at the data.
 713           Here is a typical code snippet that loops through a List
 714           containing Var *'s and processes each one:
 715
 716
 717     List        *list;
 718     ListCell    *i;
 719
 720     foreach(i, list)
 721     {
 722         Var *var = lfirst(i);
 723
 724         /* process var here */
 725     }
 726
 727    lcons(node, list)
 728           add node to the front of list, or create a new list with node
 729           if list is NIL.
 730
 731    lappend(list, node)
 732           add node to the end of list.
 733
 734    list_concat(list1, list2)
 735           Concatenate list2 on to the end of list1.
 736
 737    list_length(list)
 738           return the length of the list.
 739
 740    list_nth(list, i)
 741           return the i'th element in list, counting from zero.
 742
 743    lcons_int, ...
 744           There are integer versions of these: lcons_int, lappend_int,
 745           etc. Also versions for OID lists: lcons_oid, lappend_oid, etc.
 746
 747    You can print nodes easily inside gdb. First, to disable output
 748    truncation when you use the gdb print command:
 749 (gdb) set print elements 0
 750
 751    Instead of printing values in gdb format, you can use the next two
 752    commands to print out List, Node, and structure contents in a verbose
 753    format that is easier to understand. List's are unrolled into nodes,
 754    and nodes are printed in detail. The first prints in a short format,
 755    and the second in a long format:
 756 (gdb) call print(any_pointer)
 757     (gdb) call pprint(any_pointer)
 758
 759    The output appears in the postmaster log file, or on your screen if
 760    you are running a backend directly without a postmaster.
 761
 762   2.4) I just added a field to a structure. What else should I do?
 763
 764    The structures passed around in the parser, rewriter, optimizer, and
 765    executor require quite a bit of support. Most structures have support
 766    routines in src/backend/nodes used to create, copy, read, and output
 767    those structures (in particular, the files copyfuncs.c and
 768    equalfuncs.c. Make sure you add support for your new field to these
 769    files. Find any other places the structure might need code for your
 770    new field. mkid is helpful with this (see 1.10).
 771
 772   2.5) Why do we use palloc() and pfree() to allocate memory?
 773
 774    palloc() and pfree() are used in place of malloc() and free() because
 775    we find it easier to automatically free all memory allocated when a
 776    query completes. This assures us that all memory that was allocated
 777    gets freed even if we have lost track of where we allocated it. There
 778    are special non-query contexts that memory can be allocated in. These
 779    affect when the allocated memory is freed by the backend.
 780
 781   2.6) What is ereport()?
 782
 783    ereport() is used to send messages to the front-end, and optionally
 784    terminate the current query being processed. The first parameter is an
 785    ereport level of DEBUG (levels 1-5), LOG, INFO, NOTICE, ERROR, FATAL,
 786    or PANIC. NOTICE prints on the user's terminal and the postmaster
 787    logs. INFO prints only to the user's terminal and LOG prints only to
 788    the server logs. (These can be changed from postgresql.conf.) ERROR
 789    prints in both places, and terminates the current query, never
 790    returning from the call. FATAL terminates the backend process. The
 791    remaining parameters of ereport are a printf-style set of parameters
 792    to print.
 793
 794    ereport(ERROR) frees most memory and open file descriptors so you
 795    don't need to clean these up before the call.
 796
 797   2.7) What is CommandCounterIncrement()?
 798
 799    Normally, transactions can not see the rows they modify. This allows
 800    UPDATE foo SET x = x + 1 to work correctly.
 801
 802    However, there are cases where a transactions needs to see rows
 803    affected in previous parts of the transaction. This is accomplished
 804    using a Command Counter. Incrementing the counter allows transactions
 805    to be broken into pieces so each piece can see rows modified by
 806    previous pieces. CommandCounterIncrement() increments the Command
 807    Counter, creating a new part of the transaction.
 808
 809   2.8) What debugging features are available?
 810
 811    First, try running configure with the --enable-cassert option, many
 812    assert()s monitor the progress of the backend and halt the program
 813    when something unexpected occurs.
 814
 815    The postmaster has a -d option that allows even more detailed
 816    information to be reported. The -d option takes a number that
 817    specifies the debug level. Be warned that high debug level values
 818    generate large log files.
 819
 820    If the postmaster is not running, you can actually run the postgres
 821    backend from the command line, and type your SQL statement directly.
 822    This is recommended only for debugging purposes. If you have compiled
 823    with debugging symbols, you can use a debugger to see what is
 824    happening. Because the backend was not started from postmaster, it is
 825    not running in an identical environment and locking/backend
 826    interaction problems might not be duplicated.
 827
 828    If the postmaster is running, start psql in one window, then find the
 829    PID of the postgres process used by psql using SELECT
 830    pg_backend_pid(). Use a debugger to attach to the postgres PID. You
 831    can set breakpoints in the debugger and issue queries from the other.
 832    If you are looking to find the location that is generating an error or
 833    log message, set a breakpoint at errfinish. psql. If you are debugging
 834    postgres startup, you can set PGOPTIONS="-W n", then start psql. This
 835    will cause startup to delay for n seconds so you can attach to the
 836    process with the debugger, set any breakpoints, and continue through
 837    the startup sequence.
 838
 839    You can also compile with profiling to see what functions are taking
 840    execution time. The backend profile files will be deposited in the
 841    pgsql/data directory. The client profile file will be put in the
 842    client's current directory. Linux requires a compile with
 843    -DLINUX_PROFILE for proper profiling.