Large Objects
large object>>
BLOB>large object>>
PostgreSQL has a large object>
facility, which provides stream-style access to user data that is stored
in a special large-object structure. Streaming access is useful
when working with data values that are too large to manipulate
conveniently as a whole.
This chapter describes the implementation and the programming and
query language interfaces to PostgreSQL
large object data. We use the libpq C
library for the examples in this chapter, but most programming
interfaces native to PostgreSQL support
equivalent functionality. Other interfaces may use the large
object interface internally to provide generic support for large
values. This is not described here.
Introduction
TOAST
versus large objects
All large objects are placed in a single system table called
pg_largeobject.
PostgreSQL also supports a storage system called
TOAST
that automatically stores values
larger than a single database page into a secondary storage area per table.
This makes the large object facility partially obsolete. One
remaining advantage of the large object facility is that it allows values
up to 2 GB in size, whereas TOASTed fields can be at
most 1 GB. Also, large objects can be randomly modified using a read/write
API that is more efficient than performing such operations using
TOAST.
Implementation Features
The large object implementation breaks large
objects up into chunks
and stores the chunks in
rows in the database. A B-tree index guarantees fast
searches for the correct chunk number when doing random
access reads and writes.
Client Interfaces
This section describes the facilities that
PostgreSQL client interface libraries
provide for accessing large objects. All large object
manipulation using these functions must take
place within an SQL transaction block.
The PostgreSQL large object interface is modeled after
the Unix file-system interface, with analogues of
open, read,
write,
lseek, etc.
Client applications which use the large object interface in
libpq should include the header file
libpq/libpq-fs.h and link with the
libpq library.
Creating a Large Object
The function
Oid lo_creat(PGconn *conn, int mode);
lo_creat>>
creates a new large object.
The return value is the OID that was assigned to the new large object,
or InvalidOid (zero) on failure.
mode is unused and
ignored as of PostgreSQL 8.1; however, for
backwards compatibility with earlier releases it is best to
set it to INV_READ, INV_WRITE,
or INV_READ |> INV_WRITE.
(These symbolic constants are defined
in the header file libpq/libpq-fs.h.)
An example:
inv_oid = lo_creat(conn, INV_READ|INV_WRITE);
The function
Oid lo_create(PGconn *conn, Oid lobjId);
lo_create>>
also creates a new large object. The OID to be assigned can be
specified by lobjId;
if so, failure occurs if that OID is already in use for some large
object. If lobjId
is InvalidOid (zero) then lo_create> assigns an unused
OID (this is the same behavior as lo_creat>).
The return value is the OID that was assigned to the new large object,
or InvalidOid (zero) on failure.
lo_create> is new as of PostgreSQL
8.1; if this function is run against an older server version, it will
fail and return InvalidOid.
An example:
inv_oid = lo_create(conn, desired_oid);
Importing a Large Object
To import an operating system file as a large object, call
Oid lo_import(PGconn *conn, const char *filename);
lo_import>>
filename
specifies the operating system name of
the file to be imported as a large object.
The return value is the OID that was assigned to the new large object,
or InvalidOid (zero) on failure.
Note that the file is read by the client interface library, not by
the server; so it must exist in the client filesystem and be readable
by the client application.
Exporting a Large Object
To export a large object
into an operating system file, call
int lo_export(PGconn *conn, Oid lobjId, const char *filename);
lo_export>>
The lobjId argument specifies the OID of the large
object to export and the filename argument
specifies the operating system name of the file. Note that the file is
written by the client interface library, not by the server. Returns 1
on success, -1 on failure.
Opening an Existing Large Object
To open an existing large object for reading or writing, call
int lo_open(PGconn *conn, Oid lobjId, int mode);
lo_open>>
The lobjId argument specifies the OID of the large
object to open. The mode bits control whether the
object is opened for reading (INV_READ>), writing
(INV_WRITE), or both.
(These symbolic constants are defined
in the header file libpq/libpq-fs.h.)
A large object cannot be opened before it is created.
lo_open returns a (non-negative) large object
descriptor for later use in lo_read,
lo_write, lo_lseek,
lo_tell, and lo_close.
The descriptor is only valid for
the duration of the current transaction.
On failure, -1 is returned.
The server currently does not distinguish between modes
INV_WRITE and INV_READ> |>
INV_WRITE: you are allowed to read from the descriptor
in either case. However there is a significant difference between
these modes and INV_READ> alone: with INV_READ>
you cannot write on the descriptor, and the data read from it will
reflect the contents of the large object at the time of the transaction
snapshot that was active when lo_open> was executed,
regardless of later writes by this or other transactions. Reading
from a descriptor opened with INV_WRITE returns
data that reflects all writes of other committed transactions as well
as writes of the current transaction. This is similar to the behavior
of SERIALIZABLE> versus READ COMMITTED> transaction
modes for ordinary SQL SELECT> commands.
An example:
inv_fd = lo_open(conn, inv_oid, INV_READ|INV_WRITE);
Writing Data to a Large Object
The function
int lo_write(PGconn *conn, int fd, const char *buf, size_t len);
lo_write>> writes
len bytes from buf
to large object descriptor fd>. The fd
argument must have been returned by a previous
lo_open. The number of bytes actually
written is returned. In the event of an error, the return value
is negative.
Reading Data from a Large Object
The function
int lo_read(PGconn *conn, int fd, char *buf, size_t len);
lo_read>> reads
len bytes from large object descriptor
fd into buf. The
fd argument must have been returned by a
previous lo_open. The number of bytes
actually read is returned. In the event of an error, the return
value is negative.
Seeking in a Large Object
To change the current read or write location associated with a
large object descriptor, call
int lo_lseek(PGconn *conn, int fd, int offset, int whence);
lo_lseek>> This function moves the
current location pointer for the large object descriptor identified by
fd> to the new location specified by
offset>. The valid values for whence>
are SEEK_SET> (seek from object start),
SEEK_CUR> (seek from current position), and
SEEK_END> (seek from object end). The return value is
the new location pointer, or -1 on error.
Obtaining the Seek Position of a Large Object
To obtain the current read or write location of a large object descriptor,
call
int lo_tell(PGconn *conn, int fd);
lo_tell>> If there is an error, the
return value is negative.
Closing a Large Object Descriptor
A large object descriptor may be closed by calling
int lo_close(PGconn *conn, int fd);
lo_close>> where fd> is a
large object descriptor returned by lo_open.
On success, lo_close returns zero. On
error, the return value is negative.
Any large object descriptors that remain open at the end of a
transaction will be closed automatically.
Removing a Large Object
To remove a large object from the database, call
int lo_unlink(PGconn *conn, Oid lobjId);
lo_unlink>> The
lobjId argument specifies the OID of the
large object to remove. Returns 1 if successful, -1 on failure.
Server-Side Functions
There are server-side functions callable from SQL that correspond to
each of the client-side functions described above; indeed, for the
most part the client-side functions are simply interfaces to the
equivalent server-side functions. The ones that are actually useful
to call via SQL commands are
lo_creatlo_creat>>,
lo_createlo_create>>,
lo_unlinklo_unlink>>,
lo_importlo_import>>, and
lo_exportlo_export>>.
Here are examples of their use:
CREATE TABLE image (
name text,
raster oid
);
SELECT lo_creat(-1); -- returns OID of new, empty large object
SELECT lo_create(43213); -- attempts to create large object with OID 43213
SELECT lo_unlink(173454); -- deletes large object with OID 173454
INSERT INTO image (name, raster)
VALUES ('beautiful image', lo_import('/etc/motd'));
SELECT lo_export(image.raster, '/tmp/motd') FROM image
WHERE name = 'beautiful image';
The server-side lo_import and
lo_export functions behave considerably differently
from their client-side analogs. These two functions read and write files
in the server's file system, using the permissions of the database's
owning user. Therefore, their use is restricted to superusers. In
contrast, the client-side import and export functions read and write files
in the client's file system, using the permissions of the client program.
The client-side functions can be used by any
PostgreSQL user.
Example Program
is a sample program which shows how the large object
interface
in libpq> can be used. Parts of the program are
commented out but are left in the source for the reader's
benefit. This program can also be found in
src/test/examples/testlo.c in the source distribution.
Large Objects with libpq Example Program
/*--------------------------------------------------------------
*
* testlo.c--
* test using large objects with libpq
*
* Copyright (c) 1994, Regents of the University of California
*
*--------------------------------------------------------------
*/
#include <stdio.h>
#include "libpq-fe.h"
#include "libpq/libpq-fs.h"
#define BUFSIZE 1024
/*
* importFile
* import file "in_filename" into database as large object "lobjOid"
*
*/
Oid
importFile(PGconn *conn, char *filename)
{
Oid lobjId;
int lobj_fd;
char buf[BUFSIZE];
int nbytes,
tmp;
int fd;
/*
* open the file to be read in
*/
fd = open(filename, O_RDONLY, 0666);
if (fd < 0)
{ /* error */
fprintf(stderr, "can't open unix file %s\n", filename);
}
/*
* create the large object
*/
lobjId = lo_creat(conn, INV_READ | INV_WRITE);
if (lobjId == 0)
fprintf(stderr, "can't create large object\n");
lobj_fd = lo_open(conn, lobjId, INV_WRITE);
/*
* read in from the Unix file and write to the inversion file
*/
while ((nbytes = read(fd, buf, BUFSIZE)) > 0)
{
tmp = lo_write(conn, lobj_fd, buf, nbytes);
if (tmp < nbytes)
fprintf(stderr, "error while reading large object\n");
}
(void) close(fd);
(void) lo_close(conn, lobj_fd);
return lobjId;
}
void
pickout(PGconn *conn, Oid lobjId, int start, int len)
{
int lobj_fd;
char *buf;
int nbytes;
int nread;
lobj_fd = lo_open(conn, lobjId, INV_READ);
if (lobj_fd < 0)
{
fprintf(stderr, "can't open large object %d\n",
lobjId);
}
lo_lseek(conn, lobj_fd, start, SEEK_SET);
buf = malloc(len + 1);
nread = 0;
while (len - nread > 0)
{
nbytes = lo_read(conn, lobj_fd, buf, len - nread);
buf[nbytes] = ' ';
fprintf(stderr, ">>> %s", buf);
nread += nbytes;
}
free(buf);
fprintf(stderr, "\n");
lo_close(conn, lobj_fd);
}
void
overwrite(PGconn *conn, Oid lobjId, int start, int len)
{
int lobj_fd;
char *buf;
int nbytes;
int nwritten;
int i;
lobj_fd = lo_open(conn, lobjId, INV_WRITE);
if (lobj_fd < 0)
{
fprintf(stderr, "can't open large object %d\n",
lobjId);
}
lo_lseek(conn, lobj_fd, start, SEEK_SET);
buf = malloc(len + 1);
for (i = 0; i < len; i++)
buf[i] = 'X';
buf[i] = ' ';
nwritten = 0;
while (len - nwritten > 0)
{
nbytes = lo_write(conn, lobj_fd, buf + nwritten, len - nwritten);
nwritten += nbytes;
}
free(buf);
fprintf(stderr, "\n");
lo_close(conn, lobj_fd);
}
/*
* exportFile
* export large object "lobjOid" to file "out_filename"
*
*/
void
exportFile(PGconn *conn, Oid lobjId, char *filename)
{
int lobj_fd;
char buf[BUFSIZE];
int nbytes,
tmp;
int fd;
/*
* open the large object
*/
lobj_fd = lo_open(conn, lobjId, INV_READ);
if (lobj_fd < 0)
{
fprintf(stderr, "can't open large object %d\n",
lobjId);
}
/*
* open the file to be written to
*/
fd = open(filename, O_CREAT | O_WRONLY, 0666);
if (fd < 0)
{ /* error */
fprintf(stderr, "can't open unix file %s\n",
filename);
}
/*
* read in from the inversion file and write to the Unix file
*/
while ((nbytes = lo_read(conn, lobj_fd, buf, BUFSIZE)) > 0)
{
tmp = write(fd, buf, nbytes);
if (tmp < nbytes)
{
fprintf(stderr, "error while writing %s\n",
filename);
}
}
(void) lo_close(conn, lobj_fd);
(void) close(fd);
return;
}
void
exit_nicely(PGconn *conn)
{
PQfinish(conn);
exit(1);
}
int
main(int argc, char **argv)
{
char *in_filename,
*out_filename;
char *database;
Oid lobjOid;
PGconn *conn;
PGresult *res;
if (argc != 4)
{
fprintf(stderr, "Usage: %s database_name in_filename out_filename\n",
argv[0]);
exit(1);
}
database = argv[1];
in_filename = argv[2];
out_filename = argv[3];
/*
* set up the connection
*/
conn = PQsetdb(NULL, NULL, NULL, NULL, database);
/* check to see that the backend connection was successfully made */
if (PQstatus(conn) == CONNECTION_BAD)
{
fprintf(stderr, "Connection to database '%s' failed.\n", database);
fprintf(stderr, "%s", PQerrorMessage(conn));
exit_nicely(conn);
}
res = PQexec(conn, "begin");
PQclear(res);
printf("importing file %s\n", in_filename);
/* lobjOid = importFile(conn, in_filename); */
lobjOid = lo_import(conn, in_filename);
/*
printf("as large object %d.\n", lobjOid);
printf("picking out bytes 1000-2000 of the large object\n");
pickout(conn, lobjOid, 1000, 1000);
printf("overwriting bytes 1000-2000 of the large object with X's\n");
overwrite(conn, lobjOid, 1000, 1000);
*/
printf("exporting large object to file %s\n", out_filename);
/* exportFile(conn, lobjOid, out_filename); */
lo_export(conn, lobjOid, out_filename);
res = PQexec(conn, "end");
PQclear(res);
PQfinish(conn);
exit(0);
}