Bad Data

Web Application Abstraction Layers

Tue Mar 15 08:16:06 2016

We're doing it wrong. Unix is powerful because of simple abstractions and interfaces. All of our web apps are way too complex. We should get back to a simple notion. In unix "everything is a file". We don't need that as such, but what people care about is generally bags of bytes, which can be called files. We care about who can access our information, and how.

We should be thinking about things in terms of "bags of bytes", "people/groups/roles", and "permission capabilities". Everything else is some abstraction over these. I posit that though if those three were the only "real" things, and all applications were built up abstraction layers reducable to those three, it would be easier and faster to build new applications, and they would be more secure. If "can this person see these bytes" were baked in at the lowest level, it would be a lot easier to verify the security of a person's information, and a lot harder to screw it up. Similarly with any other particular capability. "can modify", "can delete", etc.

Some notation, a comma separated list in square brackets is a tuple. E.g. "[file, role]" would be a data item associating a file and a role. The meaning of this is dependent on the particular application using the tuple.


files: a bag of bytes with a mime type, metadata, notes, etc. whether a backup, image, music, video, etc is all a view into different types of files.

message: text with a source and destination, the text can be viewed as a file. That is a "message" is a "file" with mandatory source and destination metadata.

role: a user, group, etc.

capability: a label on an object allowing them to take a particular action. required capabilities and their semantics are particular to a given application, but a set of standard names and intended interpretations should be developed.

acl entry: [role, file, capability]

"ownership" of a file is probably an acl entry. It's entirely possible for more than one person to "own" a particular bag of bytes. different people could have different sets of metadata for a file.

file list: an ordered set of files

comment: a [file, file, role]


under this ontology, for example:

email: a message wherein the source and dest are email addresses, and the system attempts to hand off the message. additionally the default notion is that the files associated are readable only by the source and destination roles.

blog: a collection of files (messages?) timestamped, maybe tagged, etc, made viewable in (descending) timestamp order. i.e. a set of [file, timestamp], possibly [file,role,timestamp] for a group blog, and [file,role,file] for comments.

flickr: a view of image files made viewable to the public,

twitter: a blog with a very short maximum file size for blog posts.

youtube: a video blog supporting file lists and comments

calendar: set of files, each containing a particular calendar item. could be ics files, could be something else.

usenet: a public set of messages where the source is a role, and the destination is a set of newsgroups. The message format is constrained by relevant rfcs.

Implementation Notes

The primary interface needed for functionality is the ability to manipulate the base objects. Thus the ability to create roles, files, acls, file lists, etc is paramount. In theory, once that is done, if it has a public API, other things can be built on top of those, possibly creating new tuple types to support their particular application.

"metadata" should probably just be json at this point, and an application with full permissions should probably be allowed to create new named tuples. For example, a "blog" application might connect to the backend API at install time and ask for a new tuple of [file,role, timestamp] to be created and named "posts", and a second one of [file,role,file,metadata] named "comments" the metadata on the comments intended for an approval system, but not otherwise interpreted by the system. Some method of indexing json/metadata might be needed.

Note that a system wouldn't need to use json for metadata, it could roll its own, but if it's json, the system can probably be taught to index it easily.

Thoughts on modification of data: I think it's best to think of data as immutable, like a number. You can't "modify" 37, it just is what it is. So, if you are making (say) an image editing program, a particular edit doesn't modify the original as such, but rather creates a new file (i.e. bag of bytes), and possibly associates it with the same tags/name/revision chain, or whatever the image application wants. The application could, if it wanted, do some sort of garbage collection and delete the original, though it would make more sense for it to disassociate it with the name instead, since the original could, conceivably, be in use by some other application. This is analogous to hard links in a unix system. You can't really delete a file as such, but it will be removed after the last link is removed. Some system to allow different applications to use the same link sets would be needed. If you have a picture gallery application, and an image editing one, you want them to both work on the same set of pictures by name. It might be worth having an explicit notion of one file being a revision of another at a low level in the system. I'm not really sure yet how to solve this problem.

Posted in / software

Comment on this post.

Comment on this post.