Wednesday, March 13, 2013

Folder Layout pt. 2

Keeping your socks in your boots

It may be very beneficial to ensure that all data related to any other data are kept together. Like sequences can contain shots, a shot may contain more than simply the users working directories, such as its description, length, padding, camera information, frame rate, media such as video-references for animators or audio for lipsync.

Tools already exist for this purpose. Some examples I already mentioned in the previous post but I'll repeat them here for completeness. 

I really recommend checking those out, they've got videos showing some fundamental functionality and they are used today by hundreds of studios. I'll refer to these tools as Offline Asset Management, or OFAM for short. They provide two major concepts rolled into one - tracking assets and shots along with assigning tasks and maintaining schedules of projects. The tracking part enable users to associate data that would otherwise be difficult to associate via a simple hierarchy of files and folders, along with the ability to add links other than simple hierarchical ones, such as their relationships to users or even other data in the database. They do this by separating the binary data from its related information.

Separating binary data from related information

The caveat of this approach however is that it is external to the files they augment. Augmenting data, aka Metadata, is stored separately from their associated assets, shots and sequences in a database table such as SQL.

This means that whenever a change happens on one side, the other side needs to get updated somehow. This introduces the concept of syncing and is generally handled like this - a change in Maya, say a publish of a character, will trigger an event (referred to as a callback) which performs the corresponding action on the server. Data generally only travels in this one direction as any change in any of an objects metadata, such as shot description, has no relevance within, in this case, Maya (or does it? I'll dive into this more later!). For example, a new asset is created from Maya. Maya triggers a callback and the equivalent asset is created within the OFAM.

Psuedo-code of mirrored command

Some data will remain unique on either end, such as the binary scene file, while other data, such as camera information, will overlap and cause duplicity.

Overlapping data means duplicated data

In order to store the camera information of a shot, the shot itself and all of its related hierarchy must exist within the database, if only to host that information, so that users have a way to associate it and to search for it. However the hierarchy must also exist to host the binary files.

While the overlap is very small in terms of megabytes on disk, we aren't concerned with disk space issues. What matters is the information they represent and our ability to efficiently find it.

The overlap applies to both shots, sequences and assets and also any media associated. Some media are only accessible via the OFAM, such as a reference material, thumbnails, playblasts etc., but must yet be stored in binary form somewhere on disk. This means that you end up having an additional hierarchy of binary files, besides the original binary files themselves.

Superfluous complementing data

Tuesday, March 12, 2013

Folder Layout pt. 1

In a perfect world, the user will never have to know that there ever was a hierarchy of files on some disk. What matters is the data and it's level of discoverability.

Technically though, files and folders are still the most efficient way of storing heavy data on disk and so for a tool to successfully hide the details of that implementation, great care must be taken when considering their layout. A cg production generally consists of thousands or hundreds of thousands of files, some of which are tightly related, such as sequences of images, and others which relationship is less obvious, such as which animation is using which character setups.

That is why it is important to consider exactly what files will exist within this hierarchy and exactly how they relate to one another. Lets dive in with some examples.

Sequences may contain shots, shots may be worked on by one or more users and on those shots users may work in one or more applications.

Balance readability with duplicity

 You may find it odd to couple sequence and shot, that seem quite obviously interconnected, with that of user and application, but later I'll head into more details why they can come in handy in parts of the tool that deals with a users sandboxes and working spaces. The ideal layout is the one that produces the most appealing balance between readability and duplicity. In this layout, besides the logical benefits of picturing it in ones mind, practical considerations occur such as removing one shot will results in the users working directory to be deleted as well, which is what the user would come to expect.

Consider to following hierarchy.

In this layout, user is the host of sequences. A project is bound to have more than one user and since shots cannot exist without a sequence, and a film is nothing without shots, it is safe to say that at least one sequence will also exist and thus each sequence would get duplicated underneath each user.

As you can see, "sequence 1" is repeated once for each user which may or may not be desirable. Each shot within each sequence is also duplicated, producing an exponential multitude of duplicates, thus violating our rule of balancing readability with duplicity. A potential benefit of this layout is that finding the associated sequences to a user is as easy as looking at the users folder root. However, minor filtering requirements like these there have other, better, alternatives that doesn't depend on their physical layout on disk.

If you look at the first hierarchy, it also produces duplicates. Underneath each shot is a folder for each user working on the shot. If a user has worked on more than one shot, that folder would have a duplicate under each one. This is where one needs to decide on which approach is better suited for which task, as there will most often exist a trade-off between.

In summary, the hierarchical layout of files can exist in any order. Ideally the more general classes of types, such as shots and sequences, come first, followed by the more specific ones, such as images and pointcaches.