Friday, January 25, 2013

What does a Pipeline actually do?

During the development of my approach to a pipeline, I found myself digging deeper into what it actually is I'm trying to do. Turns out the word Pipeline is heavily overloaded and may be broken down into two parts; Workflow and Organization.

Workflow deals with simplifying tasks, such as how to get this 200 pound gorilla on screen before the deadline, and Organization deals with how data travels between artists during production, such as how the character setup artists deals with updating the gorilla model.

Simplifying Organization

Organization may be broken down further into two additional parts; Production Management and Asset Management. Managing Production is the art of keeping your artists in sync with the needs of production. It involves handing them the information they need in order to do their jobs, such as tasks, feedback and a deadline. It is in control of how social data flows between artists and production.

There are several products on the market today that deals with this aspect of the pipeline exclusively, namely Shotgun, FTrack and Tactic


What is required by a Production Management system is rather similar across industries and have a large foothold in history as to how it should be approached. Asset Management however has not.

Asset Management is the art of directing how binary data flows between artists, such as how the gorilla model will get from the modeler to the rigger. Note the difference between this and what is going on in the workflow domain. Workflow governs how artists perform actions whereas Organization is what the actions actually do. Lets take an example

A modeler has finished his first iteration of the gorilla model and wishes to share it with the rigger. The modeler hits "Publish" and writes down a comment. When finished, the rigger opens up the Library and Loads the model.

This is all fine and dandy, but what is actually going on under the hood? What does "sharing" mean in terms of 1s and 0s? What the modeler did was part of the workflow and what actually happened is organizational. As you might suspect, the line I'm drawing between workflow and organization if close to that of Interface versus Implementation. The interface is very similar across various cg studios, artists will always require a way of sharing their work with others, but the implementation however might not be. Sharing might happen using git commits via a central server in Malaysia or it may happen via exchanging floppy disk in the kitchen during lunch.

To sum up, here is the pipeline hierarchy of responsibility

                                /production management
                                /asset management
                            /best practices

I'll touch on workflow more once I've started implementing it. Now onto how I approached the interface and implementation of an Asset Management system. Pipi is the name. It "acts as a bridge between artists in order to aid collaboration"

Note that there is a subtle difference between simplifying "collaboration" and simplifying "organization". Organization helps Collaboration, not the other way around, and the duties of any pipeline should be to simplify organization, and as such, aid in collaboration. Asset management is one of the components of a pipeline that

The duties of Pipi can be broken down into two major components. It..
    - ..manages files and folders
    - ..presents relevant information

And it does so via a set of custom-built tools.

I will now go over each of those in more detail.

Managing files

To help users manage files, a separation between `working` and `published` files are introduced.

Let's use the example of The Ladybug. The Ladybug is a hypotetical 10 second animation produced by 10 people in a matter of 10 months. Once the script has been written and the crew picked out, the first order of business is to start creating material.


    "Hey Bob, can you start working on the ladybug model?"
Upon this request, the artist enters a working mode. He opens up the designated software for the Ladybug project, Maya, and starts saving files to his private user directory located under the asset. (more on what an asset is later)
As time goes, Bob will start gathering quite a large amount of files. That's fine, Bob is currently working within his own sandboxed area and as long as Bob doesn't have any problems with it, neither does anyone else.

Once Bob reaches v120, he decides to share his work with his co-workers. After all, Vicky has been expecting it for her Character Setup of the Ladybug.


Bob commits his work to the central repository of files, where anyone can access it. By using tools provided by Pipi, Pipi will ensure that the file lands in a common area with a common name syntax. It will also tag it some extra information such as who it is from, when it was committed and any comments that the author has provided.

This is referred to as Publishing

Any artist who wishes to share his or her work must first Publish the material they wish to share. This ensures that all material in the common area align with each other in terms of the various conditions they must fulfill.
    Error: Could not publish. Reason: Normals Locked!
Uh uh. Seems Bob is having some trouble with his publish. It seems that before Bob may submit his work to the common area, he must first make sure that the normals on his models are unlocked. By fulfilling this criteria  Bob will ensure that his work live up to the standards of every other model in the common area.

This critera is referred to as a Post-Condition

All models in the common area are guaranteed to live up to a set of post-conditions just like this one. The same applies assets of any species.

What is an Asset?

A film may be considered to be divided up into a set of Sequences and Shots, and each shot may contain one or more Assets.

An Asset is a building-block of a film

It represents an entity, such as a character or a piece of furniture, along with information about that entity, such as its name, description, author and so on. All of it encapsulated into a unified whole, called an Asset.

Assets are at the very heart of Pipi. In fact: A shot can only consist of Assets

This means that before an animator, for instance, can make use of your beautifully sculpted model, it must first go through the rigorous set of tests and fulfill each of the post-conditions set for its species.

This enables artists to know what to expect when loading new material from Pipi and helps keep things neat.

Wednesday, January 9, 2013

Working in Context pt. 2

So how would you let your application know about the context? Well, there are a few ways. The most convenient way might be to try and find a common ground of communication between you and the application. A place where both you and the application could read and write to

Environment Variables

In most operating systems there is the concept of environment variables. A common-ground for all applications on your computer, most of which will only read from it but some who also writes to it. You can think of it as a file on disk that everything accesses whenever the user stores a setting or an application requests one. In programming terms, it can be thought of as a singleton or global variable.

So how does one access the environment? Well, it depends on your point of entry.

From the command line in Windows 7, you can go
>>> set // Print all environment variables
ProgramFiles=C:\Program Files
ProgramFiles(x86)=C:\Program Files (x86)

>>> set windir // Print a single variable

>>> set myVariable=MyValue // Create a new variable and print it
>>> set myVariable

And on Both Mac and Linux\Ubuntu using bash, you can go
>>> env // Print all environment variables
LESSCLOSE=/usr/bin/lesspipe %s %s

>>> printenv XDG_CURRENT_DESKTOP // Print a single variable

>>> export myVariable=MyValue // Create a new variable and print it
>>> printenv myVariable

We, however, are mainly interested in accessing it via Python
>>> import os
>>> for key, value in os.environ.iteritems():
>>>     print "%s=%s" % (key, value) // Print all environment variables
LESSCLOSE=/usr/bin/lesspipe %s %s

>>> os.environ['MyVariable'] = MyValue // Create a new variable and print it
>>> print os.environ['MyVariable']

One variable you might already be familiar with is PATH. Which is, according to wiki.. "..a list of directory paths. When the user types a command without providing the full path, this list is checked if it contains a path that leads to the command." You usually store executables that you would like to run from a command line of sorts.

Another useful and common one is the PYTHONPATH. This is a variable which python looks for when determining which paths to use when searching for modules during import.

So, cmd.exe uses PATH to determine what executables are available and python uses PYTHONPATH. How about we make our own dependency? Couldn't we specify a CurrentProject variable that our library could use to determine which project we are in, in addition to Maya, Nuke, or any other program that might like to know about it? In this sense, CurrentProject is a global variable, accessible by everything that is run as a child of the OS.

Global is usually not the answer, however, but there is one thing that helps us with compartmentalization. Any time you open an application that needs the environment in any way, the environment is copied in to the running process. This means that modifying the environment from inside a child process is merely modifying it's own copy of the environment. It's own duplicate. You can think of this as a child process inheriting from its parent process.

This can be both a blessing and a curse. In some cases, it would be great to modify a variable and have all applications know about it so that they can update accordingly. But there are better ways around that. For instance, whenever a change occurs, a signal could be emitted to dependent processes to update accordingly. That way, not only can you control the flow of updates, but it also helps in keeping things tidy.

What's next

So, we'd like to modify our environment to store information about the context and then have Maya et al. make use of it. How do we go about doing this?

The terminal

Via the terminal, you can read the file system, create and edit files, folders. You can also modify the environment. This, in addition to the fact that the terminal contains a full copy of all environment variables, means that we can edit the environment inside the terminal and then start an application from it. This would make the terminal a child of the OS, and the application a child of the terminal. The OS would maintain it's own original of the environment, the terminal would contain a modified version and then this modified version would get passed along to Maya (which, in turn, would then make another duplicate of this environment). In fact, we could launch several processes via the terminal and each one would get the same modified copy of the environment from the parent, our terminal.

o Windows
    o Terminal
        o Maya
        o Nuke
        o Mari

Each level of this hierarchy is an opportunity to modify the environment, which means that we could separate modifications into groups related to the context in which it is getting modified. For example, the operating system could assign variables related to the overall operation of every application, independent of the user. The terminal then lets the user add to that via custom commands, such as setting the current project. Maya would then have a final copy with all of the information that it needs to operate.

But wait, there's more! In addition to these three levels of updating our environment, it can in fact happen at a few more levels.

o Boot
    o Windows
        o Login (per user)
            o Launch Terminal (per terminal)
                o Custom Commands
                    o Per Project
                        o Per Asset
                            o Maya
                            o Nuke
                            o Mari

As you can see, we can assign each of these level a responsibility and have a very dynamic way of interacting with our applications. Letting them know exactly what is going on with very little effort.

In the next part, I'll go through implementation and some of the problems I encountered along this path. Such as how to work around the fact that a process cannot edit it's parent environment.

Tuesday, January 8, 2013

Working in Context pt. 1

"Hey Maya, how are you?", said the Publisher Tool
"Oh, hey. It's allright. Whats up?", said Maya

"Nothing, just wanted to check some stuff with you if that's okay"

"What asset is your user working with?"

"Cool, do you also know which variant of that asset?"
"It's called animationSetup"

"Would you mind making me a burger?"
"Sorry, not in my job description"

"Oh. Okay, thanks!"

Having your application know under which context it is working under can be a real treat sometimes. It allows you to compartmentalize your work and keep things clean while also giving you that feeling that you are within a larger whole, that youre in a team, not only physically and emotionally, but also digitally. The feeling that whatever you do under this "digital umbrella" is somehow recorded and monitored and thus progresses the global status of the project.

In the example conversation above, between a Publisher and Maya, the Publisher asks Maya for information regarding what is about to get published. This relieves the user from having to specify it once for each publish and enables them to instead specify it per-application or perhaps once per several applications. With the application informed, the user may simply hit "Publish", enter a comment about what has changed, and click "Ok". Incrementing versions and determining where the physical file eventually ends up on disk is left as implementation details where it belongs.

Friday, January 4, 2013

Working in Context pt. 0

Why use Cygwin when windows already has a built-in terminal?


Having an application know under which context to operate is essential to providing a truly immersive experience for the user. Whenever an action is being performed the action should happen within the context that the user is working under. Take publishing for instance. The user may be publishing an variant multiple times under the same session. It makes little sense for him to specify which asset is being published each time, the surrounding should be able to provide such information. The only information relevant for the user to specify is what has been changed since the last publish. Saving is another example. Whenever the user saves a working file, they should only have one place to go and there should be no mistake as to where that place is. This helps the user with encapsulating parts of his work that relate to a certain building-block of the film. It can also be extended to providing sandboxed areas under each shot or variant for whenever the user feels the need to experiment.


So how can we inform the application of the context? Two ways, you can either start the application and then tell it about the context, or you can tell the parent of the application, the operating system, about the context and boot up the application under that context. The application can then ask the os for information it does not already know. It also allows us to have multiple applications share the same context. Say for instance you wish to work on shot 23. You set it once, and boot up Maya, Mari and Nuke to work on various aspects of the shot simultaneously. Since the context exists outside the scope of each application, each application may ask the higher source for information and may modify that source collectively, effectively staying synchronized with each other and may even allow a level of communication between them.

I chose to support the latter of the two approaches.


How do we inform the os of the context? One word, "environment variables". The os has a common area for storing both temporary and persistent metadata. The metadata may be altered either globally to affect everything always, or per instance of say a terminal.

A terminal, such as the built in Cmd.exe is booted up within this context (read "set of environment variables") and stores a copy of it internally. Whenever we make any change to the context, only this instance knows about it. That means that once we boot up Maya from within this terminal, Maya will also know about it, but the next terminal you boot up will not. (unless you boot it up from within this terminal). This lets us encapsulate modifications to the higher source (the os) whilst still allowing us to keep multiple applications living under the same encapsulation (namespace).


Cmd.exe has it's drawbacks however, which is why I chose Cygwin. The main reasons being:

Considering the pipeline should work across any platform, it makes sense to conform windows to an otherwise working standard, rather than the other way around. Additionally, knowledge gained from using cygwin translates transparently to other operating systems such as Linux and OSX as they both have access to bash, making any future transition effortless. Additionally, many studios have already adopted Linux for their productions. Using Cygwin thus would allow us to benefit from an existing knowledge pool whilst also facilitating the transition between studios for artists.

Built-in apps
Cygwin, like linux, allow for the use of editors within the terminal, mainly Text Editors. That allows you to quickly modify files that don't need the feature-richness of an external app.

In addition to:

Command line scripting
Custom commands can be made by both user and administrator to allow for easy access to common data such as project and current shot being worked on.

Per-project environment variables
Users can run a bash-script, setting an environment variable PROJECT to correspond to under which project is currently being worked on. PROJECT can then delegate paths relevant to this project, such as which bash-script is being used to launch the correct version of applications such as Maya.

The latter ones are possible using Cmd.exe as well and preferring one over the other however is a matter of style.


One of the thing about Cygwin that separates it from being native is that it uses its own directory scheme. C:\ does not exist and is instead kept under a directory like /cygdrive/c/. This may seem silly at first but I'm guessing their reasoning is the same as mine. Facing the choice between either conforming all of linux functionality to work under windows paths, or altering windows to work with all of linux, I'm guessing it was the right thing to do.

Separating Data From the Information About It

In any collection that includes a binary chunk of data and its related metadata, it can be convenient to store each in separate locations, such as having files in a hierarchical format on a server and metadata in a database table.

This way, you have two independent sources of information that relate to one anther.  You could store information in the table that you wouldn't as easily be able to store in the file itself. Especially when dealing with multiple file-types that share a common usage.

The problem is how to know which piece of the database table refers to which file. This is called referential integrity. Either each cell keeps a record of which data it represents, or each data contains reference to which cell its data is stored. "integrity" thus refers to how strong this link is.

This link is rather important. Lets take an example.

On your server, you have a file. A maya scene file representing a Character Rig. This rig has properties, such as who made it, when and how it was made. In addition, this rig exists in more than a vacuum. As a buildingblock, it has been built out of other buildingblocks such as models and perhaps even other rigs. The rig contains links to other rigs and thus maintains it's own referential integrity. More on that later.

Then, in your database table, at position A5-C5, you have stored its creator and date at which it was created and a reference to its location on disk.

|5 |  Marcus      | 2012-04   | c:\path\   |

Pretty cool huh? This way, I could look up the path to my binary data in this table and retrieve information about it not commonly stored by default file systems. Anything could be put in this table!


Now imagine what would happen if we were to move to c:\anotherpath\ The link would be broken and the metadata would no longer know what binary file it relates to.

We've got a couple of options in this case. We could:
1. Prevent files from being moved
2. Adjust cell when moving file
3. Automate cell-modification when moving the file

This is the easiest approach. Simply ensure that files don't need to be moved and enforce that fact either by setting files and directories to be read-only. In many cases, this is sufficient.

Manual Intervention
Whenever you move a file, simply update the cell. You could have either a web-based software for managing your database, making the editing not so bad. When moving more than one file on a regular basis however can make this process rather tedious. That's when automation can help.

Whenever you move a file, simply signal the database to follow. This requires some work. If you are on windows, you know how to move files via Windows Explorer. You could set up a callback from Explorer to your database, letting it know whenever a relevant file is being moved and act accordingly.

A more common approach however is to only allow files to be modified by your own tools. That is, instead of teaching Windows Explorer to your Database, let your Database and your Modification Tools speak the same language to begin with.

Having all of your tools under the same roof has it's advantages. For instance, a callback from Windows Explorer could not send information about why the change was made. Something that could be critical in a collaborative environment. On the other hand, it could help prevent files from being moved by artists who don't have write-access to certain files. Something you would get for free due to the tight integration with the current user and his permissions.

Synchronize utilizing existing tools or roll your own?

If a user is allowed to use common tools such as Windows Explorer to move files, should they also be allowed to rename files this way? How about modifying the properties for a file from executable to non-executable? Integrating an existing tool has the drawback that you might end up muddling the interface to your toolchain. The user needs to know what is supposed to be possible and what is not, he needs to know the constraints of the system and only work within them.

By integrating a common, yet comprehensive and existing toolset into your application, you need to ensure that all bases are covered by either clearly stating what is allowed up front or informing the user whenever an unsupported action is taking place. As you might suspect, this requires you to have a comprehensive understanding of what the Explorer is able to do in the first place and ensure that whatever it is able to do is handled appropriately.

Windows Explorer Feature List
Read (incl. their properties)
Update (move, rename)

Another way of ensuring that your data and metadata are synchronized is to only allow file modification through the use of custom tools. This means that for whatever feature you wish to enable, you must explicitly build it. This makes the support for it rather fool-proof in that you give yourself the ability to only enable features which are easily supported. It does however mean that the more features you require, the more code you must write and the more code you write, the more room there is for bugs to creep in.

Is there another way?

This brings me to the climax of this post. Keeping your data synchronized with your metadata is always a headache and there will always be times where there is a mismatch, either due to errors in your program or human-error.

Let me show you the way I chose to approach it in the Asset Library.

What do we have? Our assets all have additonal information about them, such as change-logs and relations to other assets in the project, but most importantly, it contains information about how it should get treated. An Asset has no files, it is merely a container for various types of character setups, textures, shaders etc. Which means we could not rely on file format to tell us how to handle it, we must give it an identity through other means. Thus, we have binary data inside of groups along with its corresponding metadata and we would like to keep them synchronized.

In windows, you can attach additional information to a file such as dimension, frame-rate, comments and even thumbnails. This is used by a few file formats to make browsing media a more immersive experience in apps such as Explorer. It is however to the best of my knowledge limited to only a few types of metadata on only a few types of file formats. Unless, as far as I know, you go more deeply in which case things seems less stable. In addition, this would require reading and writing entire files when modifying metadata which is a big deal in most scenarios but an especially big deal when modifying large pointcaches of several hundreds of gigabytes. Not sure how this works on folders.

What would be nice is if we could somehow store the metadata directly in the binary file. Maya files for example may be stored in an ASCII format, thus allowing us to extrapolate it with additional information inside the file in the form of comments. Nuke also allows for this. However, some formats do not and thus doing it this way would limit us to only extrapolate certain file formats and not others. Additionally, the same fate is suffered with that of Extrapolation in that large files need to be read and written whenever a change occurs. This also wouldn't be that easy with folders.

Another way is to keep them separate in terms of files, but store them together somehow. This would allow us to benefit from a database approach whilst not having to worry too much about synchronizing them since they will always be together. 

How can we bundle them? Well, we want them to always be together and thus separating them should involve an obvious and difficult process of an either manual or tool-based intervention. How about storing metadata in a .txt file along with binary file in an uncompressed archive of sorts. Such as .zip. This would allow assets to be passed around easily without running the risk of it loosing its metadata and suffer the performance issues whenever reading or writing to compressed archives. Reading and writing to them is quite easy as most operating systems allow for zip archives to be treated as regular folders, with the addition of them having a different icon. Which in our case is a good thing as it enlightens the fact that you are within a bundle that should not be dissolved.

In addition, it would allow us to give the archives custom file extensions, such as .asset for archives of type Asset.

This allows us to store Any metadata independently from the type of binary data or group in addition to the tight coupling between the two. It does however require us to read and write potentially large files whenever a change occurs. It also requires us to deal with that of nested archives.

o PuffAdder
    o texture
    o character_setup
    o simulation_setup

PuffAdder is a group with metadata, and character_setup is binary data with additional metadata, which means we end up with a bundle of bundles. Ultimately, we would end up with the entire data-set being a large bundle of bundles of bundles..

o MyProject.project
    o MyDatabase.database
        o MyAsset.asset
            o MyVariant1.variant
            o MyVariant2.variant
How about we turn the situation around. Instead of relying on file telling us what type of data we are dealing with, how about letting the folder have a type? If the folder is a type, that means it's content is the data just as the ones and zeros are the data of a binary file (read "container"). It would mean metadata and binary data could live as equals under the same roof, being accessed in a common fashion, without causing performance penalties when accessing either of them separately.

A folder could have dots in its name, essentially enabling them to have an extension just like files. However, as a matter of preference, adding dots to folders clutters their representation in an explorer as they would be more difficult to distinguish from files other than by looking at their icon. Of course, the library could strip the name during their display if one wanted to go that route without the visual clutter.

Thus, the asset library expects its library of assets to be formatted as follows:
- Each folder has a metadata file
- Each folder may specify a type other than the default via its metadata
- Each folder may have additional folders

o MyProject
    - metadata
    o MyDatabase
        - metadata
        o MyAsset
            - metadata
            o MyVariant1
                - metadata
                - binary data
            o MyVariant2
                - metadata
                - binary data

Having an additional file per folder clutters their interface, and thus we set the metadata to be "hidden"
o MyProject
    o MyDatabase
        o MyAsset
            o MyVariant1
                - binary data
            o MyVariant2
                - binary data

 Now we have a hierarchy of files, each folder knows what type it is without cluttering its interface.

How do I deal with keeping binary data and metadata toghether? The only accesspoint to the file system is through custom tools which only allow certain features to be performed. Artists use the library and the library is only allowed to read files. Supervisors use an Editor which may edit files and may expose more information about the system.

Some topics were left out of this post. Such as reading and writing metadata, type of metadata that may be stored and how to store it.

Tuesday, January 1, 2013

Composition for Grouping Attributes

In my previous post, I briefly mentioned that I stored instance attributes together with composed instances whose attributes I could fetch via instance.composited_instance.attr

Is composition good for this sort of thing?

In books, you might find composition to be used for grouping functionality by letting smaller objects form larger objects via nested encapsulation. E.g. instead of storing name, home_number and work_number under Person, you could instead choose to store home_number and work_number under a Contact object that is then stored under Person.

o Person
    o .name
    o .contact
        o .home_number
        o .work_number

So whats the benefit of doing this? Well, besides the benefits of composition in general, you get to stow away accessors that might not be directly related to the overall operation of the object into less obvious paths, thus encouraging the use of parts of the functionality over others. The directly available accessors is intuitively more comfortable to use than to go 6 level deep for a simple operation.
>>> # This
>>> # vs. this
>>> # Which do you prefer?
The neat thing about the latter is once you start gathering functionality that relates to the bigger picture of the parent object, but aren't really necessary for everyday use. Remember, the idea is to make the interface as simple, yet as complete, as possible.

>>> you.countries_visited()
['france', 'russia', 'usa', 'finland', 'sweden']
['france', 'russia', 'usa', 'finland', 'sweden']
Storing that level of detailed functionality directly into the parent object can easily obfuscate it's interface. That's why it can be beneficial to stow them away somewhere where, when accessed, the user knows he's dealing with less common functionality.

An alternative to composing accessors is to collect external sets of functionality. Rather than storing all functionality inside an object and it's composed objects, you can choose to store it in separated functions, either in the same file or in different modules or packages.

>>> you.nearby_cinemas()
['canary wharf cineon 1.5km', 'north greenwich odeon 0.4km']
>>> externalmodule.position.nearby_cinemas(you)
['canary wharf cineon 1.5km', 'north greenwich odeon 0.4km']

As you can see, you is no longer responsible for processing this command. Rather, an exernal module takes care of the necessary steps in order to produce the result. This can lead to deep and thorough functionality whilst still providing a simple interface for you. The user now has to think about what tool is right for the job as opposed to figuring out how to use one tool for everything.