Burst To The Public Cloud To Expand Your Rendering Power

VFX and Animation rendering is used to build scenes in pretty much every movie we watch on the big screen today! From just a few VFX scenes to a full length animation movie, major studios and VFX shops all own some render nodes in their data center or machine room. Those render farms vary in size from just a couple of render nodes on a shelf to 10,000s of compute nodes racked in a data center. Those rendering compute resources are often pushed to their limits, so studios and VFX shops are looking at new ways to augment their compute power. In this blog, we will discuss how the public cloud can help augment your on-premises compute power and we’ll highlight the key aspects that you should consider to make your cloud bursting project a success.

 

A HIGHLY RESOURCE-INTENSIVE COMPUTE JOB

Rendering is a very demanding process involving thousands of CPUs or GPUs to render the 100,000s of frames that can constitute a modern animation feature. As a result, when limited to their constrained on-premises compute resources, studios and VFX shops often struggle to meet very tight deadlines. These limitations can present real challenges to their daily operations. Those challenges are amplified by the fact that multiple projects must often be managed simultaneously, with differing delivery deadlines. Compute resources must be shared between artists working on different projects with different constraints and this often leads to a lot of frustration.

To put things in perspective, let’s do some simple math to understand the compute power required to render a full-length animated movie. A 90-minute movie at 24 frames per second (FPS) would be a total of 90 x 60 x 24 = 129,600 frames. Some complex frames in an animated movie could take up to 5 days to render (e.g. Disney’s Frozen). Let’s take an average of 24 hours to render a single frame. Using a single render node, it would take 129,600 x 24 = 3,110,400 hours, i.e. 129,600 days, i.e. 355 years! So how do they do it? They multitask the frame rendering process across multiple render node. We’re talking tens of thousands of render nodes with multiple CPUs/GPUs. This article talks about the challenges Disney encountered while producing Big Hero 6.

The following graphics illustrate some real-world rendering project timelines with the number of cores required to render the projects.

This graph shows an ideal example of a studio timeline working on 3 feature films A, B and C, where the demand for rendering cores never exceeds the available on-premises resources.

This graph represents the reality of such a timeline where a studio works on 3 parallel feature film projects A, B and C with similar delivery dates (as opposed to being evenly distributed over time). In such a scenario, the demand for rendering cores exceeds the available on-premises resources if the studio needs to meet the planned delivery dates.

 

When dealing with rendering projects similar to what is shown in the above graphics, on-premises compute resources quickly become a huge bottleneck. To cope, studios and VFX shops seek solutions to acquire additional, temporary resources to support these bursty workloads. They need a solution that can:

  • Allow them to run the same rendering software, but in a different location
  • Allow them to easily the select render projects/frames that need to be move to the burst location
  • Allow them to securely and quickly send those render projects/frames to another location
  • Allow them to easily monitor the rendering jobs in the other location
  • Allow them to easily collect the rendered projects/frames back on-premises
  • Allow them to quickly delete the burst rendering environment to control costs

The public cloud is an ideal environment for such a project. Let’s take a look.

 

A SOLUTION: BURSTING THE RENDERING JOBS USING PUBLIC CLOUD

Whether you’re managing a small shop trying to efficiently handle a moderate rendering workload, or you’re running a massive on-premises render farm and you want a fast, dependable way to handle burst capacity, public clouds have become a very viable option to augment your rendering workflow. Cloud prices have come down enough to make it a feasible option for compute bursts. The public cloud offers near unlimited performance and can quickly scale from several TBs to 100s of PBs, making it an ideal solution to accelerate your rendering process, allowing to meet your project deadlines.

Bursting compute-intensive workloads to the public cloud comes with its list of challenges, of course, so here are a few key points to consider.

 

KEY CONSIDERATIONS FOR A SUCCESSFUL CLOUD BURST

FILE ACCESS PROTOCOLS – Most modern rendering applications require a file-based access method to the render project files, typically based on the POSIX NFS or SMB protocols. Existing public cloud storage offerings are based on object storage that are accessed via object protocols (HTTP REST, S3). One must consider a cloud platform that natively supports POSIX file system protocols to avoid a lengthy project to adapt rendering applications to cloud object protocols.

SHARED ACCESS – Render farms can be made of tens of thousands of nodes that require shared access to the same render project files in order to render different versions or parts of an animation movie. One must consider a cloud platform that offers a shared global namespace with strict consistency between all the render farm nodes.

PERFORMANCE – High IOPS and random read/write storage access are the norm for rendering jobs. One must consider a cloud platform that offers high performance and can easily and transparently scale performance up and down according to constantly changing performance requirements.

SCALABILITY – Similar to performance considerations, scalability and the ability to scale up/down on-demand are equally important to enable quick adaptation to changes in rendering schedules and demands. One must consider a cloud platform that can be easily scaled up and down without interrupting ongoing rendering jobs.

REST APIs – In our API-driven world, one must consider a cloud platform that can be 100% managed via APIs, so that it can be easily integrated into existing rendering, scheduling, monitoring and queuing applications.

ON-PREMISES TO CLOUD AND BACK. SYNCING ASSETS TO AND FROM THE CLOUD – One key aspect of bursting render jobs to the public cloud is the possibility to quickly sync assets between an on-premises environment and the public cloud. Graphic artists need to be able to easily select what they need to send to the public cloud to render in a secure manner and they must also be able to easily get the rendered project back on-premises. One must consider a cloud platform that offers easy-to-use and secure sync mechanisms.

 

ELASTIFILE – AN IDEAL CLOUD PLATFORM FOR RENDERING BURST

 

Elastifile augments public cloud capabilities and facilitates cloud consumption by delivering enterprise-grade, scalable file storage in the cloud.

The Elastifile platform addresses all the key considerations described above with its end-to-end solution consisting of:

  • Elastifile Cloud File System (ECFS):
    • Software-only, environment-agnostic
    • Aggregates storage resources (i.e. spanning cloud VMs) into shared namespace
    • Delivers app-compatible, POSIX, primary storage
    • Built for the cloud (distributed services model incl/ metadata management)
    • Scalable capacity and performance (hot add/remove)
    • Enterprise feature set (e.g. snapshots, data reduction, high availability)
    • REST API controls
    • Push-button deployment via GCP Marketplace
  • Elastifile CloudConnect
    • Bidirectional transfer between file and object
    • Blends “best of both worlds” for hybrid and in-cloud
    • Retains full integrity of file system (hierarchy, attributes, links, versions, etc.)
    • Data transfer efficiency (comp, dedup, diffs)
    • REST API controls
  • Elastifile ClearTier
    • Intelligent integration between file and object storage in public clouds
    • Data automatically tiered between storage types based on user policies
    • All files transparently visible to apps via the Elastifile Cloud File System within single namespace
    • Blends application compatibility and performance of file storage with low cost of object storage
    • REST API controls

Let’s take a look at a real workflow example where Elastifile is deployed in the Google Cloud Platform (GCP) along with a rendering application, Blender, to leverage cloud-based parallel compute.

 

AN EXAMPLE – BURSTING RENDER JOBS TO GOOGLE CLOUD PLATFORM (GCP) WITH BLENDER AND ELASTIFILE

The following diagram depicts an architecture enabling a rendering burst to GCP using Elastifile, Google Cloud Storage (GCS), and Blender. Elastifile is available on the GCP Marketplace and can easily be deployed in just a few clicks – https://console.cloud.google.com/marketplace/details/elastifile/elastifile-storage

The architecture is made of two main blocks:

  • An on-premises architecture and,
  • a GCP-based cloud architecture.

The on-premises architecture consisting of:

  • A local NAS filer on which the assets to be rendered are stored
  • An Elastifile CloudConnect server that is used to transfer the assets to a GCS bucket
  • Some Render Nodes that won’t be used in the tutorial, since the purpose is to illustrate a GCP-based rendering workflow

The GCP cloud architecture consisting of:

  • A GCS setup with a bucket used to receive the assets from the on-premises Elastifile Cloud Connect Server
  • An Elastifile ClearTier service that is used to transfer assets from the GCS bucket to the Elastifile Cloud File System. [Note: That ClearTier will initially only transfer the file metadata, thus enabling the applications to being working immediately. As files are accessed, the automated, policy-based tiering mechanism will efficiently manage the movement of file data between the object and file system tiers when necessary.]
  • An Elastifile Cloud File System storage cluster that will serve a global POSIX NFS namespace for the Render Nodes
  • N x Render Nodes that will all share the same NFS mount from ECFS to render multiple frames in parallel.

 

DEMO WITH BLENDER IN GCP

The team at Elastifile has developed a pretty nice demo of a rendering burst to GCP, based on the widely known and used BLENDER open source rendering software. A demo video is available here illustrating a Linux Blender based rendering workflow with lastifile running on GCP: https://www.youtube.com/watch?v=ehSdAtsfgcg

For more information on how Elastifile supports rendering in the public cloud, you can also check out this Media Rendering Solution Brief: https://www.elastifile.com/resources#

Topics: Media & Entertainment,