VFX and Animation rendering is used to build scenes in pretty much every movie we watch on the big screen today! From just a few VFX scenes to a full length animation movie, major studios and VFX shops all own some render nodes in their data center or machine room. Those render farms vary in size from just a couple of render nodes on a shelf to 10,000s of compute nodes racked in a data center. Those rendering compute resources are often pushed to their limits, so studios and VFX shops are looking at new ways to augment their compute power. In this blog, we will discuss how the public cloud can help augment your on-premises compute power and we’ll highlight the key aspects that you should consider to make your cloud bursting project a success.
A HIGHLY RESOURCE-INTENSIVE COMPUTE JOB
Rendering is a very demanding process involving thousands of CPUs or GPUs to render the 100,000s of frames that can constitute a modern animation feature. As a result, when limited to their constrained on-premises compute resources, studios and VFX shops often struggle to meet very tight deadlines. These limitations can present real challenges to their daily operations. Those challenges are amplified by the fact that multiple projects must often be managed simultaneously, with differing delivery deadlines. Compute resources must be shared between artists working on different projects with different constraints and this often leads to a lot of frustration.
To put things in perspective, let’s do some simple math to understand the compute power required to render a full-length animated movie. A 90-minute movie at 24 frames per second (FPS) would be a total of 90 x 60 x 24 = 129,600 frames. Some complex frames in an animated movie could take up to 5 days to render (e.g. Disney’s Frozen). Let’s take an average of 24 hours to render a single frame. Using a single render node, it would take 129,600 x 24 = 3,110,400 hours, i.e. 129,600 days, i.e. 355 years! So how do they do it? They multitask the frame rendering process across multiple render node. We’re talking tens of thousands of render nodes with multiple CPUs/GPUs. This article talks about the challenges Disney encountered while producing Big Hero 6.
The following graphics illustrate some real-world rendering project timelines with the number of cores required to render the projects.
This graph shows an ideal example of a studio timeline working on 3 feature films A, B and C, where the demand for rendering cores never exceeds the available on-premises resources.
This graph represents the reality of such a timeline where a studio works on 3 parallel feature film projects A, B and C with similar delivery dates (as opposed to being evenly distributed over time). In such a scenario, the demand for rendering cores exceeds the available on-premises resources if the studio needs to meet the planned delivery dates.
When dealing with rendering projects similar to what is shown in the above graphics, on-premises compute resources quickly become a huge bottleneck. To cope, studios and VFX shops seek solutions to acquire additional, temporary resources to support these bursty workloads. They need a solution that can:
The public cloud is an ideal environment for such a project. Let’s take a look.
A SOLUTION: BURSTING THE RENDERING JOBS USING PUBLIC CLOUD
Whether you’re managing a small shop trying to efficiently handle a moderate rendering workload, or you’re running a massive on-premises render farm and you want a fast, dependable way to handle burst capacity, public clouds have become a very viable option to augment your rendering workflow. Cloud prices have come down enough to make it a feasible option for compute bursts. The public cloud offers near unlimited performance and can quickly scale from several TBs to 100s of PBs, making it an ideal solution to accelerate your rendering process, allowing to meet your project deadlines.
Bursting compute-intensive workloads to the public cloud comes with its list of challenges, of course, so here are a few key points to consider.
KEY CONSIDERATIONS FOR A SUCCESSFUL CLOUD BURST
FILE ACCESS PROTOCOLS – Most modern rendering applications require a file-based access method to the render project files, typically based on the POSIX NFS or SMB protocols. Existing public cloud storage offerings are based on object storage that are accessed via object protocols (HTTP REST, S3). One must consider a cloud platform that natively supports POSIX file system protocols to avoid a lengthy project to adapt rendering applications to cloud object protocols.
SHARED ACCESS – Render farms can be made of tens of thousands of nodes that require shared access to the same render project files in order to render different versions or parts of an animation movie. One must consider a cloud platform that offers a shared global namespace with strict consistency between all the render farm nodes.
PERFORMANCE – High IOPS and random read/write storage access are the norm for rendering jobs. One must consider a cloud platform that offers high performance and can easily and transparently scale performance up and down according to constantly changing performance requirements.
SCALABILITY – Similar to performance considerations, scalability and the ability to scale up/down on-demand are equally important to enable quick adaptation to changes in rendering schedules and demands. One must consider a cloud platform that can be easily scaled up and down without interrupting ongoing rendering jobs.
REST APIs – In our API-driven world, one must consider a cloud platform that can be 100% managed via APIs, so that it can be easily integrated into existing rendering, scheduling, monitoring and queuing applications.
ON-PREMISES TO CLOUD AND BACK. SYNCING ASSETS TO AND FROM THE CLOUD – One key aspect of bursting render jobs to the public cloud is the possibility to quickly sync assets between an on-premises environment and the public cloud. Graphic artists need to be able to easily select what they need to send to the public cloud to render in a secure manner and they must also be able to easily get the rendered project back on-premises. One must consider a cloud platform that offers easy-to-use and secure sync mechanisms.
ELASTIFILE – AN IDEAL CLOUD PLATFORM FOR RENDERING BURST
Elastifile augments public cloud capabilities and facilitates cloud consumption by delivering enterprise-grade, scalable file storage in the cloud.
The Elastifile platform addresses all the key considerations described above with its end-to-end solution consisting of:
Let’s take a look at a real workflow example where Elastifile is deployed in the Google Cloud Platform (GCP) along with a rendering application, Blender, to leverage cloud-based parallel compute.
AN EXAMPLE – BURSTING RENDER JOBS TO GOOGLE CLOUD PLATFORM (GCP) WITH BLENDER AND ELASTIFILE
The following diagram depicts an architecture enabling a rendering burst to GCP using Elastifile, Google Cloud Storage (GCS), and Blender. Elastifile is available on the GCP Marketplace and can easily be deployed in just a few clicks – https://console.cloud.google.com/marketplace/details/elastifile/elastifile-storage
The architecture is made of two main blocks:
The on-premises architecture consisting of:
The GCP cloud architecture consisting of:
DEMO WITH BLENDER IN GCP
The team at Elastifile has developed a pretty nice demo of a rendering burst to GCP, based on the widely known and used BLENDER open source rendering software. A demo video is available here illustrating a Linux Blender based rendering workflow with lastifile running on GCP: https://www.youtube.com/watch?v=ehSdAtsfgcg
For more information on how Elastifile supports rendering in the public cloud, you can also check out this Media Rendering Solution Brief: https://www.elastifile.com/resources#Topics: