Making Dollars and Sense of Research Computing Cloud Billing

Service Units for a Hyperscale Public Cloud

Introduction

Service units are traditionally the base unit sold by research computing and high performance computing centers at universities to other academic departments. Typically, arrangements are made between the research computing centers and vendors on a selection of “nodes” or “blades” that encapsulate various arrangements of compute cores, memory, disk, and GPUs that are made available to researchers. Additionally, the research computing center will maintain a set of shareable file storage systems, login nodes, and resources to host a job scheduler. Together, these expenses are used to calculate a resell price, so that researchers can purchase service units from the research computing center, typically with three to five year priority access to the purchased nodes.

This same model of procurement is possible through public cloud providers. The main distinction with purchasing from a cloud provider is that you do not need to have funds for large up-front capital expenses to pay for compute nodes or the physical infrastructure necessary to house new hardware. Instead, you can pay per second of CPU, memory, disk, and GPU usage on hardware owned and operated by a hyperscale cloud provider. Additionally, the time to bring a new set of resources online is significantly shorter than setting up an on-premises cluster, thanks to click-to-deploy and infrastructure-as-code solutions like Fluid Numerics' Research Computing Cluster on Google Cloud.

The datacenters that provision resources for cloud-native and on-premise research computing clusters both take time to architect, engineer, procure, install and integrate. On-premise clusters carry costs for mechanical and physical operations which are realized indirectly through per-second cloud-native procurement costs. Not only does this save time and money, it can potentially save a tremendous amount of energy. The carry costs of cloud systems can be limited to the compute and storage needs of a single workload while virtual hours of processing and years of storage reflect the operating costs of a state-of-the-art datacenter. Cloud-native procurements offer the opportunity to design an operational model for compute and storage resources which can be allocated within the largest organization.

In this article, we’ll discuss the costs of using public cloud and present a few different operational models for a research computing facility to deliver compute resources to researchers. Specifically, we’ll look at two models that use a traditional, centralized cluster model with resources hosted in the cloud and one model that consists of self-service. In all cases, we consider base costs to include

internal human resources,
static and shared compute resources,
directly attributable compute resources, and
third party licensing and support costs

Public Cloud Costs

To give you an idea of how a research computing center can adopt a cloud-native or cloud-hybrid model, we need to discuss the cost model for public cloud and relate this to traditional RC/HPC style architectures and user usage patterns so that you can create a viable reselling model for a university.

First, on public cloud platforms, like Google Cloud, resource usage is billed per second for each vCPU, GPU, GB of memory, and GB of Disk. Together, these individual resources add up to a cost per unit of time for operating a highly composable virtual server on Google Cloud.

For example, suppose that you want to offer a cluster that has two types of compute nodes

96 vCPU ( 48 Core ) Intel Broadwell, 360 GB memory, 100 GB Attached Disk, and 8 Nvidia® Tesla V100 GPUs
112 vCPU ( 56 Core ) AMD Epyc Milan, 224 GB memory, and 100 GB Attached Disk

The cost per-hour for each of these nodes is the total cost of each of the underlying resources. On Google Cloud, the vCPU and memory are packaged together as machine types that are identified by the make/model of the CPU and the ratio of vCPU to memory. In this example, the Intel Broadwell’s are offered as the General Purpose n1 series and the AMD Epyc Milan’s are the Compute Optimized c2d series. Specifically, the two machine types that match the specifications above are (1) n1-standard-96 and (2) c2d-highcpu-112. The n1 instances have the option of adding up to 8 GPU’s.

To obtain a cost estimate, we can use the Google Cloud Pricing Calculator. When estimating costs for compute nodes, it’s recommended to look at what the base cost might be, before any discounts are applied. Table 1 shows the estimated hourly cost per node for each machine type.

Table 1. Estimated hourly cost per node per machine type

When using the RCC solutions, you also have a set of login nodes and a controller that hosts the Slurm job scheduler. Additionally, you likely want to have a reliable shared NFS file system. To meet these needs, your system might include the following components:

n1-standard-64 ( 32 core Intel Broadwell + 240 GB Memory ) + 100 GB Attached Disk Login Node
n1-standard-64 ( 32 core Intel Broadwell + 240 GB Memory ) + 100 GB Attached Disk Controller
5 TB Standard Tier Filestore

With auto-scaling clusters that serve numerous users, you likely want to keep these kinds of resources on 24/7 with room for scheduled downtimes for maintenance. This is typical for a centrally managed system where researchers can access resources any time to schedule jobs to be run. Table 2 shows the estimated hourly costs, before any discounts are applied.

Table 2. Estimated static resource costs per hour

In addition to these expenses, there are also human resource costs that you likely need to cover for your department. Qualified system administrators and research software engineers are critical to ensuring that researchers are able to efficiently and effectively use the compute resources you are providing to them, so it’s important to factor in these costs.

As an example, a system administrator (SA) with experience in research computing can cost about $90K/year in base salary, and a RSE can cost upwards of $120K/year in base salary. Supposing that you have a team with one administrator and one RSE that each work 40 hours per week, this roughly translates to about $110/hour.

Models for Service Unit Costs

With these base costs in mind, the question now is how to develop a model for selling service units to other departments in your organization.

First, you know that you need to pay for the static resources, including the controller, login node, and NFS file server. Additionally, you will also need to make sure that your staff’s salaries are covered. In the example we’ve been building, this means that, on average, you need to cover $5,470.3791/ month for static resources and $17,500 / month for employees. Compute nodes, on the other hand, are only billed when jobs are running, thanks to the auto-scaling nature of the cluster. Additionally, through a job scheduler’s database, resource labels, or even multi-project setups, direct costs associated with the compute nodes can be aligned to individual users on the cluster.

Condo & Club Model

In traditional “condo” or “club” style models, researchers purchase nodes and pay a monthly or annual fee to have priority access to that node. This model can still apply in the cloud. As an added benefit, obtaining annual committed spend from researchers in this way also opens up fairly steep Committed Use Discounts, which can help “mop up” some of the static expenses and create an attractive pricing model.

Suppose that you aim to serve M users at your organization and your annual static resource costs, including a salaried support team, is D USD/year. Additionally, a node on a specific compute partition on your cluster costs P USD/node-hour and you want to offer allocations in groups of B node-hour/year node-hours per year. Then, your base cost per user per year is the total expected cost per year, divided by the number of users.

Figure 1.1. Base cost per user per year is the total expected cost per year divided by the number of users

If you have multiple partitions, you could use an extended cost model that sums over the allocation blocks and costs per partition.

Figure 1.2. Extended cost model summarizing additional partition costs

Using the example from Figure 1.2, and assuming you sell the GPU node-hours in groups of 1000 node-hours per year and Epyc Milan node-hours in groups of 5000 per year, and you plan to serve 50 researchers, then

D/M = $275,644.55 ≈ $5,512.89
B1P1 = $24,405.45
B2P2 = $21,015.95

Then, “at-cost” resell prices are

1000 Node-Hours of n1-standard-96 + 8xV100 GPU is $29,918.34 / year
5000 Node-Hours of c2d-highcpu-112 is $26,528.84 / year

By securing purchases on an annual basis and signing up for committed use discounts, you’re able to drop compute costs by up to 57%. In this case,

D/M ≈ $4,764.54
B1P1 = $10,494.34
B2P2 = $9,036.86

The estimated costs after accounting for committed use is shown in Table 3.

Table 3. Estimated costs after accounting for committed use

Club + Direct Cost Model

An extension of the condo or club model is to expose the benefits of per-second billing to your users, while distributing the costs of centrally managed static resources amongst all of your users. In this approach, you charge an annual subscription to users which grants access to the login node, job scheduler, shared file-servers, and RSE and SA support. Compute costs are then aligned with individual users or to groups (e.g. a Principal Investigator and their team) and billed on a monthly basis.

Using the example above, where we assume 50 users are supported, the base subscription cost is approximately $4,764.54, assuming you’re going for a committed use agreement for those resources. From here, you can simply charge per second of use for vCPU-hour, GPU-hour, disk-hour, and memory-hour as reported by the job scheduler. With this kind of model, your product schedule would look something like this, if you billed per node-hour :

Table 4. Example pricing schedule with additional benefits

Because Google Cloud offers flexibility in the compute node sizes, you can easily create compute nodes for users that match requirements for their typical workloads.

Private Catalog + Direct Cost Model

Another option to consider is giving researchers direct access to easy-to-use click-to-deploy solutions that have been reviewed and vetted by your team. With this approach, you can curate a catalog of solutions for your researchers and move away from centrally managing resources 24/7.

Instead, your organization can focus on just making sure your RSE’s and SA’s are covered and creating a lightweight administrative team to support breaking out cloud expenses per use. With this model, the job function of your RSE’s and SA’s is to primarily train researchers on how to use private catalog solutions, how to work with your team to create private catalog solutions if one does not exist for their use case, and what to expect for billing.

With this model, you don’t have to carry static compute resources. Instead, the static expenses are for your staff. You can still provide an annual membership that includes access to support staff and simplified billing to users. However, the base costs that need to be covered only include the cost of keeping qualified staff on-board.

For the simplified example considered above, a cost schedule could look like this :

Table 5. Example pricing schedule with scalable workload ready systems

The difference here is that the products offered are specifically tailored to deliver resources with applications that have been pre-installed, and ideally optimized, by your technical staff or a 3rd party like Fluid Numerics.

You can build a private catalog by bringing in solutions from Google Cloud Marketplace directly or by creating solutions tailored to your researchers using Terraform or Deployment Manager. With this approach, each solution has its own per-second cost and users are empowered to truly scale to zero, since provisioning and deprovisioning resources is easy to learn.

Granular Billing

Independent of which pricing model you choose, as an administrator of centralized computing resources, it’s helpful to understand who is incurring costs on what resources. To make it easy to handle granular billing, you can easily configure Fluid Numerics’ RCC to deploy compute nodes into distinct Google Cloud Projects or apply resource labels. Combining this with a QOS configuration with Slurm allows you to very easily align access between compute partitions and research groups. On Google Cloud, billing reports naturally separate costs by project, which enables you to effortlessly align costs for each user or group.

Service & Support from Third Parties

Research Computing resources are traditionally requested, procured and integrated through third party equipment distributors and providers that manage the complexity of systems architecture, engineering and installation. These third parties have always played a critical role in configuring, tooling and supporting systems throughout a supported lifetime established in a services and support contract. Similar third parties have emerged to fill the administrative and management gaps which develop as resources grow into cloud environments. Fluid Numerics is one of those third parties. We engage in work beyond the deployment and utilization of cloud resources and customers see value in the extended services and support we provide.

Fluid Numerics continues to build a team of internal qualified remote system administrators and research software engineers to support system users in a variety of scientific domains. This growth supports our mission to provide high quality support for researchers and IT professionals maintaining cloud-native and cloud-hybrid clusters and self-service solutions. We provide ongoing services and support to the research community through the development of applications and workflows to help carry out high quality research. Fluid Numerics team members can assist in supporting cloud resources and provide some of the additional attention new users need while resources expand.

Hiring and engaging System Administrators and Research Software Engineers as full-time employees can be particularly challenging for organizations that are getting started with HPC and RC resources or realizing rapid adoption of these resources for their organizations. Fluid Numerics has the experience to understand the types of skills necessary to provision, maintain, and operate co-located and cloud-based HPC and RC solutions and is able to offer flexible and affordable service level agreements, with a cost model that grows with your organization.

In general, costs associated with Third Party products, services and support that are made available to an organization through service agreements, statements of work, licensing agreements and other commitments should be considered as it relates to system resource expenses.

For example, the RCC solutions on Google Cloud Marketplace add a $0.006638/vCPU/hour and $0.094500/GPU/hour licensing fee, which in turn entitles you to scalable community support according to Fluid Numerics’ End User License Agreement. In this case, the support costs are dictated by how much you leverage these solutions and can be directly tied to compute expenses. For organizations planning larger usage or desiring a stricter service level agreement, Fluid Numerics can alternatively provide monthly, quarterly, or annual support at rates dictated by the number of users requiring support and the number of active support tickets. In this case, the support expense may be more akin to the human resources expense for your in-house team.

Summary

Public cloud offers a new mechanism to procure compute resources for research computing centers. This mechanism offers the benefit of pay-per-use and removes the need for large up-front capital investments in hardware and facilities. In this article, we’ve discussed how a research computing center can leverage a pay-per-use pricing to offer allocation packages to researchers using both traditional and novel pricing models. When developing allocation models, taking into account both human resources and compute expenses is critical.

In all of the allocation models presented, human resources are considered a shared expense across all users. When more traditional infrastructure models are considered, where a few centrally managed resources are operated 24/7, these resources are also incorporated as an equally shared expense. With a centrally managed cluster, however, you have the option to price allocations based on commitments to resource consumption or to provide flexible “pay-for-what-you-use” compute cycles. Commitments offer steep discounts on Google Cloud and can potentially be attractive for researchers who may already have funding secured for a project and wants to be able to get the most possible vCPU-hours or GPU-hours for a fixed spend. On the other hand, researchers that have a less consistent budget or are in a short-term exploratory phase of their work may find the flexibility of pay-for-what-you-use more attractive.

Whichever route you choose, Fluid Numerics offers services to help you provision resources on Google Cloud and develop an allocation model for your university or laboratory. As a Google Cloud reseller, we can help you find the best possible discounts. As a boutique integrator, we can tailor infrastructure to your organization’s needs and assist in optimizing your resource usage on Google Cloud to help keep costs predictable and the cloud affordable.