N3437
Variable Length Array (VLA) Allocation Control

Published Proposal,

Previous Revisions:
None
Authors:
Paper Source:
GitHub
Issue Tracking:
GitHub
Project:
ISO/IEC 9899 Programming Languages — C, ISO/IEC JTC1/SC22/WG14
Proposal Category:
Change Request, Feature Request
Target:
C2y

Abstract

Variable length arrays are commonly expected to be pulled off of what most implementations call their "stack", a dedicated area of memory for what are known as "automatic storage duration" variables and objects. However, this runs afoul of many inherent limitations unique to the stack, and exacerbated by the lack of terminology in the standard to handle it. Therefore, rather than require implementations to develop increasingly sophisticated techniques to handle these cases, we instead defer such behavior specification and implementation to the users themselves.

1. Changelog

1.1. Revision 0 - December 23th, 2024

2. Introduction and Motivation

Variable Length Arrays (VLAs) are a feature for potentially accessing an implementation-detail of where memory is stored to generate more efficient programs by utilizing a variable amount of stack space. The feature was conceived in the early 1990s with inspiration from Fortran, and eventually brought to C standardization by Tom MacDonald of the Cray Supercomputing group[N317]. Despite getting multiple implementations from 1993 onwards and achieving all of the necessary details for C standardization — including a detailed rationale answering open questions during its standardization process in subsequent N-documents — the feature went into the standard in C99 and received deeply mixed response.

Many numerical communities and other processing groups found VLAs to be of great use; kernel communities and application programmers found it fraught with peril as user-controlled variables snuck into the VLAs and threatened to blow out their stacks or — when paired with logic errors — much worse. Others still found their performance characteristics hard to control and inaccessible; different individuals could not get their compilers to provide them with the kind of memory guarantees that they wanted[making-c-less-dangerous].

Implementations withdrew support for even implementing variable length arrays as a required feature, citing its several unspecified corners and its total silence on how to handle memory errors or other issues. This resulted in variable length arrays being made optional — along with many other features — in C11, and since then VLAs have been in a state of flux. The GNU Compiler Collection (GCC) implemented "stack probing" (checking the size of the stack and letting the operating system fail where possible) rather than just blindly blowing out the stack in 2017 (its -fstack-clash-protection option, a much improved version of its initial support). It took until 2020 for the Clang C compiler to support stack probing with their VLAs. Other implementations just aliased VLA usage as closely as possible to alloca() and stack pointer manipulation as they could before phoning the effort in. Others still simply decided to call malloc for every VLA declared, with a simple crash/abort if such an allocation failed.

It is clear now that there is a wide variety of implementation techniques and ways of handling VLAs, but in every attempt this has always been tackled from an implementer perspective. In this paper, we propose allowing the user to control the semantics of the the allocation and deallocation of a VLA. It is much more likely that the user knows the memory profile, binary footprint, and performance characteristics they would like to target, as opposed to the implementation. There are thousands — sometimes, hundreds of thousands — of developers as opposed to compiler developers; it makes sense for users to have control of the feature where they deem it important to their code.

Similarly, we do not infringe upon an implementation’s right to refuse implementing a VLA: __STDC_NO_VLA__ can still be defined by an implementation, and an implementation can still outright reject VLAs if and only if this new, proposed, opt-in extension mechanism is not defined by the user. This allows implementers to implement VLAs in terms of a quick code rewrite rather than needing to worry about stack probing, a source of memory, or any of the other things implementations have cited as problematic for the last two decades.

3. Design

The design of this feature is simple; we need to provide users with the ultimate level of control so that implementations need not concern themselves with the details of allocation and deallocation of variable length arrays. This eliminates the sole reason that VLAs were made optional in C11 to begin with; implementations of varying sizes struggled with the ability to compute the allocation. If the user is able to control how a VLA is allocated, then all of the most serious technical concerns for the implementation and proliferation of VLAs disappears as the user is stating they will be responsible. This means that even implementations that have __STDC_NO_VLA__ defined and set to 1 will still be able to access the feature, provided the user declares 2 functions visible at the point of the creation of the VLA:

void* stdc_vla_alloc(size_t n, size_t align, size_t* out_n);
void stdc_vla_free(void* p, size_t n, size_t align);

The way this feature works is simple. If both function declarations are present, the compiler is required to use those as the source of memory for the VLA (subject to the usual as-if rules for optimization). The initial memory comes from the return value from stdc_vla_alloc. The memory is freed at the end of the scope (or later) by a matching call to stdc_vla_free, as was done before by the implementation. Every exit of the scope, including use of goto to a label that is outside of the scope, shall be preceded by a matching call to stdc_vla_free if a stdc_vla_alloc was hit. Similarly, the behavior is undefined if the scope is left by using longjmp/setjmp or similar constructs.

If only one of the two functions is visible, then it is a constraint violation. If either function is visible but not of a compatible function signature, then it is a constraint violation.

If none of the two functions are visible, then the behavior is implementation-defined/unspecified (subject to __STDC_NO_VLA__, and the location from where the data comes from is unspecified as it is in today’s C standard). Therefore, even if __STDC_NO_VLA__ is defined, the following program will evaluate and execute:

#include <stddef.h>
#include <string.h>
#include <stdlib.h>

void* stdc_vla_alloc(size_t n, size_t align, size_t* out_n) {
  (void)align;
  void* p = malloc(n);
  if (!p) abort();
  *out_n = n;
  return p;
}
void stdc_vla_free(void* p, size_t n, size_t align) {
  free(p);
}

int main (int argc, char* argv[]) {
  // works even if __STDC_NO_VLA__ is defined and non-zero.
  int vla[argc];
}

Importantly (for the purposes of optimization), the only hard semantic requirement is that for every one call to stdc_vla_alloc, there is a matching call to stdc_vla_free. There does not have to be a call for each VLA, if the implementation wants to e.g. re-use the memory of a previous allocation. See § 3.4 Memory Reuse for more details.

3.1. Can we return a null pointer value from stdc_vla_alloc?

The pointer returned by the function is always non-null. This is why the [static 1] annotation should be used on the function call:

void stdc_vla_alloc(size_t n, size_t align, size_t* out_n)[static 1];
void stdc_vla_free(void p[static 1], size_t n, size_t align);

Unfortunately, this syntax is not allowed for void* pointers. We do not solve this problem for this paper, but would note that such a fix would be of general interest to make these annotations more useful to a wide variety of functions, especially infallible (impossible to fail) memory and allocation functions.

If someone wishes to handle an error, they must do so inside of the function and handle it by either aborting or jumping out before the function returns. Once the function returns, the program is entirely valid in assuming that the returned pointer is a valid address for the purposes of the VLA. Any error reporting or error checking must be done exclusively by the user. This is because there is simply no syntax when working with a variable length array to check for success, as e.g.:

int main (int argc, char* argv[]) {
  int vla[argc];
  if (vla) {
    return 0;
  }
  return 1;
}

will always create a program that returns 0 as the vla check can never fail if execution reaches that point. There is simply no post-facto way to handle a failed VLA allocation in the language today, and that is a oversight we will have to live with for the rest of the duration of the existence of VLAs.

3.2. Allocation Size?

The size passed to the allocation is implementation defined. This is because the implementation controls the ABI for its VLA; it may ask for more memory and different alignment than may be implied by the declaration of the variable length array. For example, given the following program:

#include <stddef.h>
#include <string.h>

void* stdc_vla_alloc(size_t n, size_t align, size_t* out_n);
void stdc_vla_free(void* p, size_t n, size_t align);

int compute_sum(size_t data_size, int* data);

int main (int argc, char* argv[]) {
  /* 0. … */
  int vla[argc];
  /* 1. use `vla` here … */
  for (int i = 0; i < (sizeof(vla) / sizeof(vla[0])); ++i) {
    vla[i] = strlen(argv[i]);
    if (vla[i] > 255) {
      return -1;
    }
  }
  int sum = compute_sum((sizeof(vla) / sizeof(vla[0])), vla);
  /* 2. … */
  return sum;
}

The equivalent de-sugared program may look as follows:

#include <stddef.h>
#include <string.h>

void* stdc_vla_alloc(size_t n, size_t align, size_t* out_n);
void stdc_vla_free(void* p, size_t n, size_t align);

int compute_sum(size_t data_size, int* data);

int main (int argc, char* argv[]) {  
  /* 0. … */
  int $__vla_size;
  int (*vla)[argc];
  vla = (typeof(vla))stdc_vla_alloc(
    sizeof(vla[0]) + sizeof(size_t) /* implementation-defined; extra storage for VLA size */,
    alignof(size_t) /* implementation-defined */,
    &$__vla_size
  );
  /* 1. use `vla` here … */
  for (int i = 0; i < (sizeof((vla[0])) / sizeof((vla[0])[0])); ++i) {
    vla[i] = strlen(argv[i]);
    if ((vla[0])[i] > 255) {
      stdc_vla_free(vla, $__vla_size, alignof(size_t));
      return -1;
    }
  }
  int sum = compute_sum((sizeof((vla[0])) / sizeof(*(vla[0]))), (vla[0]));
  /* 2. … */
  stdc_vla_free(vla, $__vla_size, alignof(size_t));
  return sum;
}

As shown in this de-sugared program, the VLA may have a size that is, practically and conceptually, different from the length of the variable length array retrieved by sizeof(). Therefore, it is important that there is an output parameter for $__vla_size that can adequately track such information if necessary.

3.3. What If A VLA Is Lowered To A C Array?

A frequent question in the face of this proposal is "what happens if an implementation is smart enough to lower the VLA to a C-style, fixed-size array?". The answer here is simple: it is already implementation-defined if an array is considered a VLA or not, due to the nature of the rules of Constant Expressions. This was clarified in C23[n3138] that such arrays may or may not be VLAs. Implementations can continue to lower VLAs into fixed-size, C-style arrays and declare them non-VLAs and thus avoid needing to invoke any of this infrastructure. This proposal does not prevent any class of optimizations, as the implementation is still within full control when it needs to be.

What this does prevent implementations from doing is denying the use of VLAs outright if the program provides the necessary allocation functions.

3.4. Memory Reuse

It is important to note that, for this feature, the only hard requirement is that for every call to stdc_vla_alloc, there is a matching call to stdc_vla_free. It does not mean there must be one call to stdc_vla_alloc for the start and end of every VLA’s object lifetime. For example, given the for loop in the below program:

const int vla_n = 30;

int main () {
  const int n = 50;
  for (int i = 0; i < n; ++i) {
    double vla[vla_n] = {};
    /* use VLA */
  }
  return 0;
}

The program may opt to only allocate the VLA once, resulting in a de-sugaring that looks something close to:

const int vla_n = 50;

int main () {
  const int n = 30;
  size_t $__vla_size;
  double (*vla)[vla_n];
  vla = (typeof(vla))stdc_vla_alloc(
    sizeof(vla[0]) /* implementation-defined */,
    alignof(size_t) /* implementation-defined */,
    &$__vla_size
  );
  for (int i = 0; i < n; ++i) {
    for (size_t __init_vla = 0; __init_vla < sizeof((vla[0])); ++__init_vla) {
      (vla[0])[__init_vla] = (double){};
    }
    /* use VLA */
  }
  stdc_vla_free(vla, $__vla_size, alignof(size_t));
  return 0;
}

This results in a single call to stdc_vla_alloc paired with a single call to stdc_vla_free, instead of 30 paired function calls. This is legal so long as the semantics remain identical. Users are subject to unspecified behavior if they attempt to rely on the number of times stdc_vla_alloc is called, as the compiler may unroll, fuse, or otherwise reuse memory allocations for whatever purposes it sees fit.

4. Interaction with Other (Potential) Language Features / Implementations

There are many other (future) language features and implementation-specific features that can allow this feature to really blossom. Below are a select sampling of such techniques and a brief discussion of each such.

4.1. alloca and stack-probing

Below is an implementation of manual stack checking that works on MSVC-based platforms as well as GCC and Clang-based platforms with the pthreads library (with *_np (non-portable) extensions present).

///////////////////////
// platform boilerplate
///////////////////////

#define _GNU_SOURCE
#define WIN32_LEAN_AND_MEAN

#if defined(_MSC_VER)
#define MY_FLATTEN __forceinline
#define MY_OUTLINE __declspec(noinline)
#include <malloc.h>
#include <windows.h>
#elif defined(__clang__) || defined(__GNUC__)
#define MY_FLATTEN [[gnu::flatten]]
#define MY_OUTLINE [[gnu::noinline]]
#else
#define MY_FLATTEN
#define MY_OUTLINE
#error "unsupported platform: do not how to inline " \
  "function call into parent function call on this vendor"
#endif

#if defined(_REENTRANT) && (_REENTRANT == 1) && \
  __has_include(<pthread.h>)
#define MY_PTHREAD_H 1
#include <pthread.h>
#else
#define MY_PTHREAD_H 0
#endif

#include <stddef.h>
#include <stdint.h>

MY_OUTLINE
bool my_is_stack_available(size_t amount, size_t alignment)
{
  // TODO: support alignment
#if defined(_MSVC_VER)
  // https://devblogs.microsoft.com/oldnewthing/20200610-00/?p=103855
  ULONG_PTR low = 0, high = 0;
  GetCurrentThreadStackLimits(&low, &high);
  ptrdiff_t remaining = (ULONG_PTR)(&low) - low;
  ptrdiff_t available = high - low;
  if (remaining > available) {
    // Ssssshhhooould not be possible?!
    // Something is horrifically wrong here...!
    __fastfail(FAST_FAIL_INCORRECT_STACK);
  }
  return remaining >= amount;
#elif MY_PTHREAD_H
  char* low_stack_addr;
  size_t stack_size;
  pthread_attr_t attr;

  int getattr_res = pthread_getattr_np(pthread_self(), &attr);
  if (getattr_res != 0) {
    return false;
  }
  int getstack_res = pthread_attr_getstack(&attr,
    (void**)&low_stack_addr,
    &stack_size);
  if (getstack_res != 0 {
    return false;
  }
  // some nerd will scream about provenance or whatever, I'm sure
  char* local_address_guess = ((char*)(void*)&low_stack_addr);
  ptrdiff_t remaining = local_address_guess - low_stack_addr;
  if (remaining > stack_size) {
    // Absolutely should NOT be possible?!
    abort();
  }
  return remaining >= amount;
#else
#  error "cannot determine current stack size: insufficient hacks"
#endif
}

///////////////////////////
// User-Defined VLA Control
///////////////////////////
#include <stddef.h>
#include <stdlib.h>

MY_FLATTEN inline void* stdc_vla_alloc(size_t size,
  size_t alignment,
  size_t* actual_size)
{
  if (!my_is_stack_available(size, alignment)) {
    abort();
    return nullptr;
  }
  *actual_size = size;
#if defined(_MSC_VER)
  return __alloca(size);
#elif defined(__clang__) || defined(__GNUC__)
  return __builtin_alloca_with_align(size, alignment);
#endif
}

MY_FLATTEN inline void stdc_vla_free(void* ptr,
  size_t size,
  size_t alignment)
{
  // nothing, it's alloca
}

///////////////
// main program
///////////////
extern int n;

int main () {
  // we are in compiler that doesn't support VLAs (e.g., MSVC)
  static_assert(__STDC_NO_VLA__ != 0,
    "this will work even if VLAs are not present");

  // because both stdc_vla_alloc and stdc_vla_free are available,
  // VLA will use that to retrieve memory
  // and ignore whatever implementation does
  int vla[n] = {};

  // use as normal...
  /* … */

  return 0;
}

4.2. With Transparent Aliases

With Transparent Aliases, a future feature that is not yet in Standard C, the VLA allocation can provide a locally-specific allocation function that is not visible to other scopes. For example:

static_assert(__STDC_NO_VLA__ != 0,
  "this will work even if VLAs are not present");

void* my_vla_alloc(size_t n, size_t align, size_t* out_n);
void my_vla_free(void* p, size_t n, size_t align);

int main (int argc, char* argv[]) {
  // aliases the 2 required function calls
  _Alias stdc_alloc_vla = my_alloc_vla;
  _Alias stdc_free_vla = my_free_vla;
  // uses my_vla_alloc
  int meow[argc];
  // calls my_vla_free
  return 0;
}

int f (int n) {
  // Constraint violation: implementation does not support VLAs.
  int meow[n];
  return 0;
}

The VLA in main will compile, link, and run (provided a definition of my_vla_alloc and my_vla_free are in the final program). Conversely, f is a constraint violation and thus the program may not compile, link, or run. This is mildly better than using a function pointer object because null function pointers are no longer a potential point of failure.

5. Wording

Wording is relative to the latest draft of the C standard.

6.✨ Variable length array storage control
Description

A program can control the storage of variable length arrays when the identifiers stdc_vla_alloc and stdc_vla_free are in scope and are function designators or function pointers compatible with a function type or function pointer type specified below.

For any variable length array object, instead of utilizing the conditionally-supported and otherwise unspecified allocation method for that object, an implementation shall instead allocate its storage using the stdc_vla_alloc function. For every invocation to stdc_vla_alloc made, an invocation of stdc_vla_free is paired with it to allow a program to free the storage of the variable length array.

Constraints

If one of stdc_vla_alloc or stdc_vla_free is visible at a given scope when a variable length array is declared, then the other shall also be visible at that same scope.

stdc_vla_alloc shall have a (pointer to) function type compatible with void *(size_t n, size_t align, size_t actual_size[static 1]). stdc_vla_free shall have a (pointer to) function type compatible with void (void *ptr, size_t n, size_t align). If any of them are function pointers, those function pointers shall not be a null function pointer.

The stdc_vla_alloc function shall return a non-null pointer to at least n bytes of storage and has the requested or compatible alignment alignment. The actual usable size of the allocation shall be written into the actual_size parameter, representing the usable size in bytes of the allocation.

NOTE A valid implementation of stdc_vla_alloc can terminate the program on failure to uphold the requirements. An implementation is allowed to use the provided allocation for any purpose or reason consistent with the usable size, alignment, and storage duration of the corresponding variable length array object.

For the duration of its lifetime (6.2.4), a variable length array object shall have its storage provided through a successful call to stdc_vla_alloc, and after the end of its lifetime will be freed with stdc_vla_free.

The implementation shall pair every successful execution of stdc_vla_alloc with a corresponding invocation of stdc_vla_free, unless a non-local jump (7.13) or program termination occurs.

Semantics

Regardless of the presence or value of __STDC_NO_VLA__, if all of the previously-specified constraints and requirements in this subclause are met in a given scope, a program can declare, initialize, and otherwise use a variable length array object as specified in this document for that scope. If the constraints and requirements are not met, than this clause does not apply and the program is subject to the conditional support for variable length arrays and the unspecified nature of its allocation.

Any storage allocated by stdc_vla_alloc may be reused by the implementation. Implementations are permitted to not call stdc_vla_alloc if there is already suitable storage.

The behavior is undefined if execution returns from a call to stdc_vla_alloc and the returned value is a null pointer, does not provide enough storage, does not appropriately set the size of the allocation in *actual_size, or is not suitably aligned.

NOTE Parameters declared with variable-length array type are equivalent to pointers, and do not need to use stdc_vla_alloc or stdc_vla_free. The stdc_vla_free function performs any necessary actions to undo memory previously allocated by stdc_vla_alloc, if there is any behavior at all.

EXAMPLE An implementation is not guaranteed to call stdc_vla_alloc along with the corresponding stdc_vla_free for every single variable length array object if it has access to suitable storage already available, including if it has access to data previously provided by stdc_vla_alloc. The following:

#include <stddef.h>

const int vla_n;

int main () {
  extern void *stdc_vla_alloc(size_t n, size_t align, size_t actual_size[static 1]);
  extern void stdc_vla_free(void *p, size_t n, size_t align);
	
  const int n = 50;
  for (int i = 0; i < n; ++i) {
    double vla[vla_n] = {};
    /* use VLA */
  }
  return 0;
}

has a valid interpretation equivalent to:

#include <stddef.h>

extern const int vla_n;

int main () {
  extern void *stdc_vla_alloc(size_t n, size_t align, size_t actual_size[static 1]);
  extern void stdc_vla_free(void *p, size_t n, size_t align);

  const int n = 50;
  size_t __vla_size;
  double (*__pvla)[vla_n];
  __pvla = (typeof(__pvla))stdc_vla_alloc(
    sizeof(pvla[0]) /* hypothetical */,
    alignof(double) /* hypothetical */,
    &__vla_size
  );
  for (int i = 0; i < n; ++i) {
    for (size_t __init_vla = 0; __init_vla < sizeof(__pvla[0]); ++__init_vla) {
      (__pvla[0])[__init_vla] = (double){};
    }
    double* vla = __pvla[0];
    /* use VLA */
  }
  stdc_vla_free(__pvla, __vla_size, alignof(double));
  return 0;
}

stdc_vla_alloc is only called once for the entire duration of this program, despite the variable length array being declared and used within a for loop 50 times during program execution.

EXAMPLE Assuming an implementation where __STDC_NO_VLA__ is defined with a non-zero value: let there be some function named alloca that provides storage that is automatically reclaimed by the implementation when provided a size and an alignment; and, let there be some attribute named acme::flatten that behaves as-if the marked function, when called, has its function body written directly in the calling location. The following is a valid program that will translate and execute without error:

#include <stddef.h>

[[acme::flatten]]
inline void* stdc_vla_alloc(size_t size, size_t alignment, size_t actual_size[static 1]) {
  *actual_size = size;
  return alloca(size, alignment);
}

[[acme::flatten]]
inline void stdc_vla_free(void* ptr, size_t size, size_t alignment) {
  // nothing; implementation automatically reclaims this memory
}

int main (int argc, char *argv[]) {
  int vla[argc + 1] = {};
  return vla[0];
}

EXAMPLE Assuming an implementation where __STDC_NO_VLA__ is defined with a non-zero value: the following will translate successfully until the body of function f.

#include <stddef.h>

void *my_vla_alloc(size_t n, size_t align, size_t actual_size[static 1]);
void my_vla_free(void *p, size_t n, size_t align);

int main (int argc, char *argv[]) {
  static_assert(__STDC_NO_VLA__ != 0, "this will still work");
  typeof(my_vla_alloc)* stdc_alloc_vla = my_alloc_vla;
  typeof(my_vla_free)* stdc_free_vla = my_free_vla;
  // uses my_vla_alloc
  int meow[argc]; // ok
  // calls my_vla_free
  return 0;
}

int f (int n) {
  static_assert(__STDC_NO_VLA__ != 0, "this will fail");
  int meow[n]; // constraint violation:
               // implementation does not support VLAs
  return 0;
}

EXAMPLE Implementations can avoid calling stdc_vla_alloc by using semantic program information, meaning that every variable length array may not include a call to stdc_vla_alloc:

extern void *stdc_vla_alloc(size_t n, size_t align, size_t actual_size[static 1]);
extern void stdc_vla_free(void *p, size_t n, size_t align);

extern void use(int* p);

void optimizable (int n) {
  if (n < 100) {
    int a[n];
    use(a);
  }
}

An implementation is permitted to not call stdc_vla_alloc in optimizable and instead perform a fixed automatic storage duration allocation (similar to what would happen if a was declared as int a[99]).

References

Informative References

[MAKING-C-LESS-DANGEROUS]
Kees Cook; Google. Making C Less Dangerous. September 1st, 2018. URL: https://www.youtube.com/watch?v=XfNt6MsLj0E&t=310s
[N3138]
Aaron Ballman. N3138: Rebuttal to N2713 Integer Constant Expressions. June 21st, 2023. URL: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3138.pdf
[N317]
Tom MacDonald. N317: Arrays of Variable Length. January 2nd, 1994. URL: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n317.pdf