Vladimir Klepov as a Coder

How I made banditypes, the smallest TS validation library

I open-sourced banditypes — the smallest runtime validation library for TS / JS. It manages to fit all the basic functionality into an astounding 400 bytes. For reference, the popular zod and yup libraries are around 11KB, superstruct measures 1.8KB for the same set of functionality.

Today, I'll tell you how I managed to pull this off. The article is divided into three parts. First, we discuss the downsides of traditional bundle size measurement techniques, and come up with a more realistic way to assess it. Then I explain the design process focused on minimizing bundle size — with this alone, the size clocked around 500 bytes. Finally, I share the extra optimizations and dirty hacks that allowed me to strip off another 20% and arrive at the current 385-byte size.

This article originally appeared in Russian on habr.com

Size measurement technique

When people say that "library X is 40 kilobytes", they usually mean that the full build (usually UMD) of the library, minified and gzipped, is 40 kilobytes. This was perfect back in 2008, when you dropped the jQuery script from an obscure CDN onto your website; it was bearable in 2015 when require instead of copy-pasting code was enough to make us happy. But in 2023, with static imports end ES modules and stuff, this is just not a realistic measure (albeit a simple one to measure).

First off, we have tree shaking, and it's pretty good if you don't stand in its way. If a library is a collection of 200 utilities, of which I only use one, I don't really care about the total size of all helpers, as long as only one makes it into my bundle. Judging libraries by full size not only punishes feature-rich libraries (you're not strong, you're FAT, ha-ha), but also fails to assess the tree-shakability of the library. Besides, the standalone library bundle includes full exported names, like export enums or, while in real life the names will be mangled or even completely eliminated by inlining the functions into the call site.

Tools like bundlejs.com and size-limit bundle a small sample app importing from your library, taking care of these problems. But classical single-pass analysis still contains some artifacts that skew the measurement, and for tiny libraries this interference can make the majority of the reported bundle size:

  • Repetitions of common JS syntax — const, function(, for (let etc. Every client app most likely contains these already, so, thanks to gzip, a library gets them almost for free.
  • Wrappers and runtime generated by the bundler — it can be as small as an IIFE wrapper, but it's still there.
  • 22 bytes of gzip service data called "End Of Central Directory record". Again, as long as your library is bundled together with the user code, this is not added to the bundle a second time.

To address all of these issues, I build two versions of a small sample app. The first one, baseline, contains just a small chunk of JS relevant to the use case — an object we're validating, and some try / catch blocks. The second one adds banditype validation on top of that. The size difference of the bundles generated is the real size cost added by banditypes — not accounting for stuff that's already there (it's 218 bytes BTW), but including the user-defined schema. This last point is very important – otherwise, you can achieve an illusion of a smaller library by making the user type more code. Mental experiment: "I published a hot new 0-byte UI framework! Install it with npm i zerro and build UI using vanilla DOM API manually".

This approach also lets me measure the size with different subsets of functionality and assess the quality of tree shaking. 385 bytes is the size of a full build, while the common core (object, array, and primitives) sits at 207 bytes, and the "core" with no validations is 96 bytes. Different minifiers can be plugged in to cover the variety of real-world use cases: I report the 5-pass terser size as it's the most established minifier that you should use if bundle size is a concern, but esduild minifier is not far behind at 405 bytes.

Design for smolness

If you start the development by randomly slapping the keyboard until you let out all the functionality you could think of, and then try to also make it small, you're probably up for an unpleasant failure. Instead, small size should be at the core of your design decisions from the outset:

  1. Design for tree-shaking. Why is zod around 11KB no matter how few of its functions you use, while superstruct achieves a very respectable 1.8KB size for a real-world use case, with the same feature set? Because zod heavily relies on methods: z.number().gt(5), while superstruct uses functional composition: min(number(), 9000). The minifier can take a look at the bundle, see that function max is not referenced, and remove it. But finding all the places where zod's number validator is used, analyze its methods and drop the unused ones is nearly impossible. Besides, function names are much easier and safer to mangle. So, we should rely on functions as much as possible.
  2. Focused feature set. A major feature of decent validation libraries is returning detailed error messaged describing what, exactly, did not pass the validation: { message: 'Expected string, got number', path: ['item', 'title'] }. I'm sure this is a useful feature, but I also know it's not needed in every use case, so I decided to remove this functionality. Besides, fewer features mean quicker development.
  3. More extensibility. To cut functionality without putting the users and ourselves into an unpleasant position when some critical stuff is missing, we need to let the users extend the library. I added two methods (yes, methods, more on that in a minute) to every validator:
  • type1.map(res => ...) — perform extra validation aka type refinement: string().map(str => str.length ? str : fail()) or transform the data: string().map(str => [str])
  • type1.or(val => ...) — If the left validation fails, try the right one. It's obviously useful for union types: string().or(optional())), but also works for default values: string().or(() => 'default')
  1. While we're at it, it's really great if we can make one thing perform several roles. map can both refine and transform the data, or works as a union or a default value.

I think the resulting API is quite beautiful — and compatible with the established libraries to facilitate migration:

const userSchema = object({
title: string(),
// refinement
price: number().map(num => num > 0 ? num : fail()),
tags: set(enums(['sale', 'liked', 'sold']))
});

// string OR a string array
const strings = string().or(array(string()));

// data conversion
const arraySum = array()
.map(arr => arr.reduce((acc, e) => acc + e, 0));

Optimizations

Smolness as a design principle already gives us a very neat 520-byte size. But I did not just want a small library, I wanted the smallest library possible. So, it's time to try like hell and push for sub-400 without sacrificing too much DX.

  1. Compile to modern JS. Replacing function array(raw) ... with const array = (raw) => ..., and using raw spreads, surprisingly, saved 23 bytes. gzip is pretty good at removing repetition (the uncompressed bundle was reduced by 430 bytes), but there still is something to be gained. ES2017 is reasonably well supported by browsers, and you can transpile the library further down yourself.
  2. Remove duplicate APIs. Why have a literal(42) type if you can model it like a single-value union, enums([42])? It's not that single literal is a common type to validate, anyways.
  3. Repetition is power. Gzip is great at removing repetition. If you have arrow functions, turn all functions into arrow functions, so that there is more repetition. If you already use a word Object, you can use it again, almost for free. If every function accepts a single argument, it makes for a nice =(e)=>{ repeating chunk (terser makes all argument names the same). Surprisingly, having a function copy-pasted with a minor change is smaller, under gzip, than reusing a "proper" parametrized base function.
  4. typeof x === 'number' can be replaced with instance-based check: like = sample => raw => typeof raw == typeof sample, and then you define number type as like(0). 20 bytes!
  5. throw new TypeError('Invalid Banditype') is quite a big chunk of code. We can just call a string: 'bad banditype'(), and the error will be thrown anyways. 20 bytes!

Naturally, I tried out a ton of ideas that didn't work. Replacing throw with return null did slim down the size a bit, but forced the code to rely on optional chaining, which is not well-supported, and validating container types became much harder, because the errors don't magically bubble up anymore, not to mention the problems with validating real null values. I also expected to gain something by replacing for..in with Object.keys, because it's an expression, and expressions minify better, but it didn't work at all.

Finally, I left map / or as methods, even though I could strip 17 bytes by modeling them as pure functions. They are not likely to be tree-shaken completely (this would trigger a 50-byte reduction), because built-in object and array validations use map internally, and or is used for optional and nullable types. Besides, I much prefer the readability of the chained version: string().map(s => [s]).or(array(string())) reads like a normal sentence, while or(map(string(), s => [s]), array(string)) is some weird mashup of words.


Today we took a first look at a new validation library, banditypes. I managed to fit some useful, non-trivial functionality into a measly 400 bytes by treating small size as a core design principle:

  • Prefer functions over methods, because they minify better.
  • Cut features aggressively. Extra validations and detailed error messages are not useful in every scenario, so they had to go.
  • Make the library extensible to compensate for the limited feature set.
  • Be resourceful, and provide multi-purpose tools.

And applying some more optimizations on top of that:

  • Modern JS = less code.
  • Remove duplicate APIs. One way to do something is enough.
  • Repetitive code is good code, as far as gzip is concerned.
  • typeof raw === typeof sample trick
  • 'bad banditype'() hack instead of an explicit throw

I think banditypes really shines in some use cases — try it out in your apps! I'd also really appreciate you starring the repo on github — it means a lot!

Hello, friend! My name is Vladimir, and I love writing about web development. If you got down here, you probably enjoyed this article. My goal is to become an independent content creator, and you'll help me get there by buying me a coffee!
More? All articles ever
Older? Ditch google analytics now: 7 open-source alternatives Newer? Svelte reactivity — an inside and out guide