Vulkan.
The starting point for rendering is applied to the first resource that’s usually provided: HTML. So, the first step is parsing the HTML with HTMLDocumentParser
, which then provides its output to HTMLTreeBuilder
to construct the HTML tree nodes.
The resulting HTML tree is, in fact, the DOM (Document Object Model). The DOM serves two purposes:
So V8, Chrome’s JavaScript engine, exposes DOM Web APIs like createElement(...)
or appendChild(...)
, which are thin-wrappers over the C++ DOM Tree, through bindings.
There can be multiple DOM Trees in a single document, through the use of Custom Elements and their Shadow Tree. For rendering purpose, the trees are composed into a complete view that includes all sub-trees, a technique referred to as flat tree traversal, which in turn presents the exhaustive collection of all nodes as well as how they should be laid out within that tree.
The styles engine parses, through CSSParser
, the active stylesheet(s) into sub-structures
It will generate StyleSheetContents
entries for each active and applicable stylesheets, be it <style>
tags within the markup, or linked stylesheets.
The result is a Object Mode that contains all parsed style rules for all active sheets, and the browser now needs to figure out how those apply to the DOM Elements. This step is called styled resolution, and consist in walking the DOM Tree, consulting the parsed rules, and produces a computed styles for each element. Computed styles are simply a map of property-to-value pairs that represents the applicable styles to the element, once specificity, layering, prioritization, overlapping and overwriting, etc. has been taken into consideration.
The computed-style-augmented-DOM-tree is the final output of the style engine.
This can be oberved directly in the Chrome DevTools under the “computed styles” object of a given element, although the DevTools will also show computed layout property values, which aren’t available yet at this stage of the pipeline. I.e. the DevTools would show a pixel value for an element’s width
, whereas the computed style of that element could at this point contain a to-be-computed value such as auto
.
The DOM structure and the computed styles are the inputs to the Layout algorithm.
Determines the visual geometry of the elements. Each element is occupying one, or more, rectangular “boxes” in the content, end the job of the layout is to determine the coordinates of these boxes.
The layout algorithm determines how the boxes will be distributed:
writing-mode: vertical-lr
, LTR/RTL, etc.Those seem to simply match 1:1 the CSS layout modes. Assumed here that this is simply the “backend” realization of each and every one of those, although the breakdown might be different internally.
The layout also measures runs of text in the chosen font from the computed styles, and then shaping selects the glyphs (letters, numbers, symbols, etc.), through HarfBuzz’s HarfBuzzShaper
, and computes the size of each glyphs, account for things like ligatures and kerning, and their placement.
Layout also calculates multiple types of bounding rectangles for each element. For instance, if an element’s content is bigger than its border box rectangle, we’re now in an overflow scenario where layout has to keep track of both the border box rectangle and the overflow rectangle. It is also the responsibility of layout to compute scroll boundaries when applicable, and account for space for the scrollbars, determine min and max scroll positions and so on.
The Layout algorithm operates on a separate tree called the Layout Tree, which associate layout classes with element. All of the layout classes inherit from the base Layout Object class, which makes this tree yet another Object Model.
The tree is built from passing over the DOM Tree at the end of the Styles Stage. Then, the Layout Stage consists in traversing this new tree and figuring out the geometry data, the line breaks, the scroll bars, etc. for those Layout Objects.
In simple cases, DOM Tree Nodes are 1:1 with Layout Objects, but there can also be multiple Layout Objects for a single DOM Node, or Layout Objects without a corresponding DOM Node at all, in cases where it can be beneficial to simplicity or performance, or no Layout Object at all for a given DOM Node in cases such as using display: none
.
The Layout Tree is based on the flat traversal tree in that it contains every node from the sum total tree of all nodes in both the Clear DOM and Shadow Root(s). This means that a Layout Object could belong to a different DOM Tree with its own layout container.
The Layout alogithm is in the process of being rewritten to a new modern version referred to as “Next Generation”, or NG, as of 2020. The work might’ve been completed as of 2022 as part of the broader RenderingNG rewrite of Blink’s rendering engine.
While in the “old” model, a Layout Object contained every bit of data withing itself - which led to performance issues and added complexity upon recalculation because of dependency between the objects and poor visibility over “invalidation” - in NG Layout Objects are now composed of multiple sub-objects which contain their Inputs, an determined and assigned Constraint Space, and then an immutable and cacheable - keyed by the Constraint Space - Layout Result.
The Layout Result is actually its own tree called the Fragments Tree. Each node within that tree describes the physical geometry of a rectangular fragment of the element.
I.e. a simple block element might produce just a single fragment, but an element that’s broken across lines or columns might produce multiple fragments.
Here is a markup snippet along with its corresponding rendered form and the associated DOM Tree. Notice how some of these tree nodes are HTML Elements, while others are simply Text nodes with character strings.
And below is the corresponding Layout Tree. This tree is pre-line-breaking, meaning that its current state doesn’t account for line breaks just yet - this will be represented once the Layout Objects are processed into fragments. Furthermore, we can see that the match is not 1:1 with the DOM Tree nodes, for instance, anonymous Layout Objects have been inserted to separate block and inline siblings.
Finally, below is the completed Fragments Tree, complete with line breaks and other minute details. In this tree, each fragment specifies its exact content, coordinates, and geometric proportions, as well as, in case of “Boxes”, the associated layout algorithm.
It seems that some peoplesuggests that there is an additional step in-between these two called Layer, which will in turn build the Layer Tree upon which the Paint step depends. It is essentially the tree-form of stacking contexts.
Could this be the same as the [[#Compositing Assignments|Compositing Assignment]] step?
Maybe not since there is also the prepaint stage which falls in-between and isn’t explicitly listed here either.
The input to the paint step is the Layout Object’s with their Fragment Tree. From this geometry, we can start painting them.
The paint step outputs objects called Paint Artifacts.
To do so, it builds collections of painting operations called paint ops, which resemble high-level graphics API calls. It might be, for instance, “draw a rectangle”, or “draw a path”, or “draw a blob of text”. The paint op include information about the applicable coordinates, the colors, etc.
The paint ops are then packaged together in objects called Display Items which include a reference back to the associated Layout Object, and all related Display Items are then bundled together into Paint Artifacts.
This step is simply building a recording of paint operations that can be played(back) later, but those are not yet executed. Nothing gets actually rendered as pixels to the screen as part of this step.
The paint step uses stacking order, not DOM order. Although, in the absence of z-index values in a stacking context, the stacking order defaults to the DOM order.
It runs in multiple consecutive phases that traverses the stacking context. Since those phases process the whole stack every time, it is possible for an element to be both behind and in front of another at the same time, though their different layers.
The example below details the Display Items output, composed of paint ops, for the given styles applied to a single <div>
DOM node as well as the Document itself, and for various paint phases.
Looking into more details with the last display item, and its single paint op, which is about painting text. That op itself then contains a text blob definition which includes the glyphs’ identifiers and x-offsets for each glyph in the font that was selected and based on the text shaping result produced by the Layout step’s HarfBuzz and which handles the kerning, ligatures, etc.
The Raster step turn some (or all) of the paint ops into a Bitmap, which is a matrix of color values in memory. So each cell into this matrix will contain bits that encode the color and transparency of a single pixel.
Raster also decodes any image resource that are embedded in the page. Images will be provided in formats such as jpeg or png, and the associated paint ops will simply referrence compressed data, but it is left to the raster step to invoque the appropriate decoder to decompresse the image into its own raw bitmap.
When we refer to raster’s bitmaps as being “in-memory”, it usually refers to GPU memory. This is because modern GPUs offer the hardware-accelerated rasterization feature.
Raster then uses Skia , a library which provides a layer of abstraction around the hardware, to turn paint ops and bitmaps into (Open)GL commands that will finally build the textures.
For security reasons, the renderer’s sandbox is not allowed to make system calls. For this reason, Raster needs to occur withing the GPU process. The stream of paint ops is first sent down to that process through IPC, and that is where the Skia code gets executed.
This process-level isolation has multiple advantages.
The paint ops are IPC’d through a command buffer from the renderer process to the GPU process. Command buffers are extremely flexible constructs which were first conceived to pass over serialized OpenGL commands; as such, they’d be used to provide pre-built GL calls in a sort of RPC-like mode.
But today they serve simply as an exchange mechanism for paint ops. This in turn means that they are agnostic to the underlying graphics API. When Vulkan ends-up replacing OpenGL, no change will be required to the command buffers and their paint ops.
Skia issues GL commands through function pointers which are setup from dynamic lookups to the system’s shared OpenGL library.
Is this shared OpenGL library an abstraction layer much like glibc for Linux syscalls?
On Windows, there’s an additional translation step to DirectX. The generic GL function pointers pass through a Chromium library called Angle, which will then transform them into DirectX calls.
It is Angle which currently does the translation from GL calls to Vulkan, when the flag is enabled, and there are plans for it to eventually do the same for Metal.
Those low-level graphics API calls are the final output of the rendering pipeline. We now have pixels in memory.
Many events in a browsing session can change the rendering dynamically, and running the full pipeline is very expensive. The goal is then to avoid unnecessary work as much as possible.
The rendering pipeline packages its outputs into Animation Frames. These are the units which gets painted to the display. The aim is to produce as many of those per-second as the refresh rate of the output device, with a minimum aim of 60. Anything below will look janky.
As an optimisation techinique to avoid having to fully rerender on each frames, the various steps apply granular invalidation to indicate, as accurately as possible, what’s been “changed” and needs to be recalculated, as opposed to what stayed the same and can be taken from last frame’s results.
Invalidation will only get you so far, especially for operations that cover large regions such as scrolling.
Everything on the main thread competes with JavaScript. This means that even if the rendering pipeline is exetremely fast, you’ll still get jank if your script needs to do compute-heavy operations prior to the rendering even beginning.
Compositing is an optimisation technique to further break down the document into independent items, which in turn allows for more targetted repaints and thus less work to be (re)done.
The Compositor’s main purpose are:
The layers are being built on the main thread, but are then sent off to another thread; the compositor thread (called impl
).
A layer is a piece of the page that can be transformed and rastered independently from the rest of the page / of other layers.
A layer captures a subtree of content (if that element has childrens, they’ll be part of that same layer) so that they can be rasterized independently, and then animated as a single unit.
A layer is essentially DOM content rendered as a bitmap, which can then be reused, hardware-accelerated, etc. It is part of the animation performance techniques and can be inferred or imperatively toggled, in some cases, by properties such as will-change.
It just so happen that every fluid motions instances on a page can be expressed in terms of layers.
To unload the busy main thread, inputs are first sent to the compositor’s thread to see if they only affect a composited layer. If so, in a case like a scoll input for instance, the compositor’s thread can handle the input by itself without ever needed to tax the main thread. This can lead to significant performance gains.
In other situations though, the compositor’s thread might decide that it cannot handle it, for instance in situations where the scrolled-on element doesn’t belong to a layer, or if that element has blocking JavaScript event listeners. In such cases, the input is forwarded to the main thread which then puts it in its task queue for processing whenever possible.
Composited layers are represented by cc::Layer
, where cc
stands for Chromium Compositer. Today, the layers are created from the Layout Tree, by promoting things with certain style properties, such as animation or transform.
Are those what’s referred to as Compositing Triggers? Is this where
will-change
would also qualify?
There is an intermediated step towards promotion, called the Paint Layer Tree. Paint Layers are candidates for promotion; so some Layout Objects will get Paint Layers, and some Paint Layers will then get CC Layers.
CC Layers don’t have any parent-child relationship, they live as a list and not a tree, but the collection is still referred to as the Layer Tree, since it used to be one and the name hasn’t been consistently updated yet.
Elements that are scroll containers will create a set of special layers for things such as the borders, the scrolling contents, the scrollbars, etc. These are all managed by a class (CPP) called CompositedLayerMapping
.
Anti-Aliasing consists in blending edged pixels against their background, in an effort to “hide” the jagged edges of drawn polygons.
Since layers are isolated from each other, and since scrolling layers (scrollers in general) have a transparent backgroud, it is impossible to do sub-pixel anti-aliasing, since the rasterizer cannot tell the background color that this text will be put against.
The only available option in terms of anti-aliasing is then grayscale anti-aliasing, which consists in varying the levels of transparency around a shape to smooth out the edges.
This is taken into account by then engine when deciding when to composite. It is important to note that on high-DPI displays, sub-pixel AA tends to not be needed - so the trend is towards compositing more and more scrollers. In fact, in Android and ChromeOS all scrollers are being composited by default now.
Does enforcing sub-pixel AA through
-webkit-font-smoothing: subpixel-antialiased
block compositing on WebKit?
This step (lifecycle stage) consists in building the layer tree. This happens after Layout, and before Paint, on the main thread.
Each layer is painted separately, so each layer will have its own DisplayItemList along will all the paint ops that were produced when that specific layer was painted.
When the compositor draws a layer, it can apply various transformation properties. These informations are stored inside property trees, which qualify a layer, but are not directly part of it.
This decoupling of the layer and its transformation properties mean that in theory the compositor could apply these properties to any sequence of paint ops - even if it didn’t have a layer. This is an optimisation that’s not implemented yet as of 2020.
The property trees are built as part of the prepaint stage.
But in the future, the target is to create the layers after the painting stage. A technique referred to as composite after paint or CAP. This is still ongoing as of 2022, in an effort previously knows as Slimming Paint v2 .
Doing so will allow the compositor to make finer-grained, more flexible, decisions since more information will be available at that stage.
Decoupling the properties from the layers was a prerequisite of CAP. That way, the paint step can still have information regarding the paint properties, without the layers having been produced yet.
The outputs generated from the previous stages on the main thread need to be sent down to the compositor thread (impl) to be assembled into a single output. After the paint stage, a commit is executed to update copies of the layer tree and property trees in the compositor thread, updating its state to be aligned with the main thread’s state.
The main thread will then block until the impl thread is done executing the commit and updating its state. This is for synchronization reasons, so the main thread’s data structure can be read safely.
Since layers can be really big, and since layers raster independently from each other, the compositor divides the layers into tiles.
Tiles are the unit of raster work. They’re created on the compositor thread by the tile manager, and then scheduled in raster tasks to be picked up eventually by worker threads from the dedicated pool.
The tasks are then prioritized based on various factors, such as:
There are also various tiles for different zoom levels. If you’re zooming in, a higher-resolution tile is shown.
Once all the tiles are generated, the compositer thread generates DrawQuads, which are instructions on where - a screen location - to draw a given tile, taking into account all the transformations and effects applied to the layer by the property trees.
Each Quad references its associated tile’s rastered output in memory.
Quads are then composed together into a CompositorFrame
, which serves as the output of the renderer that gets sent to the GPU. These are the animation frames that are produced by the renderer process.
Since raster and drawing both happens on the compositor thread, but that raster happens asynchronously on a pool of worker threads, this implies that at that time the compositor thread might be available to start processing the next commit.
In that case, we’d like to continue drawing tiles from the current commit, while being able to raster the next commit.
This creates some complications though, since the tiles being rastered at that moment requires the current commit’s state. This is why the engine holds two trees - or two copies of the given layers - at the same time, the pending tree and the active tree. Those trees are a bundling of Layer Trees and Property Trees together.
Whenever the current commit is done being rastered fully, the pending tree can be sent to the active tree for drawing in an action called the activation.
Now that we’ve got complete animation frames from the renderer process, those are ready to be submitted to the GPU process.
There is only a single GPU process, but there might be multiple renderer processes; for instance in cases where the document contains iFrames which are rendered separately for security, but for which the combined outputs need to be stitched together.
There is also the browser process, which generates animations frames for the browser’s UI.
All of these sources (or surfaces ) submit their outputs to the display compositor, which lives within the GPU process and runs on the viz thread.
It is this thread that synchronizes all of theses frames together and understands the dependencies between each of them when surfaces are embedded inside each other.
Viz aggregates all these frames and is ultimately the one reponsible for issuing the graphics calls that finally displays the Quads from the compositor frame on the screen.
The output of viz is double-buffered, so it draws the Quads into a back buffer, and the OpenGL calls are then proxied from the viz thread to the GPU’s main thread over Command Buffers.
Those buffers wrap serialized OpenGL calls over the thread boundary, for which real GL calls are then issued by the decoder on the GPU’s main thread.
There is also a newer mode, not yet released on all platforms as of 2020, in which viz issues Skia calls - through a data structure called the Deferred Display List - instead of serialized OpenGL calls.
This means that the viz code and the transport format is then API-agnostic; the Skia backend, from the GPU thread, can then replay those instractions for OpenGL, just as well as Vulkan, and eventually more such as Metal, etc.
Finally, with the Quads drawn in that back-buffer, the last step for viz is to swap the front and the back buffer.
This means that the “back” buffer is the working buffer and the “front” buffer is the one that gets read from by the display process?
Finally, the pixels are on the screen for the user to see!
All credit for the source and/or referenced material goes to the named author(s).