23 Aug 2014 mbrubeck   » (Journeyer)

Let's build a browser engine! Part 4: Style

Welcome back to my series on building your own toy browser engine. If you’re just tuning in, you can find the previous episodes here:

This article will cover what the CSS standard calls assigning property values, or what I call the style module. This module takes DOM nodes and CSS rules as input, and matches them up to determine the value of each CSS property for any given node.

This part doesn’t contain a lot of code, since I’ve left out all the really complicated parts. However, I think what’s left is still quite interesting, and I’ll also explain how some of the missing pieces can be implemented.

The Style Tree

The output of the style module is something I call the style tree. Each node in this tree includes a pointer to a DOM node, plus its CSS property values:

    /// Map from CSS property names to values.
type PropertyMap = HashMap<String, Value>;

/// A node with associated style data.
struct StyledNode<'a> {
    node: &'a Node, // pointer to a DOM node
    specified_values: PropertyMap,
    children: Vec<StyledNode<'a>>,
}

  

What’s with all the 'a stuff? These are lifetime annotations, part of how Rust guarantees that pointers are memory-safe without requiring garbage collection. If you are not working in Rust you can safely ignore them; they aren’t critical to the meaning of this code.

We could add style information directly to the dom::Node struct from Part 1 instead, but I wanted to keep this code out of the earlier “lessons.” This is also a good opportunity to talk about the parallel trees that inhabit most layout engines.

A browser engine module often takes one tree as input, and produces a different but related tree as output. For example, Gecko’s layout code takes a DOM tree and produces a frame tree, which is then used to build a view tree. Blink and WebKit transform the DOM tree into a render tree. Later stages in all these engines produce still more trees, including layer trees and widget trees.

The pipeline for our toy browser engines will look something like this after we complete a few more stages:

In my implementation, each node in the DOM tree produces exactly one node in the style tree. But in a more complicated pipeline stage, several input nodes could collapse into a single output node. Or one input node might expand into several output nodes, or be skipped completely. For example, the style tree could exclude elements whose display property is set to 'none'. (Instead this will happen in the layout stage, because my code turned out a bit simpler that way.)

Selector Matching

The first step in building the style tree is selector matching. This will be very easy, since my CSS parser supports only simple selectors. You can tell whether a simple selector matches an element just by looking at the element itself. Matching compound selectors would require traversing the DOM tree to look at the element’s siblings, parents, etc.

    fn matches(elem: &ElementData, selector: &Selector) -> bool {
    match *selector {
        Simple(ref simple_selector) => matches_simple_selector(elem, simple_selector)
    }
}

  

To help, we’ll add some convenient ID and class accessors to our DOM element type. The class attribute can contain multiple class names separated by spaces, which we return in a hash table. [Note: The Rust types below look a bit hairy because we are passing around pointers rather than copying values. This code should be a lot more concise in languages that are not so concerned with this distinction.]

    impl ElementData {
    fn get_attribute<'a>(&'a self, key: &str) -> Option<&'a String> {
        self.attributes.find_equiv(&key)
    }

    fn id<'a>(&'a self) -> Option<&'a String> {
        self.get_attribute("id")
    }

    fn classes<'a>(&'a self) -> HashSet<&'a str> {
        match self.get_attribute("class") {
            Some(classlist) => classlist.as_slice().split(' ').collect(),
            None => HashSet::new()
        }
    }
}

  

To test whether a simple selector matches an element, just look at each selector component, and return false if the element doesn’t have a matching class, ID, or tag name.

    fn matches_simple_selector(elem: &ElementData, selector: &SimpleSelector) -> bool {
    // Check type selector
    if selector.tag_name.iter().any(|name| elem.tag_name != *name) {
        return false;
    }

    // Check ID selector
    if selector.id.iter().any(|id| elem.id() != Some(id)) {
        return false;
    }

    // Check class selectors
    let elem_classes = elem.classes();
    if selector.class.iter().any(|class| !elem_classes.contains(&class.as_slice())) {
        return false;
    }

    // We didn't find any non-matching selector components.
    return true;
}

  

Rust note: This function uses the any method, which returns true if an iterator contains an element that passes the provided test. This is the same as the any function in Python (or Haskell), or the some method in JavaScript.

When comparing two rules that match the same element, we need to use the highest-specificity selector from each match. Because our CSS parser stores the selectors from most- to least-specific, we can stop as soon as we find a matching one, and return its specificity along with a pointer to the rule.

    /// A single CSS rule and the specificity of its most specific matching selector.
type MatchedRule<'a> = (Specificity, &'a Rule);

/// If `rule` matches `elem`, return a `MatchedRule`. Otherwise return `None`.
fn match_rule<'a>(elem: &ElementData, rule: &'a Rule) -> Option<MatchedRule<'a>> {
    // Find the first (highest-specificity) matching selector.
    rule.selectors.iter().find(|selector| matches(elem, *selector))
        .map(|selector| (selector.specificity(), rule))
}

  

To find all the rules that match an element we call filter_map, which does a linear scan through the style sheet, checking every rule and throwing out ones that don’t match. A real browser engine would speed this up by storing the rules in multiple hash tables based on tag name, id, class, etc.

    /// Find all CSS rules that match the given element.
fn matching_rules<'a>(elem: &ElementData, stylesheet: &'a Stylesheet) -> Vec<MatchedRule<'a>> {
    stylesheet.rules.iter().filter_map(|rule| match_rule(elem, rule)).collect()
}

  

Once we have the matching rules, we can find the specified values for the element. We insert each rule’s property values into a HashMap. We sort the matches by specificity, so the higher specificity rules are processed after the lower ones and can overwrite their values in the HashMap.

    /// Apply styles to a single element, returning the specified styles.
fn specified_values(elem: &ElementData, stylesheet: &Stylesheet) -> PropertyMap {
    let mut values = HashMap::new();
    let mut rules = matching_rules(elem, stylesheet);

    // Go through the rules from lowest to highest specificity.
    rules.sort_by(|&(a, _), &(b, _)| a.cmp(&b));
    for &(_, rule) in rules.iter() {
        for declaration in rule.declarations.iter() {
            values.insert(declaration.name.clone(), declaration.value.clone());
        }
    }
    return values;
}

  

Now we have everything we need to walk through the DOM tree and build the style tree. Note that selector matching works only on elements, so the specified values for a text node are just an empty map.

    /// Apply a stylesheet to an entire DOM tree, returning a StyledNode tree.
pub fn style_tree<'a>(root: &'a Node, stylesheet: &'a Stylesheet) -> StyledNode<'a> {
    StyledNode {
        node: root,
        specified_values: match root.node_type {
            Element(ref elem) => specified_values(elem, stylesheet),
            Text(_) => HashMap::new()
        },
        children: root.children.iter().map(|child| style_tree(child, stylesheet)).collect(),
    }
}

  

That’s all of robinson’s code for building the style tree. Next I’ll talk about some glaring omissions.

The Cascade

Style sheets provided by the author of a web page are called author style sheets. In addition to these, browsers also provide default styles via user agent style sheets. And they may allow users to add custom styles through user style sheets (like Gecko’s userContent.css).

The cascade defines which of these three “origins” takes precedence over another. There are six levels to the cascade: one for each origin’s “normal” declarations, plus one for each origin’s !important declarations.

Robinson’s style code does not implement the cascade; it uses only a single style sheet. The lack of a default style sheet means that HTML elements will not have any of the default styles you might expect. For example, the <head> element’s contents will not be hidden unless you explicitly add this rule to your style sheet:

    head { display: none; }

  

Implementing the cascade should by fairly easy: Just track the origin of each rule, and sort declarations by origin and importance in addition to specificity. A simplified, two-level cascade should be enough to support the most common cases: normal user agent styles and normal author styles.

Computed Values

In addition to the “specified values” mentioned above, CSS defines initial, computed, used, and actual values.

Initial values are defaults for properties that aren’t specified in the cascade. Computed values are based on specified values, but may have some property-specific normalization rules applied.

Implementing these correctly requires separate code for each property, based on its definition in the CSS specs. This work is necessary for a real-world browser engine, but I’m hoping to avoid it in this toy project. In later stages, code that uses these values will (sort of) simulate initial values by using a default when the specified value is missing.

Used values and actual values are calculated during and after layout, which I’ll cover in future articles.

Inheritance

If text nodes can’t match selectors, how do they get colors and fonts and other styles? Through the magic of inheritance.

When a property is inherited, any node without a cascaded value will receive its parent’s value for that property. Some properties, like 'color', are inherited by default; others only if the cascade specifies the special value 'inherit'.

My code does not support inheritance. To implement it, you could pass the parent’s style data into the specified_values function, and use a hard-coded lookup table to decide which properties should be inherited.

Style Attributes

Any HTML element can include a style attribute containing a list of CSS declarations. There are no selectors, because these declarations automatically apply only to the element itself.

    <span style="color: red; background: yellow;">

  

If you want to support the style attribute, make the specified_values function check for the attribute. If the attribute is present, pass it to parse_declarations from the CSS parser. Apply the resulting declarations after the normal author declarations, since the attribute is more specific than any CSS selector.

Exercises

In addition to writing your own selector matching and value assignment code, for further exercise you can implement one or more of the missing pieces discussed above, in your own project or a fork of robinson:

  1. Cascading
  2. Initial and/or computed values
  3. Inheritance
  4. The style attribute

Also, if you extended the CSS parser from Part 3 to include compound selectors, you can now implement matching for those compound selectors.

To be continued…

Part 5 will introduce the layout module. I haven’t finished the code for this yet, so there will be another short before I can start writing the article. I plan to split layout into at least two articles (one for block layout and one for inline layout, probably).

In the meantime, I’d love to see anything you’ve created based on these articles or exercises. If your code is online somewhere, feel free to add a link to the comments below! So far I have seen Martin Tomasi’s Java implementation and Pohl Longsine’s Swift version.

Syndicated 2014-08-25 22:45:00 from Matt Brubeck

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!