parsedown/docs/Migrating-Extensions-v2.0.md
Aidan Woods 6fdcfaa14a
Add initial documentation
This isn't yet complete, but is a decent start
2021-12-08 20:38:35 +00:00

10 KiB

Implementing "Extensions" in v2.0

Parsedown v1.x allowed extensability through class extensions, where an developer could extend the core Parsedown class, and access or override any of the protected level methods and variables.

Whilst this approach allows huge breadth to the type of functionality that can be added by an extension, it has some downsides too:

  • ### Composability: extensions cannot be combined easily An extension must extend another extension for two extensions to work together. This limits the usefulness of small extensions, because they cannot be combined with another small or popular extension. If an extension author wishes the extension to be compatible with another extension, they can only pick one.

  • API stability

    Because extensions have access to functions and variables at the protected API layer, it is hard to determine impacts of internal changes. Yet, without being able to make a certain amount of internal change it is impractical to fix bugs or develop new features. In the 1.x branch, 1.8 was never released outside of a "beta" version for this reason: changes in the protected API layer would break extensions.

In order to address these concerns, "extensions" in Parsedown v2.0 will become more like "plugins", and with that comes a lot of flexability.

ParsedownExtra is a popular extension for Parsedown, and this has been completely re-implemented for 2.0. In order to use ParsedownExtra with Parsedown, a user simply needs to write the following:

$Parsedown = new Parsedown(new ParsedownExtra);
$actualMarkup = $Parsedown->toHtml($markdown);

Here, ParsedownExtra is composed with Parsedown, but does not extend it.

A key feature of composability is the ability to compose multiple extensions together, for example another extension, say, ParsedownMath could be composed with ParsedownExtra in a user-defined order.

This time using the ::from method, rather than the convinence constructor provided by ParsedownExtra.

$Parsedown = new Parsedown(ParsedownExtra::from(ParsedownMath::from(new State)));
$Parsedown = new Parsedown(ParsedownMath::from(ParsedownExtra::from(new State)));

Introduction to the State Object

Key to Parsedown's new composability for extensions is the State object.

This name is a little obtuse, but is importantly accurate.

A State object incorporates Blocks, Inlines, some additional render steps, and any custom configuration options that the user might want to set. This can fully control how a document is parsed and rendered.

In the above code, ParsedownExtra and ParsedownMath would both be implementing the StateBearer interface, which essentially means "this class holds onto a particular Parsedown State". A StateBearer should be constructable from an existing State via ::from(StateBearer $StateBearer), and reveals the State it is holding onto via ->state(): State.

Implementing the StateBearer interface is strongly encouraged if implementing an extension, but not necessarily required. In the end, you can modify Parsedown's behaviour by producing an appropriate State object (which itself is trivially a StateBearer).

In general, extensions are encouraged to go further still, and split each self-contained piece of functionality out into its own StateBearer. This will allow your users to cherry-pick specific pieces of functionality and combine it with other functionality from different authors as they like. For example, a feature of ParsedownExtra is the ability to define and expand "abbreviations". This feature is self-contained, and does not depend on other features (e.g. "footnotes").

A user could import only the abbreviations feature from ParsedownExtra by using the following:

use Erusev\ParsedownExtra\Features\Abbreviations;

$State = Abbreviations::from(new State);

$Parsedown = new Parsedown($State);
$actualMarkup = $Parsedown->toHtml($markdown);

This allows a user to have fine-grained control over which features they import, and will allow them much more control over combining features from multiple sources. E.g. a user may not like the way ParsedownExtra has implemented the "footnotes" feature, and so may wish to utilise an implementation from another source. By implementing each feature as its own StateBearer, we give users the freedom to compose features in a way that works for them.

Anatomy of the State Object

The State object, generically, consists of a set of Configurables. The word "set" is important here: only one instance of each Configurable may exist in a State. If you need to store related data in a Configurable, your Configurable needs to handle this containerisation itself.

State has a special property: all Configurables "exist" in any State object when retrieving that Configurable with ->get.

This means that retrieval cannot fail when using this method, though does mean that all Configurables need to be "default constructable" (i.e. can be constructed into a "default" state). All Configurables must therefore implement the static method initial, which must return an instance of the given Configurable. No initial data will be provided, but the Configurable must arrive at some sane default instance.

Configurables must also be immutable, unless they declare themeslves otherwise by implementing the MutableConfigurable interface.

Blocks

One of the "core" Configurables in Parsedown is BlockTypes. This contains a mapping of "markers" (a character that Parsedown looks for, before handing off to the block-specific parser), and a list of Blocks that can begin parsing from a specific marker. Also contained, is a list of "unmarked" blocks, which Parsedown will hand off to prior to trying any marked blocks. Within marked blocks there is also a precedence order, where the first block type to successfully parse in this list will be the one chosen.

The default value given by BlockTypes::initial() consists of Parsedown's default blocks. The following is a snapshot of this list:

const DEFAULT_BLOCK_TYPES = [
    '#' => [Header::class],
    '*' => [Rule::class, TList::class],
    '+' => [TList::class],
    '-' => [SetextHeader::class, Table::class, Rule::class, TList::class],
...

This means that if a - marker is found, Parsedown will first try to parse a SetextHeader, then try to parse a Table, and so on...

A new block can be added to this list in several ways. ParsedownExtra, for example, adds a new Abbreviation block as follows:

$BlockTypes = $State->get(BlockTypes::class)
    ->addingMarkedLowPrecedence('*', [Abbreviation::class])
;

$State = $State->setting($BlockTypes);

This first retrieves the current value of the BlockTypes configurable, adds Abbreviation with low precedence (i.e. the back of the list) to the * marker, and then updates the $State object by using the ->setting method.

Immutability

Note that the ->setting method must be used to create a new instance of the State object because BlockTypes is immutable, the same will be true of most configurables. This approach is preferred because mutations to State are localised by default: i.e. only affect copies of $State which we provide to other methods, but does not affect copies of $State which were provided to our code by a parent caller.

Localised mutability allows for more sensible reasoning by default, for example (this time talking about Inlines), the Link inline can enforce that no inline Urls are parsed (which would cause double links in output when parsing something like: [https://example.com](https://example.com)). This can be done by updating the copy of $State which is passed down to lower level parsers to simply no longer include parsing of Urls:

$State = $State->setting(
    $State->get(InlineTypes::class)->removing([Url::class])
);

If InlineTypes were mutable, this change would not only affect decendent parsing, but would also affect all parsing which occured after our link was parsed (i.e. would stop URL parsing from that point on in the document).

Another use case for this is implementing a recursion limiter (which is implemented as a configurable). After a user-specifiable max-depth is exceeded: further parsing will halt. The implementaion for this is extremely simple, only because of immutability.

Mutability

The preference toward immutability by default is not an assertion that "mutability is bad", rather that "unexpected mutability is bad". By opting-in to mutability, we can treat mutability with the care it deserves.

While immutabiltiy can do a lot to simplify reasoning in the majority of cases, there are some cirumstances where mutability is required to implement a specific feature. An exmaple of this is found in ParsedownExtra's "abbreviations" feature, which implements the following:

final class AbbreviationBook implements MutableConfigurable
{
    /** @var array<string, string> */
    private $book;

    /**
     * @param array<string, string> $book
     */
    public function __construct(array $book = [])
    {
        $this->book = $book;
    }

    /** @return self */
    public static function initial()
    {
        return new self;
    }

    public function mutatingSet(string $abbreviation, string $definition): void
    {
        $this->book[$abbreviation] = $definition;
    }

    public function lookup(string $abbreviation): ?string
    {
        return $this->book[$abbreviation] ?? null;
    }

    /** @return array<string, string> */
    public function all()
    {
        return $this->book;
    }

    /** @return self */
    public function isolatedCopy(): self
    {
        return new self($this->book);
    }
}

Under the hood, AbbreviationBook is nothing more than a string-to-string mapping between an abbreviation, and its definition.

The powerful feature here is that when an abbreviation is identified during parsing, that definition can be updated immediately everywhere, without needing to worry about the current parsing depth, or organise an alternate method to sharing this data. Footnotes also make use of this with a FootnoteBook, with slightly more complexity in what is stored (so that inline references can be individually numbered).