PriML - Primitive Markup Language

PriML is a stab at making simplest possible general markup language. It is partially inspired by LISP's (lack of) syntax. There are only two rules to speak of. Balanced square brackets group characters together to form a list. Whitespace characters break words within a list apart. For example:


[list with some content]


[list of things [list can be nested]]

Tokenization

PriML can be tokenized with a regular expression: (\[|\]|[^\s\[\]]+). If you run it against first example on this page, you will get the following structure (expressed using JSON-like format):


["", "[", "", "list", " ", "with", " ", "some", " ", "content", "", "]", ""]

Note that this regex separates words from whitespaces in such a way that whitespaces are always at odd indices of the list. You can easily ignore them, or use the to fully reconstruct the original content of the list.

Parsing

It makes sense to represent PriML using primitives that have the best support in your language. In most scripting languages today that would be lists (dynamic array) and strings. Here is sample parser code in JavaScript:


function parse(text){
	let tokens = text.split(/(\[|\]|[^\s\[\]]+)/);
	let iterator = tokens[Symbol.iterator]();
	let result = structure(iterator);
	if (!iterator.next().done) throw 'There were unmatched closing brackets';
	return result;
}

function structure(tokens, root = true){
	let result = [];
	for (let token of tokens) switch (token) {
		case '[': result.push(structure(tokens, false)); break;
		case ']': return result;
		default: result.push(token);
	}
	if (root) return result;
	throw 'There were unmatched opening brackets';
}

After tokens are arranged in arrays, you can interpret the result using basic recursive descent algoritm.