Major tools that exist today for partial structure traversal and focused manipulation:
- Optics (Lenses, Prisms, Traversals)
Elegant, composable ways to zoom into, modify, and rebuild structures.
Examples: Haskell's `lens`, Scala's Monocle, Clojure's Specter.
Think of these as programmable accessors and updaters.
- Zippers Data structures with a "focused cursor" that allow local edits without manually traversing the whole structure.
Examples: Huet’s original Zipper (1997), Haskell’s `Data.Tree.Zipper`, Clojure’s built-in zippers.
- Query Languages (for semantic traversal and deep search) When paths aren't enough and you need semantic conditionals:
- SPARQL (semantic web graph querying)
- Datalog (logic programming and query over facts)
- Cypher (graph traversal in Neo4j)
- Prolog (pure logic exploration)
These approaches let you declaratively state what you want instead of manually specifying traversal steps.
(defn walk [inner outer form]
(cond
(list? form) (outer (with-meta (apply list (map inner form)) (meta form)))
(instance? clojure.lang.IMapEntry form)
(outer (clojure.lang.MapEntry/create (inner (key form)) (inner (val form))))
(seq? form) (outer (with-meta (doall (map inner form)) (meta form)))
(instance? clojure.lang.IRecord form)
(outer (reduce (fn [r x] (conj r (inner x))) form form))
(coll? form) (outer (into (empty form) (map inner form)))
:else (outer form)))
(defn postwalk [f form]
(walk (partial postwalk f) f form))
(defn prewalk [f form]
(walk (partial prewalk f) identity (f form)))
Another reason why this perlisism holds: 9. It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.
"Let's move on."I use it in every project for data navigation and transformation, and it's more performant than standard Clojure data manipulation, while retaining types (instead of coercing back from seqs).
E.g. if you have a map and you want to increment every value in the map: (require '[com.rpl.specter :as S])
``` (->> {:a 5, :b 6, :c 7} (S/transform [S/MAP-VALS] inc)) => {:a 6, :b 7, :c 8} ```
^ try to write that in normal clojure.
Now let's say you have a map of vectors and want to increment all of those? (->> {:a 5, :b 6, :c 7} (S/transform [S/MAP-VALS S/ALL] inc)) ;; note the navigator juts got another level of nesting => {:a [2 3], :b [4 5], :c [6 7]}works for all clj data types, and of course it has navigators for recursive walking .
It took me a while to get good at Specter, but it was worth it. I hear Rama uses Specter navigators internally.
# Expects:
# Tuple of iterateOn where iterateOn can be None
def fancyIt(init, *options):
if init != None:
yield(init)
for f in options:
newVal = f(init)
yield from fancyIt(newVal, *options)
class Tree:
def __init__(self, val, left = None, right = None):
self.val = val
self.left = left
self.right = right
def left(self):
return self.left
def right(self):
return self.right
myTree = Tree(
1,
Tree(2,
Tree(3),
Tree(4, None, Tree(5))),
Tree(6, None, Tree(7)))
for node in fancyIt(myTree, Tree.left, Tree.right):
print(node.val)
which prints the numbers 1 through 7 in order.Breadth-first is slightly trickier, but only slightly trickier one time.
For example the most basic operations of a pointer are to advance and dereference.
std::map is actually implemented as a tree. To iterator its members you can do
for (cost auto &pair : map)
The only requirement for your custom data structure to work is to implement begin() and end() which return iterators - “pointer like” objects.Here's the range based for loop counterexample from the blog post, as a python generator:
import itertools
def iterate_tree():
for length in range(1, 8):
for combo in itertools.product("abc", repeat=length):
yield ''.join(combo)
for s in iterate_tree():
print(s)
The same from another angle: there are a lot of trees in the indices of SQL databases (example [1]) but we don't zoom in to that level of detail very often when defining our tables.
One great reason not to use recursive functions for traversing trees is that you can allocate your own stack data structure rather than relying on the call stack itself. In most languages/runtimes, the call stack has a maximum depth which limits the depth of trees you can process, usually on the order of thousands of stack frames.
Managing your own stack usually produces weirder looking code (personally I find "naive" recursive approaches more readable) - but having it as a first-class language feature could solve that!
But tree traversal doesn't have this universal property. There are too many methods and purposes for traversing a tree, sufficient that IMHO no single primitive embodiment could materially improve a language. Also, modern compilers efficiently break down high-level traversal code so well that expressing the idea at a high level incurs no serious penalty compared to having a primitive for that purpose, or a series of them.
Candidates: Racket, Scheme, Rust.
I explored this idea with gstd, a standard library for graphs inspired by the JS Array interface: https://github.com/cdrini/gstd/
public static IEnumerable<TNode> DepthFirstTraverse<TNode>(
TNode root,
Func<TNode, IEnumerable<TNode>> children
)
You can then just ‘foreach’ over it.This doesn’t require the whole tree to be in-memory. Optimal implementation would recurse via programmer-controlled stack (instead of language-controlled) to avoid nested iterators.
Here is a TypeScript version:
function* traverseDepthFirst<T>(
root: T,
children: (parent: T) => Iterable<T>
)
A similar approach should be possible in any language which supports iterators/generators.OTOH I read/heard that the beauty of array languages is that they have few data types, but they can be easily worked on. So maybe the answer is not easier tree traversal primitives, but better literature/training on how to transform it in a more manageable data type.
Sometimes the answer is to adapt our vision to our world, sometimes the answer is to adapt the world to our vision.
foreach (Node node in EnumerateNodes(root, x => x != null, x => [x.Left, x.Right]))
where EnumerateNodes uses `yield return` (i.e. is a generator) and calls itself recursively. Though it'd probably be easier / better performance to write an implementation specific to each node type.I guess/expect that will have to expose the implementation detail of the branching factor (yes, a binary tree is a special-case of a https://en.wikipedia.org/wiki/B-tree for B = 2, but algorithms tend to be different)
On top of that, build a library of tree traversal routines (breadth first, depth first with a few variants), allowing the user to specify code that decides, on each node visited, whether to navigate further down and whether to stop the traversal.
Now, whether the language proper needs a generic way to specify tree traversals? I doubt it, and doubt even more that it is possible too that in a way so that a) most graph traversal algorithms can use the primitive and b) the result is easier to use/looks significantly nicer than just calling those routines.
This kind of imperative iteration seems better served by the traditional visitor design pattern: more verbose (more explicit, not more complex) and more general.
for recursive (Node t = tree.root; t != NULL;) {
puts(t.value);
if (t.value == target) break;
if (t.value == dontfollow) continue;
if (t.left) continue t.left;
if (t.right) continue t.right;
}
return t;
Regular 'break' is to really break out of the structure like a regular for, as regular 'continue' is to do the next iteration. But if continue has a value to recurse on, it reenters the for loop like a subroutine.As a bonus, I think this is tail-call-optimization friendly.
For exposition, lets seperate loop initialization from the rest, make `continue` mandatory to repeat the loop with another specified value (i.e. the loop implicitly ends without a continue) and lets say `descend` to recurse into the tree.
sum = 0;
loop (node = tree.root) {
sum += node.value;
if (node.left) descend node.left;
if (node.right) descend node.right;
}
Compare with a linear loop: sum = 0;
loop (i = 0) {
if (i >= array.length) break;
sum += array[i];
continue i + 1;
}
The `descend` keyword differs from `continue` in that its more of a continuation - control flow will continue from that point once it is done.You could make this more functional and less imperative (and I'd be surprised if such things don't exist as libraries in some functional languages). Yes ultimately we can transform this to recursive functions and compile it that way... but for imperative languages (especially those without closures) something like the above might be useful.
Arguably the above loops are _more_ imperative and less declaritive than the standard ones from the C family. You could add a functional twist by breaking with a value (which is the result of the overall `loop` expression).
[1]: https://lean-lang.org/doc/reference/latest//The-Type-System/...
But I think grafting those two together is the right answer, not inventing a new loop construct.
Trees are easy to write iterators for. DAGs are a bit harder and full graphs are an advanced interview question.
Not as ergonomic as a direct tree-iterator, but I can't see of an elegant way to introduce that in an imperative language while keeping the forking/recursion aspect clear
func Search(Root : Tree; Desired_Value : Tree_Value)
-> optional Tree_Id is
var Identifier : optional Tree_Id := null;
for T => Root then Left(T) || Right(T)
while T not null concurrent loop
if Value(T) == Desired_Value then
// Found desired node,
// exit with its identifier
exit loop with Identifier => Key(T);
end if;
end loop;
return Identifier;
end func Search;
I actually don't use this feature much, because usually there is a key of some sort that determines which side of the tree is of interest.It allows you to specify traversal in a structure-shy way. Personally, I think it goes a little too far, but it's definitely a good inspiration.
[1] https://www2.ccs.neu.edu/research/demeter/
Presentation: https://www2.ccs.neu.edu/research/demeter/talks/overview/qua...
var todo = new List<T>();
todo.append(root);
while (var item = todo.pop_front()) {
todo.append(item.left); // or .prepend for depth-first
todo.append(item.right); // or .prepend()
// do stuff...
}
// Comments for the non-Rust native reader, regarding this Function declaration:
// successors is a function that accepts an `Option` container for some Value of type T, called `first`
// and a Closure called `succ`, constrained below:
pub fn successors<T, F>(first: Option<T>, succ: F) -> Successors<T, F> ⓘ
where
// `succ` must receive the iterated state, and return the next iterated state
F: FnMut(&T) -> Option<T>,
// Each time the `next()` function is called on the returned Iterator (a Successors-flavored iterator),
// the state of `first` is yielded, and then
// `succ` is called to progress
// until a `None` type is reported by `succ`
I'm not sure where the concept came from, but it's not dissimilar to the author's implementation, but instead of the ControlFlow enum, it relies simply on the Option enum. I know though, that it was initially built in the Itertools crate as unfold and then upstreamed some time later.Essentially you use `first` to contain a Queue, Stack, or Level for the different traversals, and define traversal or activities from there.
It's fairly ergonomic in practice, ergonomic enough for Leetcode.
Here's a BFS: https://leetcode.com/problems/course-schedule-iv/solutions/6...
[0] https://doc.rust-lang.org/std/iter/fn.successors.html
[1] https://docs.rs/itertools/latest/itertools/fn.unfold.html
type Node = { value: Any, left: Node, right: Node }
type Direction = Left | Right
type TreePosition = { root: Node, currentNode: Node = root, position: Direction[] = [] }
# Implementation left as an exercise but should be obvious and run in O(1), I believe. Returns Nothing when we're out of nodes.
function nextPosition(position: TreePosition): Option<TreePosition>
# The tree you want to iterate through
const myTree: Node = ...
# The loop
for(let position: TreePosition? = TreePosition(root: myTree); position != Nothing; position = nextPosition(position) {
node = position!.currentNode
# Your loop code
}
I'd argue this doesn't belong as a language-level feature, but maybe an API/stdlib-level feature.I think it's a better idea to write a function that applies a function to the elements of a tree. Ideally you'd make a function for each traversal order. This makes it obvious what is happening.
map_tree_pre(my_tree, my_func);
map_tree_in(my_tree, my_func);
map_tree_post(my_tree, my_func);
map_tree_bfs(my_tree, my_func);
I think allowing for starting this kind of search from a given node would cover most (though not all) of what you'd want to expose the trees structure for directly.
> This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Also I’m slightly confused by this example.
for_tree(string x = ""; x.size() <= 8; x : {x+"a", x+"b", x+"c"}){
print(x);
}So, our “next node” operation is to concatenate to x. Won’t we either have to have a method for modifying x to go “up” a node, or we’ll have to keep a record of what x was upon entering each node? Like in this example we’ll end up with x=“aaaaaaaa” and then go up a node, over to a “b” node, and get x=“aaaaaaaab”, right?
We don't have syntax and semantics for recursion schemes in any programming language - it's always deferred to library support at best. As far as I'm concerned, this is an open problem in programing language design where we finally replace the vestigial recursion hack with a proper structured programming solution.
Or for a possibly more useful comment, constant space tree traversal is genuinely quite difficult to implement. I don't know a general solution to it (other than walk from the start N times, quadratic style), would be interested to hear of one.
EnScript had a forall(NodeClass n in tree.GetRoot(){} construct that was very easy. It was essentially a depth-first iterator.
If this becomes a C++ feature, imagine how many data structures we would need to support?
Many other languages, specially the FP languages, allow to do that as a library. Even the languages that are only inspired by FP. Example, Ruby:
class BinTree
include Enumerable
def initialize v, l, r
@v, @l, @r = v, l, r
end
def each &block
@l.each(&block) unless @l.nil?
yield @v
@r.each(&block) unless @r.nil?
end
end
Using the Enumerable mixin includes many FP-based methods, such as map, filter and reduce by only defining each, which in this case is DFS.Then we can proceed to define a binary tree:
tree = BinTree.new(
1,
BinTree.new(
2,
BinTree.new(4, nil, nil),
BinTree.new(5, nil, nil)
),
BinTree.new(
3,
BinTree.new(6, nil, nil),
BinTree.new(7, nil, nil),
)
)
Iterate over all the elements: tree.each{|v| puts v}
Iterate over the even elements: tree.filter{|v| v.even?}.each{|v| puts v}
Stop iteration when finding a value: tree.each do |v|
break if v == 1
puts v
end
And so on. The same can be done in Python, Kotlin and many others.Const km = 1 Const velocity = 11
And do km + velocity = 12 is all kinds of silly.
A TXR Lisp macro for-tree for traversing a tree, given a variable, an expression for the root node, and expressions for how to to left and right relative to the variable.
A node structure that for-tree knows nothing about. No iterator abstraction or API, nothing.
(defstruct node ()
value
left
right)
Example tree (a balanced binary tree whose numeric values happen to be sorted): (defvar example-tree #S(node value 5
left #S(node value 3
left #S(node value 1)
right #S(node value 4))
right #S(node value 7
left #S(node value 6)
right #S(node value 9)))
Walk it: (for-tree nod example-tree nod.left nod.right
(prinl nod.value))
1
3
4
5
6
7
9
Code: (defmacro go-leftmost (var root left-expr stack)
^(for ((,var ,root)) (,var) ((set ,var ,left-expr))
(push ,var ,stack)))
(defmacro for-tree (var root left-expr right-expr . body)
(with-gensyms (stack node left right)
^(let (,stack)
(go-leftmost ,var ,root ,left-expr ,stack)
(for ((,var (pop ,stack)))
(,var)
((iflet ((,right ,right-expr))
(go-leftmost ,var ,right ,left-expr ,stack))
(set ,var (pop ,stack)))
,*body))))
The left-expr and right-expr must be expressions that involve the var variable; the construct evaluates these to find the left or right child. When the value is nil it means there is no child.This is almost exactly what the blog is asking for, transliterated to Lisp syntax.
Original concept in fantasy C syntax:
for_tree(Node* N = mytreeroot; N != NULL; N : {N->left, N->right}
{
print(N->value);
}
Lispified: (for-tree nod example-tree nod.left nod.right
(prinl nod.value))
As required, the construct specifies the node variable to be the loop dummy, the root expression giving its initial value and the two accessor expressions for left and right traversal.The termination test N != NULL is made implicit in the Lisp version, so early termination requires a break out of the loop. It could be arranged.
The fantasy C syntax specifies a variable name in the navigation declarator: N : { N->left, N->right }. Presumably, the variable here can be renamed to anything you want; it just arbitrarily happens to have the same name as the loop dummy.
I didn't replicate this feature exactly because it just adds verbosity for nothing. It's okay if the navigation declarator just refers to the loop's one and only dummy variable.
Anyway, programming languages should definitely not have a tree traversal primitive. Rather, languages should be Lisp.
Both looping and tree traversal can be done with library functions.
In general, everything that can be done with library functions, should be done with library functions.
> Doesn't for_tree(...) look a lot nicer and simpler and less error prone than needing to implement a recursive function for each operation you would want to do on a tree?
Eh, in Haskell you can derive many of these things automatically. And even Rust has some automatic derivation.
Bonus: Java's HotSpot magick will NOP (most?) methods of Null Objects, making this a zero cost abstraction.
I should probably write a concrete example for a blog or something.
TLDR: For every base class such as TreeNode, create a NullTreeNode that does nothing, then replace all uses of null with NullTreeNode. Voilá, no more null checks or NPEs.
So something similar to `break` and `continue` should be the method of branching (descending into the tree).
I think Scala might be a great language to prototype implementation of such construct.
> Doesn't for_tree(...) look a lot nicer and simpler and less error prone than needing to implement a recursive function for each operation you would want to do on a tree?
No it does not
Web browsers have this. It’s the DOM and it scares the shit out of most JavaScript developers. The fun part is confronting people about their irrational fears watching them spin their wheels with all the imagined excuses.
I am a tremendous fan of tree systems. Tree systems should completely replace inheritance in programming to enforce the composition over inheritance principle.
I will be transparent here, I do not know of a non recursive, tree traversal alogorthim that uses constant storage but doesn't modify the tree.. (Morris Traversal satisfies 2 but makes temporary changes to the tree)
Most primatives do not allocate hidden data (even C++ iterators require you to instantiate the iteration state as a variable) and operate within a known stack/execution context.
Having an actual primitive that implements arbitrary tree traversal would require a substantial complier generated, but non-transparent, framework for that execution. It would either need to be recursive, have localized dynamic data, or would require tree structures that are well defined to use something like Morris. (And Morris is natively non-concurrent even for reads)
While this means that that simple primitive would actually do a lot of potentially problematic things under the hood. How many Junior programmers would call it concurrently for reads because it's not "modifying the data". Why do you need to lock? Things like brake and continue and other control mechanisms wouldn't actually work as assumed from the flat context.... I.E what happens if somebody called a goto inside of that block? Also, in an embedded context where you may not have a dynamic allocator, how would a version written as in reiterative function with a expanding stack data structure function? From a language standard point of view, there are so many complexities to building something like this that runs in any context that the language is supported. Something like this that runs in any context that the language is supported.
I mean look at C++, they have specifically and (deliberately / arguably rightfully) failed to build copy on write strings (a simpler abstraction) because there are cases where they couldn't guarantee that they would have the expected results in all contexts.
This to me sounds more like something that should be a lambda function and a standard library, not a primitive... Languages that support dictionaries and maps as part of their standard data structures already have library support for tree traversal internally. They have solved many of the problems of making that language exposed, so that would be a much smaller ask. And programmers rightfully assume a lot more complexity within library calls then they do within primitives. Primitives are designed to be primitive. Having primitives able to create stack frames, dynamically allocate hidden memory, and/or recurse honestly worries me.
Programming languages should have less primitives like this and instead we should have better foundation libraries for the languages, i.e., containing iterator/-interfaces like Rust and Python (or Smalltalk).
1. BFS is not supported.
2. Traversal type (inorder, preorer, postorder, or even mixed order!) is also not handled.
3. The syntax doesn't make it apparent that stack overflow can occur, e.g. by doing DFS on a linked list.