I've been quite here on the forum for a while (mainly been active on the Numenta forum). Thought I would post a progress update.
I spent some time on the Reinforcement Learning problem, and linking it with HTM (which still I believe is the best starting technology for the seed AI). About this time last year I formalized a strategy, which I described in
this thread . For reference, here is a visualization of that strategy:

I've since discovered a couple of problems with this strategy.
One issue is with the reinforcement TP layer. The basic problem here is that although the pooling layer creates a representation depicting all expected future rewards/punishments, the result is that the system picks a future motor command with the most reward (skipping required previous motor commands). I spent some times putting in hacks to get around this, but think I need to tackle the problem from a different angle.
I've been reading neuroscience papers to see if I can get clues from biology. One interesting paper is
Computational Cognitive Neuroscience. An important clue is this diagram:

The dynamics columns are the key here. What I have developed so far models only an "integrator" dynamic. For the reinforcement learning problem, leveraging a "separator" dynamic is necessary. I've started looking more closely at
a thesis paper by Ali Kaan Sungur, who has modeled the basal ganglia and integrated with HTM.
The second issue is that the location signal is kind of a magic box without any description of where it comes from. For testing, it can be hard-coded, but it needs to be replaced with something generic. The latest
HTM School video introduces the concept of grid cells, which is definitely the correct starting point to solving this problem. I've also been keeping up with Numenta's work in this area. Jeff's
recent talk at Simons Institute highlights their latest thinking in the current round of research. The relevant connections:

I've also begun working on a new implementation of HTM in GoLang, which allows for multi-core processing, or distributed processing over a network. The basic design looks like this:

The basic idea is to integrate a cluster of shards which each process a sub-set of cells or minicolumns in parallel, and assemble their outputs into a completed SDR. This is done by a leveraging a module I'm calling the Sparse Representation Controller (SRC) which takes chunks of a representation and reassembles them:

An SRC acts as a message bus with receivers and transmitters. Shards that need to know complete activation SDRs register as receivers, and related shards register as transmitters to report their output. Once an SRC receives all outputs from transmitting shards, it constructs an activation SDR and transmits it to all registered receivers. Because only the resulting activation SDR is transmitted, the size of traffic within the cluster is small, and most of the processing within a shard can happen in parallel with a relatively small window required for synchronization.
The SRC for Spatial Pooling is a special case, because it needs to coordinate scoring across all shards. I've called this special SRC the Spatial Pooling Controller (SPC). The amount of traffic transmitted within a spatial pooling cluster is reduced by having the SP shards only report their top (sparsity * shard count) scores to the SPC. The SPC joins those and clips to just the top (sparsity) minicolumns. It then reports back only those winners which are relevant to each shard for learning.