I have been active on Numenta's forum lately, but keep forgetting to post on my own forum

Let me give a quick progress update on how things are going.
The biggest epiphany for me came from realizing that the concepts of "imagination" and "curiosity" (which were the most biologically implausible elements of my original design) can be simulated by existing functions of a spatial pooler.
Spatial poolers currently simulate inhibition by selecting a percentage of columns that best connect to the current input space, and only those columns activate. A slight modification of this function allows it to replace my earlier concept of "imagination" -- selecting a percentage of columns that best connect to the most positive reinforcement input space, and only those activate. The columns in the motor layer map to the motor commands, so the winning columns drive what actions are taken.
Spatial poolers also have a function for "boosting", which allows columns that haven't been used in a while to slowly accumulate a higher score, and eventually win out over other columns that have been used more frequently. This can be used to replace my earlier concept of "curiosity". Actions the system hasn't tried in a while, such as new actions or those which previously resulted in a negative reinforcement, will eventually be tried again, allowing the system to explore and re-attempt actions that could lead to new outcomes.
I drew up a diagram to help visualize what the current design looks like:

The sequence and feature/location layers are complimentary -- both using the same spatial pooler (same columns activate for both layers) -- i.e. both receiving proximal input from the sensors. The sequence layer receives distal input from other cells in its own layer, while the feature/location layer receives distal input from an array of cells representing an allocentric location.
The motor layer receives proximal input from the reinforcement layer, via the modified spatial pooler which chooses a percentage of motor columns which have the highest reinforcement score with boosting. This layer receives distal input from active cells in both the sequence layer and the feature/location layer. Columns represent motor commands, while cells in the column represent the sensory context.
Columns in the reinforcement layer represent how positive or negative a reinforcement is. In my implementation, I am using columns to the left to represent more negative reinforcement, while columns to the right represent more positive reinforcement (with columns near the center being neutral). This is just to make it easier to visualize. Columns represent positivity/negativity, and cells in the columns represent sensory-motor context. Cells in this layer receive distal input from active cells in the motor layer.
My current design utilizes a two-layer circuit to pool reinforcement input. This tweak eliminates the need to extend reinforcement predictions backwards through time (handled now by a function of the temporal pooler), allowing the implementation to align even more closely with traditional HTM concepts. Output from the reinforcement pooling layer is passed through the modified spatial pooler, which chooses a percentage of the motor columns which best map to the most positive reinforcement, with boosting.
There is still some more tweaking to do, but it is definitely starting to come together. The most recent changes I got from watching the
HTM Chat with Jeff. One is the association of the sequence and feature/location layers. Location input itself, however, is currently just an array of input cells representing an allocentric location, which the feature/location layer connects to distally. Egocentric location is still missing, as well as tighter feedback between the two regions. The other idea from Jeff's slides is the two-layer circuit which gave me the idea for configuring reinforcement feedback with a pooling layer.