JoseRFJunior/TransNAR https://github.com/JoseRFJuniorLLMs/TransNAR https://arxiv.org/html/2406.09308v1 TransNAR hybrid architecture. Similar to Alayrac et al, we interleave existing Transformer layers with gated cross-attention layers which enable information to flow from the NAR to the Transformer. We generate queries from tokens while we obtain keys and values from nodes and edges of the graph. The node and edge embeddings are obtained by running the NAR on the graph version of the reasoning task to be solved. When experimenting with pre-trained Transformers, we initially close the cross-attention gate, in order to fully preserve the language model’s internal knowledge at the beginning of training.
JoseRFJunior/TransNAR https://github.com/JoseRFJuniorLLMs/TransNAR https://arxiv.org/html/2406.09308v1 TransNAR hybrid architecture. Similar to Alayrac et al, we interleave existing Transformer layers with gated cross-attention layers which enable information to flow from the NAR to the Transformer. We generate queries from tokens while we obtain keys and values from nodes and edges of the graph. The node and edge embeddings are obtained by running the NAR on the graph version of the reasoning task to be solved. When experimenting with pre-trained Transformers, we initially close the cross-attention gate, in order to fully preserve the language model’s internal knowledge at the beginning of training.
This is the closest I’ve seen of a scalable AI/LLM Operating System - it has all the major ingredients of a feasible AI OS 1 architecture:
- Extends classical OS functionalities with an LLM Kernel. - Multi agent-centric approach. - Optimized resource allocation system that allows for LLM-based tasks and Classical OS tasks to coexist. - An Agent Scheduler that can perform classical os operations (FIFO, RR). - A Context Manager to improve alignment. - Lazy Memory Manager for agents (ensures data is stored and accessible only while the agent is active) - An Enhanced security module for the AI-driven environment.
It does hit all checkpoints, doesn’t it? An upscale version of @karpathy’s.