I recently revisited "A Thousand Brains". I read this book once last year and wrote a review at that time:
, to continuously improve this world model.
This model bears a striking resemblance to the "world model" described in A Thousand Brains, as both are attempting to understand and construct environmental cognition in smarter ways.
Define
forming a location-based cognitive system.
The transition from text-based models to 3D data-based models reflects the overall trend of AI development in recent years: from understanding and generating language (text models) to interpreting and creating static and dynamic images (2D visual models), and then to the current rapid development of modeling the three-dimensional appearance of objects (3D visual models).
" capabilities, which allow us to infer details based on similar scenes we have seen before. For machines, however, this task is extremely complex: even the most advanced AI models today struggle to complete scene filling or imagine a place from a new angle. Nevertheless, spatial intelligence will break through this limitation and become the next frontier of AI development.
Visual Positioning System (VPS)
. Through a photo taken by a mobile phone, VPS can use a three-dimensional map constructed from interesting locations scanned by users in games and Scaniverse to determine the position and orientation of the device.
,covering more than one million locations worldwide. In LGM's vision, each local model will contribute to the global large model, achieving understanding and cognition of places that have not been fully scanned.
Characteristics
Centimeter-level accurate positioning enables a seamless integration of digital content with reality.
Not only can users accurately locate themselves within the physical environment, but they can also see digital content that seamlessly blends with their surroundings. This content is persistent: it remains at the designated location even after the user leaves and can be shared with others. For example, a recent experimental feature introduced in Pokémon GO called Pokémon Playgrounds allows users to place Pokémon at specific locations, where other players can then see and interact with these Pokémon at the same spot.
Unique data sources contribute to building a high-precision understanding of the world.
Niantic's VPS system relies on location data scanned by users from different perspectives and at different times, including multiple time periods within a day and accumulated over years. These scan data, accompanied by precise positioning information, help create a highly refined understanding of the world. More uniquely, these data are obtained from a pedestrian's perspective, including many places inaccessible to vehicles, providing the system with rich details and a unique viewpoint.
Promising future application scenarios
As VPS technology continues to mature, users will be able to experience more realistic, persistent, and shared augmented reality content. This capability will not only shine in the entertainment sector but also drive innovative developments in navigation, content creation, and social interaction, laying the foundation for building a future world where the virtual and real are integrated.
Core advantages
The core of LGM lies in extracting a unified global cognition from geographical and visual data. By interpolating and extrapolating data globally, LGM can compensate for blind spots in local models, enhancing the coverage and accuracy of localization. This ability to "infer locally from the global" makes LGM an important foundation for future spatial intelligence.
The ability to internalize the concept of a "church," not only understanding its structural characteristics but also inferring the possible forms of different churches; Even if the local model of a certain location only captures the main entrance of a church, LGM can intelligently infer the appearance of the rear of the building through global church data; LGM is capable of localization from perspectives and angles that have not been observed by the VPS, which cannot be achieved in a local model.
Human-like Understanding: From Machine Vision to Spatial Intelligence
The above process bears great resemblance to how humans perceive and imagine the world. As humans, we can easily recognize things we've seen before, even when viewed from different angles. For example, when navigating the winding streets of an old European city, we can quickly identify all the key intersections, even if we’ve only walked through once and our current perspective is the opposite of the original one. This ability stems from our deep understanding of the physical world and cultural spaces.
This understanding involves recognizing basic natural laws: the world consists of entities with fronts and backs, and the appearance of objects changes over time and seasons. Additionally, it requires some cultural knowledge: many man-made objects follow specific symmetry rules or universal layout patterns, which often vary by geographic region.
MicKey: An Early Exploration as a Proof of Concept
MicKey shows great potential. It can accurately determine the positional relationship between two camera views even under extreme changes in perspective. Although MicKey is trained on only a small portion of the total data and supports only two-view inputs, it still provides an important proof-of-concept for the potential of LGM.
Notably, achieving such geographic intelligence requires vast amounts of geospatial data, which is a resource that many organizations cannot easily access. Thanks to over one million real-world location scans contributed by users each week, Niantic holds a unique advantageous position in building LGM.
high-level understanding driven by machine learning
Early computer vision research attempted to manually parse the rules of the world through hard coding, but practice has shown that in order to meet expectations for LGM, it is only possible with large-scale machine learning. Niantic's LGM project is based on this concept, integrating geographical and visual data from around the globe.
From the initial signs of camera positioning capabilities demonstrated by MicKey to the future of comprehensive geographic intelligence, Niantic is gradually moving towards the goal of achieving human-like spatial understanding. This breakthrough is not only an inevitable trend of technological development, but also lays the foundation for the future of global geographic intelligence and augmented reality technology.
The wide application of LGM
As more scalable models are developed, Niantic's goal remains to lead the development of LGM, ensuring that these models can be effective wherever they can provide users with novel, interesting, and meaningful experiences. LGM will have a wide range of applications across multiple fields, including gaming (especially AR games), spatial planning and design, logistics management, audience engagement, and remote collaboration.