LoGeR – 3D reconstruction from extremely long videos (DeepMind, UC Berkeley)

loger-project.github.io

・

119 points

・

helloplanets

・

11 hours ago

27 comments

priowise ・ an hour ago

very interesting direction. One thing I’m curious about with extremely long videos is how you handle temporal drift over time. Do you periodically re-anchor the reconstruction or rely purely on accumulated frame consistency?

quadrature ・ 30 minutes ago

In a traditional SLAM pipeline you do periodically fix drift by detecting when you've visited an area that you've mapped before this lets you align your sub maps so they are globally consistent.
In the areas you have visited previously you have two estimates of your position one from your frame-to-frame estimates and another from the map you built of the area the first time. You can then solve an optimization problem to bring those two estimates closer together.
In order to find out if you've already visited an area you store a description of the locations in a DB and search through them. The paper says they use a compressed representation of the "maps" and use test time training to optimize the global consistency between their sub maps.

IshKebab ・ 10 hours ago

Very cool. Doesn't seem like they've actually released the code:

> This is a reimplementation of LoGeR; complete code and models will be released upon approval.

I don't understand why it's a reimplementation either?

I would guess it's "research" code anyway so not really usable unless you are an expert.

tmilard ・ 9 hours ago

Very interesting paper. I can see street-view using it to perfect the 3D analysing of the photo-video they catch with there google-car. What a wonderfull time we are living in ! Specificaly in the Video to 3D reconstruction. Every month, a new brick is put in place.Super

wumms ・ 8 hours ago

Street View cars added Velodyne LiDAR around 2017 [0][1], but it's optional. I found no data on 'LiDAR vs image only'-percentage.
[0] https://arstechnica.com/gadgets/2017/09/googles-street-view-...
[1] https://en.wikipedia.org/wiki/Google_Street_View

_fw ・ 8 hours ago

This is like something straight out of Cyberpunk 2077 - the braindances investigation scenes.

Karliss ・ 7 hours ago

More like the opposite. Point cloud data captured with varying means has existed for a long time with raw data visualized more or less just like this. And SciFi movies/games use the effect of raw visualization as something futuristic/computer tech looking. Just like wireframe on black background, although that one is getting partially downgraded to more retro scifi status since drawing 3d wireframe isn't hard anymore. It started when any 3d computer graphics even basic wireframe was futuristic and not every movie could afford it, with some of them faking it with analog means. Any good scifi author takes inspiration from real world technology and extrapolate based on it, often before widespread recognition of technology by general population. Once something reaches the state of consumer product beyond just researchers and trained professionals, the visuals tend to get more polished and you loose some of the raw, purely functional, engineering style.
realberkeaslan ・ 7 hours ago

It reminds me of that as well.

raphaelmolly8 ・ 38 minutes ago

[dead]

msuniverse2026 ・ 10 hours ago

Truly don't understand what is happening in the heads of these researchers. Can't they see how the main use of this is going to be mass surveillance?

KeplerBoy ・ 9 hours ago

These seems to be much more robotics / autonomous vehicle focused? I don't quite see the mass surveillance angle you get from this you don't already get from cheap ubiquitous cameras, basic computer vision and networking (aka flock) .
haritha-j ・ 9 hours ago

I think you've made the erroneous assumption that the researchers care. I work in 3D reconstruction and I've not really seen too many people care about the actual use case, and indeed have had some friends join defence.
KaiserPro ・ 3 hours ago

This bit isn't that surveillance-y
Relocalisation is the bit thats surveillance-y. But its also crucial for accurate visual only navigation.
endymion-light ・ 7 hours ago

I mean, i think if you want to perform mass surveilance, you can do it far cheaper and more efficiently via facial recognition, mobile phone surveillance and a variety of different other methods.
If you want reconstruction and training of robotic movement, this is far more appropriate. I believe we're going to see robots being able to "dream" in terms of analysing historical video information on spaces and improving movement and navigation.
So not mass surveilance, but probably there's a future of mass subjugation using robot enforcement.
imtringued ・ 9 hours ago

I'm not sure what you mean. The input video feed already constitutes "surveillance". You'd need cameras everywhere and if you have a camera, you can also just use regular models like China already does.

Dead_Lemon ・ 9 hours ago

What is the actual objective of this, is it solving an issue or creating a solution to a problem, that is still to be determined? It seems like a lot of energy to replicate a lidar mapping system. It's not like you can expect accurate dimensions from this approximate guess work, excluding the expected hallucinations adding to inaccuracy.

alpine01 ・ 7 hours ago

3D reconstruction of old spaces which no longer exist seems like a clear use case to me. There's loads of old videos of driving down a street in the 80s, or neighborhoods in cities which got replaced.
I can imagine future iterations of this which bring together other stills of the same space at that time to augment the dataset. Then perhaps another pass to fill in gaps with likely missing content based on probability or data from say the same street 10 years later.
It won't be 100% real, but I think it'd be very cool to be able to have a google-street view style experience of areas before google street view existed.
- phrotoma ・ 6 hours ago
  
  > it'd be very cool to be able to have a google-street view style experience of areas before google street view existed.
  Now do Kowloon Walled City.
voidUpdate ・ 8 hours ago

Video cameras are much cheaper and easier to use than LIDAR, like anyone can just pull out their phone, take a video and send it to this algorithm to get a reasonable point cloud of the environment. Sure, if you want an exact model of an environment and you have the time and money, LIDAR would give better results, but this is about doing more with less
KaiserPro ・ 3 hours ago

One of the key issues of "machine perception" is the inability of machines using standard image sensors to re-create the world accurately.
Lidars are great, and getting smaller, but they still eat a lot of power. (The quest 3 had a lidar on the front[well structured light] and it was mostly not used)
For machines to understand the 3d world, first they need to extract geometry, then isolate those geometries into objects. This method is _a_ way to do that, the first step, extracting 3d points.
The problem with this model is that the points are not actually that well aligned frame to frame. This is why it looks a bit blurry. I assume this is to avoid running out of memory, as you're not quite sure about which points are relevant and need to be kept in memory.
Once you have those points, you need to replace them with simplfied geometry, so that you can workout intersections and junk.
washadjeffmad ・ 5 hours ago

We use drones with RGB cameras for photogrammetry to reconstruct 3D environments with gaussian splatting, which is a manual process and often requires making multiple trips for additional capture to fill in gaps. Because it's for perceptual use and doesn't require high accuracy, automating with a single-take video would be useful.
ekjhgkejhgk ・ 5 hours ago

The actual objective is learning about these systems. It's called research.
_diyar ・ 5 hours ago

You can reconstruct accurate dimensions if you have IMU data.
flipbrad ・ 9 hours ago

N00b question from me, perhaps, but how easy is it to mount and run Lidar on aerial drones?
- petargyurov ・ 8 hours ago
  
  It's easy but it's not cheap. Well, price is relative but capturing video is certainly cheaper.
  Also, I am not sure how heavy LIDAR units are, but remember that the heavier the payload the more the flight time is reduced. Some drones can only have a single payload, so if you also want to capture (high-res) video/imgs you need to fly again.
  It all depends on the use-case.
- Daub ・ 8 hours ago
  
  The most available lidar is found on your iPhone, but the results are orders of magnitude less detailed than that derived from photogrammetry. How ever an advantage is that lidar is not confused by reflections.
  
  taneq ・ 7 hours ago
  
  Huh? LIDAR absolutely is confused by reflections. Not always the reflections you can see (because often it’s using IR wavelengths) but nonetheless, reflections.