ELLIS Algebraic Vision & Machine Learning Workshop 2024

Organizers: ELLIS Unit Prague & AAG of CIIRC – Czech Technical University in Prague, July 2-4, 2024
Program: T Pajdla (ELLIS Fellow) Admin: Z Kadava & M Kamenska
Venue: CIIRC – CTU in Prague, Building B, Room B-670, the 6th floor (how to get there) (Online Zoom)

Program (See the speakers/titles/abstracts below)

Time (CEST)\Day	Tuesday, July 2	Wednesday, July 3	Thursday, July 4
09:00-09:45		T Votroubek
10:00-10:45	V Korotynskiy	T Birdal	K Astrom
11:00-11:45	K Kohn	L Oeding	L Magri
12:00-12:45	O Marigliano	E Shehu	F Rydel
13:00-14:30	Lunch	Lunch	Lunch
14:30-15:15	G Flood	F Arrigoni	A Porfiri Dal Cin
15:30-16:15	M Wadenback	A Fussielo	Y Ding
16:30-17:15	D Sungatullina	P Hruby
19:00		Dinner

July 2, Tuesday

(10:00-10:45) Viktor Korotynskiy (CTU in Prague) Exploiting Lie group actions to decompose minimal problems in 3D reconstruction
Abstract: A minimal problem is a parametric polynomial system with finite (and nonempty) solution set for generic choice of parameters. The monodromy group attached to such a system provides a method for detecting the existence of problem decomposition into algebraic subproblems. Beyond the question of existence, one would like to compute expressions for decomposing maps, towards the eventual goal of solving the systems more efficiently. We use numerical homotopy continuation and multivariate interpolation to approach this task. However, with increasing amount of variables in the problem formulation, multivariate interpolation becomes problematic due to large sizes of matrices involved, whose nullspaces have to be computed. We notice that minimal problems are invariant under scaling and rotational Lie groups whose action can be exploited for more efficient interpolation of decomposing maps. I will illustrate our approach on the 5-point minimal problem. Although this problem has already been investigated fully in the computer vision community and its decomposition (via the essential matrix) is known, it is good to see the results our method gives on simple problems: if we get what we expect, we can then move to more complicated minimal problems (where decompositions are not known).
(11:00-11:45) Kathlen Kohn (KTH Stockholm) Order-one rolling shutter cameras.
Abstract: A rolling shutter (RS) camera can see a world point several times, i.e., a world point can appear many times on the same RS image. We characterize the RS cameras that see each point exactly once, called order-one RS cameras. On the general side, we introduce a new RS camera model based on back-projection. With this, we completely describe the parameter space of order-one RS cameras. Explicit parameterizations open a way to identify camera parameters from image measurements. This allows us to formulate new minimal problem for RS relative pose. We provide a complete catalog of minimal problems for linear order-one RS cameras. In particular, we several practical minimal problem with a small number of solutions and a small number of image features. Order-one RS cameras map a straight world line to a higher-degree curve on the image. We compute the degree D of this curve depending on the camera’s motion and rotation, and show that the curve passes through a special point at infinity D-1 many times.
(12:00-12:45) Orlando Marigliano (IT Genova) Classifying Minimal Problems for Rolling Shutter Cameras.
Abstract. In this talk we look at a particular rolling shutter camera model. Each camera moves in a straight line in space and maintains a constant rotation. I show how to classify all the minimal problems for this type of model by using algebraic geometry, counting degrees of freedom, and solving an integer equation. Although we allow an arbitrary number of cameras, it turns out that there are only finitely many such minimal problems. I talk about their respective algebraic degrees, as they relate to actually solving the reconstruction problem using numerical algebraic geometry (in our case, homotopy continuation).
(14:30-15:15) Gabrielle Flood (Lund University) Minimal Solvers for Point Cloud Matching with Statistical Deformations and Some Multi-view Geometry for Parallel Cylinders
Abstract: This talk will be two-fold. First, we will focus on matching and merging of individual point clouds. To take advantage of methods that allow for deformations of the local map – in order to improve the global map – it is important to find feature matches that capture such deformations. We will discuss minimal solvers for finding point cloud matches that utilize statistical deformations. The solvers use either three or four point matches and solve for either rigid or similarity transformation as well as shape deformation in the direction of the most important modes of variation. In the second part of the talk, we will discuss structure from motion problems for parallel cylinders. Sometimes, point features are unavailable or unstable over time and viewing conditions. Instead, we propose a framework based on silhouettes of parallel cylinders. Such structures are found in e.g. trees, lampposts and pillars. We take advantage of both the center lines and the width of such cylinders and develop a number of minimal solvers for simultaneously estimating camera pose and scene structure from silhouette lines of cylinders.
(15:30-16:15) Marten Wadenback (Linkoping University). Geometric Deep Learning Using Spherical Neurons.
Abstract: We start from geometric first principles to construct a machine learning framework for 3D point set analysis. We argue that spherical decision surfaces are a natural choice for this type of problems, and we represent them using a non-linear embedding of 3D Euclidean space into a Minkowski space, represented by a 5D Euclidean space. Via classification experiments on a 3D Tetris dataset, we show that we can get a geometric handle on the network weights, allowing us to directly apply transformations to the network. The model is further extended into a steerable filter bank, facilitating classification in arbitrary poses. Additionally, we study equivariance and invariance properties with respect to O(3) transformations.
(16:30-17:25) Diana Sungatullina (CTU in Prague, ELLIS PhD Student) MinBackProp – Backpropagating through Minimal Solvers
Abstract: We present an approach to backpropagating through minimal problem solvers in end-to-end neural network training. Traditional methods relying on manually constructed formulas, finite differences, and autograd are laborious, approximate, and unstable for complex minimal problem solvers. We show that using the Implicit function theorem to calculate derivatives to backpropagate through the solution of a minimal problem solver is simple, fast, and stable. We compare our approach to (i) using the standard autograd on minimal problem solvers and relate it to existing backpropagation formulas through SVD-based and Eig-based solvers and (ii) implementing the backprop with an existing PyTorch Deep Declarative Networks (DDN) framework. We demonstrate our technique on a toy example of training outlier-rejection weights for 3D point registration and on a real state-of-the-art application of training an outlier-rejection and RANSAC sampling network in image matching. Our method provides 100% stability and is 10 times faster compared to autograd, which is unstable and slow, and compared to DDN, which is stable but also slow.

July 3, Wednesday

(09:00-09:45) Tomas Votroubek (CTU in Prague). Branch and Bound for Optimal Kinematic Configurations.
Abstract: Certain polynomial optimization tasks can be solved surprisingly efficiently using the spatial branch and bound method. One such example is the Inverse Kinematics of generic redundant manipulators. We will show how to use a canonical idea discussed by Shor to project polynomial optimization problems into non-convex quadratically-constrained quadratic problems within a higher-dimensional space. The resulting formulation can be solved by global optimization solvers such as SCIP or Gurobi. We apply this technique to IK to explore the complete space of manipulator configurations and pick one that maximizes a user-defined preference function or concludes that none exists. We compare our approach to the established techniques of the moment-SOS hierarchy.
(10:00-10:45) Tolga Birdal (Imperial College) Topological deep learning
Abstract: Topological deep learning is a rapidly growing field focused on developing deep learning models for data supported on topological domains such as simplicial complexes, cell complexes, and hypergraphs, which generalize graphs and many other domains encountered in scientific computations. In this talk, Tolga will present a unifying deep learning framework built upon these rich data structures of higher-order relationships. Specifically, he will begin by introducing the field of topological deep learning and its potential applications and challenges. He will then dive into a novel type of topological domain: combinatorial complexes. Similar to hypergraphs, combinatorial complexes impose no constraints on the set of relations, while permitting the construction of hierarchical higher-order relations, analogous to those found in simplicial and cell complexes. In this way, combinatorial complexes generalize and combine useful traits of both hypergraphs and cell complexes, which have emerged as two promising abstractions that facilitate the generalization of graph neural networks to topological spaces.
(11:00-11:45) Luke Oeding (Auburn University) Tensor Decompositions of Calibrated Trifocal Tensors
Abstract: In recent work (arXiv: 2402.12542) we have investigated generalizations of the Higher Order Singular Value Decomposition (HOSVD) for tensors. Applying these decompositions to calibrated trifocal tensors we noticed interesting geometric structure, which may shed light on the question of finding all the defining equations of the algebraic variety of calibrated trifocal tensors. In addition, we anticipate this may lead to new methods for approximating calibrated trifocal tensors. This is work in progress with Ian Tan.
(12:00-12:45) Elima Shehu (MPI Lepzig) Fast and stable triangulation using incident points with respect to a line.
Abstract: We investigate improving scene estimations by integrating line features into the triangulation process. Our focus is on utilizing points incident to a line considering an arbitrary number of pinhole cameras with complete visibility. Employing the mathematical model known as “anchored point multiview variety,” we triangulate points incident to a line by fitting point correspondences to the model. In the case of m = 3 images, our approach not only enhances stability but also accelerates triangulation compared to traditional methods (fitting point correspondences to the “point multiview variety”), especially when dealing with synthetic data. This research is a collaboration with Felix Rydell and Angèlica Torres.
(14:30-15:15) Federica Arrigoni (Politecnico di Milano, ELLIS Member) Viewing Graph Solvability in Structure from Motion
Abstract: Structure from Motion (SfM) is a fundamental task in Computer Vision that aims at recovering both cameras and the 3D scene starting from multiple images. The problem can be conveniently represented as a “viewing graph”: each node corresponds to a camera/image, and an edge is present between two nodes if the fundamental (or essential) matrix is available. While several research efforts on SfM have focused on devising more accurate and efficient algorithms, much less attention has been devoted to investigating theoretical aspects. In particular, a relevant question is establishing whether a viewing graph is “solvable”, i.e., it uniquely determines a configuration of cameras. This talk will overview existing formulations and different notions of viewing graph solvability.
(15:30-16:15) Andrea Fusiello (University of Udine) Putting view-graph solvability in a rigidity perspective
Abstract: The concept of rigidity in frameworks has long been a source of fascination for mathematicians and engineers alike. It revolves around the understanding of how the structure of a framework (e.g., bars and joints) determines its ability to maintain its shape under various constraints. The first part of this talk provides a brief overview of the classical results of rigidity theory. Following this introduction, an intriguing analogy will be drawn between the notion of rigidity in frameworks and the concept of solvability in view graphs. By establishing a nexus between these disparate concepts, we aim to illuminate the problem of view-graph solvability, potentially leading to the emergence of novel insights and methodologies.
(16:30-17:15) Petr Hruby (ETH Zurich) Efficient Solution of Point-Line Absolute Pose

Abstract: We revisit certain problems of pose estimation based on 3D–2D correspondences between features which may be points or lines. Specifically, we address the two previously-studied minimal problems of estimating camera extrinsics from $p \in \{ 1, 2 \}$ point–point correspondences and $l=3-p$ line–line correspondences.To the best of our knowledge, all of the previously-known practical solutions to these problems required computing the roots of degree $\ge 4$ (univariate) polynomials when $p=2$, or degree $\ge 8$ polynomials when $p=1.$ We describe and implement two elementary solutions which reduce the degrees of the needed polynomials from $4$ to $2$ and from $8$ to $4$, respectively. We show experimentally that the resulting solvers are numerically stable and fast: when compared to the previous state-of-the art, we may obtain nearly an order of magnitude speedup. The code is available at \url{https://github.com/petrhruby97/efficient\_absolute}”

July 4, Thursday

(10:00-10:45) Kalle Astrom (Lund University) Minimal (and critical) configurations for structure from motion problems from distance measurements
Abstract: In the last decade we have worked hard to classify and solve structure from motion problems involving distance measurements. In particular such problems that arise when measuring distances from a number of senders to a number of receivers. Such problems are relevant when calibrating and using ad-hoc microphone or radio arrays (UWB, WIFI, bluetooth). In this talk I will give an overview of what is known today, but also discuss some unsolved problems concerning time-difference-of-arrival problems and also discuss potential critical configurations.
(11:00-11:45) Luca Magri (Politecnico di Milano, ELLIS Member) Critical configurations as transformations between images
Abstract: In 3D vision, degenerate and critical configurations have to be considered when dealing with geometric estimates. The aim of this talk is to present a characterization of the so-called critical configuration, e.g., camera and point configurations for which the 3D reconstruction of the scene is ambiguous, as a parametric transformation between images with a focus on possible practical implications for robust fitting.
(12:00-12:45) Felix Rydel (KTH Stockholm) Metric Multiview Geometry – a Catalogue in Low Dimensions
Abstract: We systematically compile an exhaustive catalogue of multiview varieties and anchored multiview varieties arising from projections of points and lines in 1, 2, and 3-dimensional projective space. We say that two such varieties are ED-equivalent if there is a linear isomorphism between that that preserve ED-critical points. This gives rise to fourteen equivalence classes, and we determine various properties—dimension, set-theoretic equations, and multidegrees—for all varieties featured in our catalogue. In the case of points, we also present a complementary study of resectioning varieties and their singular loci. Finally, we propose conjectures for the Euclidean distance degrees of all varieties appearing in our comprehensive compilation.
(14:30-15:15) Andrea Porfiri Dal Cin (Politecnico di Milano) Minimal Perspective Autocalibration.

Abstract: Minimal problems for reconstruction from multiple views are paramount for 3D Computer Vision. This talk discusses novel reconstruction and camera autocalibration formulations from two and three views, which rely on constraints on image points, the unknown depths of 3D points, and a partially specified calibration matrix. This setup contrasts traditional approaches, e.g., Kruppa’s equations and the modulus constraint, which rely explicitly on the knowledge of multiple fundamental matrices or a projective reconstruction. We present a comprehensive organization of these minimal problems into classes according to the number of views and any assumed prior knowledge of camera intrinsics. We also discuss our strategies to derive efficient solvers for each of these classes by determining the minimal problems having the fewest solutions through a novel combinatorial framework.
(15:30-16:15) Yaqing Ding (CTU in Prague). Relative pose estimation using relative depth.
Abstract: We revisit the problem of estimating the relative pose from a sparse set of point-correspondences. For each point-correspondence we also use the relative depth, i.e., the relative distance to the scene point in the two images. Relative depths can be approximated from the scales of local features, which are commonly available or can be obtained from non-metric monocular depth estimates provided by popular deep learning-based methods. One point correspondence with relative depth yields an additional constraint, allowing us to use fewer matches in RANSAC to generate the pose candidates. We propose new minimal solvers to essential matrix, homography and fundamental matrix estimation using relative depth.

Format
20-45 min talks altered with discussion and working sessions to initiate future collaboration between participants.

The workshop is supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 951847, known as ELISE – European Network of AI Excellence Centres.