-
Can perform semantic segmentation
-
The ability to distinguish areas of the data
-
Point Net Extracts both local and global features of a Point Cloud with any orientation
-
Still not entirely sure what this means
-
Analyses point clouds
-
A Point Cloud is a set of 3D points that are invariant to both order and rigid motion
-
Point cloud analysis (Like CNNs) are feature extractors and usually go before a classifier (like SVM or MLP)
-
When we want to learn an object like an airplane, how can we get the model to understand they have a nose, wings, tail etc.?
-
We can’t specify this for every class, so we need a way for the model to learn the features itself
-
PointNet consumes an entire point cloud, learns a spatial encoding of each point, aggregates learned encodings into features, and feeds them into Classification and Segmentation heads.
-
encoding is the transformation of data into model inputs
-
embeddings are the representation of the inputs in latent space
-
The PointNet Model attempts to approximate the following function
-
-
f is symmetric where “a symmetric function takes n vectors as input and outputs a new vector that is invariant to the input order”
-
This is important given that point clouds are symmetric
-
x_1, x_2, .., x_n are inputs
-
g is the max pooling function
-
Max pooling occurs on features/dimensions across inputs
-
h is a MLP with weights shared between inputs
-
-
A T-net is a special Neural Network that learns a transformation matrix that will rotate the input point cloud to a consistent orientation.
-
Rotate here isn’t literal
-
T-net begins with a learned transformation matrix
-
The T-net is a mini Point Net that performs it’s own feature extraction with a shared MLP and Max Pooling function, it then uses Fully Connected layers to scale down the features into a transformation matrix
-
The transformation matrix begins as the identity, and the output of the model is then added to it
-
Right before the max pool are where the local features lie
-
After the max pool, there are global features
-
The max pooling layer retains the highest importance points that define the shape
-
If we were to initialize to zero, then we would set all the points to zero, if we were to use a random initialization, then we could disrupt the structure of the point cloud.
-
The critical points are the global features
-
Classification head just uses the global features
-
Segmentation head uses global and local features
-
The global features are essentially added to each of the point features to go from nx64 local features, 1024 global features, to nx1088 input.