-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get initial point cloud estimate (init_pt_cld.npz) #17
Comments
I'm looking at this right now as well and raised a similar question here. Response indicated that the point cloud is very specific to CMU Panoptic data: However, I'm looking at the same methods and the N x 7 numpy data array and trying to reverse engineer them. I have been able to reconstruct the train_meta.json file for my own videos To your observation, in the init_pt_cld.npz it looks like column 6 is segmentation data (all values are def initialize_params(seq, md):
init_pt_cld = np.load(f"./data/{seq}/init_pt_cld.npz")["data"]
seg = init_pt_cld[:, 6]
max_cams = 50
sq_dist, _ = o3d_knn(init_pt_cld[:, :3], 3)
mean3_sq_dist = sq_dist.mean(-1).clip(min=0.0000001)
params = {
'means3D': init_pt_cld[:, :3],
'rgb_colors': init_pt_cld[:, 3:6],
...
#Open3D K-nearest neighbors
def o3d_knn(pts, num_knn):
indices = []
sq_dists = []
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(np.ascontiguousarray(pts, np.float64))
pcd_tree = o3d.geometry.KDTreeFlann(pcd)
for p in pcd.points:
[_, i, d] = pcd_tree.search_knn_vector_3d(p, num_knn + 1)
indices.append(i[1:])
sq_dists.append(d[1:])
return np.array(sq_dists), np.array(indices)
```
|
Hi, I've encountered the same issue as well. Could you please let me know how you were able to reconstruct the train_meta.json file for your own videos? Thanks. |
Usually I would use Colmap, but I am working with only two videos, and Colmap hasn't been able to solve this. So instead I placed the cameras manually in the Unity editor and exported the camera transforms. Then I ran this C# script: using System.CommandLine;
using System.CommandLine.NamingConventionBinder;
using System.IO.Compression;
using Newtonsoft.Json;
using NumSharp;
static class Program
{
class Args
{
public string InputPath { get; set; }
public string CameraPositions { get; set; }
}
static void Main(string[] args)
{
RootCommand rootCommand = new()
{
new Argument<string>(
"InputPath",
"This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera"),
new Argument<string>(
"CameraPositions",
"These camera positions are generated in the Colmap")
};
rootCommand.Description = "Initialize the training data for the dynamic gaussian splatting";
// Note that the parameters of the handler method are matched according to the names of the options
rootCommand.Handler = CommandHandler.Create<Args>(Parse);
rootCommand.Invoke(args);
Environment.Exit(0);
}
[Serializable]
public class CameraTransform
{
public int aabb_scale;
public List<Frame> frames;
}
[Serializable]
public class Frame
{
public string file_path;
public float sharpness;
public float[][] transform_matrix;
public float camera_angle_x;
public float camera_angle_y;
public float fl_x;
public float fl_y;
public float k1;
public float k2;
public float k3;
public float k4;
public float p1;
public float p2;
public bool is_fisheye;
public float cx;
public float cy;
public float w;
public float h;
}
[Serializable]
public class train_meta
{
public float w;
public float h;
public List<List<List<float[]>>> k;
public List<List<float[][]>> w2c;
public List<List<string>> fn;
public List<List<int>> cam_id;
}
static void Parse(Args args)
{
CameraTransform cameraTransforms = JsonConvert
.DeserializeObject<CameraTransform>(File.ReadAllText(args.CameraPositions))!;
string imsPath = Path.Combine(args.InputPath, "ims");
int camCount = Directory.EnumerateDirectories(imsPath).Count();
int fileCount = Directory.EnumerateFiles(Directory.EnumerateDirectories(imsPath).ToList()[0]).Count();
train_meta trainMeta = new()
{
w = 640,
h = 360,
fn = new(),
cam_id = new(),
k = new(),
w2c = new()
};
for (int i = 0; i < fileCount; i++)
{
List<string> toInsert = new();
List<int> camToInsert = new();
List<List<float[]>> kToInsert = new();
List<float[][]> wToInsert = new();
for(int j= 0; j < camCount; j++)
{
toInsert.Add($"{j}/{i:D3}.jpg");
camToInsert.Add(j);
Frame cameraFrame = cameraTransforms.frames[j];
List<float[]> kToInsertInner = new()
{
new[]{cameraFrame.fl_x, 0f, cameraFrame.cx},
new[]{0f, cameraFrame.fl_y, cameraFrame.cy},
new[]{0f, 0f, 1f}
};
kToInsert.Add(kToInsertInner);
float[][] w = cameraFrame.transform_matrix;
wToInsert.Add(w);
}
trainMeta.fn.Add(toInsert);
trainMeta.cam_id.Add(camToInsert);
trainMeta.k.Add(kToInsert);
trainMeta.w2c.Add(wToInsert);
}
File.WriteAllText(Path.Combine(args.InputPath, "train_meta.json"), JsonConvert.SerializeObject(trainMeta, Formatting.Indented));
// TODO create point cloud
Dictionary<string, Array> npz = new();
int pointCount = 0; // TODO number of points from Colmap
double[,] data = new double[pointCount, 7];
for (int i = 0; i < pointCount; i++)
{
// point position
data[i, 0] = 0;
data[i, 1] = 0;
data[i, 2] = 0;
// color
data[i, 3] = 0;
data[i, 4] = 0;
data[i, 5] = 0;
//seg
data[i, 6] = 1;
}
npz.Add("data.npz", data);
np.Save_Npz(npz, Path.Combine(args.InputPath, "init_pt_cld.npz"), CompressionLevel.NoCompression);
}
} |
a point cloud from colmap should be fine... I was getting it from the available depth cameras. Would recommend setting the seg value on the point cloud to all 1. Unless you know some points are 100% static, then you can specifically set them to 0 to fix them, but this is not necessary. |
Hi, congrats on the great work. I have a query regarding initial point cloud estimate which the code expects, its being read from this file (init_pt_cld.npz) and has the shape (N, 7).
@JonathonLuiten , I have 2 questions regarding this.
1.) Could you provide any insights/suggestions on how you are constructing this from the posed images? Would colmap suffice?
2.) Especially Last col which has 'seg' label (binary), does this indicate foreground/background?
Looking forward for your response.
The text was updated successfully, but these errors were encountered: