Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get initial point cloud estimate (init_pt_cld.npz) #17

Open
jayaramreddy10 opened this issue Oct 20, 2023 · 4 comments
Open

How to get initial point cloud estimate (init_pt_cld.npz) #17

jayaramreddy10 opened this issue Oct 20, 2023 · 4 comments

Comments

@jayaramreddy10
Copy link

Hi, congrats on the great work. I have a query regarding initial point cloud estimate which the code expects, its being read from this file (init_pt_cld.npz) and has the shape (N, 7).

@JonathonLuiten , I have 2 questions regarding this.
1.) Could you provide any insights/suggestions on how you are constructing this from the posed images? Would colmap suffice?
2.) Especially Last col which has 'seg' label (binary), does this indicate foreground/background?
Looking forward for your response.

@atonalfreerider
Copy link

atonalfreerider commented Oct 23, 2023

I'm looking at this right now as well and raised a similar question here. Response indicated that the point cloud is very specific to CMU Panoptic data:
#13

However, I'm looking at the same methods and the N x 7 numpy data array and trying to reverse engineer them. I have been able to reconstruct the train_meta.json file for my own videos

To your observation, in the init_pt_cld.npz it looks like column 6 is segmentation data (all values are 1 afaik), column 0-2 are mean 3D points (x,y,z). Columns 3-5 are rgb colors

def initialize_params(seq, md):
    init_pt_cld = np.load(f"./data/{seq}/init_pt_cld.npz")["data"]
    seg = init_pt_cld[:, 6]
    max_cams = 50
    sq_dist, _ = o3d_knn(init_pt_cld[:, :3], 3)
    mean3_sq_dist = sq_dist.mean(-1).clip(min=0.0000001)
    params = {
        'means3D': init_pt_cld[:, :3],
        'rgb_colors': init_pt_cld[:, 3:6],
    ...  

#Open3D K-nearest neighbors
def o3d_knn(pts, num_knn):
    indices = []
    sq_dists = []
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(np.ascontiguousarray(pts, np.float64))
    pcd_tree = o3d.geometry.KDTreeFlann(pcd)
    for p in pcd.points:
        [_, i, d] = pcd_tree.search_knn_vector_3d(p, num_knn + 1)
        indices.append(i[1:])
        sq_dists.append(d[1:])
    return np.array(sq_dists), np.array(indices)  
    ```  
    

@PKUVDIG
Copy link

PKUVDIG commented Oct 24, 2023

I'm looking at this right now as well and raised a similar question here. Response indicated that the point cloud is very specific to CMU Panoptic data: #13

However, I'm looking at the same methods and the N x 7 numpy data array and trying to reverse engineer them. I have been able to reconstruct the train_meta.json file for my own videos

To your observation, in the init_pt_cld.npz it looks like column 6 is segmentation data (all values are 1 afaik), column 0-2 are mean 3D points (x,y,z). Columns 3-5 are rgb colors

def initialize_params(seq, md):
    init_pt_cld = np.load(f"./data/{seq}/init_pt_cld.npz")["data"]
    seg = init_pt_cld[:, 6]
    max_cams = 50
    sq_dist, _ = o3d_knn(init_pt_cld[:, :3], 3)
    mean3_sq_dist = sq_dist.mean(-1).clip(min=0.0000001)
    params = {
        'means3D': init_pt_cld[:, :3],
        'rgb_colors': init_pt_cld[:, 3:6],
    ...  

#Open3D K-nearest neighbors
def o3d_knn(pts, num_knn):
    indices = []
    sq_dists = []
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(np.ascontiguousarray(pts, np.float64))
    pcd_tree = o3d.geometry.KDTreeFlann(pcd)
    for p in pcd.points:
        [_, i, d] = pcd_tree.search_knn_vector_3d(p, num_knn + 1)
        indices.append(i[1:])
        sq_dists.append(d[1:])
    return np.array(sq_dists), np.array(indices)  
    ```  
    

Hi, I've encountered the same issue as well. Could you please let me know how you were able to reconstruct the train_meta.json file for your own videos? Thanks.

@atonalfreerider
Copy link

Usually I would use Colmap, but I am working with only two videos, and Colmap hasn't been able to solve this. So instead I placed the cameras manually in the Unity editor and exported the camera transforms. Then I ran this C# script:

using System.CommandLine;
using System.CommandLine.NamingConventionBinder;
using System.IO.Compression;
using Newtonsoft.Json;
using NumSharp;

static class Program
{
    class Args
    {
        public string InputPath { get; set; }
        public string CameraPositions { get; set; }
    }

    static void Main(string[] args)
    {
        RootCommand rootCommand = new()
        {
            new Argument<string>(
                "InputPath",
                "This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera"),
            
            new Argument<string>(
                "CameraPositions", 
                "These camera positions are generated in the Colmap")
        };

        rootCommand.Description = "Initialize the training data for the dynamic gaussian splatting";

        // Note that the parameters of the handler method are matched according to the names of the options 
        rootCommand.Handler = CommandHandler.Create<Args>(Parse);

        rootCommand.Invoke(args);

        Environment.Exit(0);
    }

    [Serializable]
    public class CameraTransform
    {
        public int aabb_scale;
        public List<Frame> frames;
    }

    [Serializable]
    public class Frame
    {
        public string file_path;
        public float sharpness;
        public float[][] transform_matrix;
        public float camera_angle_x;
        public float camera_angle_y;
        public float fl_x;
        public float fl_y;
        public float k1;
        public float k2;
        public float k3;
        public float k4;
        public float p1;
        public float p2;
        public bool is_fisheye;
        public float cx;
        public float cy;
        public float w;
        public float h;
    }

    [Serializable]
    public class train_meta
    {
        public float w;
        public float h;
        public List<List<List<float[]>>> k;
        public List<List<float[][]>> w2c;
        public List<List<string>> fn;
        public List<List<int>> cam_id;
    }

    static void Parse(Args args)
    {
        CameraTransform cameraTransforms = JsonConvert
            .DeserializeObject<CameraTransform>(File.ReadAllText(args.CameraPositions))!;

        string imsPath = Path.Combine(args.InputPath, "ims");
        int camCount = Directory.EnumerateDirectories(imsPath).Count();
        int fileCount = Directory.EnumerateFiles(Directory.EnumerateDirectories(imsPath).ToList()[0]).Count();

        train_meta trainMeta = new()
        {
            w = 640,
            h = 360,
            fn = new(),
            cam_id = new(),
            k = new(),
            w2c = new()
        };
        
        for (int i = 0; i < fileCount; i++)
        {
            List<string> toInsert = new();
            List<int> camToInsert = new();
            List<List<float[]>> kToInsert = new();
            List<float[][]> wToInsert = new();
            for(int j= 0; j < camCount; j++)
            {
                toInsert.Add($"{j}/{i:D3}.jpg");
                camToInsert.Add(j);
                Frame cameraFrame = cameraTransforms.frames[j];
                List<float[]> kToInsertInner = new()
                {
                    new[]{cameraFrame.fl_x, 0f, cameraFrame.cx},
                    new[]{0f, cameraFrame.fl_y, cameraFrame.cy},
                    new[]{0f, 0f, 1f}
                };
                kToInsert.Add(kToInsertInner);

                float[][] w = cameraFrame.transform_matrix;
                wToInsert.Add(w);
            }
            trainMeta.fn.Add(toInsert);
            trainMeta.cam_id.Add(camToInsert);
            trainMeta.k.Add(kToInsert);
            trainMeta.w2c.Add(wToInsert);
        }
        
        File.WriteAllText(Path.Combine(args.InputPath, "train_meta.json"), JsonConvert.SerializeObject(trainMeta, Formatting.Indented));
     
        // TODO create point cloud
        Dictionary<string, Array> npz = new();
        int pointCount = 0; // TODO number of points from Colmap
        double[,] data = new double[pointCount, 7];
        for (int i = 0; i < pointCount; i++)
        {
            // point position
            data[i, 0] = 0;
            data[i, 1] = 0;
            data[i, 2] = 0;
            
            // color
            data[i, 3] = 0;
            data[i, 4] = 0;
            data[i, 5] = 0;
            
            //seg
            data[i, 6] = 1;
        }
        npz.Add("data.npz", data);
        np.Save_Npz(npz, Path.Combine(args.InputPath, "init_pt_cld.npz"), CompressionLevel.NoCompression);
    }
}

@JonathonLuiten
Copy link
Owner

a point cloud from colmap should be fine... I was getting it from the available depth cameras.

Would recommend setting the seg value on the point cloud to all 1.

Unless you know some points are 100% static, then you can specifically set them to 0 to fix them, but this is not necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants