Skip to content
This repository has been archived by the owner on Dec 16, 2021. It is now read-only.

New Feature: Metastore Database Prefix #45

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zshao
Copy link
Contributor

@zshao zshao commented Nov 4, 2016

For issue #44

This diff adds a Decorator called DbPrefixHiveMetastoreClient which automatically adds/removes prefix to database names. This is a self-contained clean approach to ensure that we don't miss any transformation of database names.

Note that although this pull request contains 2 commits, only the second one "Feature: Add option for destination database name prefix" is needed. The first commit is another ongoing pull request that happens to have a conflict.

@zshao zshao force-pushed the issue_44_db_prefix branch 4 times, most recently from f4c3187 to 7a5af64 Compare November 5, 2016 00:58
@plypaul
Copy link
Collaborator

plypaul commented Nov 5, 2016

Have you tested this out? I don't quite recall the details of how paths are generated, but I thought it would copy data to the location without the prefix.

@zshao
Copy link
Contributor Author

zshao commented Nov 5, 2016

@plypaul You are right. I didn't want to add prefix to the directory name at first (I plan to use a different dest hdfs root so that the sub directory structure is still exactly the same), but now I feel it would be a good idea to add that as an option and default to true (change directory name just like the database name). Otherwise the directory names may overlap with existing databases, and that can be disastrous.

@zshao
Copy link
Contributor Author

zshao commented Nov 8, 2016

I had a second thought on this. Changing directory name is dangerous because it will involve pattern matching and can cause incorrect directory name changes (there will be external tables as well).

Instead, users should run this with a different destination fsRoot so that everything that is copied for testing is stored in a different place.

My current plan is that if the dest dbPrefix is not null or empty, I will set:
destFsRoot = destFsRoot + "/dbPrefix=${dbPrefix}/";

What do you think @plypaul ?

@plypaul
Copy link
Collaborator

plypaul commented Nov 8, 2016

My initial thought is that we should probably make it more generic than having to specify an entirely different FS root. The downside to doing that is warehouse management is more complicated on the destination - tables won't be in standard locations.

Going back to my earlier comment, if we have all the name transformation / path generation logic go through DestinationObjectFactory, we can have more complex and custom rules for how objects are copied. If the class was extendable, then it would be a nice plug-in interface. Using extended classes, users could do things like exclude external tables.

I agree though that path generation could be tricky and there's potential for issues there. Not sure what can be done to mitigate those issues yet.

@plypaul
Copy link
Collaborator

plypaul commented Nov 8, 2016

It's a little complicated for the user to have to know that they have change the FS root when they add a prefix.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants