Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why must actualDataNodes be specified? Some requirements are to add dynamic special characters to the basic table name. #32734

Closed
huyu-tom opened this issue Aug 30, 2024 · 19 comments

Comments

@huyu-tom
Copy link

For example, xxx_orgCode, orgCode is an unpredictable or indeterminate number.

@linghengqian
Copy link
Member

  • In the current design, there is an SPI to dynamically generate actualDataNodes when initializing ShardingSphere JDBC DataSource. It currently has 4 built-in implementations, refer to https://shardingsphere.apache.org/document/current/cn/user-manual/common-config/builtin-algorithm/expr/ . This SPI was originally designed to solve the dynamic configuration problem of Groovy expressions under GraalVM Native Image.

  • Once ShardingSphere JDBC DataSource is successfully created, modifying actualDataNodes has nothing to do with the initialization configuration file. If it is ShardingSphere JDBC, you need to get the ShardingSphere JDBC Connection of the current ShardingSphere JDBC DataSource, and then get the ContextManager to change the internal metadata of ShardingSphere JDBC DataSource. If it is ShardingSphere Proxy, there are 2 DistSQL: REFRESH DATABASE METADATA; and REFRESH TABLE METADATA;, refer to https://shardingsphere.apache.org/document/current/cn/user-manual/shardingsphere-proxy/distsql/syntax/ral/ .

  • So you want to propose a new design?

@huyu-tom
Copy link
Author

huyu-tom commented Aug 30, 2024

Expressions cannot achieve dynamic tenant splicing, Can I not specify actual-data-nodes and directly return the table I specify through the doSharding method of the sharding algorithm, regardless of whether actual-data-nodes exist? Because some sharding algorithms are one tenant id and one table, actual-data-nodes cannot be specified because the number is very large
Snipaste_2024-08-30_22-47-09
image
image

@linghengqian
Copy link
Member

Expressions cannot achieve dynamic tenant splicing

  • What do you expect from dynamic tenant splicing? I'm curious about the initial documentation references for this type of concept.

Can I not specify actual-data-nodes and directly return the table I specify through the doSharding method of the sharding algorithm, regardless of whether actual-data-nodes exist?

sharding algorithms are one tenant id and one table, actual-data-nodes cannot be specified because the number is very large

  • So your idea is actually to have an SPI implementation that can load all the tables of a certain database? Like this?
rules:
- !SHARDING
  tables:
    t_order_mysql: 
      actualDataNodes: <MYSQL>'localhost:3306', 'test_database', 'my_user', 'user_password'
    t_order_postgresql:
      actualDataNodes: <POSTGRESQL>'postgres1:5432', 'test_database', 'postgres', 'mysecretpassword', 'schema_name', 1
    t_order_sqlite:
      actualDataNodes: <SQLITE>'sqlite.db'
    t_order_hdfs:
      actualDataNodes: <HDFS>'hdfs://hdfs1:9000/data/path/test', 'name String'

@huyu-tom
Copy link
Author

image final Collection availableTargetNames , he generated table name may not be in this collection. Is this collection generated by acutal-data-nodes? I see that your implementation is to check whether it exists in the availableTargetNames list again after splicing. If it exists, it returns it, otherwise it returns null.

@linghengqian
Copy link
Member

This is what I call a rabbit hole. A more humorous external unit test is in #32591.

ShardingSphere relies on the sharding algorithm to resolve the physical table name from the logical table name, even if the actual-data-nodes do not explicitly list all tables. This is a humorous rabbit hole, and I don't see what needs to be done to plug the rabbit hole.

@huyu-tom
Copy link
Author

可是我就是要分片算法从逻辑表名直接分析物理表名,即使没有在actual-data-nodes指定, 但是我这里这样的话,会显示表名不存在。 但是如果我在actual-data-nodes指定的话(但是我无法全部指定,因为我是动态的,数据量大的,不可能加载到这里面,容易oom),才可以继续执行。

@linghengqian
Copy link
Member

可是我就是要分片算法从逻辑表名直接分析物理表名,即使没有在actual-data-nodes指定, 但是我这里这样的话,会显示表名不存在。

  • It sounds like you need to re-implement all SPIs of the 3 Maven modules under org.apache.shardingsphere:shardingsphere-sharding:5.5.1-SNAPSHOT in your own project. Because the existing logic of shardingsphere is to have a process for metadata introspection, and the rabbit hole is deeper. This will most likely involve dynamically changing the shardingsphere metadata.

因为我是动态的,数据量大的,不可能加载到这里面,容易oom),才可以继续执行。

@huyu-tom
Copy link
Author

huyu-tom commented Aug 31, 2024

Main方法

`
public class Main {

public static void main(String[] args) throws SQLException {
//配置对应不同库的数据源
Map<String, DataSource> dataSourceMap = new HashMap<>();
HikariDataSource dataSource1 = new HikariDataSource();
dataSource1.setDriverClassName("com.mysql.jdbc.Driver");
dataSource1.setJdbcUrl("jdbc:mysql://192.168.0.199:13307/apl_sharding_test_db1");
dataSource1.setUsername("root");
dataSource1.setPassword("xxxx");
dataSourceMap.put("apl_sharding_test_db1", dataSource1);

// HikariDataSource dataSource2 = new HikariDataSource();
// dataSource2.setDriverClassName("com.mysql.jdbc.Driver");
// dataSource2.setJdbcUrl("jdbc:mysql://192.168.0.199:13307/apl_sharding_test_db2");
// dataSource2.setUsername("root");
// dataSource2.setPassword("xxxxx");
// dataSourceMap.put("apl_sharding_test_db2", dataSource2);
//属性配置
Properties props = new Properties();
props.setProperty("sql-show", "true");
//分片规则配置
ShardingRuleConfiguration ruleConfiguration = new ShardingRuleConfiguration();
//表配置
ruleConfiguration.getTables().add(chargeReceivableRuleConfig());

//分库策略(不分库,直接定位库)
Properties databaseProperties = new Properties();
databaseProperties.setProperty("databaseName", "apl_sharding_test_db1");
databaseProperties.setProperty("strategy", "STANDARD");
databaseProperties.setProperty("algorithmClassName",
    "com.apl.shardingjdbc.sharding.AplDefaultDatabaseShardingAlgorithm");
ruleConfiguration.getShardingAlgorithms().put("aplDataBaseShardingStrategy",
    new AlgorithmConfiguration("CLASS_BASED", databaseProperties));
Properties chargeReceivableOrgproperties = new Properties();
chargeReceivableOrgproperties.setProperty("shardingColumn", "inner_org_id");
chargeReceivableOrgproperties.setProperty("strategy", "STANDARD");
chargeReceivableOrgproperties.setProperty("algorithmClassName",
    "com.apl.shardingjdbc.sharding.OrgParamStandardShardingAlgorithm");
ruleConfiguration.getShardingAlgorithms().put("receivableTableOrgStandardShardingStrategy",
    new AlgorithmConfiguration("CLASS_BASED", chargeReceivableOrgproperties));
//创建sharding-dataSource
DataSource shardingDataSource = ShardingSphereDataSourceFactory.createDataSource(dataSourceMap,
    Arrays.asList(ruleConfiguration), props);
Connection connection = shardingDataSource.getConnection();
Statement statement = connection.createStatement();
statement.executeQuery("select * from charge_receivable where inner_org_id = 100");

}

private static ShardingTableRuleConfiguration chargeReceivableRuleConfig() {
ShardingTableRuleConfiguration tableRuleConfiguration = new ShardingTableRuleConfiguration(
"charge_receivable", "apl_sharding_test_db1.charge_receivable");
StandardShardingStrategyConfiguration strategyConfig = new StandardShardingStrategyConfiguration(
"inner_org_id", "receivableTableOrgStandardShardingStrategy");
tableRuleConfiguration.setTableShardingStrategy(strategyConfig);
//由于我们采用租户code,所以 actualDataNodes肯定无法满足,所以必须采用加上分库表
StandardShardingStrategyConfiguration databaseStrategyConfig = new StandardShardingStrategyConfiguration(
"inner_org_id", "aplDataBaseShardingStrategy");
tableRuleConfiguration.setDatabaseShardingStrategy(databaseStrategyConfig);
return tableRuleConfiguration;
}
}
`

分库算法(其实只有库,直接外部传入,相当于直接写死)

`public class AplDefaultDatabaseShardingAlgorithm implements
StandardShardingAlgorithm<Comparable>, ComplexKeysShardingAlgorithm> {

private String databaseName;
private Collection databaseNames;

public void init(Properties properties) {
final var databaseName = properties.getProperty("databaseName");
if (Strings.isNullOrEmpty(databaseName)) {
throw new IllegalArgumentException("databaseName 不能不指定");
}
this.databaseName = databaseName;
this.databaseNames = Arrays.asList(databaseName);
}

@OverRide
public String doSharding(Collection collection,
PreciseShardingValue<Comparable<?>> preciseShardingValue) {
return databaseName;
}

@OverRide
public Collection doSharding(Collection collection,
RangeShardingValue<Comparable<?>> rangeShardingValue) {
return databaseNames;
}

@OverRide
public Collection doSharding(Collection collection,
ComplexKeysShardingValue<Comparable<?>> complexKeysShardingValue) {
return databaseNames;
}
}
`

分表算法 (基础表名+租户ID对应的租户Code)

`public class OrgParamStandardShardingAlgorithm implements StandardShardingAlgorithm<Comparable<?>> {

public void init(Properties properties) {
}

@OverRide
public String doSharding(Collection collection,
PreciseShardingValue<Comparable> preciseShardingValue) { final String logicTableName = preciseShardingValue.getLogicTableName(); final Comparable value = preciseShardingValue.getValue();
if (Number.class.isAssignableFrom(value.getClass())) {
return STR."{logicTableName}_{OrgUtils.getOrgCode(Long.valueOf(value.toString()))}";
} else {
throw new RuntimeException("租户不支持非数字类型");
}
}

@OverRide
public Collection doSharding(Collection collection,
RangeShardingValue<Comparable<?>> rangeShardingValue) {
throw new UnsupportedOperationException("租户分表操作不支持该操作");
}
}`

OrgUtils

`public class OrgUtils {

public static String getOrgCode(Long orgId) {
return "sharding" + orgId;
}
}`

由于Main方法没有指定 actualDataNodes(租户ID无法精确指定,需要从sql分片键中动态获取,并且每个租户id都有一个相对应的表),出现如下错误

Exception in thread "main" java.sql.SQLException: Routed target 'charge_receivable_sharding100' does not exist, available targets are '[charge_receivable]'. at org.apache.shardingsphere.infra.exception.core.external.sql.ShardingSphereSQLException.toSQLException(ShardingSphereSQLException.java:76) at org.apache.shardingsphere.infra.exception.dialect.SQLExceptionTransformEngine.toSQLException(SQLExceptionTransformEngine.java:54) at org.apache.shardingsphere.driver.jdbc.core.statement.ShardingSphereStatement.executeQuery(ShardingSphereStatement.java:188) at com.apl.shardingjdbc.sharding.Main.main(Main.java:72)

sharding-jdbc的版本

<dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>shardingsphere-jdbc</artifactId> <version>5.5.0</version> </dependency>

这种情况我该如何做到不指定actual-data-nodes,能够正常运行

@linghengqian
Copy link
Member

  • You actually didn't understand what I meant. You are completely calling the public API of Java JDBC, and you don't really touch upon the essence of metadata changes in ShardingSphere JDBC DataSource.
  • If you want to implement the ambiguous concept of dynamic tenant splicing, you must change the processing of metadata by ShardingSphere, which is the so-called downstream ShardingSphere private fork, because it conflicts with the design of ShardingSphere. You can propose a draft PR to explain why this feature will not conflict with the design of ShardingSphere, but dynamic tenant splicing are actually not covered by the current design of ShardingSphere.
  • To obtain ContextManager from ShardingSphere JDBC DataSource, please refer to the unit test of Refactor the acquisition of ContextManager in JDBC metrics #31825 . ShardingSphere Agent operates ShardingSphere metadata with a completely independent design.

@huyu-tom
Copy link
Author

huyu-tom commented Aug 31, 2024

image 将动态生成表,直接add到doSharding方法传入的第一个参数里面(LinkedhashSet类型),他是可以执行成功的 `10:32:14.776 [main] INFO ShardingSphere-SQL -- Logic SQL: select * from charge_receivable where inner_org_id = 100 10:32:14.776 [main] INFO ShardingSphere-SQL -- Actual SQL: apl_sharding_test_db1 ::: select * from charge_receivable_sharding100 where inner_org_id = 100`

我个人感觉dosharding返回的结果你们会在传入的参数里面再次判断是否存在,如果不存在的话,就抛出异常,我感觉可以不用加这个判断!!!

You actually didn't understand what I meant. You are completely calling the public API of Java JDBC, and you don't really touch upon the essence of metadata changes in ShardingSphere JDBC DataSource.
其实我不想去触发配置的更改, 因为租户ID无法确定数据个数,如果只是将所有的租户的表名加载到actual-data-nodes里面,这不是我所预期的,如果我有上万个租户id的话,例如我的业务里面有100张表的话,我都加载到acutal-data-nodes里面,那么我得消耗大量的内存(OOM)

@linghengqian
Copy link
Member

  • I believe we are in a rut. You obviously know what you want, but I don't know what you need. This results in no PR being confirmed.

@huyu-tom
Copy link
Author

就最开始我想要不指定acutal-data-nodes,随着发现,可以最开始乱指定一个,然后通过自定义分片算法 doSharding方法,动态生成的表名add到入参1集合里面(该集合是一个LinkedhashSet),然后就可以绕过你们的判断(判断dosharding返回的表名是否在你们的入参1集合里面),如果这个集合设定可修改的,那你们为什么要进行判定(我没有去看你们的源码,只是猜测)?

@linghengqian
Copy link
Member

  • Disclaimer, I'm just an Apache ShardingSphere Committer, and I'm not the original author of this logic. I don't even know which java class of determination you are talking about.

@huyu-tom
Copy link
Author

image 这个doSharding返回结果和doSharding这个方法的入参1

@linghengqian
Copy link
Member

这个doSharding返回结果和doSharding这个方法的入参1

  • This is just an interface. I noticed that you said it was a logic method, and I assumed that there is an error stack.

我个人感觉dosharding返回的结果你们会在传入的参数里面再次判断是否存在,如果不存在的话,就抛出异常,我感觉可以不用加这个判断!!!

@linghengqian
Copy link
Member

由于Main方法没有指定 actualDataNodes(租户ID无法精确指定,需要从sql分片键中动态获取,并且每个租户id都有一个相对应的表),出现如下错误
Exception in thread "main" java.sql.SQLException: Routed target 'charge_receivable_sharding100' does not exist, available targets are '[charge_receivable]'. at org.apache.shardingsphere.infra.exception.core.external.sql.ShardingSphereSQLException.toSQLException(ShardingSphereSQLException.java:76) at org.apache.shardingsphere.infra.exception.dialect.SQLExceptionTransformEngine.toSQLException(SQLExceptionTransformEngine.java:54) at org.apache.shardingsphere.driver.jdbc.core.statement.ShardingSphereStatement.executeQuery(ShardingSphereStatement.java:188) at com.apl.shardingjdbc.sharding.Main.main(Main.java:72)

  • By the way, org.apache.shardingsphere.driver.jdbc.core.statement.ShardingSphereStatement may have been refactored. The executeQuery() method is not currently at line 188 of org.apache.shardingsphere.driver.jdbc.core.statement.ShardingSphereStatement, and I really don't see where the logic of the assertion is.

  • image

@terrymanu
Copy link
Member

For English only, other languages will not accept.

@huyu-tom
Copy link
Author

image This is the judgment. The target returned must be in the availableTargetNames, otherwise an error will be reported. However, I bypassed this judgment by adding it in. However, availableTargetNames is a shared (different connections) and thread-unsafe collection, so the solution of dynamically adding the availableTargetNames collection cannot be adopted! The final idea is: Can this judgment be omitted? It is highly likely that it will not affect the overall logic. Although I did not actually specify actual-data-nodes, the database and table can be specified through the sharding algorithm, and the table in the database must have been created))

@linghengqian
Copy link
Member

  • I would first recommend you to run the unit tests as a series of bash commands in the Github Actions file in the master branch. Secondly, I would recommend you to open a new issue, because improving the logic here does not seem to be the original topic of this issue.

For example, xxx_orgCode, orgCode is an unpredictable or indeterminate number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants