Why must actualDataNodes be specified? Some requirements are to add dynamic special characters to the basic table name. #32734

huyu-tom · 2024-08-30T09:50:29Z

For example, xxx_orgCode, orgCode is an unpredictable or indeterminate number.

linghengqian · 2024-08-30T13:45:43Z

In the current design, there is an SPI to dynamically generate actualDataNodes when initializing ShardingSphere JDBC DataSource. It currently has 4 built-in implementations, refer to https://shardingsphere.apache.org/document/current/cn/user-manual/common-config/builtin-algorithm/expr/ . This SPI was originally designed to solve the dynamic configuration problem of Groovy expressions under GraalVM Native Image.
Once ShardingSphere JDBC DataSource is successfully created, modifying actualDataNodes has nothing to do with the initialization configuration file. If it is ShardingSphere JDBC, you need to get the ShardingSphere JDBC Connection of the current ShardingSphere JDBC DataSource, and then get the ContextManager to change the internal metadata of ShardingSphere JDBC DataSource. If it is ShardingSphere Proxy, there are 2 DistSQL: REFRESH DATABASE METADATA; and REFRESH TABLE METADATA;, refer to https://shardingsphere.apache.org/document/current/cn/user-manual/shardingsphere-proxy/distsql/syntax/ral/ .
So you want to propose a new design?

huyu-tom · 2024-08-30T14:50:38Z

Expressions cannot achieve dynamic tenant splicing, Can I not specify actual-data-nodes and directly return the table I specify through the doSharding method of the sharding algorithm, regardless of whether actual-data-nodes exist? Because some sharding algorithms are one tenant id and one table, actual-data-nodes cannot be specified because the number is very large

linghengqian · 2024-08-30T15:13:11Z

Expressions cannot achieve dynamic tenant splicing

What do you expect from dynamic tenant splicing? I'm curious about the initial documentation references for this type of concept.

Can I not specify actual-data-nodes and directly return the table I specify through the doSharding method of the sharding algorithm, regardless of whether actual-data-nodes exist?

This is a weird rabbit hole from shardingsphere 5.0.x, and there are unit tests that depend on it. Even if actual-data-nodes is empty, the custom algorithm can return a table where actual-data-nodes is not defined. See https://github.com/apache/shardingsphere/tree/master/test/native/src/test/resources/test-native/yaml . I would say this doesn't seem like best practice.

sharding algorithms are one tenant id and one table, actual-data-nodes cannot be specified because the number is very large

So your idea is actually to have an SPI implementation that can load all the tables of a certain database? Like this?

rules:
- !SHARDING
  tables:
    t_order_mysql: 
      actualDataNodes: <MYSQL>'localhost:3306', 'test_database', 'my_user', 'user_password'
    t_order_postgresql:
      actualDataNodes: <POSTGRESQL>'postgres1:5432', 'test_database', 'postgres', 'mysecretpassword', 'schema_name', 1
    t_order_sqlite:
      actualDataNodes: <SQLITE>'sqlite.db'
    t_order_hdfs:
      actualDataNodes: <HDFS>'hdfs://hdfs1:9000/data/path/test', 'name String'

huyu-tom · 2024-08-30T15:27:23Z

final Collection availableTargetNames , he generated table name may not be in this collection. Is this collection generated by acutal-data-nodes? I see that your implementation is to check whether it exists in the availableTargetNames list again after splicing. If it exists, it returns it, otherwise it returns null.

linghengqian · 2024-08-30T15:32:02Z

This is what I call a rabbit hole. A more humorous external unit test is in #32591.

ShardingSphere relies on the sharding algorithm to resolve the physical table name from the logical table name, even if the actual-data-nodes do not explicitly list all tables. This is a humorous rabbit hole, and I don't see what needs to be done to plug the rabbit hole.

huyu-tom · 2024-08-30T15:41:32Z

可是我就是要分片算法从逻辑表名直接分析物理表名，即使没有在actual-data-nodes指定，但是我这里这样的话，会显示表名不存在。但是如果我在actual-data-nodes指定的话(但是我无法全部指定，因为我是动态的，数据量大的，不可能加载到这里面,容易oom)，才可以继续执行。

linghengqian · 2024-08-30T15:52:30Z

可是我就是要分片算法从逻辑表名直接分析物理表名，即使没有在actual-data-nodes指定，但是我这里这样的话，会显示表名不存在。

It sounds like you need to re-implement all SPIs of the 3 Maven modules under org.apache.shardingsphere:shardingsphere-sharding:5.5.1-SNAPSHOT in your own project. Because the existing logic of shardingsphere is to have a process for metadata introspection, and the rabbit hole is deeper. This will most likely involve dynamically changing the shardingsphere metadata.

因为我是动态的，数据量大的，不可能加载到这里面,容易oom)，才可以继续执行。

I have a hard time understanding why parsing an expression would cause OOM. Although ShardingSphere is limited to using JDK 8 API, I think most custom line expression SPI implementations actually use parallel streams with a shared thread pool like <INTERVAL>, see https://github.com/apache/shardingsphere/blob/master/infra/expr/type/interval/src/main/java/org/apache/shardingsphere/infra/expr/interval/IntervalInlineExpressionParser.java . If the CPU on the device running the JVM has only 8 cores, the shared thread pool has a default thread limit of 8.
In downstream projects, the use of parallel streams can be changed to virtual threads under JDK 21 - JDK 24.

huyu-tom · 2024-08-31T02:04:24Z

Main方法

`
public class Main {

public static void main(String[] args) throws SQLException {
//配置对应不同库的数据源
Map<String, DataSource> dataSourceMap = new HashMap<>();
HikariDataSource dataSource1 = new HikariDataSource();
dataSource1.setDriverClassName("com.mysql.jdbc.Driver");
dataSource1.setJdbcUrl("jdbc:mysql://192.168.0.199:13307/apl_sharding_test_db1");
dataSource1.setUsername("root");
dataSource1.setPassword("xxxx");
dataSourceMap.put("apl_sharding_test_db1", dataSource1);

// HikariDataSource dataSource2 = new HikariDataSource();
// dataSource2.setDriverClassName("com.mysql.jdbc.Driver");
// dataSource2.setJdbcUrl("jdbc:mysql://192.168.0.199:13307/apl_sharding_test_db2");
// dataSource2.setUsername("root");
// dataSource2.setPassword("xxxxx");
// dataSourceMap.put("apl_sharding_test_db2", dataSource2);
//属性配置
Properties props = new Properties();
props.setProperty("sql-show", "true");
//分片规则配置
ShardingRuleConfiguration ruleConfiguration = new ShardingRuleConfiguration();
//表配置
ruleConfiguration.getTables().add(chargeReceivableRuleConfig());

//分库策略(不分库，直接定位库)
Properties databaseProperties = new Properties();
databaseProperties.setProperty("databaseName", "apl_sharding_test_db1");
databaseProperties.setProperty("strategy", "STANDARD");
databaseProperties.setProperty("algorithmClassName",
    "com.apl.shardingjdbc.sharding.AplDefaultDatabaseShardingAlgorithm");
ruleConfiguration.getShardingAlgorithms().put("aplDataBaseShardingStrategy",
    new AlgorithmConfiguration("CLASS_BASED", databaseProperties));
Properties chargeReceivableOrgproperties = new Properties();
chargeReceivableOrgproperties.setProperty("shardingColumn", "inner_org_id");
chargeReceivableOrgproperties.setProperty("strategy", "STANDARD");
chargeReceivableOrgproperties.setProperty("algorithmClassName",
    "com.apl.shardingjdbc.sharding.OrgParamStandardShardingAlgorithm");
ruleConfiguration.getShardingAlgorithms().put("receivableTableOrgStandardShardingStrategy",
    new AlgorithmConfiguration("CLASS_BASED", chargeReceivableOrgproperties));
//创建sharding-dataSource
DataSource shardingDataSource = ShardingSphereDataSourceFactory.createDataSource(dataSourceMap,
    Arrays.asList(ruleConfiguration), props);
Connection connection = shardingDataSource.getConnection();
Statement statement = connection.createStatement();
statement.executeQuery("select * from charge_receivable where inner_org_id = 100");

}

private static ShardingTableRuleConfiguration chargeReceivableRuleConfig() {
ShardingTableRuleConfiguration tableRuleConfiguration = new ShardingTableRuleConfiguration(
"charge_receivable", "apl_sharding_test_db1.charge_receivable");
StandardShardingStrategyConfiguration strategyConfig = new StandardShardingStrategyConfiguration(
"inner_org_id", "receivableTableOrgStandardShardingStrategy");
tableRuleConfiguration.setTableShardingStrategy(strategyConfig);
//由于我们采用租户code,所以 actualDataNodes肯定无法满足,所以必须采用加上分库表
StandardShardingStrategyConfiguration databaseStrategyConfig = new StandardShardingStrategyConfiguration(
"inner_org_id", "aplDataBaseShardingStrategy");
tableRuleConfiguration.setDatabaseShardingStrategy(databaseStrategyConfig);
return tableRuleConfiguration;
}
}
`

分库算法(其实只有库,直接外部传入，相当于直接写死)

`public class AplDefaultDatabaseShardingAlgorithm implements
StandardShardingAlgorithm<Comparable>, ComplexKeysShardingAlgorithm> {

private String databaseName;
private Collection databaseNames;

public void init(Properties properties) {
final var databaseName = properties.getProperty("databaseName");
if (Strings.isNullOrEmpty(databaseName)) {
throw new IllegalArgumentException("databaseName 不能不指定");
}
this.databaseName = databaseName;
this.databaseNames = Arrays.asList(databaseName);
}

@OverRide
public String doSharding(Collection collection,
PreciseShardingValue<Comparable<?>> preciseShardingValue) {
return databaseName;
}

@OverRide
public Collection doSharding(Collection collection,
RangeShardingValue<Comparable<?>> rangeShardingValue) {
return databaseNames;
}

@OverRide
public Collection doSharding(Collection collection,
ComplexKeysShardingValue<Comparable<?>> complexKeysShardingValue) {
return databaseNames;
}
}
`

分表算法 (基础表名+租户ID对应的租户Code)

`public class OrgParamStandardShardingAlgorithm implements StandardShardingAlgorithm<Comparable<?>> {

public void init(Properties properties) {
}

@OverRide
public String doSharding(Collection collection,
PreciseShardingValue<Comparable> preciseShardingValue) { final String logicTableName = preciseShardingValue.getLogicTableName(); final Comparable value = preciseShardingValue.getValue();
if (Number.class.isAssignableFrom(value.getClass())) {
return STR."{logicTableName}_{OrgUtils.getOrgCode(Long.valueOf(value.toString()))}";
} else {
throw new RuntimeException("租户不支持非数字类型");
}
}

@OverRide
public Collection doSharding(Collection collection,
RangeShardingValue<Comparable<?>> rangeShardingValue) {
throw new UnsupportedOperationException("租户分表操作不支持该操作");
}
}`

OrgUtils

`public class OrgUtils {

public static String getOrgCode(Long orgId) {
return "sharding" + orgId;
}
}`

由于Main方法没有指定 actualDataNodes(租户ID无法精确指定，需要从sql分片键中动态获取，并且每个租户id都有一个相对应的表),出现如下错误

Exception in thread "main" java.sql.SQLException: Routed target 'charge_receivable_sharding100' does not exist, available targets are '[charge_receivable]'. at org.apache.shardingsphere.infra.exception.core.external.sql.ShardingSphereSQLException.toSQLException(ShardingSphereSQLException.java:76) at org.apache.shardingsphere.infra.exception.dialect.SQLExceptionTransformEngine.toSQLException(SQLExceptionTransformEngine.java:54) at org.apache.shardingsphere.driver.jdbc.core.statement.ShardingSphereStatement.executeQuery(ShardingSphereStatement.java:188) at com.apl.shardingjdbc.sharding.Main.main(Main.java:72)

sharding-jdbc的版本

<dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>shardingsphere-jdbc</artifactId> <version>5.5.0</version> </dependency>

这种情况我该如何做到不指定actual-data-nodes，能够正常运行

linghengqian · 2024-08-31T02:16:27Z

You actually didn't understand what I meant. You are completely calling the public API of Java JDBC, and you don't really touch upon the essence of metadata changes in ShardingSphere JDBC DataSource.
If you want to implement the ambiguous concept of dynamic tenant splicing, you must change the processing of metadata by ShardingSphere, which is the so-called downstream ShardingSphere private fork, because it conflicts with the design of ShardingSphere. You can propose a draft PR to explain why this feature will not conflict with the design of ShardingSphere, but dynamic tenant splicing are actually not covered by the current design of ShardingSphere.
To obtain ContextManager from ShardingSphere JDBC DataSource, please refer to the unit test of Refactor the acquisition of ContextManager in JDBC metrics #31825 . ShardingSphere Agent operates ShardingSphere metadata with a completely independent design.

huyu-tom · 2024-08-31T02:42:08Z

将动态生成表，直接add到doSharding方法传入的第一个参数里面(LinkedhashSet类型),他是可以执行成功的 `10:32:14.776 [main] INFO ShardingSphere-SQL -- Logic SQL: select * from charge_receivable where inner_org_id = 100 10:32:14.776 [main] INFO ShardingSphere-SQL -- Actual SQL: apl_sharding_test_db1 ::: select * from charge_receivable_sharding100 where inner_org_id = 100`

我个人感觉dosharding返回的结果你们会在传入的参数里面再次判断是否存在，如果不存在的话，就抛出异常,我感觉可以不用加这个判断！！！

You actually didn't understand what I meant. You are completely calling the public API of Java JDBC, and you don't really touch upon the essence of metadata changes in ShardingSphere JDBC DataSource.
其实我不想去触发配置的更改，因为租户ID无法确定数据个数,如果只是将所有的租户的表名加载到actual-data-nodes里面，这不是我所预期的，如果我有上万个租户id的话，例如我的业务里面有100张表的话，我都加载到acutal-data-nodes里面，那么我得消耗大量的内存(OOM)

linghengqian · 2024-08-31T02:44:18Z

I believe we are in a rut. You obviously know what you want, but I don't know what you need. This results in no PR being confirmed.

huyu-tom · 2024-08-31T02:58:05Z

就最开始我想要不指定acutal-data-nodes,随着发现,可以最开始乱指定一个,然后通过自定义分片算法 doSharding方法,动态生成的表名add到入参1集合里面(该集合是一个LinkedhashSet)，然后就可以绕过你们的判断(判断dosharding返回的表名是否在你们的入参1集合里面)，如果这个集合设定可修改的，那你们为什么要进行判定(我没有去看你们的源码，只是猜测)?

linghengqian · 2024-08-31T03:00:15Z

Disclaimer, I'm just an Apache ShardingSphere Committer, and I'm not the original author of this logic. I don't even know which java class of determination you are talking about.

huyu-tom · 2024-08-31T03:15:27Z

这个doSharding返回结果和doSharding这个方法的入参1

linghengqian · 2024-08-31T03:20:25Z

这个doSharding返回结果和doSharding这个方法的入参1

This is just an interface. I noticed that you said it was a logic method, and I assumed that there is an error stack.

我个人感觉dosharding返回的结果你们会在传入的参数里面再次判断是否存在，如果不存在的话，就抛出异常,我感觉可以不用加这个判断！！！

linghengqian · 2024-08-31T03:27:15Z

由于Main方法没有指定 actualDataNodes(租户ID无法精确指定，需要从sql分片键中动态获取，并且每个租户id都有一个相对应的表),出现如下错误
Exception in thread "main" java.sql.SQLException: Routed target 'charge_receivable_sharding100' does not exist, available targets are '[charge_receivable]'. at org.apache.shardingsphere.infra.exception.core.external.sql.ShardingSphereSQLException.toSQLException(ShardingSphereSQLException.java:76) at org.apache.shardingsphere.infra.exception.dialect.SQLExceptionTransformEngine.toSQLException(SQLExceptionTransformEngine.java:54) at org.apache.shardingsphere.driver.jdbc.core.statement.ShardingSphereStatement.executeQuery(ShardingSphereStatement.java:188) at com.apl.shardingjdbc.sharding.Main.main(Main.java:72)

By the way, org.apache.shardingsphere.driver.jdbc.core.statement.ShardingSphereStatement may have been refactored. The executeQuery() method is not currently at line 188 of org.apache.shardingsphere.driver.jdbc.core.statement.ShardingSphereStatement, and I really don't see where the logic of the assertion is.

terrymanu · 2024-08-31T03:53:41Z

For English only, other languages will not accept.

huyu-tom · 2024-08-31T03:56:35Z

This is the judgment. The target returned must be in the availableTargetNames, otherwise an error will be reported. However, I bypassed this judgment by adding it in. However, availableTargetNames is a shared (different connections) and thread-unsafe collection, so the solution of dynamically adding the availableTargetNames collection cannot be adopted! The final idea is: Can this judgment be omitted? It is highly likely that it will not affect the overall logic. Although I did not actually specify actual-data-nodes, the database and table can be specified through the sharding algorithm, and the table in the database must have been created))

linghengqian · 2024-08-31T04:00:36Z

I would first recommend you to run the unit tests as a series of bash commands in the Github Actions file in the master branch. Secondly, I would recommend you to open a new issue, because improving the logic here does not seem to be the original topic of this issue.

For example, xxx_orgCode, orgCode is an unpredictable or indeterminate number.

linghengqian added the status: need more info label Aug 30, 2024

terrymanu closed this as completed Aug 31, 2024

terrymanu added status: invalid and removed status: need more info labels Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why must actualDataNodes be specified? Some requirements are to add dynamic special characters to the basic table name. #32734

Why must actualDataNodes be specified? Some requirements are to add dynamic special characters to the basic table name. #32734

huyu-tom commented Aug 30, 2024

linghengqian commented Aug 30, 2024

huyu-tom commented Aug 30, 2024 •

edited

Loading

linghengqian commented Aug 30, 2024

huyu-tom commented Aug 30, 2024

linghengqian commented Aug 30, 2024

huyu-tom commented Aug 30, 2024

linghengqian commented Aug 30, 2024

huyu-tom commented Aug 31, 2024 •

edited

Loading

linghengqian commented Aug 31, 2024

huyu-tom commented Aug 31, 2024 •

edited

Loading

linghengqian commented Aug 31, 2024

huyu-tom commented Aug 31, 2024

linghengqian commented Aug 31, 2024

huyu-tom commented Aug 31, 2024

linghengqian commented Aug 31, 2024

linghengqian commented Aug 31, 2024

terrymanu commented Aug 31, 2024

huyu-tom commented Aug 31, 2024

linghengqian commented Aug 31, 2024

Why must actualDataNodes be specified? Some requirements are to add dynamic special characters to the basic table name. #32734

Why must actualDataNodes be specified? Some requirements are to add dynamic special characters to the basic table name. #32734

Comments

huyu-tom commented Aug 30, 2024

linghengqian commented Aug 30, 2024

huyu-tom commented Aug 30, 2024 • edited Loading

linghengqian commented Aug 30, 2024

huyu-tom commented Aug 30, 2024

linghengqian commented Aug 30, 2024

huyu-tom commented Aug 30, 2024

linghengqian commented Aug 30, 2024

huyu-tom commented Aug 31, 2024 • edited Loading

Main方法

分库算法(其实只有库,直接外部传入，相当于直接写死)

分表算法 (基础表名+租户ID对应的租户Code)

OrgUtils

由于Main方法没有指定 actualDataNodes(租户ID无法精确指定，需要从sql分片键中动态获取，并且每个租户id都有一个相对应的表),出现如下错误

sharding-jdbc的版本

linghengqian commented Aug 31, 2024

huyu-tom commented Aug 31, 2024 • edited Loading

linghengqian commented Aug 31, 2024

huyu-tom commented Aug 31, 2024

linghengqian commented Aug 31, 2024

huyu-tom commented Aug 31, 2024

linghengqian commented Aug 31, 2024

linghengqian commented Aug 31, 2024

terrymanu commented Aug 31, 2024

huyu-tom commented Aug 31, 2024

linghengqian commented Aug 31, 2024

huyu-tom commented Aug 30, 2024 •

edited

Loading

huyu-tom commented Aug 31, 2024 •

edited

Loading

huyu-tom commented Aug 31, 2024 •

edited

Loading