Improve doc #690

xuzhenbao · 2023-11-23T11:56:28Z

Improve document for RSA and DFI

It fixes #510.

PengZheng

LGTM.

Nice work! I learned a lot from reviewing it. The dfi documentation is especially valuable.
There are ambiguities which need clarification, though.
And by reading it, I found a serious design flaw of discovery_zeroconf that needs to be addressed by another PR, which illustrates from another aspect the usefulness of this PR.

PengZheng · 2023-11-24T03:15:20Z

bundles/remote_services/discovery_zeroconf/README.md

+
+Because We will perform the mDNS query only using link-local multicast, so we set domain name default value "local".
+
+To reduce the operation of conversion between host name and address info. we set the address info to txt record, and set the host name and port to a dummy value("celix_rpc_dumb_host.local." and "50009").


The content of TXT is set by the service provider. This design will force them to watch for address change, which violates the rationale of zeroconf and suggests a severe design flaw. I'll have a good read of RFC 6762 and 6763 this weekend before making further comments on this.

PengZheng · 2023-11-24T06:17:40Z

bundles/remote_services/rsa_rpc_json/README.md

+
+![remote_service_proxy_use_seq.png](diagrams/remote_service_proxy_use_seq.png)
+
+In the above process, each consumer of the remote service will have a different service proxy, because the service proxy needs to use the interface description file in the consumer (which may be a bundle) to serialize the service call information.


This does not feel right. Currently, service descriptor is bundled by its various consumers.
THis is far less than ideal. We'd better let service discovery and RSA fetch service descriptors from service providers at runtime. Alternatively, we can generate the descriptor from the service header, i.e. solve #590.

In the same process, there should be one unique copy of service descriptor for each remote service instance.

In the same process, there should be one unique copy of service descriptor for each remote service instance.

I think this is the key issue. Only 1 unique descriptor for a service per process.

Ideally the consumer and provider are also aligned, but this is not always necessary. It is possible for the provider to provide newer version of the service, as long as it is backwards compatible.

For example a consumer using the following service:

#define FOO_SERVICE_NAME "foo" #define FOO_SERVICE_VERSION "1.0.0" struct { void* handle; int foo(const char* string); } my_service

Can use a remote! provided service with the following service:

#define FOO_SERVICE_NAME "foo" #define FOO_SERVICE_VERSION "1.1.0" struct { void* handle; int bar(const char* string); int foo(const char* string); } my_service

Note that because bar is added before foo the updated service is not binary backwards compatible inside the same process. But for remote calls, as long as we can identity which method signature is called, different service versions is possible.

For the pubsub serializer handler different version is actual handled:
https://github.com/apache/celix/blob/rel/celix-2.4.0/bundles/pubsub/pubsub_utils/src/pubsub_serializer_handler.c#L237-L242

I also though there was a check on the literal descriptor content, but apparently not.

PengZheng · 2023-11-24T08:02:38Z

libs/dfi/README.md

+
+  |**Identifier**|B  |D     |F    |I      |J      |S      |V   |Z             |b    | i      | j      | s      |P     |t     |N  | 
+  |---------|---|------|-----|-------|-------|-------|----|--------------|-----|--------|--------|--------|------|------|---|
+  |**Types**|char|double|float|int32_t|int64_t|int16_t|void|boolean(uint8)|uchar|uint32_t|uint64_t|uint16_t|void *|char *|int|


Now that we have *, why we need t for char *?

I think the t is not really necessary anymore. IIRC I added the t for 2 reasons:

Although C uses a char* as string, in other languages a string type is often a special built-in type.

In the beginning of libdfi there was not support to specific if something was a pointer or not, so t was needed to specify a char*, now *B can be used.

Maybe we still need char * for plain char * and t for null-terminate c string.

PengZheng · 2023-11-24T08:03:04Z

libs/dfi/README.md

+
+  *Type schema*:
+  ~~~
+  L(name);//shortcut for *l(name);


Now that we have *, why we need *l?

Given the support for *, I think we can drop L and t so that we do not have alternatives so "say" the same thing.

pnoltes · 2023-11-27T20:21:17Z

It's great to see some sorely missed documentation being added. I will try to review this PR this week.

pnoltes

Nice work :). I do have some remarks, but overall this looks good.

bundles/remote_services/README.md

bundles/remote_services/discovery_zeroconf/README.md

pnoltes · 2023-11-28T15:15:40Z

bundles/remote_services/discovery_zeroconf/README.md

+
+To reduce the operation of conversion between host name and address info. we set the address info to txt record, and set the host name and port to a dummy value("celix_rpc_dumb_host.local." and "50009").
+
+We set the instance name of the mDNS service as `service_name + hash(endpoint uuid)`. If there is a conflict in the instance name, mDNS_daemon will resolve it. Since the maximum size of the mDNS service instance name is 64 bytes, we take the hash of the endpoint uuid here, which also reduces the probability of instance name conflicts.


Why is the service name still used, a UUID should be unique enough. Is this for readability/debugging?

Usage of UUID in instance name is explicitly discourage by RFC 6763:

The default name should be short and
descriptive, and SHOULD NOT include the device's Media Access Control
(MAC) address, serial number, or any similar incomprehensible
hexadecimal string in an attempt to make the name globally unique.

The rational is explained here.

Therefore, UUID should be removed from instance name altogether.

pnoltes · 2023-11-28T15:19:04Z

bundles/remote_services/discovery_zeroconf/diagrams/service_announce_seq.puml

Nice to see the usage of plant UML for some diagrams 😄

bundles/remote_services/remote_service_admin_shm_v2/README.md

pnoltes · 2023-11-28T15:31:34Z

...les/remote_services/remote_service_admin_shm_v2/diagrams/rsa_shm_remote_service_call_seq.png

Again nice to see some sequence diagrams. Especially for a complex sequence like a remote call over the remote service admin 👍

bundles/remote_services/rsa_rpc_json/README.md

pnoltes · 2023-11-28T16:34:05Z

bundles/remote_services/rsa_rpc_json/README.md

+
+![remote_service_proxy_use_seq.png](diagrams/remote_service_proxy_use_seq.png)
+
+In the above process, each consumer of the remote service will have a different service proxy, because the service proxy needs to use the interface description file in the consumer (which may be a bundle) to serialize the service call information.


In the same process, there should be one unique copy of service descriptor for each remote service instance.

I think this is the key issue. Only 1 unique descriptor for a service per process.

Ideally the consumer and provider are also aligned, but this is not always necessary. It is possible for the provider to provide newer version of the service, as long as it is backwards compatible.

For example a consumer using the following service:

#define FOO_SERVICE_NAME "foo" #define FOO_SERVICE_VERSION "1.0.0" struct { void* handle; int foo(const char* string); } my_service

Can use a remote! provided service with the following service:

#define FOO_SERVICE_NAME "foo" #define FOO_SERVICE_VERSION "1.1.0" struct { void* handle; int bar(const char* string); int foo(const char* string); } my_service

Note that because bar is added before foo the updated service is not binary backwards compatible inside the same process. But for remote calls, as long as we can identity which method signature is called, different service versions is possible.

For the pubsub serializer handler different version is actual handled:
https://github.com/apache/celix/blob/rel/celix-2.4.0/bundles/pubsub/pubsub_utils/src/pubsub_serializer_handler.c#L237-L242

I also though there was a check on the literal descriptor content, but apparently not.

pnoltes · 2023-11-28T17:29:14Z

libs/dfi/README.md

+
+  |**Identifier**|B  |D     |F    |I      |J      |S      |V   |Z             |b    | i      | j      | s      |P     |t     |N  | 
+  |---------|---|------|-----|-------|-------|-------|----|--------------|-----|--------|--------|--------|------|------|---|
+  |**Types**|char|double|float|int32_t|int64_t|int16_t|void|boolean(uint8)|uchar|uint32_t|uint64_t|uint16_t|void *|char *|int|


I think the t is not really necessary anymore. IIRC I added the t for 2 reasons:

Although C uses a char* as string, in other languages a string type is often a special built-in type.

In the beginning of libdfi there was not support to specific if something was a pointer or not, so t was needed to specify a char*, now *B can be used.

pnoltes · 2023-11-28T17:31:10Z

libs/dfi/README.md

+
+  *Type schema*:
+  ~~~
+  L(name);//shortcut for *l(name);


Given the support for *, I think we can drop L and t so that we do not have alternatives so "say" the same thing.

PengZheng · 2023-12-01T09:58:23Z

I've finished reading RFC 6762 and 6763 (word by word), and have some further remarks on discovery_zeroconf:

Large TXT Records

As previously pointed out by @pnoltes in an email:

One of the issue we had with mDNS, is that payload size is very
limited and as result the remote service discovery was updated to a 2
step approach:

The current design adopt the approach sketched by Service Instances with Multiple TXT Records
.
For this to work reliably, care must be taken to make sure that IP fragmentation does not happen, because

Even on hosts that normally
handle Ethernet "Jumbo" packets and IP fragment reassembly, it is
becoming more common for these hosts to implement power-saving modes
where the main CPU goes to sleep and hands off packet reception tasks
to a more limited processor in the network interface hardware, which
may not support Ethernet "Jumbo" packets or IP fragment reassembly.

A snapshot of mDNS wireshark capture will be very helpful to illustrate the current approach.

From the current documentation (I have not checked the implementation), it is not clear how we deal with packet loss.
For service whose service advertisement fit in a single message, it is never a problem. But for services generating large TXT records, how do we guarantee to collect all associated TXT records.

Do we have a test case for packet loss? I fully understand it is challenging to test this. But if we do have one, it rocks.

Fixed Hostname and Port in SRV

As mentioned above, this basically defeats the purpose of zeroconf. Don't do this!
Let just use the hostname of the OS.
For RSA over HTTP, port should be the real port listened by the HTTP server.
For RSA over shared memory, port could be 0.

It is OK to store URL in TXT records, but note that

The target host name and TCP (or UDP) port number of the service are
given in the SRV record. This information -- target host name and
port number -- MUST NOT be duplicated using key/value attributes in
the TXT record.

URL/URI does not necessarily contain host and port. For more, check URI Syntax Components.

Service Instance Name and Service Subtypes

If the application uses rsa_shm alone, then only service instances exported by rsa_shm should be discovered.
This is not the case (and less than ideal) for the current implementation: all service instances are enumerated by discovery_zeroconf. By using service subtypes, if the application only uses rsa_shm, discovery_zeroconf should search for something like _shm._sub._celix-rpc._udp rather than _celix-rpc._udp. For a user who want to list all exported services for debugging purpose, he/she can search for _celix-rpc._udp. Check Selective Instance Enumeration for more on service subtypes.

We only relies on the uniqueness of service instance name, otherwise It's mostly used for debugging.
Fortunately, we only need to provide an informative name, mDNSResponder will help to guarantee its uniqueness:

The default name should be short and
descriptive, and SHOULD NOT include the device's Media Access Control
(MAC) address, serial number, or any similar incomprehensible
hexadecimal string in an attempt to make the name globally unique.
For discussion of why names don't need to be (and SHOULD
NOT be) made unique at the factory, see Appendix D, "Choice of
Factory-Default Names".

Service name alone is not informative enough. I suggest we include pid of the service provider, which is very important for debugging a misbehaved remote service.

A shell command or the existing mDNSResponder command tool should be a useful debugging tool.

libs/dfi/README.md

…ove_doc

… improve_doc

PengZheng · 2024-02-22T06:09:18Z

Given that #710 has been merged, is this PR ready now? @xuzhenbao @pnoltes
Note that parts of #699 may need to be updated into this.

xuzhenbao · 2024-02-22T07:34:44Z

Given that #710 has been merged, is this PR ready now? @xuzhenbao @pnoltes Note that parts of #699 may need to be updated into this.

I will update this PR, please wait a few day.

…ove_doc

codecov-commenter · 2024-02-27T07:16:28Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.45%. Comparing base (423abb8) to head (7b3ea01).
Report is 13 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #690      +/-   ##
==========================================
+ Coverage   89.32%   89.45%   +0.13%     
==========================================
  Files         216      216              
  Lines       25153    25294     +141     
==========================================
+ Hits        22468    22627     +159     
+ Misses       2685     2667      -18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

xuzhenbao · 2024-02-27T08:27:59Z

Hi @pnoltes
I have update this PR, it includes the following changes:

Add document for dynamic IP mechanism.
Add document for PR Feature/dfi cleanup #699 in libdfi README.md

pnoltes · 2024-02-27T21:41:45Z

Hi @pnoltes I have update this PR, it includes the following changes:

1. Add document for dynamic IP mechanism.

2. Add document for PR [Feature/dfi cleanup #699](https://github.com/apache/celix/pull/699) in libdfi README.md

Thanks, LGTM.

xuzhenbao and others added 4 commits November 20, 2023 16:30

Improve doc for RSA

5c8883e

Improve doc for RSA

3bad2b3

Merge branch 'apache:master' into improve_doc

14ea913

Merge branch 'apache:master' into improve_doc

891db5f

PengZheng requested review from pnoltes and PengZheng November 23, 2023 12:03

PengZheng approved these changes Nov 24, 2023

View reviewed changes

PengZheng linked an issue Nov 24, 2023 that may be closed by this pull request

Outdated top level README.md of remote_services #669

Closed

pnoltes requested changes Nov 28, 2023

View reviewed changes

PengZheng reviewed Dec 19, 2023

View reviewed changes

libs/dfi/README.md Outdated Show resolved Hide resolved

Update doc of remote service

bac6049

PengZheng linked an issue Jan 1, 2024 that may be closed by this pull request

Document libdfi with a top level README.md #637

Closed

xuzhenbao added 2 commits January 1, 2024 14:13

Merge branch 'master' of https://github.com/xuzhenbao/celix into impr…

d161681

…ove_doc

Merge branch 'improve_doc' of https://github.com/xuzhenbao/celix into…

90f02c1

… improve_doc

xuzhenbao mentioned this pull request Jan 1, 2024

Improve zeroconf discovery #710

Merged

PengZheng requested a review from pnoltes January 2, 2024 02:11

PengZheng mentioned this pull request Jan 28, 2024

Feature/dfi cleanup #699

Merged

xuzhenbao added 2 commits February 26, 2024 09:14

Merge branch 'master' of https://github.com/xuzhenbao/celix into impr…

2f9a93a

…ove_doc

Add document for libdfi and dynamic ip mechanism

7b3ea01

pnoltes approved these changes Feb 27, 2024

View reviewed changes

xuzhenbao merged commit bc09982 into apache:master Feb 28, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve doc #690

Improve doc #690

xuzhenbao commented Nov 23, 2023 •

edited by PengZheng

Loading

PengZheng left a comment •

edited

Loading

PengZheng Nov 24, 2023

PengZheng Nov 24, 2023

pnoltes Nov 28, 2023

PengZheng Nov 24, 2023

pnoltes Nov 28, 2023

PengZheng Nov 29, 2023

PengZheng Nov 24, 2023

pnoltes Nov 28, 2023

pnoltes commented Nov 27, 2023

pnoltes left a comment

pnoltes Nov 28, 2023

PengZheng Dec 1, 2023 •

edited

Loading

pnoltes Nov 28, 2023

pnoltes Nov 28, 2023

pnoltes Nov 28, 2023

pnoltes Nov 28, 2023

pnoltes Nov 28, 2023

PengZheng commented Dec 1, 2023 •

edited

Loading

PengZheng commented Feb 22, 2024 •

edited

Loading

xuzhenbao commented Feb 22, 2024

codecov-commenter commented Feb 27, 2024

xuzhenbao commented Feb 27, 2024

pnoltes commented Feb 27, 2024


		Because We will perform the mDNS query only using link-local multicast, so we set domain name default value "local".

		To reduce the operation of conversion between host name and address info. we set the address info to txt record, and set the host name and port to a dummy value("celix_rpc_dumb_host.local." and "50009").


		![remote_service_proxy_use_seq.png](diagrams/remote_service_proxy_use_seq.png)

		In the above process, each consumer of the remote service will have a different service proxy, because the service proxy needs to use the interface description file in the consumer (which may be a bundle) to serialize the service call information.


		To reduce the operation of conversion between host name and address info. we set the address info to txt record, and set the host name and port to a dummy value("celix_rpc_dumb_host.local." and "50009").

		We set the instance name of the mDNS service as `service_name + hash(endpoint uuid)`. If there is a conflict in the instance name, mDNS_daemon will resolve it. Since the maximum size of the mDNS service instance name is 64 bytes, we take the hash of the endpoint uuid here, which also reduces the probability of instance name conflicts.

Improve doc #690

Improve doc #690

Conversation

xuzhenbao commented Nov 23, 2023 • edited by PengZheng Loading

PengZheng left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pnoltes commented Nov 27, 2023

pnoltes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PengZheng Dec 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PengZheng commented Dec 1, 2023 • edited Loading

Large TXT Records

Fixed Hostname and Port in SRV

Service Instance Name and Service Subtypes

PengZheng commented Feb 22, 2024 • edited Loading

xuzhenbao commented Feb 22, 2024

codecov-commenter commented Feb 27, 2024

Codecov Report

xuzhenbao commented Feb 27, 2024

pnoltes commented Feb 27, 2024

xuzhenbao commented Nov 23, 2023 •

edited by PengZheng

Loading

PengZheng left a comment •

edited

Loading

PengZheng Dec 1, 2023 •

edited

Loading

PengZheng commented Dec 1, 2023 •

edited

Loading

PengZheng commented Feb 22, 2024 •

edited

Loading