Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve doc #690

Merged
merged 9 commits into from
Feb 28, 2024
Merged

Improve doc #690

merged 9 commits into from
Feb 28, 2024

Conversation

xuzhenbao
Copy link
Contributor

@xuzhenbao xuzhenbao commented Nov 23, 2023

Improve document for RSA and DFI

It fixes #510.

Copy link
Contributor

@PengZheng PengZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Nice work! I learned a lot from reviewing it. The dfi documentation is especially valuable.
There are ambiguities which need clarification, though.
And by reading it, I found a serious design flaw of discovery_zeroconf that needs to be addressed by another PR, which illustrates from another aspect the usefulness of this PR.


Because We will perform the mDNS query only using link-local multicast, so we set domain name default value "local".

To reduce the operation of conversion between host name and address info. we set the address info to txt record, and set the host name and port to a dummy value("celix_rpc_dumb_host.local." and "50009").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content of TXT is set by the service provider. This design will force them to watch for address change, which violates the rationale of zeroconf and suggests a severe design flaw. I'll have a good read of RFC 6762 and 6763 this weekend before making further comments on this.


![remote_service_proxy_use_seq.png](diagrams/remote_service_proxy_use_seq.png)

In the above process, each consumer of the remote service will have a different service proxy, because the service proxy needs to use the interface description file in the consumer (which may be a bundle) to serialize the service call information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not feel right. Currently, service descriptor is bundled by its various consumers.
THis is far less than ideal. We'd better let service discovery and RSA fetch service descriptors from service providers at runtime. Alternatively, we can generate the descriptor from the service header, i.e. solve #590.

In the same process, there should be one unique copy of service descriptor for each remote service instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same process, there should be one unique copy of service descriptor for each remote service instance.

I think this is the key issue. Only 1 unique descriptor for a service per process.

Ideally the consumer and provider are also aligned, but this is not always necessary. It is possible for the provider to provide newer version of the service, as long as it is backwards compatible.

For example a consumer using the following service:

#define FOO_SERVICE_NAME "foo"
#define FOO_SERVICE_VERSION "1.0.0"

struct {
  void* handle;
  int foo(const char* string); 
} my_service

Can use a remote! provided service with the following service:

#define FOO_SERVICE_NAME "foo"
#define FOO_SERVICE_VERSION "1.1.0"

struct {
  void* handle;
  int bar(const char* string);
  int foo(const char* string); 
} my_service

Note that because bar is added before foo the updated service is not binary backwards compatible inside the same process. But for remote calls, as long as we can identity which method signature is called, different service versions is possible.

For the pubsub serializer handler different version is actual handled:
https://github.com/apache/celix/blob/rel/celix-2.4.0/bundles/pubsub/pubsub_utils/src/pubsub_serializer_handler.c#L237-L242

I also though there was a check on the literal descriptor content, but apparently not.


|**Identifier**|B |D |F |I |J |S |V |Z |b | i | j | s |P |t |N |
|---------|---|------|-----|-------|-------|-------|----|--------------|-----|--------|--------|--------|------|------|---|
|**Types**|char|double|float|int32_t|int64_t|int16_t|void|boolean(uint8)|uchar|uint32_t|uint64_t|uint16_t|void *|char *|int|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have *, why we need t for char *?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the t is not really necessary anymore. IIRC I added the t for 2 reasons:

  1. Although C uses a char* as string, in other languages a string type is often a special built-in type.
  2. In the beginning of libdfi there was not support to specific if something was a pointer or not, so t was needed to specify a char*, now *B can be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we still need char * for plain char * and t for null-terminate c string.


*Type schema*:
~~~
L(name);//shortcut for *l(name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have *, why we need *l?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the support for *, I think we can drop L and t so that we do not have alternatives so "say" the same thing.

@PengZheng PengZheng linked an issue Nov 24, 2023 that may be closed by this pull request
@pnoltes
Copy link
Contributor

pnoltes commented Nov 27, 2023

It's great to see some sorely missed documentation being added. I will try to review this PR this week.

Copy link
Contributor

@pnoltes pnoltes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work :). I do have some remarks, but overall this looks good.

bundles/remote_services/README.md Outdated Show resolved Hide resolved
bundles/remote_services/discovery_zeroconf/README.md Outdated Show resolved Hide resolved

To reduce the operation of conversion between host name and address info. we set the address info to txt record, and set the host name and port to a dummy value("celix_rpc_dumb_host.local." and "50009").

We set the instance name of the mDNS service as `service_name + hash(endpoint uuid)`. If there is a conflict in the instance name, mDNS_daemon will resolve it. Since the maximum size of the mDNS service instance name is 64 bytes, we take the hash of the endpoint uuid here, which also reduces the probability of instance name conflicts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the service name still used, a UUID should be unique enough. Is this for readability/debugging?

Copy link
Contributor

@PengZheng PengZheng Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage of UUID in instance name is explicitly discourage by RFC 6763:

The default name should be short and
descriptive, and SHOULD NOT include the device's Media Access Control
(MAC) address, serial number, or any similar incomprehensible
hexadecimal string in an attempt to make the name globally unique.

The rational is explained here.

Therefore, UUID should be removed from instance name altogether.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see the usage of plant UML for some diagrams 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again nice to see some sequence diagrams. Especially for a complex sequence like a remote call over the remote service admin 👍

bundles/remote_services/rsa_rpc_json/README.md Outdated Show resolved Hide resolved

![remote_service_proxy_use_seq.png](diagrams/remote_service_proxy_use_seq.png)

In the above process, each consumer of the remote service will have a different service proxy, because the service proxy needs to use the interface description file in the consumer (which may be a bundle) to serialize the service call information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same process, there should be one unique copy of service descriptor for each remote service instance.

I think this is the key issue. Only 1 unique descriptor for a service per process.

Ideally the consumer and provider are also aligned, but this is not always necessary. It is possible for the provider to provide newer version of the service, as long as it is backwards compatible.

For example a consumer using the following service:

#define FOO_SERVICE_NAME "foo"
#define FOO_SERVICE_VERSION "1.0.0"

struct {
  void* handle;
  int foo(const char* string); 
} my_service

Can use a remote! provided service with the following service:

#define FOO_SERVICE_NAME "foo"
#define FOO_SERVICE_VERSION "1.1.0"

struct {
  void* handle;
  int bar(const char* string);
  int foo(const char* string); 
} my_service

Note that because bar is added before foo the updated service is not binary backwards compatible inside the same process. But for remote calls, as long as we can identity which method signature is called, different service versions is possible.

For the pubsub serializer handler different version is actual handled:
https://github.com/apache/celix/blob/rel/celix-2.4.0/bundles/pubsub/pubsub_utils/src/pubsub_serializer_handler.c#L237-L242

I also though there was a check on the literal descriptor content, but apparently not.


|**Identifier**|B |D |F |I |J |S |V |Z |b | i | j | s |P |t |N |
|---------|---|------|-----|-------|-------|-------|----|--------------|-----|--------|--------|--------|------|------|---|
|**Types**|char|double|float|int32_t|int64_t|int16_t|void|boolean(uint8)|uchar|uint32_t|uint64_t|uint16_t|void *|char *|int|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the t is not really necessary anymore. IIRC I added the t for 2 reasons:

  1. Although C uses a char* as string, in other languages a string type is often a special built-in type.
  2. In the beginning of libdfi there was not support to specific if something was a pointer or not, so t was needed to specify a char*, now *B can be used.


*Type schema*:
~~~
L(name);//shortcut for *l(name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the support for *, I think we can drop L and t so that we do not have alternatives so "say" the same thing.

@PengZheng
Copy link
Contributor

PengZheng commented Dec 1, 2023

I've finished reading RFC 6762 and 6763 (word by word), and have some further remarks on discovery_zeroconf:

Large TXT Records

As previously pointed out by @pnoltes in an email:

One of the issue we had with mDNS, is that payload size is very
limited and as result the remote service discovery was updated to a 2
step approach:

The current design adopt the approach sketched by Service Instances with Multiple TXT Records
.
For this to work reliably, care must be taken to make sure that IP fragmentation does not happen, because

Even on hosts that normally
handle Ethernet "Jumbo" packets and IP fragment reassembly, it is
becoming more common for these hosts to implement power-saving modes
where the main CPU goes to sleep and hands off packet reception tasks
to a more limited processor in the network interface hardware, which
may not support Ethernet "Jumbo" packets or IP fragment reassembly.

A snapshot of mDNS wireshark capture will be very helpful to illustrate the current approach.

From the current documentation (I have not checked the implementation), it is not clear how we deal with packet loss.
For service whose service advertisement fit in a single message, it is never a problem. But for services generating large TXT records, how do we guarantee to collect all associated TXT records.

Do we have a test case for packet loss? I fully understand it is challenging to test this. But if we do have one, it rocks.

Fixed Hostname and Port in SRV

As mentioned above, this basically defeats the purpose of zeroconf. Don't do this!
Let just use the hostname of the OS.
For RSA over HTTP, port should be the real port listened by the HTTP server.
For RSA over shared memory, port could be 0.

It is OK to store URL in TXT records, but note that

The target host name and TCP (or UDP) port number of the service are
given in the SRV record. This information -- target host name and
port number -- MUST NOT be duplicated using key/value attributes in
the TXT record.

URL/URI does not necessarily contain host and port. For more, check URI Syntax Components.

Service Instance Name and Service Subtypes

If the application uses rsa_shm alone, then only service instances exported by rsa_shm should be discovered.
This is not the case (and less than ideal) for the current implementation: all service instances are enumerated by discovery_zeroconf. By using service subtypes, if the application only uses rsa_shm, discovery_zeroconf should search for something like _shm._sub._celix-rpc._udp rather than _celix-rpc._udp. For a user who want to list all exported services for debugging purpose, he/she can search for _celix-rpc._udp. Check Selective Instance Enumeration for more on service subtypes.

We only relies on the uniqueness of service instance name, otherwise It's mostly used for debugging.
Fortunately, we only need to provide an informative name, mDNSResponder will help to guarantee its uniqueness:

The default name should be short and
descriptive, and SHOULD NOT include the device's Media Access Control
(MAC) address, serial number, or any similar incomprehensible
hexadecimal string in an attempt to make the name globally unique.
For discussion of why names don't need to be (and SHOULD
NOT be) made unique at the factory, see Appendix D, "Choice of
Factory-Default Names".

Service name alone is not informative enough. I suggest we include pid of the service provider, which is very important for debugging a misbehaved remote service.

A shell command or the existing mDNSResponder command tool should be a useful debugging tool.

libs/dfi/README.md Outdated Show resolved Hide resolved
@PengZheng PengZheng linked an issue Jan 1, 2024 that may be closed by this pull request
@PengZheng
Copy link
Contributor

PengZheng commented Feb 22, 2024

Given that #710 has been merged, is this PR ready now? @xuzhenbao @pnoltes
Note that parts of #699 may need to be updated into this.

@xuzhenbao
Copy link
Contributor Author

Given that #710 has been merged, is this PR ready now? @xuzhenbao @pnoltes Note that parts of #699 may need to be updated into this.

I will update this PR, please wait a few day.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.45%. Comparing base (423abb8) to head (7b3ea01).
Report is 13 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #690      +/-   ##
==========================================
+ Coverage   89.32%   89.45%   +0.13%     
==========================================
  Files         216      216              
  Lines       25153    25294     +141     
==========================================
+ Hits        22468    22627     +159     
+ Misses       2685     2667      -18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xuzhenbao
Copy link
Contributor Author

Hi @pnoltes
I have update this PR, it includes the following changes:

  1. Add document for dynamic IP mechanism.
  2. Add document for PR Feature/dfi cleanup #699 in libdfi README.md

@pnoltes
Copy link
Contributor

pnoltes commented Feb 27, 2024

Hi @pnoltes I have update this PR, it includes the following changes:

1. Add document for dynamic IP mechanism.

2. Add document for PR [Feature/dfi cleanup #699](https://github.com/apache/celix/pull/699) in libdfi README.md

Thanks, LGTM.

@xuzhenbao xuzhenbao merged commit bc09982 into apache:master Feb 28, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants