(This document is based on curl's CONTRIBUTE.md - thank you!)
This document is intended to offer guidelines on how to best contribute to the librdkafka project. This concerns new features as well as bug fixes and general improvements.
When contributing with code, you agree to put your changes and new code under the same license librdkafka is already using unless stated and agreed otherwise.
When changing existing source code, you do not alter the copyright of the original file(s). The copyright will still be owned by the original creator(s) or those who have been assigned copyright by the original author(s).
By submitting a patch to the librdkafka, you are assumed to have the right to the code and to be allowed by your employer or whatever to hand over that patch/code to us. We will credit you for your changes as far as possible, to give credit but also to keep a trace back to who made what changes. Please always provide us with your full real name when contributing!
Official librdkafka project maintainer(s) assume ownership and copyright ownership of all accepted submissions.
librdkafka maintains a strict API and ABI compatibility guarantee, we guarantee not to break existing applications and we honour the SONAME version.
Note: ABI compatibility is guaranteed only for the C library, not C++.
Note to librdkafka maintainers:
Don't think we can or should bump the SONAME version, it will break all
existing applications relying on librdkafka, and there's no change important
enough to warrant that.
Instead deprecate (but keep) old APIs and add new better APIs as required.
Deprecate APIs through documentation (@deprecate ..
) rather than
compiler hints (RD_DEPRECATED
) - since the latter will cause compilation
warnings/errors for users.
Existing public APIs MUST NEVER be changed, as this would be a breaking API and ABI change. This line must never be crossed.
This means that no changes are allowed to:
- public function or method signatures - arguments, types, return values.
- public structs - existing fields may not be modified and new fields must not be added.
As for semantic changes (i.e., a function changes its behaviour), these are allowed under the following conditions:
- the existing behaviour that is changed is not documented and not widely relied upon. Typically this revolves around what error codes a function returns.
- the existing behaviour is well known but is clearly wrong and consistently trips people up.
All such changes must be clearly stated in the "Upgrade considerations" section of the release in CHANGELOG.md.
Since changes to existing APIs are strictly limited to the above rules, it is also clear that new APIs must be delicately designed to be complete and future proof, since once they've been introduced they can never be changed.
- Never add public structs - there are some public structs in librdkafka and they were all mistakes, they've all been headaches. Instead add private types and provide accessor methods to set/get values. This allows future extension without breaking existing applications.
- Avoid adding synchronous APIs, try to make them asynch by the use of
rd_kafka_queue_t
result queues, if possible. This may complicate the APIs a bit, but they're most of the time abstracted in higher-level language clients and it allows both synchronous and asynchronous usage.
librdkafka is highly portable and needs to stay that way; this means we're limited to almost-but-not-quite C99, and standard library (libc, et.al) functions that are generally available across platforms.
Also avoid adding new dependencies since dependency availability across platforms and package managers are a common problem.
If an external dependency is required, make sure that it is available as a vcpkg, and also add it as a source build dependency to mklove (see mklove/modules/configure.libcurl for an example) so that it can be built and linked statically into librdkafka as part of the packaging process.
Less is more. Don't try to be fancy, be boring.
When writing C code, follow the code style already established in the project. Consistent style makes code easier to read and mistakes less likely to happen.
clang-format is used to check, and fix, the style for C/C++ files, while flake8 and autopep8 is used for the Python scripts.
You must check the style before committing by running make style-check-changed
from the top-level directory, and if any style errors are reported you can
automatically fix them using make style-fix-changed
(or just run
that command directly).
The Python code may need some manual fixing since autopep8 is unable to fix
all warnings reported by flake8, in particular it will not split long lines,
in which case a # noqa: E501
may be needed to turn off the warning.
See the end of this document for the C style guide to use in librdkafka.
It is annoying when you get a huge patch from someone that is said to fix 511 odd problems, but discussions and opinions don't agree with 510 of them - or 509 of them were already fixed in a different way. Then the person merging this change needs to extract the single interesting patch from somewhere within the huge pile of source, and that gives a lot of extra work.
Preferably, each fix that correct a problem should be in its own patch/commit with its own description/commit message stating exactly what they correct so that all changes can be selectively applied by the maintainer or other interested parties.
Also, separate changes enable bisecting much better when we track problems and regression in the future.
Please try to make your patches against latest master branch.
Bugfixes should also include a new test case in the regression test suite that verifies the bug is fixed. Create a new tests/00-<short_bug_description>.c file and try to reproduce the issue in its most simple form. Verify that the test case fails for earlier versions and passes with your bugfix in-place.
New features and APIs should also result in an added test case.
Submitted patches must pass all existing tests. For more information on the test suite see [tests/README.md].
File a pull request on github
Your change will be reviewed and discussed there and you will be expected to correct flaws pointed out and update accordingly, or the change risk stalling and eventually just get deleted without action. As a submitter of a change, you are the owner of that change until it has been merged.
Make sure to monitor your PR on github and answer questions and/or fix nits/flaws. This is very important. We will take lack of replies as a sign that you're not very anxious to get your patch accepted and we tend to simply drop such changes.
When you adjust your pull requests after review, please squash the commits so that we can review the full updated version more easily and keep history cleaner.
For example:
# Interactive rebase to let you squash/fixup commits
$ git rebase -i master
# Mark fixes-on-fixes commits as 'fixup' (or just 'f') in the
# first column. These will be silently integrated into the
# previous commit, so make sure to move the fixup-commit to
# the line beneath the parent commit.
# Since this probably rewrote the history of previously pushed
# commits you will need to make a force push, which is usually
# a bad idea but works good for pull requests.
$ git push --force origin your_feature_branch
A short guide to how to write good commit messages.
---- start ----
[area]: [short line describing the main effect] [(#issuenumber)]
-- empty line --
[full description, no wider than 72 columns that describe as much as
possible as to why this change is made, and possibly what things
it fixes and everything else that is related]
---- stop ----
Example:
cgrp: Restart query timer on all heartbeat failures (#10023)
If unhandled errors were received in HeartbeatResponse
the cgrp could get stuck in a state where it would not
refresh its coordinator.
Important: Rebase your PR branch on top of master (git rebase -i master
)
and squash interim commits (to make a clean and readable git history)
before pushing. Use force push to keep your history clean even after
the initial PR push.
Note: Good PRs with bad commit messages or messy commit history such as "fixed review comment", will be squashed up in to a single commit with a proper commit message.
If the changes in the PR affects the end user in any way, such as for a user visible bug fix, new feature, API or doc change, etc, a release changelog item needs to be added to CHANGELOG.md for the next release.
Add a single line to the appropriate section (Enhancements, Fixes, ..) outlining the change, an issue number (if any), and your name or GitHub user id for attribution.
E.g.:
## Enhancements
* Improve commit() async parameter documentation (Paul Nit, #123)
Note: The code format style is enforced by our clang-format and pep8 rules, so that is not covered here.
This is a mix of C89 and C99, to be compatible with old MSVC versions.
Notable, it is C99 with the following limitations:
- No variable declarations after statements.
- No in-line variable declarations.
Use self-explanatory hierarchical snake-case naming.
Pretty much all symbols should start with rd_kafka_
, followed by
their subsystem (e.g., cgrp
, broker
, buf
, etc..), followed by an
action (e.g, find
, get
, clear
, ..).
The exceptions are:
- Protocol requests and fields, use their Apache Kafka CamelCase names, .e.g:
rd_kafka_ProduceRequest()
andint16_t ErrorCode
. - Public APIs that closely mimic the Apache Kafka Java counterpart, e.g.,
the Admin API:
rd_kafka_DescribeConsumerGroups()
.
For existing types use the type prefix as variable name. The type prefix is typically the first part of struct member fields. Example:
rd_kafka_broker_t
has field names starting withrkb_..
, thus broker variable names should be namedrkb
Be consistent with using the same variable name for the same type throughout the code, it makes reading the code much easier as the type can be easily inferred from the variable.
For other types use reasonably concise but descriptive names.
i
and j
are typical int iterators.
Variables must be declared at the head of a scope, no in-line variable declarations after statements are allowed.
For internal functions assume that all function parameters are properly specified, there is no need to check arguments for non-NULL, etc. Any maluse internally is a bug, and not something we need to preemptively protect against - the test suites should cover most of the code anyway - so put your efforts there instead.
For arguments that may be NULL, i.e., optional arguments, we explicitlly document in the function docstring that the argument is optional (NULL), but there is no need to do this for non-optional arguments.
Use 8 spaces indent, no tabs, same as the Linux kernel.
In emacs, use c-set-style "linux
.
For C++, use Google's C++ style.
Fix formatting issues by running make style-fix-changed
prior to committing.
Use /* .. */
comments, not // ..
For functions, use doxygen syntax, e.g.:
/**
* @brief <short description>
* ..
* @returns <something..>
*/
Make sure to comment non-obvious code and situations where the full context of an operation is not easily graspable.
Also make sure to update existing comments when the code changes.
Try hard to keep line length below 80 characters, when this is not possible exceed it with reason.
Braces go on the same line as their enveloping statement:
int some_func (..) {
while (1) {
if (1) {
do something;
..
} else {
do something else;
..
}
}
/* Single line scopes should not have braces */
if (1)
hi();
else if (2)
/* Say hello */
hello();
else
bye();
All expression parentheses should be prefixed and suffixed with a single space:
int some_func (int a) {
if (1)
....;
for (i = 0 ; i < 19 ; i++) {
}
}
Use space around operators:
int a = 2;
if (b >= 3)
c += 2;
Except for these:
d++;
--e;
New blocks should be on a new line:
if (1)
new();
else
old();
Don't assume the reader knows C operator precedence by heart for complex statements, add parentheses to ease readability and make the intent clear.
Avoid ifdef's as much as possible. Platform support checking should be performed in configure.librdkafka.
Follow Google's C++ style guide