Server-sent events (HTTP) #86
Replies: 3 comments
-
Hi @derrickoswald, I agree, SSE makes a lot of sense. It's just not so common in public APIs, but if you are in control of the API implementation, it can definitely be a nice fit. Regarding how would you go about implementing a Kafka Connect source connector for it, it's not that different to any other source connector requiring a subscription to an external system. You already described the gist of it. Source Task lifecycle is the way it is: pull-based by continuously polling for new records. I'm afraid if you want to use Kafka Connect you can't fight this. One way to go about it is by creating a buffer where you accumulate all messages pushed by the server, and serve them in batches whenever Kafka Connect polls for them. The main challenges I see you would need to overcome with this approach come from the fact that this buffer is nothing but a finite shared state between 2 threads: Kafka Connect's and the server sending events. So:
I hope this helps. |
Beta Was this translation helpful? Give feedback.
-
I forgot to mention that although Kafka Connect is the way it is, and probably for a good reason, as this kind of generalized abstraction of a source connector enables pretty much all sorts of integration patterns, while still offering all of the benefits of Kafka Connect, such as managing the lifecycle of your running connectors, generalizing its configuration, deployment, etc. I'm sorry if I'm saying something too obvious, but you might be after something more specific, in which case you can just use the Kafka Producer client to push messages to Kafka straight from your own code as a result of every server sent event. Either way, I'm happy to help any way I can in your implementation. Best regards. |
Beta Was this translation helpful? Give feedback.
-
It looks like the way to go is a simple HTTP client feeding the Kafka Producer. |
Beta Was this translation helpful? Give feedback.
-
An alternative to Change Data Capture is to use server-sent events (SSE) where the endpoint has implemented support for it.
Just as an example:
curl --no-buffer --header "Accept:text/event-stream" https://cybre.space/api/v1/streaming/public
The naive approach of adding
http.request.params=Accept=text/event-stream
to the plugin properties for kafka-connect-http leads tofailed to poll records from SourceTask
as obviously theorg.apache.kafka.connect.source.SourceTask
is based on polling to request updates.I suppose one could collect the streamed events and return the collection when asked by
poll()
but this looks like theSourceTask
is just getting in the way, and a better solution would be to use aSourceConnector
where thetaskClass()
is just aTask
, i.e. a hypotheticalHttpTask
extendingTask
.Would Kafka still try to call
SourceTask
methods even though there is no override of thetaskClass
method inSourceConnector
?If not, how does that work, where the
poll()
method is called on aTask
instance?If so, is there a work-around that would force Kafka to only call
start()
andstop()
on a hypotheticalHttpTask
?Beta Was this translation helpful? Give feedback.
All reactions