From 3d99b2f9d6196bb9ad4d59b9eb6a3547d4356b2f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Wala?= Date: Tue, 23 Jul 2024 14:49:20 +0200 Subject: [PATCH 1/4] WIP: improvements to the introductory tutorials --- .../{introduction => advanced}/modifying.md | 4 +- guides/introduction/forwarding.md | 6 +- guides/introduction/intro.md | 2 +- guides/introduction/negotiation.md | 170 ++++++++++-------- mix.exs | 5 +- 5 files changed, 106 insertions(+), 81 deletions(-) rename guides/{introduction => advanced}/modifying.md (97%) diff --git a/guides/introduction/modifying.md b/guides/advanced/modifying.md similarity index 97% rename from guides/introduction/modifying.md rename to guides/advanced/modifying.md index a75cb93..da64d77 100644 --- a/guides/introduction/modifying.md +++ b/guides/advanced/modifying.md @@ -1,6 +1,6 @@ # Modifying the session -So far, we focused on forwarding the data back to the same peer. Usually, you want to connect with multiple peers, which means adding +In the introductory tutorials we focused on forwarding the data back to the same peer. Usually, you want to connect with multiple peers, which means adding more PeerConnection to the Elixir app, like in the diagram below. ```mermaid @@ -31,7 +31,7 @@ new negotiation has to take place! > > But what does that even mean? > Each transceiver is responsible for sending and/or receiving a single track. When you call `PeerConnection.add_track`, we actually look for a free transceiver -> (that is, one that is not sending a track already) and use it, or create a new transceiver if we don' find anything suitable. If you are very sure +> (that is, one that is not sending a track already) and use it, or create a new transceiver if we don't find anything suitable. If you are very sure > that the remote peer added _N_ new video tracks, you can add _N_ video transceivers (using `PeerConnection.add_transceiver`) and begin the negotiation as > the offerer. If you didn't add the transceivers, the tracks added by the remote peer (the answerer) would be ignored. diff --git a/guides/introduction/forwarding.md b/guides/introduction/forwarding.md index cf91675..f771acb 100644 --- a/guides/introduction/forwarding.md +++ b/guides/introduction/forwarding.md @@ -21,7 +21,7 @@ The `packet` is an RTP packet. It contains the media data alongside some other u > RTP is a network protocol created for carrying real-time data (like media) and is used by WebRTC. > It provides some useful features like: > -> * sequence numbers: UDP (which is usually used by WebRTC) does not provide ordering, thus we need this to catch missing or out-of-order packets +> * sequence numbers: UDP (which is usually used by WebRTC) does not provide packet ordering, thus we need this to catch missing or out-of-order packets > * timestamp: these can be used to correctly play the media back to the user (e.g. using the right framerate for the video) > * payload type: thanks to this combined with information in the SDP offer/answer, we can tell which codec is carried by this packet > @@ -63,8 +63,8 @@ defmodule Forwarder do {:ok, _sender} = PeerConnection.add_track(pc, audio_track) {:ok, _sender} = PeerConnection.add_track(pc, video_track) - # in_tracks (tracks we will receive media from) = %{id => kind} - # out_tracks (tracks we will send media to) = %{kind => id} + # in_tracks (tracks we will receive from the browser) = %{id => kind} + # out_tracks (tracks we will send to the browser) = %{kind => id} out_tracks = %{audio: audio_track.id, video: video_track.id} {:ok, %{pc: pc, out_tracks: out_tracks, in_tracks: %{}}} end diff --git a/guides/introduction/intro.md b/guides/introduction/intro.md index c28326b..9618da7 100644 --- a/guides/introduction/intro.md +++ b/guides/introduction/intro.md @@ -30,4 +30,4 @@ your web application. Here are some example use cases: In general, all of the use cases come down to getting media from one peer to another. In the case of Elixir WebRTC, one of the peers is usually a server, like your Phoenix app (although it doesn't have to - there's no concept of server/client in WebRTC, so you might as well connect two browsers or two Elixir peers). -This is what the next section of this tutorial series will focus on - we will try to get media from a web browser to a simple Elixir app. +This is what the next tutorials will focus on - we will try to get media from a web browser to a simple Elixir app. diff --git a/guides/introduction/negotiation.md b/guides/introduction/negotiation.md index 14156e6..df06f32 100644 --- a/guides/introduction/negotiation.md +++ b/guides/introduction/negotiation.md @@ -2,31 +2,29 @@ Before starting to send or receive media, you need to negotiate the WebRTC connection first, which comes down to: -* specifying to your WebRTC peer what you want to send and/or receive (like video or audio tracks) -* exchanging information necessary to establish a connection with the other WebRTC peer -* starting the data transmission. +1. Specifying to your WebRTC peer what you want to send and/or receive (like video or audio tracks). +2. Exchanging information necessary to establish a connection with the other WebRTC peer. +3. Starting the data transmission. We'll go through this process step-by-step. -## Web browser +> #### Code snippets {: .warning} +> These tutorials include code snippetes showing how your implementation _might_ look like. +> For comprehensive, working examples take a look at the [examples](https://github.com/elixir-webrtc/ex_webrtc/tree/master/examples) +> in the `ex_webrtc` repository. -Let's start from the web browse side of things. Let's say we want to send the video from your webcam and audio from your microphone to the Elixir app. +## Offer and answer exchange -Firstly, we'll create the `RTCPeerConnection` - this object represents a WebRTC connection with a remote peer. Further on, it will be our interface to all +Let's start from the web browser JavaScript code. We will try to send the video from your webcam and audio from your microphone to the Elixir app. + +Firstly, we'll create a new `RTCPeerConnection` - this object represents a WebRTC connection with a remote peer. Further on, it will be our interface to all of the WebRTC-related stuff. ```js -const opts = { iceServers: [{ urls: "stun:stun.l.google.com:19302" }] } -const pc = new RTCPeerConnection(opts) +// `iceServers` option will be explained at the end of this tutorial +const pc = new RTCPeerConnection({ iceServers: "stun:stun.l.google.com:19302" }) ``` -> #### ICE servers {: .info} -> Arguably, the most important configuration option of the `RTCPeerConnection` is the `iceServers`. -> It is a list of STUN/TURN servers that the PeerConnection will try to use. You can learn more about -> it in the [MDN docs](https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/RTCPeerConnection) but -> it boils down to the fact that lack of any STUN servers might cause you trouble connecting with other peers, so make sure -> something is there. - Next, we will obtain the media tracks from the webcam and microphone using `mediaDevices` JavaScript API. ```js @@ -51,38 +49,40 @@ const offer = await pc.createOffer(); await pc.setLocalDescription(offer); ``` -> #### Offers, answers, and SDP {: .info} -> Offers and answers contain information about your local `RTCPeerConnection`, like tracks, codecs, IP addresses, encryption fingerprints, and more. -> All of that is encoded in a text format called SDP. You, as the user, generally can very successfully use WebRTC without ever looking into what's in the SDP, +> #### Offers, answers and SDP {: .info} +> Offers and answers contain information about your local `RTCPeerConnection`, like added tracks, codecs, IP addresses, encryption fingerprints, and more. +> All of that is carried in a text format called SDP. One of the WebRTC peers has to create an offer, to which the other responds with an answer in order +> to negotiate the conditions of various aspects of media transmision. +> +> You, as the user, can very successfully use WebRTC without ever looking into what's in the SDP, > but if you wish to learn more, check out the [SDP Anatomy](https://webrtchacks.com/sdp-anatomy/) tutorial from _webrtcHacks_. Next, we need to pass the offer to the other peer - in our case, the Elixir app. The WebRTC standard does not specify how to do this. Here, we will just assume that the offer was sent to the Elixir app using some kind of WebSocket relay service that we previously connected to, but generally it -doesn't matter how you get the offer from the other peer. +doesn't matter how you get the offer from one peer to the other. ```js const json = JSON.stringify(offer); -webSocket.send(json); +webSocket.send_offer(json); ``` Let's handle the offer in the Elixir app next. -## Elixir app +> #### PeerConnection configuration {: .info} +> There is quite a lot of configuration options for the `ExWebRTC.PeerConnection`. +> You can find all of them in `ExWebRTC.PeerConnection.Configuration` module docs. For instance, all of the JavaScript `RTCPeerConnection` events +> like `track` or `icecandidate` in Elixir WebRTC are simply messages sent by the `ExWebRTC.PeerConnection` process sent to the process that +> called `ExWebRTC.PeerConnection.start_link/2` by default. This can be changed by using the `start_link(controlling_process: pid)` option! Before we do anything else, we need to set up the `PeerConnection`, similar to what we have done in the web browser. The main difference -between Elixir and JavaScript WebRTC API is that, in Elixir, `PeerConnection` is a process. Also, remember to set up the `ice_servers` option! +between Elixir and JavaScript WebRTC API is that, in Elixir, `PeerConnection` is a process. ```elixir # PeerConnection in Elixir WebRTC is a process! +# take a look at the very end of the tutorial to learn what `ice_servers` option is {:ok, pc} = ExWebRTC.PeerConnection.start_link(ice_servers: [%{urls: "stun:stun.l.google.com:19302"}]) ``` -> #### PeerConnection configuration {: .info} -> There is quite a lot of configuration options for the `ExWebRTC.PeerConnection`. -> You can find all of them in `ExWebRTC.PeerConnection.Configuration`. For instance, all of the JavaScript `RTCPeerConnection` events -> like `track` or `icecandidate` in Elixir WebRTC are simply messages sent by the `ExWebRTC.PeerConnection` process sent to the process that -> called `ExWebRTC.PeerConnection.start_link/2` by default. This can be changed by using the `start_link(controlling_process: pid)` option! - Then we can handle the SDP offer that was sent from the web browser. ```elixir @@ -115,75 +115,99 @@ Now we create the answer, set it, and send it back to the web browser. answer |> ExWebRTC.SessionDescription.to_json() |> Jason.encode!() -|> web_socket_send() +|> web_socket_send_answer() ``` -> #### PeerConnection can be bidirectional {: .tip} -> Here we have only shown you how to receive data from a browser in the Elixir app, but, of course, you -> can also send data from Elixir's `PeerConnection` to the browser. -> -> Just be aware of this for now, you will learn more about sending data using Elixir WebRTC in the next tutorial. - Now the `PeerConnection` process should send messages to its parent process announcing remote tracks - each of the messages maps to one of the tracks added on the JavaScript side. ```elixir receive do {:ex_webrtc, ^pc, {:track, %ExWebRTC.MediaStreamTrack{}}} -> - # we will learn what you can do with the track later + # we will learn what you can do with the track in the next tutorial end ``` -> #### ICE candidates {: .info} -> ICE candidates are, simplifying a bit, the IP addresses that PeerConnection will try to use to establish a connection with the other peer. -> A PeerConnection will produce a new ICE candidate every now and then, that candidate has to be sent to the other WebRTC peer -> (using any medium, i.e. the same WebSocket relay used for the offer/answer exchange, or some other way). -> -> In JavaScript: -> -> ```js -> pc.onicecandidate = event => webSocket.send(JSON.stringify(event.candidate)); -> webSocket.onmessage = candidate => pc.addIceCandidate(JSON.parse(candidate)); -> ``` -> -> And in Elixir: -> -> ```elixir -> receive do -> {:ex_webrtc, ^pc, {:ice_candidate, candidate}} -> -> candidate -> |> ExWebRTC.ICECandidate.to_json() -> |> Jason.encode!() -> |> web_socket_send() -> -> {:web_socket, {:ice_candidate, json}} -> -> candidate = -> json -> |> Jason.decode!() -> |> ExWebRTC.ICECandidate.from_json() +> #### PeerConnection can be bidirectional {: .tip} +> Here we have only shown you how to receive data from a browser in the Elixir app, but, of course, you +> can also send data from Elixir's `PeerConnection` to the browser. > -> ExWebRTC.PeerConnection.add_ice_candidate(pc, candidate) -> end -> ``` +> Just be aware of this for now, you will learn more about sending data using Elixir WebRTC in the next tutorial. + -Lastly, we need to set the answer on the JavaScript side. +Lastly, we need to set the answer in the web browser. ```js answer = JSON.parse(receive_answer()); await pc.setRemoteDescription(answer); ``` -The process of the offer/answer exchange is called _negotiation_. After negotiation has been completed, the connection between the peers can be established, and media -flow can start. - -You can determine that the connection was established by listening for `{:ex_webrtc, _from, {:connection_state_change, :connected}}` message -or by handling the `onconnectionstatechange` event on the JavaScript `RTCPeerConnection`. +The process of the offer/answer exchange is called _negotiation_. > #### Renegotiations {: .info} > We've just gone through the first negotiation, but you'll need to repeat the same steps after you added/removed tracks > to your `PeerConnection`. The need for renegotiation is signaled by the `negotiationneeded` event in JavaScript or by the > `{:ex_webrtc, _from, :negotiation_needed}` message in Elixir WebRTC. You will learn more about how to properly conduct -> a renegotiation with multiple PeerConnectins present in section TODO. +> a renegotiation with multiple PeerConnectins present in [Modifying the session](./../advanced/modifying.md) tutorial. + +## ICE and candidate exchange + +ICE is a protocol used by WebRTC to establish peer-to-peer connection. It works by exchanging something called _ICE candidates_ +between the peers using some kind of separate medium (similar to the offer/answer exchange). These candidates, simplifying a bit, contain IP addreses that other +peer will try to use to connect to your machine. ICE will try to find a pair of these addresse (one for each peer) and establish a connection. + +> #### Why candidates are not in the offer/answer? {: .info} +> ICE candidates can be included in the offer or the answer, but generally they are not - you send them separately. +> Gathering a candidate can be anywhere in-between of nearly instantenous, or taking up to a few seconds, depending on the type of the candidate. +> The PeerConnection will asynchornously produce the "quicker" candidates so you can send them to the other peer and try to establish +> a connection as quickly as possible. If any of the later candidates happens to be more suitable (or the previous did not succeed), PeerConnection will use it instead. + +The PeerConnection will gather these candidates, but it is your responsibility (similarly to offer/answer exchange, again) to send them to the other peer. + +In JavaScript: + +```js +// the end of candidates will be signalled by event.candidate === null +pc.onicecandidate = event => webSocket.send_ice_candidate(JSON.stringify(event.candidate)); +webSocket.onIceCandidate = candidate => pc.addIceCandidate(JSON.parse(candidate)); +``` + +And in Elixir: + +```elixir +receive do + {:ex_webrtc, ^pc, {:ice_candidate, candidate}} -> + candidate + |> ExWebRTC.ICECandidate.to_json() + |> Jason.encode!() + |> web_socket_send_ice_candidate() + + {:web_socket, {:ice_candidate, json}} -> + candidate = + json + |> Jason.decode!() + |> ExWebRTC.ICECandidate.from_json() + + ExWebRTC.PeerConnection.add_ice_candidate(pc, candidate) +end +``` + +After the candidate exchange, the connection should be eventually established and media will start to flow! +You can tell it by listening for `{:ex_webrtc, _from, {:connection_state_change, :connected}}` message +or by handling the `onconnectionstatechange` event on the JavaScript `RTCPeerConnection`. + +> #### ICE servers {: .info} +> Remember when we created the `RTCPeerConnection` object at the beginning of this tutorial? It was configured +with `iceServers` options: +> +> ```js +> const pc = new RTCPeerConnection({ iceServers: "stun:stun.l.google.com:19302" }) +> ``` +> +> It is a list of STUN/TURN servers that the PeerConnection will try to use. You can learn more about +> it in the [MDN docs](https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/RTCPeerConnection) but +> it boils down to the fact that lack of any STUN servers might cause you trouble connecting with other peers, so make sure +> there's at least one STUN server there. You can find a list of publicly available STUN servers online. You might be wondering how can you do something with the media data in the Elixir app. While in JavaScript API you are limited to e.g. attaching tracks to video elements on a web page, diff --git a/mix.exs b/mix.exs index 14e7e8e..b0a01d3 100644 --- a/mix.exs +++ b/mix.exs @@ -73,7 +73,8 @@ defmodule ExWebRTC.MixProject do end defp docs() do - intro_guides = ["intro", "negotiation", "forwarding", "consuming", "modifying"] + intro_guides = ["intro", "negotiation", "forwarding", "consuming"] + advanced_guides = ["modifying", "mastering_transceivers"] [ main: "readme", @@ -81,7 +82,7 @@ defmodule ExWebRTC.MixProject do extras: ["README.md"] ++ Enum.map(intro_guides, &"guides/introduction/#{&1}.md") ++ - Path.wildcard("guides/advanced/*.md"), + Enum.map(advanced_guides, &"guides/advanced/#{&1}.md"), source_ref: "v#{@version}", formatters: ["html"], before_closing_body_tag: &before_closing_body_tag/1, From f7f19d58f442e2e67da24348a35e866038e114a4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Wala?= Date: Tue, 23 Jul 2024 15:36:00 +0200 Subject: [PATCH 2/4] Remove redundant information from forwarding chapter --- guides/introduction/forwarding.md | 104 ++++++++---------------------- 1 file changed, 28 insertions(+), 76 deletions(-) diff --git a/guides/introduction/forwarding.md b/guides/introduction/forwarding.md index f771acb..916e613 100644 --- a/guides/introduction/forwarding.md +++ b/guides/introduction/forwarding.md @@ -39,37 +39,28 @@ flowchart LR WB((Web Browser)) <-.-> PC ``` -The only thing we have to implement is the `Forwarder` GenServer. Let's combine the ideas from the previous section to write it. +The only thing we have to implement is the `Forwarder` process. In practice, making it a `GenServer` would be probably the +easiest and that's what we are going to do here. Let's combine the ideas from the previous section to write it. ```elixir -defmodule Forwarder do - use GenServer - - alias ExWebRTC.{PeerConnection, ICEAgent, MediaStreamTrack, SessionDescription} - - @ice_servers [%{urls: "stun:stun.l.google.com:19302"}] - - @impl true - def init(_) do - {:ok, pc} = PeerConnection.start_link(ice_servers: @ice_servers) - - # we expect to receive two tracks from the web browser - one for audio, one for video - # so we also need to add two tracks here, we will use these to forward media - # from each of the web browser tracks - stream_id = MediaStreamTrack.generate_stream_id() - audio_track = MediaStreamTrack.new(:audio, [stream_id]) - video_track = MediaStreamTrack.new(:video, [stream_id]) - - {:ok, _sender} = PeerConnection.add_track(pc, audio_track) - {:ok, _sender} = PeerConnection.add_track(pc, video_track) - - # in_tracks (tracks we will receive from the browser) = %{id => kind} - # out_tracks (tracks we will send to the browser) = %{kind => id} - out_tracks = %{audio: audio_track.id, video: video_track.id} - {:ok, %{pc: pc, out_tracks: out_tracks, in_tracks: %{}}} - end - - # ... +def init(_) do + {:ok, pc} = PeerConnection.start_link(ice_servers: [%{urls: "stun:stun.l.google.com:19302"}]) + + # we expect to receive two tracks from the web browser - one for audio, one for video + # so we also need to add two tracks here, we will use these to forward media + # from each of the web browser tracks + stream_id = MediaStreamTrack.generate_stream_id() + audio_track = MediaStreamTrack.new(:audio, [stream_id]) + video_track = MediaStreamTrack.new(:video, [stream_id]) + + {:ok, _sender} = PeerConnection.add_track(pc, audio_track) + {:ok, _sender} = PeerConnection.add_track(pc, video_track) + + # in_tracks (tracks we will receive from the browser) = %{id => kind} + # out_tracks (tracks we will send to the browser) = %{kind => id} + in_tracks = %{} + out_tracks = %{audio: audio_track.id, video: video_track.id} + {:ok, %{pc: pc, out_tracks: out_tracks, in_tracks: in_tracks}} end ``` @@ -77,7 +68,7 @@ We started by creating the PeerConnection and adding two tracks (one for audio a Remember that these tracks will be used to *send* data to the web browser peer. Remote tracks (the ones we will set up on the JavaScript side, like in the previous tutorial) will arrive as messages after the negotiation is completed. -> #### Where are the tracks? {: .tip} +> #### What are the tracks? {: .tip} > In the context of Elixir WebRTC, a track is simply a _track id_, _ids_ of streams this track belongs to, and a _kind_ (audio/video). > We can either add tracks to the PeerConnection (these tracks will be used to *send* data when calling `PeerConnection.send_rtp/4` and > for each one of the tracks, the remote peer should fire the `track` event) @@ -96,34 +87,10 @@ will arrive as messages after the negotiation is completed. > > If you want to know more about transceivers, read the [Mastering Transceivers](https://hexdocs.pm/ex_webrtc/mastering_transceivers.html) guide. -Next, we need to take care of the offer/answer and ICE candidate exchange. As in the previous tutorial, we assume that there's some kind -of WebSocket relay service available that will forward our offer/answer/candidate messages to the web browser and back to us. - -```elixir -@impl true -def handle_info({:web_socket, {:offer, offer}}, state) do - :ok = PeerConnection.set_remote_description(state.pc, offer) - {:ok, answer} = PeerConnection.create_answer(state.pc) - :ok = PeerConnection.set_local_description(state.pc, answer) - - web_socket_send(answer) - {:noreply, state} -end +Next, we need to take care of the offer/answer and ICE candidate exchange. This can be done the exact same way as in the previous +tutorial, so we won't get into here. -@impl true -def handle_info({:web_socket, {:ice_candidate, cand}}, state) do - :ok = PeerConnection.add_ice_candidate(state.pc, cand) - {:noreply, state} -end - -@impl true -def handle_info({:ex_webrtc, _from, {:ice_candidate, cand}}, state) do - web_socket_send(cand) - {:noreply, state} -end -``` - -Now we can expect to receive messages with notifications about new remote tracks. +After the negotiation, we can expect to receive messages with notifications about new remote tracks. Let's handle these and match them with the tracks that we are going to send to. We need to be careful not to send packets from the audio track on a video track by mistake! @@ -154,28 +121,13 @@ end > change between two tracks, the payload types are dynamically assigned and may differ between RTP sessions), and some RTP header extensions. All of that is > done by Elixir WebRTC behind the scenes, but be aware - it is not as simple as forwarding the same piece of data! -Lastly, let's take care of the client-side code. It's nearly identical to what we have written in the previous tutorial. +Lastly, let's take care of the client-side code. It's nearly identical to what we have written in the previous tutorial, +except for the fact that we need to handle tracks added by the Elixir's PeerConnection. ```js -const localStream = await navigator.mediaDevices.getUserMedia({audio: true, video: true}); -const pc = new RTCPeerConnection({iceServers: [{urls: "stun:stun.l.google.com:19302"}]}); -localStream.getTracks().forEach(track => pc.addTrack(track, localStream)); - -// these will be the tracks that we added using `PeerConnection.add_track` +// these will be the tracks that we added using `PeerConnection.add_track` in Elixir +// but be careful! event for the same track, the ids might be different for each of the peers pc.ontrack = event => videoPlayer.srcObject = event.stream[0]; - -// sending/receiving the offer/answer/candidates to the other peer is your responsibility -pc.onicecandidate = event => send_to_other_peer(event.candidate); -on_cand_received(cand => pc.addIceCandidate(cand)); - -// remember that we set up the Elixir app to just handle the incoming offer -// so we need to generate and send it (and thus, start the negotiation) here -const offer = await pc.createOffer(); -await pc.setLocalDescription(offer) -send_offer_to_other_peer(offer); - -const answer = await receive_answer_from_other_peer(); -await pc.setRemoteDescription(answer); ``` And that's it! The other peer should be able to see and hear the echoed video and audio. From 5aafdf6f6f1e7b5002f6622ec7dd4fe10ed42df8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Wala?= Date: Wed, 24 Jul 2024 15:33:20 +0200 Subject: [PATCH 3/4] Add tutorial about consuming the media --- guides/introduction/consuming.md | 122 ++++++++++++++++++++++++++---- guides/introduction/forwarding.md | 2 - 2 files changed, 109 insertions(+), 15 deletions(-) diff --git a/guides/introduction/consuming.md b/guides/introduction/consuming.md index ee6ef03..7228f8d 100644 --- a/guides/introduction/consuming.md +++ b/guides/introduction/consuming.md @@ -1,27 +1,123 @@ # Consuming media data -Other than just forwarding, we probably would like to be able to use the media right in the Elixir app to -e..g feed it to a machine learning model or create a recording of a meeting. +Other than just forwarding, we would like to be able to use the media right in the Elixir app to e.g. +use it as a machine learning model input, or create a recording of a meeting. -In this tutorial, we are going to build on top of the simple app from the previous tutorial by, instead of just sending the packets back, depayloading and decoding -the media, using a machine learning model to somehow augment the video, encode and payload it back into RTP packets and only then send it to the web browser. +In this tutorial, we are going to depayload and decode received video data to use it for ML inference. -## Deplayloading RTP +## Depayloading RTP -We refer to the process of taking the media payload out of RTP packets as _depayloading_. +We refer to the process of getting the media payload out of RTP packets as _depayloading_. It may seem straightforward at first, +we just take the payload of the packets and we get a stream of media data. Sometimes it is that simple, like in the +case of Opus-encoded audio, where each of the RTP packets is, more or less, 20 milliseconds of audio, and that's it. > #### Codecs {: .info} -> A media codec is a program used to encode/decode digital video and audio streams. Codecs also compress the media data, +> A media codec is a program/technique used to encode/decode digital video and audio streams. Codecs also compress the media data, > otherwise, it would be too big to send over the network (bitrate of raw 24-bit color depth, FullHD, 60 fps video is about 3 Gbit/s!). > -> In WebRTC, most likely you will encounter VP8, H264 or AV1 video codecs and Opus audio codec. Codecs that will be used during the session are negotiated in -> the SDP offer/answer exchange. You can tell what codec is carried in an RTP packet by inspecting its payload type (`packet.payload_type`, -> a non-negative integer field) and match it with one of the codecs listed in this track's transceiver's `codecs` field (you have to find -> the `transceiver` by iterating over `PeerConnection.get_transceivers` as shown previously in this tutorial series). +> In WebRTC, most likely you will encounter VP8, H264 or AV1 video codecs and Opus audio codec. Codecs used during the session are negotiated in +> the SDP offer/answer exchange. You can tell what codec is carried in an RTP packet by inspecting its payload type (`payload_type` field in the case of Elixir WebRTC). +> This value should correspond to one of the codecs included in the SDP offer/answer. -_TBD_ +Unfortunately, in other cases, we need to do more work. In video, things are more complex: each video frame is usually split into multiple packets (and +we need complete frames, not some pieces of encoded video out of context), the video codec does not keep track of timestamps, and many other quirks. + +Elixir WebRTC provides depayloading utilities for some codecs (see the `ExWebRTC.RTP.` submodules). For instance, when receiving VP8 RTP packets, we could depayload +the video by doing: + +```elixir +def init(_) do + # ... + state = %{depayloader: ExWebRTC.Media.VP8.Depayloader.new()} + {:ok, state} +end + +def handle_info({:ex_webrtc, _from, {:rtp, _track_id, nil, packet}}, state) do + depayloader = + case ExWebRTC.RTP.VP8.Depayloader.write(state.depayloader, packet) do + {:ok, depayloader} -> depayloader + {:ok, frame, depayloader} -> + # we collected a whole frame (it is just a binary)! + # we will learn what to do with it in a moment + depayloader + end + + {:noreply, %{state | depayloader: depayloader}} +end +``` + +Every time we collect a whole video frame consisting of a bunch of RTP packets, the `VP8.Depayloader.write` returns it for further processing. + +> #### Codec configuration {: .warning} +> By default, `ExWebRTC.PeerConnection` will use a set of default codecs when negotiating the connection. In such case, you have to either: +> +> * support depayloading/decoding for all of the negotiated codecs +> * force some specific set of codecs (or even a single codec) in the `PeerConnection` configuration. +> +> Of course, the second option is much simpler, but it increases the risk of failing the negotiation, as the other peer might not support your codec of choice. +> If you still want to do it the simple way, set the codecs in `PeerConnection.start_link` +> ```elixir +> codec = %ExWebRTC.RTPCodecParameters{ +> payload_type: 96, +> mime_type: "video/VP8", +> clock_rate: 90_000 +> } +> {:ok, pc} = ExWebRTC.PeerConnection.start_link(video_codecs: [codec]) +> ``` +> This way, you either will always have to send/receive VP8 video codec, or you won't be able to negotiate a video stream at all. At least you won't encounter +> unpleasant bugs in video decoding! ## Decoding the media to raw format -_TBD_ +Before we use the video as an input to the machine learning model, we need to decode it into raw format. Video decoding or encoding is a very +complex and resource-heavy process, so we don't provide anything for that in Elixir WebRTC, but you can use the `xav` library, a simple wrapper over `ffmpeg`, +to decode the VP8 video. Let's modify the snippet from the previous section to do so. + +```elixir +def init(_) do + # ... + serving = # setup your machine learning model (i.e. using Bumblebee) + state = %{ + depayloader: ExWebRTC.Media.VP8.Depayloader.new(), + decoder: Xav.Decoder.new(:vp8), + serving: serving + } + {:ok, state} +end + +def handle_info({:ex_webrtc, _from, {:rtp, _track_id, nil, packet}}, state) do + depayloader = + with {:ok, frame, depayloader} <- ExWebRTC.RTP.VP8.Depayloader.write(state.depayloader, packet), + {:ok, raw_frame} <- Xav.Decoder.decode(state.decoder, frame) do + # raw frame is just a 3D matrix with the shape of resolution x colors (e.g 1920 x 1080 x 3 for FullHD, RGB frame) + # we can cast it to Elixir Nx tensor and use it as the machine learning model input + # machine learning stuff is out of scope of this tutorial, but you probably want to check out Elixir Nx and friends + tensor = Xav.Frame.to_nx(raw_frame) + prediction = Nx.Serving.run(state.serving, tensor) + # do something with the prediction + + depayloader + else + {:ok, depayloader} -> depayloader + {:error, _err} -> # handle the error + end + + {:noreply, %{state | depayloader: depayloader}} +end +``` + +We decoded the video and used it as an input of the machine learning model and got some kind of prediction - do whatever you want with it. + +> #### Jitter buffer {: .warning} +> Do you recall that WebRTC uses UDP under the hood, and UDP does not ensure packet ordering? We could ignore this fact when forwarding the packets (as +> it was not our job to decode/play/save the media), but now packets out of order can seriously mess up the process of decoding. +> To remedy this issue, something called _jitter buffer_ can be used. Its basic function +> is to delay/buffer incoming packets by some time, let's say 100 milliseconds, waiting for the packets that might be late. Only if the packets do not arrive after the +> additional 100 milliseconds, we count them as lost. To learn more about jitter buffer, read [this](https://bloggeek.me/webrtcglossary/jitter-buffer/). +> +> As of now, Elixir WebRTC does not provide a jitter buffer, so you either have to build something yourself or wish that such issues won't occur, but if anything +> is wrong with the decoded video, this might be the problem. + +This tutorial shows, more or less, what the [Recognizer](https://github.com/elixir-webrtc/apps/tree/master/recognizer) app does. Check it out, along with other +example apps in the [apps](https://github.com/elixir-webrtc/apps) repository, it's a great reference on how to implement fully-fledged apps based on Elixir WebRTC. diff --git a/guides/introduction/forwarding.md b/guides/introduction/forwarding.md index 916e613..8de12ad 100644 --- a/guides/introduction/forwarding.md +++ b/guides/introduction/forwarding.md @@ -95,7 +95,6 @@ Let's handle these and match them with the tracks that we are going to send to. We need to be careful not to send packets from the audio track on a video track by mistake! ```elixir -@impl true def handle_info({:ex_webrtc, _from, {:track, track}}, state) do state = put_in(state.in_tracks[track.id], track.kind) {:noreply, state} @@ -105,7 +104,6 @@ end We are ready to handle the incoming RTP packets! ```elixir -@impl true def handle_info({:ex_webrtc, _from, {:rtp, track_id, nil, packet}}, state) do kind = Map.fetch!(state.in_tracks, track_id) id = Map.fetch!(state.out_tracks, kind) From 35a81efee62669c262bffa47a811f6b3727ff5a9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Wala?= Date: Thu, 25 Jul 2024 12:04:46 +0200 Subject: [PATCH 4/4] Apply requested changes --- guides/introduction/consuming.md | 29 +++++++++++++++++++++-------- guides/introduction/forwarding.md | 4 ++-- guides/introduction/negotiation.md | 29 +++++++++++------------------ 3 files changed, 34 insertions(+), 28 deletions(-) diff --git a/guides/introduction/consuming.md b/guides/introduction/consuming.md index 7228f8d..aecf08e 100644 --- a/guides/introduction/consuming.md +++ b/guides/introduction/consuming.md @@ -3,13 +3,25 @@ Other than just forwarding, we would like to be able to use the media right in the Elixir app to e.g. use it as a machine learning model input, or create a recording of a meeting. -In this tutorial, we are going to depayload and decode received video data to use it for ML inference. +In this tutorial, we are going to learn how to use received media as input for ML inference. -## Depayloading RTP +## From raw media to RTP + +When the browser sends audio or video, it does the following things: + +1. Capturing the media from your peripheral devices, like a webcam or microphone. +2. Encoding the media, so it takes less space and uses less network bandwidth. +3. Packing it into a single or multiple RTP packets, depending on the media chunk (e.g., video frame) size. +4. Sending it to the other peer using WebRTC. -We refer to the process of getting the media payload out of RTP packets as _depayloading_. It may seem straightforward at first, -we just take the payload of the packets and we get a stream of media data. Sometimes it is that simple, like in the -case of Opus-encoded audio, where each of the RTP packets is, more or less, 20 milliseconds of audio, and that's it. +We have to reverse these steps in order to be able to use the media: + +1. We receive the media from WebRTC. +2. We unpack the encoded media from RTP packets. +3. We decode the media to a raw format. +4. We use the media however we like. + +We already know how to do step 1 from previous tutorials, and step 4 is completely up to the user, so let's go through steps 2 and 3 in the next sections. > #### Codecs {: .info} > A media codec is a program/technique used to encode/decode digital video and audio streams. Codecs also compress the media data, @@ -19,10 +31,11 @@ case of Opus-encoded audio, where each of the RTP packets is, more or less, 20 m > the SDP offer/answer exchange. You can tell what codec is carried in an RTP packet by inspecting its payload type (`payload_type` field in the case of Elixir WebRTC). > This value should correspond to one of the codecs included in the SDP offer/answer. -Unfortunately, in other cases, we need to do more work. In video, things are more complex: each video frame is usually split into multiple packets (and -we need complete frames, not some pieces of encoded video out of context), the video codec does not keep track of timestamps, and many other quirks. +## Depayloading RTP -Elixir WebRTC provides depayloading utilities for some codecs (see the `ExWebRTC.RTP.` submodules). For instance, when receiving VP8 RTP packets, we could depayload +We refer to the process of getting the media payload out of RTP packets as _depayloading_. Usually a single video frame is split into +multiple RTP packets, and in case of audio, each packet carries, more or less, 20 milliseconds of sound. Fortunately, you don't have to worry about this, +just use one of the depayloaders provided by Elixir WebRTC (see the `ExWebRTC.RTP.` submodules). For instance, when receiving VP8 RTP packets, we could depayload the video by doing: ```elixir diff --git a/guides/introduction/forwarding.md b/guides/introduction/forwarding.md index 8de12ad..00f76a4 100644 --- a/guides/introduction/forwarding.md +++ b/guides/introduction/forwarding.md @@ -46,8 +46,8 @@ easiest and that's what we are going to do here. Let's combine the ideas from th def init(_) do {:ok, pc} = PeerConnection.start_link(ice_servers: [%{urls: "stun:stun.l.google.com:19302"}]) - # we expect to receive two tracks from the web browser - one for audio, one for video - # so we also need to add two tracks here, we will use these to forward media + # we expect to receive two tracks from the web browser - audio and video + # so we also need to add two tracks here, we will use them to loop media back t othe browser # from each of the web browser tracks stream_id = MediaStreamTrack.generate_stream_id() audio_track = MediaStreamTrack.new(:audio, [stream_id]) diff --git a/guides/introduction/negotiation.md b/guides/introduction/negotiation.md index df06f32..f6ddbbd 100644 --- a/guides/introduction/negotiation.md +++ b/guides/introduction/negotiation.md @@ -49,13 +49,10 @@ const offer = await pc.createOffer(); await pc.setLocalDescription(offer); ``` -> #### Offers, answers and SDP {: .info} -> Offers and answers contain information about your local `RTCPeerConnection`, like added tracks, codecs, IP addresses, encryption fingerprints, and more. -> All of that is carried in a text format called SDP. One of the WebRTC peers has to create an offer, to which the other responds with an answer in order -> to negotiate the conditions of various aspects of media transmision. -> -> You, as the user, can very successfully use WebRTC without ever looking into what's in the SDP, -> but if you wish to learn more, check out the [SDP Anatomy](https://webrtchacks.com/sdp-anatomy/) tutorial from _webrtcHacks_. +Finally, we have to create and set an offer. +It will contain information on how many tracks we want to send, which codecs we want to use, whether we are willing to also receive something or not and so on. +The other side responds with an answer, which can either accept, reject or partially accept our offer (e.g. accept only audio tracks). +Both offer and answer are carried in a text format called SDP. You can read more about it in the [SDP Anatomy](https://webrtchacks.com/sdp-anatomy/) tutorial from _webrtcHacks_. Next, we need to pass the offer to the other peer - in our case, the Elixir app. The WebRTC standard does not specify how to do this. Here, we will just assume that the offer was sent to the Elixir app using some kind of WebSocket relay service that we previously connected to, but generally it @@ -142,13 +139,9 @@ answer = JSON.parse(receive_answer()); await pc.setRemoteDescription(answer); ``` -The process of the offer/answer exchange is called _negotiation_. - -> #### Renegotiations {: .info} -> We've just gone through the first negotiation, but you'll need to repeat the same steps after you added/removed tracks -> to your `PeerConnection`. The need for renegotiation is signaled by the `negotiationneeded` event in JavaScript or by the -> `{:ex_webrtc, _from, :negotiation_needed}` message in Elixir WebRTC. You will learn more about how to properly conduct -> a renegotiation with multiple PeerConnectins present in [Modifying the session](./../advanced/modifying.md) tutorial. +The process of the offer/answer exchange is called _negotiation_. Here, we've just presented the very first negotiation, but +the process has to be repeated every time tracks are added or removed. You can learn +about negotiation in more complex secenarios in [Modifying the session](./../advanced/modifying.md). ## ICE and candidate exchange @@ -204,10 +197,10 @@ with `iceServers` options: > const pc = new RTCPeerConnection({ iceServers: "stun:stun.l.google.com:19302" }) > ``` > -> It is a list of STUN/TURN servers that the PeerConnection will try to use. You can learn more about -> it in the [MDN docs](https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/RTCPeerConnection) but -> it boils down to the fact that lack of any STUN servers might cause you trouble connecting with other peers, so make sure -> there's at least one STUN server there. You can find a list of publicly available STUN servers online. +> It is a list of STUN/TURN servers that the PeerConnection will try to use. +> These are used by the PeerConnection to generate more ICE candidates with different types, which vastly +> increases chances of establishing a connection between peers in some specific cases. +> You can read more about it in the [MDN docs](https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/RTCPeerConnection). You might be wondering how can you do something with the media data in the Elixir app. While in JavaScript API you are limited to e.g. attaching tracks to video elements on a web page,