Skip to content
Michael Thomson edited this page May 21, 2020 · 57 revisions

Here I'll probably put info before I format it properly into their own pages so you guys have something to work with way before "it looks nice" :) ofc feel free to add stuff here too, formatting doesn't matter as long as it's clear what you mean!


Protocol overview

(C = client, S = server/Sonoff camera)

C => broadcast to local LAN on port 32108
S => replies to the client’s originating port from the (new?) port its listening on with a cookie
C => returns the cookie to the server’s listening port directly
S => replies with a slightly modified version of the cookie indicating success
C => sends a ‘client initialising’ packet - client command index reset to 0
S => sends acknowledgement
S => sends a ‘server initialising’ packet - server command index reset to 0
C => sends acknowledgment

From this point on if there’s a gap of > 1250 msec a “Keepalive” exchange is triggered:

S => sends “Marco”
C => replies with “Polo”
C => sends “Marco”
S => replies with "Polo

This will keep the channel open until the client sends a command which the server will acknowledge as above, with the command index incrementing by one each time.

This works even if the camera is isolated from the WAN (it still makes DNS requests of the local DHCP-supplied resolver), and the RTSP streams can be opened while the camera motion is triggered as above.

I have not yet figured out if it’s possible to enable the RTSP stream / set the password without using the app however, but hopefully others might be interested and can help take this reverse engineering further?

Channel discovery:

The camera will respond to a UDP broadcast on the local LAN with a payload of 4 bytes (f1 30 00 00) sent to port 32108. The response will be sent as a unicast to the source port of the machine that sent the broadcast.

The source port in the initial response is to be used for further UDP commands. It looks like the initial response is a cookie that the client sends back, and the source port of that message is used as the reply port by the server.

E.g. If the camera IP were on 192.168.0.129 and your test machine on 192.168.0.50 then:

Source IP.      Source Port    Dest.IP.        Dest.Port      Payload          Means
192.168.0.50  : (aaaaa)     -> 192.168.0.255 : 32108 (fixed)  (f1 30 00 00)    What's your port?
192.168.0.129 : server_port -> 192.168.0.50. : (aaaaa)        (f1 41 00 ..)    Talk to me on "server_port", here's a cookie
...
192.168.0.50  : client_port -> 192.168.0.255 : server_port    (f1 41 00 ..)    Talk to me on "client_port", here's your cookie back
192.168.0.129 : server_port -> 192.168.0.50. : client_port    (f1 42 00 ..)    Channel established?

aaaaa: Random high port >1024 chosen at runtime 32108: Fixed port for broadcast to the network on which the camera is always listening server_port: Random high port >1024 chosen at runtime by the camera for ongoing communication client_port: Random high port >1024 chosen at runtime by the client, can be the same as aaaaa above.

Cookie format:

The payload of the cookie contains a 4-byte header then the encoded serial number of the camera as follows:

Byte offset Value Purpose(?)
0 - 3 F1 41 00 14 Constant - present in all cookies
4 - 7 45 57 4c 4b "EWLK" - first third of the serial number
8 - 15 xx xx xx xx xx xx xx xx Hexadecimal representation of the numeric in the serial number
16 - 20 xx xx xx xx xx ASCII representation of the alphanumeric suffix in the serial number
21 - 23 00 00 00 Constant

So my camera with serial number "EWLK-057746-HRWYT" is represented by this cookie: f1 41 00 14 45 57 4c 4b 00 00 00 00 00 00 e1 92 48 52 57 59 54 00 00 00

Command protocol:

Seems to follow this pattern, but some messages are repeated (and acknowledged as repeats in the acknowledgement).

  1. Command sent
  2. Acknowledgement of command received
  3. Response sent
  4. Acknowledgement of response received

Payload structure:

Messages are variable length, both individually and when concatenated into larger payloads.

Message header

Byte offset Value Purpose(?)
0 F1 Constant - present in all commands / acks / responses
1 D0 Message sent
2 xx Message length MSB
3 xx Message length LSB

Channel identification and message sequence number

Byte offset Value Purpose(?)
4 D1 Constant
5 00 / 02 Constant; 00 = Commands, 02 = Video frames
6 xx Message index (MSB) N.B. Server and client both maintain separate counters
7 xx Message index (LSB)

Command header

Byte offset Value Purpose(?)
8 88 Constant
9 88 Constant
10 76 Constant
11 76 Constant

Command payload

Byte offset Value Purpose(?)
12 xx 04 or 08, varies by command
13 00 Constant so far
14 00 Constant so far
15 00 Constant so far
16 xx Varies by command
17 xx Varies by command
18-31 00 Constant so far
32 xx Varies by command
33 xx Varies by command
34 00 Constant so far
35 00 Constant so far
36 xx Varies by command

Individual commands can be longer than this one. Subsequent commands can be appended in a single message, simply concatenating another command header 88 88 76 76 and then the next variable-length command payload. One, two and four commands have been seen in a single message.

Commands understood so far

(Byte offsets assume a single command per message)

Pan/Tilt Byte 12 16 17 32 33
Tilt up 08 01 10 02 08
Tilt down 08 01 10 01 08
Pan left 08 01 10 06 08
Pan right 08 01 10 03 08
Video stream Byte 12 16 32
Start SD 640x360 14 01 02
Start HD 1920x1080 14 01 01
Stop stream 14 03 00

(The video resolution is also passed back in the message from the camera to announce the video stream, after the start request is received from the client)

Sound Monitor Byte 12 16 17
On 04 04 00
Off 04 05 00
Microphone Gain Byte 12 16 17 36
Low 0E 1A 81 28
Medium 0E 1A 81 4B
High 0E 1A 81 55
Speaker Volume Byte 12 16 17 36
Low 08 1A 81 28
Medium 08 1A 81 4B
High 08 1A 81 55
Motion Detection Byte 12 16 17 32 33
Low 08 24 03 xx 19
Medium 08 24 03 xx 32
High 08 24 03 xx 4B
Alarm on 08 24 03 00 xx
Alarm off 08 24 03 01 xx
Microphone Sensitivity Byte 12 16 17 36
Indoor (50Hz) 08 60 03 01
Outdoor (60Hz) 08 60 03 00
Picture Orientation Byte 12 16 17 36
Normal 08 70 03 00
Rotated 08 70 03 03

Less well understood messages

Client setup

(Sent immediately after channel setup and before video feed begins) Bytes 07-08 are set to 00 00 - resetting command counter?

0000  F1 D0 00 64 D1 00 00 00 88 88 76 76 48 00 00 00  ...d......vvH...
0010  10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0020  31 32 33 34 35 36 37 38 00 00 00 00 00 00 00 00  12345678........
0030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0040  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0060  00 00 00 00 00 00 00 00                          ........

Camera setup(?) sent by camera after client init above is acknowledged

0000  F1 D0 00 28 D1 00 00 00 88 88 76 76 0C 00 00 00  ...(......vv....
0010  10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0020  00 00 00 00 00 00 00 00 07 00 00 01              ............

Audio stream (from camera to client)

| Audio "get" control | Byte 12 | 16 | |---|---|---|---|---| | Start | 04 | 04 | | Stop | 04 | 05 |

Header f1 d0 01 24 d1 02 xx xx followed by a (fixed?) 288 byte payload. Payload encoding to be determined....

Audio stream (from client to camera)

| Audio "send" control | Byte 12 | 16 | 36 | |---|---|---|---|---|---| | Start | 10 | 06 | 01 | | Stop | 04 | 07 | 00 |

Audio frames (Client to camera)

Header f1 d0 01 54 d1 01 xx xx f2 0f 86 19 40 01 00 00 00 00 00 00 00 00 00 00 followed by a (fixed?) 340 byte payload.

Audio data is encoded as 8kHz, mono, A-law

e.g.

00000000:  f1 d0 01 54 d1 01 00 6f  f2 0f 86 19 40 01 00 00  |...T...o....@...|
00000010:  00 00 00 00 00 00 00 00  a4 82 34 27 25 03 b4 a4  |..........4'%...|
00000020:  a5 86 37 27 3a 1a b1 a7  bb 90 3d 26 3b 65 be a1  |..7':.....=&;e..|
00000030:  b9 7a 25 20 3f 9e a7 a3  b2 0c 21 20 37 b7 a0 a1  |.z% ?.....! 7...|
00000040:  82 3d 23 27 16 b8 a3 ba  73 25 20 3f 9d a4 a1 b2  |.=#'....s% ?....|
00000050:  04 24 26 31 87 a5 a7 b7  04 3b 25 37 9b b9 a5 b7  |.$&1.....;%7....|
00000060:  1f 3e 3a 37 9c bf ba b4  1f 3e 3a 34 98 b9 ba b5  |.>:7.....>:4....|
00000070:  07 38 3a 08 83 bb bb 8d  09 3a 38 05 b7 a5 be e9  |.8:......:8.....|
00000080:  33 24 3d f5 bf a4 b3 17  38 27 30 9e bb a7 b1 1a  |3$=.....8'0.....|
00000090:  3a 27 36 85 bb a4 b6 1a  3a 27 31 9b ba a7 b0 1b  |:'6.....:'1.....|
000000a0:  25 26 33 9a a4 a1 b2 04  24 20 33 81 a7 a0 b0 0c  |%&3.....$ 3.....|
000000b0:  26 20 37 b5 a1 a1 89 30  20 27 04 bf a3 a4 e3 3b  |& 7....0 '.....;|
000000c0:  23 3b ec a4 a3 bf 04 27  20 30 8f a6 a6 b5 35 26  |#;.....' 0....5&|
000000d0:  27 02 b7 a7 ba 9b 36 24  3e 17 b1 ba bd e3 36 38  |'.....6$>.....68|
000000e0:  30 61 b4 bf b1 e9 09 32  31 12 83 b3 b0 84 07 33  |0a.....21......3|
000000f0:  32 03 84 bd bf 8c 01 3e  38 02 8e a5 bb 84 36 27  |2......>8.....6'|
00000100:  3b 6b bf a1 b9 67 25 20  3d 85 a7 a1 b1 09 26 21  |;k...g% =.....&!|
00000110:  35 b4 a1 a6 8d 31 21 24  01 b0 a6 a5 84 31 27 3a  |5....1!$.....1':|
00000120:  04 b4 a5 b8 86 08 3b 39  00 8d be be 82 01 3f 3e  |......;9......?>|
00000130:  0d 86 bf be 8d 03 39 39  00 8e bb b9 85 34 25 3e  |......99.....4%>|
00000140:  16 b1 a5 bd cd 33 3a 30  e6 b2 b8 b4 14 32 3f 08  |.....3:0.....2?.|
00000150:  9c b3 bc 8d 05 32 3d 00                           |.....2=.|

...etc.

Video stream announcement (from camera to client)

00000000:  f1 d0 00 34 d1 00 00 15  88 88 76 76 18 00 00 00  |...4......vv....|
00000010:  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020:  03 00 00 00 80 02 00 00  68 01 00 00 52 00 00 00  |........h...R...|
00000030:  00 00 00 00 00 00 00 00                           |........|

Bytes 36 & 37 are the horizontal resolution (little endian)

  • 0x80 0x02 = (16 * 8 + 256 * 2) = 640 (SD stream)
  • 0x80 0x07 = (16 * 8 + 256 * 7) = 1920 (HD Stream)

Bytes 40 & 41 are the vertical resolution (little endian)

  • 0x68 0x01 = (16 * 6 + 8 + 256 * 1) = 360 (SD stream)
  • 0x38 0x04 = (16 * 3 + 8 + 256 * 4) = 1020 (HD stream)

Video stream stopped (from camera to client)

00000000:  f1 d0 00 20 d1 00 00 19  88 88 76 76 04 00 00 00  |... ......vv....|
00000010:  13 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020:  00 00 00 00                                       |....|

Acknowledgments

Send back to confirm receipt of commands, video/audio data or responses to commands. Minimum payload of 18 bytes, padded with zeroes if there are less than five acks in the message, but can be longer. Maximum payload seen for multiple video frames (84 bytes) included 38 message acknowledgements.

Byte offset Value Purpose(?)
0 F1 Constant - present in all commands / acks / responses
1 D1 Message acknowledgment?
2 xx Message Length MSB
3 xx Message Length LSB, counting from after this byte
4 D1 Constant
5 xx 00 for command acknowledgement, 02 for video frames
6 xx Number of acks in this message (MSB)
7 xx Number of acts in this message (LSB)
8 xx Ack #1 MSB
9 xx Ack #1 LSB
10 xx Ack #2 MSB (or 00)
11 xx Ack #2 LSB (or 00)
12 xx Ack #3 MSB (or 00)
13 xx Ack #3 LSB (or 00)
14 xx Ack #4 MSB (or 00)
15 xx Ack #4 LSB (or 00)
16 xx Ack #5 MSB (or 00)
17 xx Ack #5 LSB (or 00)
18 xx Ack #6 MSB (optional)
19 xx Ack #6 LSB (optional)
20 xx Ack #7 MSB (optional)
21 xx Ack #7 LSB (optional)
.. xx etc to a maximum offset of 83?

Keepalives

Sent back and forth while no other traffic is flowing e.g. video stream. Timeout seems to be around 1-2 seconds of silence before this is initiated.

Either side (client or server) starts the keepalive exchange with a message containing a 4 byte payload:F1 E0 00 00. The other side responds with F1 E1 00 00.

Teardown

If the server doesn't want to talk to the client any more, it sends a 4-byte payload: F1 F0 00 00

Video frames

Video is raw h.264 data, encoded in variable lengths from byte offset 8 onwards in the message. Concatenating these payloads results in a file that's playable with ffmpeg or vlc using the command line -demux h264 option. Video frames are distinguished as channel D1 02 and have a distinct message sequence counter from the other D1 00 commands sent from the camera.

First video frames received:

0000  F1 D0 04 04 D1 02 00 01 F1 0F 86 19 63 4B 00 00  ............cK..
0010  F4 E3 81 00 01 00 00 00 61 66 0A 00 01 00 00 00  ........af......
0020  03 00 00 00 00 00 00 00 00 00 00 01 47 4D 40 1E  ............GM@.
0030  99 A0 28 0B FE 58 40 00 00 FA 00 00 13 88 21 00  ..(..X@.......!.
0040  00 00 01 48 EE 3C 80 00 00 00 01 45 B8 02 08 FF  ...H.<.....E....
<snipped>
0000  F1 D0 04 04 D1 02 00 02 D9 A0 7A 96 74 7D 5C CB  ..........z.t}\.
0010  D5 B5 40 90 54 B1 29 21 73 C2 AC 48 96 10 5A 91  ..@.T.)!s..H..Z.
0020  5D E0 51 AC 61 F6 2E 91 E7 25 39 A4 A1 2A 66 74  ].Q.a....%9..*ft
0030  40 1A 77 1C 47 27 4C 19 76 AA 14 A7 49 F8 24 CB  @.w.G'L.v...I.$.
0040  93 F5 D6 4C B5 FF 92 21 48 FE C5 C1 65 CD 08 AB  ...L...!H...e...
<snipped>
0000  F1 D0 04 04 D1 02 00 03 C7 6B FB 9B D9 77 C3 0F  .........k...w..
0010  E4 E7 54 EB FA 0B 31 C8 3C 29 20 8A D1 29 01 08  ..T...1.<) ..)..
0020  FB 50 0E 04 95 F0 2E F0 8B 81 1E 26 81 5D EF 36  .P.........&.].6
0030  D2 91 5E 0B B9 95 BF 9A AD 71 B5 52 F4 C8 B1 23  ..^......q.R...#
0040  47 37 E6 49 F6 74 59 AA F0 7A 35 6B 0F DF B6 B8  G7.I.tY..z5k....
<snipped>

Video frames acknowledgement Includes frame count (00 11) at bytes 07/08, and individual message numbers in byte pairs from 09/10 onwards

0000  F1 D1 00 26 D1 02 00 11 00 01 00 02 00 03 00 04  ...&............
0010  00 05 00 06 00 07 00 08 00 09 00 0A 00 0B 00 0C  ................
0020  00 0D 00 0E 00 0F 00 10 00 11                    ..........

Includes frame count (00 02) at bytes 07/08, and individual message numbers in byte pairs from 09/10 onwards

0000  F1 D1 00 08 D1 02 00 02 00 12 00 13 00 00 00 00  ................
0010  00 00                                            ..

Audio frames (Client to camera)

Header f1 d0 01 54 d1 01 xx xx f2 0f 86 19 40 01 00 00 00 00 00 00 00 00 00 00 followed by a (fixed? 340 byte payload - possibly simple 8-bit WAV?)

e.g.

00000000:  f1 d0 01 54 d1 01 00 6f  f2 0f 86 19 40 01 00 00  |...T...o....@...|
00000010:  00 00 00 00 00 00 00 00  a4 82 34 27 25 03 b4 a4  |..........4'%...|
00000020:  a5 86 37 27 3a 1a b1 a7  bb 90 3d 26 3b 65 be a1  |..7':.....=&;e..|
00000030:  b9 7a 25 20 3f 9e a7 a3  b2 0c 21 20 37 b7 a0 a1  |.z% ?.....! 7...|
00000040:  82 3d 23 27 16 b8 a3 ba  73 25 20 3f 9d a4 a1 b2  |.=#'....s% ?....|
00000050:  04 24 26 31 87 a5 a7 b7  04 3b 25 37 9b b9 a5 b7  |.$&1.....;%7....|
00000060:  1f 3e 3a 37 9c bf ba b4  1f 3e 3a 34 98 b9 ba b5  |.>:7.....>:4....|
00000070:  07 38 3a 08 83 bb bb 8d  09 3a 38 05 b7 a5 be e9  |.8:......:8.....|
00000080:  33 24 3d f5 bf a4 b3 17  38 27 30 9e bb a7 b1 1a  |3$=.....8'0.....|
00000090:  3a 27 36 85 bb a4 b6 1a  3a 27 31 9b ba a7 b0 1b  |:'6.....:'1.....|
000000a0:  25 26 33 9a a4 a1 b2 04  24 20 33 81 a7 a0 b0 0c  |%&3.....$ 3.....|
000000b0:  26 20 37 b5 a1 a1 89 30  20 27 04 bf a3 a4 e3 3b  |& 7....0 '.....;|
000000c0:  23 3b ec a4 a3 bf 04 27  20 30 8f a6 a6 b5 35 26  |#;.....' 0....5&|
000000d0:  27 02 b7 a7 ba 9b 36 24  3e 17 b1 ba bd e3 36 38  |'.....6$>.....68|
000000e0:  30 61 b4 bf b1 e9 09 32  31 12 83 b3 b0 84 07 33  |0a.....21......3|
000000f0:  32 03 84 bd bf 8c 01 3e  38 02 8e a5 bb 84 36 27  |2......>8.....6'|
00000100:  3b 6b bf a1 b9 67 25 20  3d 85 a7 a1 b1 09 26 21  |;k...g% =.....&!|
00000110:  35 b4 a1 a6 8d 31 21 24  01 b0 a6 a5 84 31 27 3a  |5....1!$.....1':|
00000120:  04 b4 a5 b8 86 08 3b 39  00 8d be be 82 01 3f 3e  |......;9......?>|
00000130:  0d 86 bf be 8d 03 39 39  00 8e bb b9 85 34 25 3e  |......99.....4%>|
00000140:  16 b1 a5 bd cd 33 3a 30  e6 b2 b8 b4 14 32 3f 08  |.....3:0.....2?.|
00000150:  9c b3 bc 8d 05 32 3d 00                           |.....2=.|
Clone this wiki locally