-
Notifications
You must be signed in to change notification settings - Fork 9
Infodump
Here I'll probably put info before I format it properly into their own pages so you guys have something to work with way before "it looks nice" :) ofc feel free to add stuff here too, formatting doesn't matter as long as it's clear what you mean!
(C = client, S = server/Sonoff camera)
C => broadcast to local LAN on port 32108
S => replies to the client’s originating port from the (new?) port its listening on with a cookie
C => returns the cookie to the server’s listening port directly
S => replies with a slightly modified version of the cookie indicating success
C => sends a ‘client initialising’ packet - client command index reset to 0
S => sends acknowledgement
S => sends a ‘server initialising’ packet - server command index reset to 0
C => sends acknowledgment
From this point on if there’s a gap of > 1250 msec a “Keepalive” exchange is triggered:
S => sends “Marco”
C => replies with “Polo”
C => sends “Marco”
S => replies with "Polo
This will keep the channel open until the client sends a command which the server will acknowledge as above, with the command index incrementing by one each time.
This works even if the camera is isolated from the WAN (it still makes DNS requests of the local DHCP-supplied resolver), and the RTSP streams can be opened while the camera motion is triggered as above.
I have not yet figured out if it’s possible to enable the RTSP stream / set the password without using the app however, but hopefully others might be interested and can help take this reverse engineering further?
The camera will respond to a UDP broadcast on the local LAN with a payload of 4 bytes (f1 30 00 00
) sent to port 32108
. The response will be sent as a unicast to the source port of the machine that sent the broadcast.
The source port in the initial response is to be used for further UDP commands. It looks like the initial response is a cookie that the client sends back, and the source port of that message is used as the reply port by the server.
E.g. If the camera IP were on 192.168.0.129 and your test machine on 192.168.0.50 then:
Source IP. Source Port Dest.IP. Dest.Port Payload Means
192.168.0.50 : (aaaaa) -> 192.168.0.255 : 32108 (fixed) (f1 30 00 00) What's your port?
192.168.0.129 : server_port -> 192.168.0.50. : (aaaaa) (f1 41 00 ..) Talk to me on "server_port", here's a cookie
...
192.168.0.50 : client_port -> 192.168.0.255 : server_port (f1 41 00 ..) Talk to me on "client_port", here's your cookie back
192.168.0.129 : server_port -> 192.168.0.50. : client_port (f1 42 00 ..) Channel established?
aaaaa
: Random high port >1024 chosen at runtime
32108
: Fixed port for broadcast to the network on which the camera is always listening
server_port
: Random high port >1024 chosen at runtime by the camera for ongoing communication
client_port
: Random high port >1024 chosen at runtime by the client, can be the same as aaaaa
above.
The payload of the cookie contains a 4-byte header then the encoded serial number of the camera as follows:
Byte offset | Value | Purpose(?) |
---|---|---|
0 - 3 | F1 41 00 14 | Constant - present in all cookies |
4 - 7 | 45 57 4c 4b | "EWLK" - first third of the serial number |
8 - 15 | xx xx xx xx xx xx xx xx | Hexadecimal representation of the numeric in the serial number |
16 - 20 | xx xx xx xx xx | ASCII representation of the alphanumeric suffix in the serial number |
21 - 23 | 00 00 00 | Constant |
So my camera with serial number "EWLK-057746-HRWYT" is represented by this cookie:
f1 41 00 14 45 57 4c 4b 00 00 00 00 00 00 e1 92 48 52 57 59 54 00 00 00
Seems to follow this pattern, but some messages are repeated (and acknowledged as repeats in the acknowledgement).
- Command sent
- Acknowledgement of command received
- Response sent
- Acknowledgement of response received
Messages are variable length, both individually and when concatenated into larger payloads.
Byte offset | Value | Purpose(?) |
---|---|---|
0 | F1 | Constant - present in all commands / acks / responses |
1 | D0 | Message sent |
2 | xx | Message length MSB |
3 | xx | Message length LSB |
Byte offset | Value | Purpose(?) |
---|---|---|
4 | D1 | Constant |
5 | 00 / 02 | Constant; 00 = Commands, 02 = Video frames |
6 | xx | Message index (MSB) N.B. Server and client both maintain separate counters |
7 | xx | Message index (LSB) |
Byte offset | Value | Purpose(?) |
---|---|---|
8 | 88 | Constant |
9 | 88 | Constant |
10 | 76 | Constant |
11 | 76 | Constant |
Byte offset | Value | Purpose(?) |
---|---|---|
12 | xx | 04 or 08, varies by command |
13 | 00 | Constant so far |
14 | 00 | Constant so far |
15 | 00 | Constant so far |
16 | xx | Varies by command |
17 | xx | Varies by command |
18-31 | 00 | Constant so far |
32 | xx | Varies by command |
33 | xx | Varies by command |
34 | 00 | Constant so far |
35 | 00 | Constant so far |
36 | xx | Varies by command |
Individual commands can be longer than this one.
Subsequent commands can be appended in a single message, simply concatenating another command header 88 88 76 76
and then the next variable-length command payload. One, two and four commands have been seen in a single message.
(Byte offsets assume a single command per message)
Pan/Tilt | Byte 12 | 16 | 17 | 32 | 33 |
---|---|---|---|---|---|
Tilt up | 08 | 01 | 10 | 02 | 08 |
Tilt down | 08 | 01 | 10 | 01 | 08 |
Pan left | 08 | 01 | 10 | 06 | 08 |
Pan right | 08 | 01 | 10 | 03 | 08 |
Video stream | Byte 12 | 16 | 32 |
---|---|---|---|
Start SD 640x360 | 14 | 01 | 02 |
Start HD 1920x1080 | 14 | 01 | 01 |
Stop stream | 14 | 03 | 00 |
(The video resolution is also passed back in the message from the camera to announce the video stream, after the start request is received from the client)
Sound Monitor | Byte 12 | 16 | 17 |
---|---|---|---|
On | 04 | 04 | 00 |
Off | 04 | 05 | 00 |
Microphone Gain | Byte 12 | 16 | 17 | 36 |
---|---|---|---|---|
Low | 0E | 1A | 81 | 28 |
Medium | 0E | 1A | 81 | 4B |
High | 0E | 1A | 81 | 55 |
Speaker Volume | Byte 12 | 16 | 17 | 36 |
---|---|---|---|---|
Low | 08 | 1A | 81 | 28 |
Medium | 08 | 1A | 81 | 4B |
High | 08 | 1A | 81 | 55 |
Motion Detection | Byte 12 | 16 | 17 | 32 | 33 |
---|---|---|---|---|---|
Low | 08 | 24 | 03 | xx | 19 |
Medium | 08 | 24 | 03 | xx | 32 |
High | 08 | 24 | 03 | xx | 4B |
Alarm on | 08 | 24 | 03 | 00 | xx |
Alarm off | 08 | 24 | 03 | 01 | xx |
Microphone Sensitivity | Byte 12 | 16 | 17 | 36 |
---|---|---|---|---|
Indoor (50Hz) | 08 | 60 | 03 | 01 |
Outdoor (60Hz) | 08 | 60 | 03 | 00 |
Picture Orientation | Byte 12 | 16 | 17 | 36 |
---|---|---|---|---|
Normal | 08 | 70 | 03 | 00 |
Rotated | 08 | 70 | 03 | 03 |
(Sent immediately after channel setup and before video feed begins) Bytes 07-08 are set to 00 00 - resetting command counter?
0000 F1 D0 00 64 D1 00 00 00 88 88 76 76 48 00 00 00 ...d......vvH...
0010 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0020 31 32 33 34 35 36 37 38 00 00 00 00 00 00 00 00 12345678........
0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0060 00 00 00 00 00 00 00 00 ........
Camera setup(?) sent by camera after client init above is acknowledged
0000 F1 D0 00 28 D1 00 00 00 88 88 76 76 0C 00 00 00 ...(......vv....
0010 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0020 00 00 00 00 00 00 00 00 07 00 00 01 ............
Get Audio | Byte 12 | 16 |
---|---|---|
Start | 04 | 04 |
Stop | 04 | 05 |
Format is 8kHz, mono, A-law encoded, with some mystery data on the front...
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 32 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50...
|-----|-----|-----|-----|-----------|-----------------------------------------------------------------------------------|-----------------------------------
|-Hdr-|-Len-|-Ch.-|-Num-|-AudioData-|-???-???-???-???-???-???-???-???-???-???-???-???-???-???-???-???-???-???-???-???---|----- 8kHz, mono, A-Law audio! ----
------------------------------------------------------------------------------------------------------------------------|-----------------------------------
f1 d0 04 04 d1 02 00 ef f2 0f 86 19 00 05 00 00 1f e0 4f 00 00 00 00 00 00 00 00 00 00 00 00 00 52 00 00 00 00 00 00 00 52 5d 5c 5f 5e 59 58 5b 5a 5a 45 44
f1 d0 04 04 d1 02 00 f2 f2 0f 86 19 00 05 00 00 bf e0 4f 00 00 00 00 00 00 00 00 00 00 00 00 00 52 00 00 00 00 00 00 00 5f 5f 59 58 5b 5a 5a 44 44 47 46 41
f1 d0 04 04 d1 02 00 f6 f2 0f 86 19 00 05 00 00 5f e1 4f 00 00 00 00 00 00 00 00 00 00 00 00 00 52 00 00 00 00 00 00 00 5f 5e 59 58 5b 5a 45 44 47 47 46 46
f1 d0 04 04 d1 02 00 fc f2 0f 86 19 00 05 00 00 9f e2 4f 00 00 00 00 00 00 00 00 00 00 00 00 00 52 00 00 00 00 00 00 00 dd c6 cf f5 f7 f6 f1 f1 f7 f4 f4 f4
Send Audio | Byte 12 | 16 | 36 |
---|---|---|---|
Start | 10 | 06 | 01 |
Stop | 04 | 07 | 00 |
Audio data is encoded as 8kHz, mono, A-law
e.g.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 32 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50...
|-----|-----|-----|-----|-----------|-----------------------------------------------------------------------------------------------------------------------
|-Hdr-|-Len-|-Ch.-|-Num-|-AudioData-|--???-???-???-???-???-???-???-???--|----- 8kHz, mono, A-Law audio! -----
------------------------------------------------------------------------------------------------------------------------------------------------------------
f1 d0 01 54 d1 01 00 12 f2 0f 86 19 40 01 00 00 00 00 00 00 00 00 00 00 db c5 c6 c0 c3 c3 c0 c7 c4 c4 c5 c7 c6 c1 c0 c0 c0 c2 cd c3 c0 c1 c7 c5 d8 d3 d4 56
f1 d0 01 54 d1 01 00 13 f2 0f 86 19 40 01 00 00 00 00 00 00 00 00 00 00 d1 d0 d0 d3 d3 d3 d0 d2 d2 d3 d3 d0 d1 d6 d6 d7 d7 d4 d5 55 54 54 54 54 57 57 57 55
f1 d0 01 54 d1 01 00 14 f2 0f 86 19 40 01 00 00 00 00 00 00 00 00 00 00 54 d5 d5 55 54 54 54 57 d5 54 d5 54 d4 54 55 d5 d4 54 55 d7 54 d4 d4 d5 d4 d5 55 d5
f1 d0 01 54 d1 01 00 15 f2 0f 86 19 40 01 00 00 00 00 00 00 00 00 00 00 7c 17 62 60 6f c6 c4 4e f5 ef e7 e3 95 e7 e6 e1 f4 64 40 65 60 7e 7f 60 44 d8 48 da
00000000: f1 d0 00 34 d1 00 00 15 88 88 76 76 18 00 00 00 |...4......vv....|
00000010: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020: 03 00 00 00 80 02 00 00 68 01 00 00 52 00 00 00 |........h...R...|
00000030: 00 00 00 00 00 00 00 00 |........|
Bytes 36 & 37 are the horizontal resolution (little endian)
- 0x80 0x02 = (16 * 8 + 256 * 2) = 640 (SD stream)
- 0x80 0x07 = (16 * 8 + 256 * 7) = 1920 (HD Stream)
Bytes 40 & 41 are the vertical resolution (little endian)
- 0x68 0x01 = (16 * 6 + 8 + 256 * 1) = 360 (SD stream)
- 0x38 0x04 = (16 * 3 + 8 + 256 * 4) = 1020 (HD stream)
Video is raw h.264 data, encoded in variable lengths from byte offset 8 onwards in the message. Concatenating these payloads results in a file that's playable with ffmpeg, or vlc using the command line -demux h264
option. Video frames are distinguished as channel D1 02
and have a distinct message sequence counter from the other D1 00
commands sent from the camera.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 32 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50...
|-----|-----|-----|-----|-----------|-----------------------------------------------------------------------------------------------------------------------
|-Hdr-|-Len-|-Ch.-|-Num-|-VideoData-|----- h.264 encoded (AVC, Main@L3 format profile, 10.0 fps, YUV, 4:2:0, 8bit Progressive) -----
|-----|-----|-----|-----|-----------|-----------------------------------------------------------------------------------------------------------------------
f1 d0 04 04 d1 02 02 4c f1 0f 86 19 68 5a 00 00 a8 a3 0e 00 01 00 00 00 61 e3 0c 00 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 01 47 4d 40 1e 99 a0 28 0b
f1 d0 04 04 d1 02 02 63 f1 0f 86 19 b4 09 00 00 0c a4 0e 00 02 00 00 00 62 e3 0c 00 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 01 41 e0 1d 5c 77 04 c0 85
f1 d0 04 04 d1 02 02 6a f1 0f 86 19 2f 5c 00 00 38 a5 0e 00 01 00 00 00 65 e3 0c 00 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 01 47 4d 40 1e 99 a0 28 0b
f1 d0 04 04 d1 02 02 82 f1 0f 86 19 f8 06 00 00 9c a5 0e 00 02 00 00 00 66 e3 0c 00 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 01 41 e0 1d 5c 77 05 44 50
00000000: f1 d0 00 20 d1 00 00 19 88 88 76 76 04 00 00 00 |... ......vv....|
00000010: 13 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020: 00 00 00 00 |....|
Sent back to confirm receipt of commands, video/audio data or responses to commands. Minimum payload of 18 bytes, padded with zeroes if there are less than five acks in the message, but can be longer. Maximum payload seen for multiple video frames (84 bytes) included 38 message acknowledgements.
Byte offset | Value | Purpose(?) |
---|---|---|
0 | F1 | Constant - present in all commands / acks / responses |
1 | D1 | Message acknowledgment? |
2 | xx | Message Length MSB |
3 | xx | Message Length LSB, counting from after this byte |
4 | D1 | Constant |
5 | xx | 00 for command acknowledgement, 02 for video frames |
6 | xx | Number of acks in this message (MSB) |
7 | xx | Number of acts in this message (LSB) |
8 | xx | Ack #1 MSB |
9 | xx | Ack #1 LSB |
10 | xx | Ack #2 MSB (or 00) |
11 | xx | Ack #2 LSB (or 00) |
12 | xx | Ack #3 MSB (or 00) |
13 | xx | Ack #3 LSB (or 00) |
14 | xx | Ack #4 MSB (or 00) |
15 | xx | Ack #4 LSB (or 00) |
16 | xx | Ack #5 MSB (or 00) |
17 | xx | Ack #5 LSB (or 00) |
18 | xx | Ack #6 MSB (optional) |
19 | xx | Ack #6 LSB (optional) |
20 | xx | Ack #7 MSB (optional) |
21 | xx | Ack #7 LSB (optional) |
.. | xx | etc to a maximum offset of 83? |
Sent back and forth while no other traffic is flowing e.g. video stream. Timeout seems to be around 1-2 seconds of silence before this is initiated.
Either side (client or server) starts the keepalive exchange with a message containing a 4 byte payload:F1 E0 00 00
. The other side responds with F1 E1 00 00
.
If the server doesn't want to talk to the client any more, it sends a 4-byte payload: F1 F0 00 00