Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider implementing the VNC protocol for transporting keystrokes and mouse movements #29

Open
RoganDawes opened this issue Apr 10, 2019 · 24 comments

Comments

@RoganDawes
Copy link

Is your feature request related to a problem? Please describe.
VNC (Virtual Network Computing) is a lightweight protocol designed for implementing remote Keyboard Video Mouse systems. It is an ideal protocol for this use case, as there is an extensive ecosystem of clients and servers that already support this protocol. VNC supports transfer of video data, but does not require any responses to queries for it, making it actually quite simple to just ignore that part of the spec.

Describe the solution you'd like
Consider replacing the RPC server and firmware interface with an implementation of the VNC protocol. Substituting the line by line interface in the Arduino firmware with an implementation of the VNC protocol should be fairly straightforward. A sample implementation of VNC for the ESP8266 already exists as part of the ESP-VNC project (https://github.com/sensepost/esp-vnc/tree/7e770585c405288ff93fd42684a246627b5761e0/vnc), which could fairly easily be repurposed. While the current implementation expects a new TCP connection to initiate the protocol and reset the state machine, this implementation could begin when the USB Serial interface is opened.

Once that is done, the RPC Server implementation would collapse to "Listen on a TCP port, when a connection is made, open and lock the USB-Serial port, and simply copy bytes backwards and forwards", providing an adapter from a network port (expected by common VNC clients) to the USB-Serial port.

A user would then be able to make use of any suitable VNC client to emit keystrokes and mouse movements via the bluetooth interface.

Additionally, given that you are already familiar with Python, there is a command line VNC client implemented in Python (actually a simple frontend to a Python library) called VNCdo (https://github.com/sibson/vncdotool), which could provide an adapter between any specific functionality required by an individual, and the VNC server implementation. For example, this could be used to provide "macro" functions, where a single operation by the user results in a series of scripted keystrokes and mouse movements on the target.

VNC even includes support for copy and paste of client buffers to the server, which I have implemented as "Type out this block of text using the keyboard". Obviously, this is inappropriate for pasting of binary data, but can be a nice feature otherwise.

I would consider implementing this perhaps as a subfunction of the existing firmware, triggered by e.g. a custom AT command. "AT+VNC\n" "OK VNC" . This would still allow for things like AT commands to configure BlueTooth pairing, or an AT command to update a built-in keyboard map. And it would mean that any existing implementations using RPC-Service would continue to work unaltered.

Describe alternatives you've considered
I have implemented a similar sort of device using a "WiFi to USB keyboard" hardware, so I have some experience in this area.

All the other solutions I considered seemed to be more complex than simply using VNC. If using a custom protocol, one has to consider creating a client for it and dealing with things like special keys (e.g. Alt-Tab) which are interpreted locally to switch applications, when the desire was for that special key to be transmitted to the target.

An alternative to VNC might be Remote Desktop Protocol (RDP). I did not investigate that further as VNC was very simple to implement.

Additional context
Keyboard mapping (#20) could potentially be implemented by updating the mapKey() function with a different table.

Drag and drop (#21) is also then a solved problem, as it simply devolves to sending a mouse button press via VNC (with no immediate release event), moving the mouse, and eventually sending a button release event.

@willwade
Copy link
Contributor

Interesting. I've been wondering about this project potentially replacing the synergyKM solution (http://synergykm.com) which has used VNC. The problem though - (and correct me if I'm wrong here) This is taking the solution from Bluetooth - to TCP/IP right? Synergy and others like it have been super laggy due to the connection over TCP.. Maybe there is a combination we can use though. Hmmm. Thanks for the thoughts - really great to have your comments.

@hosseinzoda
Copy link

hosseinzoda commented Apr 11, 2019

Hi @RoganDawes , I have implemented RPC-daemon for RelayKeys in order for other applications to concurrently use the daemon. We have been considering to change it. With use of IPC protocols same purpose can be achieved. Protocols like dbus and or COM for windows.

We have implemented few clients that use RPC like relaykeys-qt.py and relaykeys-cli.py. You can also implement another client that would enable use of VNC protocol as server or client.

@RoganDawes
Copy link
Author

Interesting. I've been wondering about this project potentially replacing the synergyKM solution (http://synergykm.com) which has used VNC. The problem though - (and correct me if I'm wrong here) This is taking the solution from Bluetooth - to TCP/IP right? Synergy and others like it have been super laggy due to the connection over TCP.. Maybe there is a combination we can use though. Hmmm. Thanks for the thoughts - really great to have your comments.

This does not move from bluetooth to TCP/IP at all, other than as a local transport. The VNC daemon would replace the RPC-Daemon as the "thing sending keystrokes and mouse movements to the dongle". All that happens on the same PC, so there is no external network traffic at all. All network comms is purely on localhost.

And yes, this could be seen as an alternative to Synergy that also avoids the "separate network" problem. There would however be the problem of lack of feedback regarding the actual relative mouse position on the remote machine, and eventual mismatch should the mouse be moved through some other means e.g. a physical mouse attached to the other computer. One solution to that is to have the mouse emit absolute mouse events, similar to a touch screen, as opposed to relative mouse events as most common mice do. i.e. "I am at position (423, 500)" as opposed to "I moved (-5, +2)".

The only thing to be sure of then, is to set up the resolution of the mouse to match the resolution of the computer that it is plugged into. That could possibly be configured on a "per paired bluetooth device" basis, such that it emits absolute coordinates only if the remote bluetooth MAC corresponds to a configured resolution. And in fact, while I stand to be corrected on this, I suspect that the host PC would read back the configured resolution (min/max X, min/max Y) from the Bluetooth descriptor, and simply scale the coordinates to match the current display resolution.

@RoganDawes
Copy link
Author

Hi @RoganDawes , I have implemented RPC-daemon for RelayKeys in order for other applications to concurrently use the daemon. We have been considering to change it. With use of IPC protocols same purpose can be achieved. Protocols like dbus and or COM for windows.

We have implemented few clients that use RPC like relaykeys-qt.py and relaykeys-cli.py. You can also implement another client that would enable use of VNC protocol as server or client.

In that case, I would make use of libvncserver as a VNC proxy to allow concurrent connections to localhost, and simply multiplex/interleave the VNC events that get sent down to the dongle, making sure to handle the concurrent access to the serial port to ensure that each client does not stomp on the other.

It would also be fairly simple to reimplement RPC-Server as a wrapper around vncdo, to allow for backwards compatibility with any existing RPC-Server clients.

@RoganDawes
Copy link
Author

One last comment, is that use of VNC as the primary protocol to the dongle could also help to improve responsiveness, as it would not require the ("send AT+..=event", wait for "OK", repeat) cycle, but rather streams multiple keystrokes and mouse movements without waiting for any confirmation. In my testing with http://github.com/sensepost/usabuse, (admittedly WiFi and not Bluetooth), I was able to get over 200 characters per second (i.e. 400 key events per second) using VNC, largely because I was not subject to the round trip penalties.

@hosseinzoda
Copy link

@RoganDawes I don't think use of VNC inside relaykeysd would result in significant improvement in the protocol. This daemon is intended for sharing hid device to multiple programs. Without them need to initiate connectivity at every start. concurrency is not our big concern.

I'm more interested in use of IPC protocols for simplifying authentication process.

One last comment, is that use of VNC as the primary protocol to the dongle could also help to improve responsiveness, as it would not require the ("send AT+..=event", wait for "OK", repeat) cycle, but rather streams multiple keystrokes and mouse movements without waiting for any confirmation. In my testing with http://github.com/sensepost/usabuse, (admittedly WiFi and not Bluetooth), I was able to get over 200 characters per second (i.e. 400 key events per second) using VNC, largely because I was not subject to the round trip penalties.

Performance of relaykeysd was good enough for our purpose. I'm sure It's not the most efficient method. I would be happy to see how VNC can get integrated in this project. I can imagine leaving RPC protocol as is and implementing VNC as a client would be a easier approach for maintaining the project.

@RoganDawes
Copy link
Author

Absolutely agree that one should maintain the existing interfaces that others may already be using, while improving the internals of the project where appropriate.

I'm curious as to your reference to an "authentication process". What exactly do you need to authenticate?

@hosseinzoda
Copy link

Well. As rpc is running on TCP/IP. I've considered It would be insecure to accept commands from any accepted connection.

I've read about VNC a bit. I'm not sure about implementing it on the relaykeysd. It may be the correct thing to do. Since it has support for multiple concurrent connections. At the moment relaykeysd is only connected to these BLE devices. How important it would be to implement VNC within this daemon?

@hosseinzoda
Copy link

We may be able to benefit from VNC clients if we consider adding another peripheral for receiving other interfaces like screen.

@RoganDawes
Copy link
Author

RoganDawes commented Apr 11, 2019

Well, there are two approaches to security in this regard.

  1. We can assume that any connections originating from the same PC are trusted. In that case, listen on the localhost interface only, and it will be impossible to connect to any network services from outside the PC.

  2. VNC actually supports authentication as part of the protocol. Combine that with only listening on localhost to avoid things like HTTP rebinding attacks, etc.

As far as implementing VNC withing relaykeysd, I'm not 100% familiar with your architecture, so can't really comment.

If I was starting from scratch, I would proceed as follows:

  1. Implement VNC within the dongle, potentially behind an AT command like AT+VNC=<target BLE MAC address>. This allows you to specify to which central device to direct the keystrokes and mouse movements, while also benefiting from the lack of round trips per message/event. Especially when you get to sending mouse movements, those round trips will kill the performance of the solution.
  2. Implement a VNC extension to allow breaking back out to the command line interface when needed, other than by closing the serial port. Obviously, closing the serial port would have the same effect.
  3. Implement a simple VNC proxy server listening on localhost only. Each listening port would correspond to the BLE MAC address of a paired central. (It would be necessary to have an AT command to query which central devices are paired, obviously, and ideally, which are currently connected)
    This VNC proxy server would:
    1. Accept a connection on a port corresponding to one of the paired central devices. These should be configured in a config file/registry entry/whatever, so that the "VNC port to central mappings" do not change unexpectedly. Alternatively, the VNC client could pass the MAC address as a username when connecting to the server to select which central it wishes to send its events to.
    2. when the first connection is made, the VNC proxy would open the dongle, verify that the provided MAC address is paired and available, then issue "AT+VNC=", wait for "OK", then start relaying VNC events to the dongle. If the VNC client disconnects, the proxy server would execute the VNC escape sequence to return to the command line.
    3. If a concurrent VNC connection is made, the new VNC client could request control of the dongle from any other client that is busy. If that other client agrees to yield control of the dongle (e.g. it considers its connection to be idle), it would execute the VNC escape sequence to drop back to the command line interface, then hand the serial port over to the next VNC client thread. That client would then issue its own "AT+VNC=" command to initiate the VNC protocol decoder on the dongle, and commence relaying VNC events.
    4. If a concurrent connection to the same BLE MAC is requested, there would be no need to request control of the dongle, as it is already connected to the desired central. VNC events from each VNC client could be interleaved without issue. Obviously, having both clients emitting keystroke events at the same time would have exactly the same effects as typing on two keyboards at the same time, namely that those events would be intermingled. If this is not desired, making sure that each client requests exclusive control of the dongle would solve this problem.
    5. Rinse and repeat as necessary.
  4. As an alternate interface to VNC, the VNC proxy server could also implement the RPC-Server protocol, and translate that internally to VNC events, requesting control over the dongle as described previously. Alternatively the RPC-Server could be implemented as a separate process, connecting to one or more of the VNC proxy server ports as needed.

Whatever the current user interface for selecting which device to target could be adapted to emit VNC events to an appropriate VNC Proxy Server port/username. Or, existing VNC clients could be used as-is.
And Macros can be composed by using tools such as vncdo as previously mentioned.

I'll probably have a go at implementing some of the above myself, as I have one of the Nordic nRF52840 dongle's I mentioned elsewhere, and a couple of Bluetooth-capable devices that I could connect to.

@willwade
Copy link
Contributor

willwade commented Jun 1, 2020

Hey @RoganDawes - still ponder this from time to time. Its a super neat idea - but something I havent had the time to investigate. Over a year on - do you have any more fresh thoughts?

@RoganDawes
Copy link
Author

RoganDawes commented Jun 1, 2020

Hi Will. Yes, I am still pondering this myself. My current thought is to not use the nRF52840 dongle, with the associated complexity of tunneling VNC protocol over a serial port. I think it would make a lot more sense to use an ESP32 which has both bluetooth and WiFi capabilities. Then, the ESP32 can present a web interface for configuration purposes (WiFi network configuration, Bluetooth pairing/unpairing, default and/or per-device keyboard mappings, etc), as well as a VNC service per Bluetooth connection.

I envision the process going something like:

  1. Apply power to newly acquired/flashed device.
  2. Connect to a unique AP offered by the device to perform initial configuration.
  3. Enter WiFi SSID and passphrase, and optionally choose a VNC authentication password
  4. Reconnect to preferred SSID, and find the IP address of the ESP32, possibly using Multicast DNS/DNS-SD. e.g. http://devicename.lan, whatever the default devicename might be. Espressif normally names them something like ESP_xxxx where xxxx are the last 4 digits of the MAC address. This is also used in the WiFi SSID, so it should be fairly clear what the hostname should be.
  5. Initiate Bluetooth pairing via the ESP32 web interface, pair device using OS-specific process. Each paired device could get a "slot", which is essentially a reference number that provides stable numbering of the VNC service that appears and disappears depending on whether the Bluetooth peer is connected or not. VNC services normally start at TCP/5900, and increment from there, so the slot would indicate the port number to connect to.
  6. On the assistive computer, create VNC profiles for each connected device using a preferred VNC viewer. e.g. iPad in slot 0 = esp_xxxx:5900, etc.
  7. To interact with each device, execute or switch to the VNC profile. Any keystrokes or mouse movements will appear on the connected device for that profile.

https://github.com/asterics/esp32_mouse_keyboard (and other linked repositories) appear to be good starting points.

@RoganDawes
Copy link
Author

I actually haven't played all that much with Bluetooth keyboards/mice, so I may be a bit off in terms of who initiates the pairing process, the keyboard or the device (e.g. iPad). It may be preferable to have the VNC services listening for each paired slot, regardless of whether the associated device is currently connected. Then a connection on that port would trigger the "keyboard" to connect to the iPad, if it is in range. If not able to connect, the VNC connection could just be closed.

@RoganDawes
Copy link
Author

I did make some changes to the esp32_mouse_keyboard repo to allow for key instructions to be received on the default serial port. I just pushed those to a fork at https://github.com/RoganDawes/esp32_mouse_keyboard

@willwade
Copy link
Contributor

willwade commented Jun 1, 2020

Ha! Its a small world - I follow the asterics team closely ;)

I'm trying to figure out pros and cons of 1) moving to the esp32 and 2) Using VNC. The one thing that immediately comes to mind re: VNC is there's a lot more of a software stack for the client to deal with. With it being just a an API to talk directly to the board over serial its pretty simple. But - I do see it immediately solves all the keycode/scancode problems and language there may be.

If we get some resources/funding I think we should definitely do some R&D into this.

@RoganDawes
Copy link
Author

I don't know that there is any real software stack on the client. Certainly nothing custom that becomes the responsibility of this project. The client can choose any VNC client that works the way they want to, whether open source or commercial. Configuration is done using a browser, so no stack required there either. This makes the project cross platform (although I suppose the major accessibility/assistive platforms are all Windows based?) with no additional effort from you.

Efforts are focused on firmware for a single device, which can even be pre-flashed with the SSID/passphrase of the recipient's network, if necessary. Then once the recipient receives the device, their carer can assist them to pair the connected device (i.e. invoking the Bluetooth pairing functionality), and after that, the recipient can do all of the configuration for themself. Assuming some technical capacity, that is.

And, what's more, this should run on a multitude of super-cheap ESP32 platforms without (much?) modification, because it uses no peripherals other than those that are on-chip. i.e. there is no pin mapping variation to worry about, unless you want indicator LEDs.

@willwade
Copy link
Contributor

willwade commented Jun 1, 2020

Yes - of course - because the VNC server is effectively on the esp32 over TCP I get your point. Yes - there is an integration niggle with the AT companies stuff - but then there is anyway.

I feel we need to do a fork of our project as it spins everything on its head a little - but I totally get its a cool idea.

There is one (maybe I'm worrying too much) query. The reliance on any networking method that relies on TCP is a worry as we always struggle with kit working in places like schools and hospitals. Wifi isnt always the most reliable - but also policies block the use of protcols in some environments.

Thanks @RoganDawes for your input. Its really helpful.

@RoganDawes
Copy link
Author

Ah, I see. Well, there are two possible modes of operation, either as a WiFi Station (i.e. client, getting an address via DHCP), or as a WiFi Access Point (handing out DHCP addresses to clients).
Depending on the user's requirements, if they need to leave home and take the device off their home wifi, we can program a fallback mode. If it can't connect to its pre-configured AP, start up as its own AP.
Then the user's assistive PC can connect to the ESP32's AP, and continue. The one thing to make sure of would be that DNS-SD continues to work when operating in AP mode, so that the preconfigured VNC profiles continue to function.

If the user needs to connect their assistive PC to e.g. the hospital network, using their WiFi interface, that could pose a bit of a problem. One doesn't want to have to reprogram the ESP32 for each network that the user may want to connect to. One workaround might be to have a second Wireless interface (e.g. USB dongle) plugged into the assistive PC that is dedicated to connecting to the ESP32, and allow the user to configure their other interfaces as desired.

If the argument is that hospitals are hostile RF environments, and WiFi generally doesn't work, or is disrupted by big electromagnets, etc, I don't imagine that Bluetooth would work too well either?

One final fallback might be to make use of the USB-UART interface of the ESP32, and proceed as described in my initial post. This would be suboptimal, however, because the USB link is not under the control of the ESP32, and as a result, some features such as detecting disconnection of the host are not available.

@willwade
Copy link
Contributor

willwade commented Jun 1, 2020

Yeah - to be honest - its more about the practicalities of dealing with the device wifi and the network. E.g. hospitals issue is more about Wifi signals being crappy and locked down networks - more than a restriction on protocols. BLE solves that as its straight device to device. Schools generally have very strict protocols in place about anything that joins their network - or creates a network. But yes - not unsurmountable. Just makes the project a bit trickier.

@RoganDawes
Copy link
Author

RoganDawes commented Jun 1, 2020

Right. The idea would definitely not be to have the ESP32 joining random networks all over the place. That is why I suggested adding a second WiFi interface to the assistive PC, so that it could manage its own connection to the ESP32.
And for schools that have strict rules about things creating their own Wifi AP's, apparently the ESP32 now supports 802.11w (Protected Management Frames), so technical means of disrupting the AP will fail (although I'm not sure if that applies when the ESP32 is operating as its own AP, rather than as a client of a managed AP). From a policy perspective, I would imagine that this would be a sufficiently good reason for granting an exemption from the rule. So we should be ok in all aspects, I think.
Edit: "Currently, PMF is supported only in Station mode." - that's a pity!

@RoganDawes
Copy link
Author

Another alternative to the ESP32 is to go full Linux, using something like a Raspberry Pi Zero W, and a stack like: https://github.com/quangthanh010290/BL_keyboard_RPI
The advantage to that is that the Pi Zero W can be connected to the host PC using USB networking, and any VNC connections can be done via that USB network interface, rather than WiFi (although WiFi would also be an option).

willwade added a commit that referenced this issue Jun 1, 2020
@joedevsys
Copy link
Contributor

VNC or similar remote control protocol would not work on Android as it appears that Android doesn't allow this kind of keyboard input in normal operating modes. HID does work on Android.

@RoganDawes
Copy link
Author

I think you are misunderstanding how the parts fit together in my proposal. The idea is that this project provides a "dongle" that appears to be a bluetooth keyboard and mouse, and can emit keystrokes and mouse movements. Any device that is capable of connecting a Bluetooth keyboard (this includes most Android devices) would then be able to pair with it, and receive those events.
Most of this discussion about VNC has been with regards to the way of communicating the required keystroke and mouse events to the "dongle", before they are emitted as Bluetooth events. My contention is that VNC is a far more efficient way of sending keystroke and mouse events than AT+MOUSE=-1,+1\n, OK\n sequences, if only because there is no delay waiting for the OK response before sending the next mouse movement. And of course, the encoding of the mouse movements in a binary protocol would be more compact as well, but it is the round trip latency that would make the biggest difference.

@joedevsys
Copy link
Contributor

Ok in that context I agree. The AT command set is very inefficient, although not using it would prevent use of the Bluefruit Friend

gitbook-com bot pushed a commit that referenced this issue Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants