Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

01-rewrite #279

Merged
merged 39 commits into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
7a09fd6
Merge branch 'main' of https://github.com/benxu3/01 into react-native…
benxu3 May 3, 2024
763026d
add text response interface
benxu3 May 3, 2024
1112ac5
Merge branch 'main' of https://github.com/benxu3/01 into react-native…
benxu3 May 3, 2024
2f594dd
add monospace font
benxu3 May 3, 2024
4994132
Merge branch 'main' of https://github.com/OpenInterpreter/01 into rea…
benxu3 May 6, 2024
926045e
add js docs
benxu3 May 6, 2024
c35d4c0
Merge branch 'main' of https://github.com/OpenInterpreter/01 into rea…
benxu3 May 20, 2024
10681b5
add async-interpreter
benxu3 May 31, 2024
0fbe497
merge upstream
benxu3 May 31, 2024
bf7c81b
Revert "merge upstream"
benxu3 Jun 1, 2024
9e04e2c
remove excess print statements
benxu3 Jun 4, 2024
72f7d14
add realtime tts streaming
benxu3 Jun 12, 2024
5e9f940
fix api keys
benxu3 Jun 14, 2024
2627fba
remove print api key
benxu3 Jun 14, 2024
4b25239
stash server changes
benxu3 Jun 17, 2024
eee00ac
add async interpreter with coqui, openai, elevenlabs tts
benxu3 Jun 18, 2024
5e6dae2
Merge branch 'temp-branch' into async-interpreter
benxu3 Jun 18, 2024
d8d57f3
add plyer pywinctl
benxu3 Jun 18, 2024
3011e55
resolve dateparser dependencies
benxu3 Jun 18, 2024
d59bce5
stash local debug statements
benxu3 Jun 18, 2024
a495b9d
add latency logs
benxu3 Jun 18, 2024
2809835
merge temp branch
benxu3 Jun 18, 2024
8f62be8
add profiles
benxu3 Jun 18, 2024
f1ed90e
Merge branch 'main' of https://github.com/OpenInterpreter/01 into tem…
benxu3 Jun 18, 2024
674cccd
Merge branch 'async-interpreter' into temp-branch
benxu3 Jun 18, 2024
2d6d7f9
Merge pull request #1 from benxu3/temp-branch
benxu3 Jun 18, 2024
0f5c75c
add base device
benxu3 Jun 18, 2024
d9270ef
merge async_interpreter from temp-branch
benxu3 Jun 18, 2024
375ed1f
merge profiles from temp-branch
benxu3 Jun 18, 2024
456ac51
merge server from temp-branch
benxu3 Jun 18, 2024
34bd6ea
remove unused cmd-line files
benxu3 Jun 18, 2024
4850b4a
move llm config to profiles directory
benxu3 Jun 18, 2024
d162ee6
remove unused files
benxu3 Jun 19, 2024
564255a
update docs and remove comments
benxu3 Jun 19, 2024
3642905
Merge branch 'async-interpreter' of https://github.com/benxu3/01 into…
benxu3 Jun 19, 2024
2d15bae
add different sample rates for mic and speakers on 01
benxu3 Jun 21, 2024
2814e1f
add mic buf count and len settings
benxu3 Jun 21, 2024
5b60ec2
set template server and wifi
benxu3 Jun 21, 2024
ef48e9c
update readme for 01 Light speaker sample rate
benxu3 Jun 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,9 @@ If you want to run local speech-to-text using Whisper, you must install Rust. Fo

## Customizations

To customize the behavior of the system, edit the [system message, model, skills library path,](https://docs.openinterpreter.com/settings/all-settings) etc. in `i.py`. This file sets up an interpreter, and is powered by Open Interpreter.
To customize the behavior of the system, edit the [system message, model, skills library path,](https://docs.openinterpreter.com/settings/all-settings) etc. in the `profiles` directory under the `server` directory. This file sets up an interpreter, and is powered by Open Interpreter.

To specify the text-to-speech service for the 01 `base_device.py`, set `interpreter.tts` to either "openai" for OpenAI, "elevenlabs" for ElevenLabs, or "coqui" for Coqui (local) in a profile. For the 01 Light, set `SPEAKER_SAMPLE_RATE` to 24000 for Coqui (local) or 22050 for OpenAI TTS. We currently don't support ElevenLabs TTS on the 01 Light.

## Ubuntu Dependencies

Expand Down
5,844 changes: 4,011 additions & 1,833 deletions software/poetry.lock

Large diffs are not rendered by default.

18 changes: 16 additions & 2 deletions software/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,27 @@ psutil = "^5.9.8"
typer = "^0.9.0"
platformdirs = "^4.2.0"
rich = "^13.7.1"
open-interpreter = {extras = ["os"], version = "^0.2.5"}
dateparser = "^1.2.0"
pytimeparse = "^1.1.8"
python-crontab = "^3.0.0"
inquirer = "^3.2.4"
pyqrcode = "^1.2.1"
realtimestt = "^0.1.12"
realtimetts = "^0.4.1"
keyboard = "^0.13.5"
pyautogui = "^0.9.54"
ctranslate2 = "4.1.0"
py3-tts = "^3.5"
elevenlabs = "1.2.2"
groq = "^0.5.0"
open-interpreter = {extras = ["os"], version = "^0.2.6"}
litellm = "1.35.35"
openai = "1.30.5"
pywebview = "*"
pyobjc = "*"

sentry-sdk = "^2.4.0"
plyer = "^2.1.0"
pywinctl = "^0.3"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
Expand Down
1 change: 1 addition & 0 deletions software/pytest.ini
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
; Config for Pytest Runner.
; suppress Deprecation Warning and User Warning to not spam the interface, but check periodically

[pytest]
python_files = tests.py test_*.py
filterwarnings =
Expand Down
101 changes: 68 additions & 33 deletions software/source/clients/base_device.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

load_dotenv() # take environment variables from .env.

import subprocess
import os
import sys
import asyncio
Expand Down Expand Up @@ -46,7 +47,7 @@
CHUNK = 1024 # Record in chunks of 1024 samples
FORMAT = pyaudio.paInt16 # 16 bits per sample
CHANNELS = 1 # Mono
RATE = 44100 # Sample rate
RATE = 16000 # Sample rate
RECORDING = False # Flag to control recording state
SPACEBAR_PRESSED = False # Flag to track spacebar press state

Expand All @@ -60,12 +61,18 @@
# Specify OS
current_platform = get_system_info()


def is_win11():
return sys.getwindowsversion().build >= 22000


def is_win10():
try:
return platform.system() == "Windows" and "10" in platform.version() and not is_win11()
return (
platform.system() == "Windows"
and "10" in platform.version()
and not is_win11()
)
except:
return False

Expand All @@ -80,9 +87,10 @@ class Device:
def __init__(self):
self.pressed_keys = set()
self.captured_images = []
self.audiosegments = []
self.audiosegments = asyncio.Queue()
self.server_url = ""
self.ctrl_pressed = False
self.tts_service = ""

def fetch_image_from_camera(self, camera_index=CAMERA_DEVICE_INDEX):
"""Captures an image from the specified camera device and saves it to a temporary file. Adds the image to the captured_images list."""
Expand Down Expand Up @@ -144,11 +152,25 @@ def queue_all_captured_images(self):

async def play_audiosegments(self):
"""Plays them sequentially."""

mpv_command = ["mpv", "--no-cache", "--no-terminal", "--", "fd://0"]
mpv_process = subprocess.Popen(
mpv_command,
stdin=subprocess.PIPE,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)

while True:
try:
for audio in self.audiosegments:
audio = await self.audiosegments.get()

if self.tts_service == "elevenlabs":
mpv_process.stdin.write(audio) # type: ignore
mpv_process.stdin.flush() # type: ignore
else:
play(audio)
self.audiosegments.remove(audio)

await asyncio.sleep(0.1)
except asyncio.exceptions.CancelledError:
# This happens once at the start?
Expand Down Expand Up @@ -267,19 +289,18 @@ def toggle_recording(self, state):
def on_press(self, key):
"""Detect spacebar press and Ctrl+C combination."""
self.pressed_keys.add(key) # Add the pressed key to the set


if keyboard.Key.space in self.pressed_keys:
self.toggle_recording(True)
elif {keyboard.Key.ctrl, keyboard.KeyCode.from_char('c')} <= self.pressed_keys:
elif {keyboard.Key.ctrl, keyboard.KeyCode.from_char("c")} <= self.pressed_keys:
logger.info("Ctrl+C pressed. Exiting...")
kill_process_tree()
os._exit(0)

# Windows alternative to the above
if key == keyboard.Key.ctrl_l:
self.ctrl_pressed = True

try:
if key.vk == 67 and self.ctrl_pressed:
logger.info("Ctrl+C pressed. Exiting...")
Expand All @@ -289,17 +310,17 @@ def on_press(self, key):
except:
pass



def on_release(self, key):
"""Detect spacebar release and 'c' key press for camera, and handle key release."""
self.pressed_keys.discard(key) # Remove the released key from the key press tracking set
self.pressed_keys.discard(
key
) # Remove the released key from the key press tracking set

if key == keyboard.Key.ctrl_l:
self.ctrl_pressed = False
if key == keyboard.Key.space:
self.toggle_recording(False)
elif CAMERA_ENABLED and key == keyboard.KeyCode.from_char('c'):
elif CAMERA_ENABLED and key == keyboard.KeyCode.from_char("c"):
self.fetch_image_from_camera()

async def message_sender(self, websocket):
Expand Down Expand Up @@ -332,35 +353,48 @@ async def exec_ws_communication(websocket):
chunk = await websocket.recv()

logger.debug(f"Got this message from the server: {type(chunk)} {chunk}")
# print("received chunk from server")

if type(chunk) == str:
chunk = json.loads(chunk)

message = accumulator.accumulate(chunk)
if chunk.get("type") == "config":
self.tts_service = chunk.get("tts_service")
continue

if self.tts_service == "elevenlabs":
message = chunk
else:
message = accumulator.accumulate(chunk)

if message == None:
# Will be None until we have a full message ready
continue

# At this point, we have our message

if message["type"] == "audio" and message["format"].startswith("bytes"):
if isinstance(message, bytes) or (
message["type"] == "audio" and message["format"].startswith("bytes")
):
# Convert bytes to audio file

audio_bytes = message["content"]

# Create an AudioSegment instance with the raw data
audio = AudioSegment(
# raw audio data (bytes)
data=audio_bytes,
# signed 16-bit little-endian format
sample_width=2,
# 16,000 Hz frame rate
frame_rate=16000,
# mono sound
channels=1,
)

self.audiosegments.append(audio)
if self.tts_service == "elevenlabs":
audio_bytes = message
audio = audio_bytes
else:
audio_bytes = message["content"]

# Create an AudioSegment instance with the raw data
audio = AudioSegment(
# raw audio data (bytes)
data=audio_bytes,
# signed 16-bit little-endian format
sample_width=2,
# 16,000 Hz frame rate
frame_rate=22050,
# mono sound
channels=1,
)

await self.audiosegments.put(audio)

# Run the code if that's the client's job
if os.getenv("CODE_RUNNER") == "client":
Expand All @@ -369,7 +403,7 @@ async def exec_ws_communication(websocket):
code = message["content"]
result = interpreter.computer.run(language, code)
send_queue.put(result)

if is_win10():
logger.info("Windows 10 detected")
# Workaround for Windows 10 not latching to the websocket server.
Expand Down Expand Up @@ -399,6 +433,7 @@ async def start_async(self):

# Start watching the kernel if it's your job to do that
if os.getenv("CODE_RUNNER") == "client":
# client is not running code!
asyncio.create_task(put_kernel_messages_into_queue(send_queue))

asyncio.create_task(self.play_audiosegments())
Expand Down
Loading