-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent apparently rewrites entire code file and loses a bunch of work. #44
Comments
Yes, I’m also working on improving AIlice’s ability to modify and edit code in complex software projects. It often overwrites previous work, and this problem is largely because AI lacks a suitable text editor. If there were a convenient text editor available, the LLM would likely prefer modify existing code than generating a new code file. Currently, there are a few solutions:
The first method is natively supported but has the drawback that the LLM might need to use many escape characters in the text replacement expressions, which it is not very good at, leading to potential errors. I have already implemented the second method but haven't yet submitted the corresponding changes in the prompt (so you can’t use it yet). This method works in tests but is not ideal yet. The third method is more complex. Leveraging existing knowledge when providing tools for LLM is an effective technique. Teaching it to do complex things with prompts incurs significant costs and is less effective than using tools it has learned during pre-training. This is still an ongoing experiment. I can upload the complete code to dev soon, and if you're interested, you can give it a try. I’m also interested in solutions for directly editing functions or classes in the code. Do we have a command-line tool for such code editing functions? |
I am working on ingesting CPP methods, classes, functions. I have untested code also for ingesting Python, Rust, and Javascript. Check out R2R framework it may be just what is needed for ingesting for autocoding. Doc store, graph db (neo4j) and vector DB and postgres with agent RAG pipelines etc. I might add my code ingestion stuff in there. I think it's possible to give the agent a "map" of a project and let it edit individual classes or methods without having to rewrite everything else. As long as it is grounded by unit tests along the way I think the flow from alphacodium will be successful for autocoding. This could be the perfect long-term memory for AIlice. |
I am very much looking forward to seeing the progress of this work! I have recently realized the importance of extracting entities based on classes or functions, which is exactly what you are working towards. Once we have a better long-term memory, we can enable AIlice to accumulate experience automatically into long-term memory, just as you mentioned with procedures. This will be a controllable and interpretable learning mechanism, which is very promising! |
I have basic ingestion working but I got diverted temporarily. I would rather use an existing solution for this sort of thing than have to write it myself. But if you want to see a working (basic) ingestion of C++ classes/methods/functions see my ngest repo. There's Python, Rust, and Javascript ingestion in there as well but that part is untested: https://github.com/FellowTraveler/ngest Preferred: I found a great project that runs in docker and has a clean API I think you will really like for RAG and memory purposes. It supports GraphRAG, HyDE, hybrid search, all kinds of ingestion, etc and works great from my testing so far. Probably I will just add my code ingestion to it instead of having to reinvent the wheel. https://r2r-docs.sciphi.ai/api-reference/introduction https://github.com/SciPhi-AI/R2R Also, aider has a repo map and uses tree-sitter to build an abstract syntax tree. Worth checking out when I have the time. |
Last time you mentioned r2r, and I did some research on it. I wrote a long-term memory module to test its performance, but due to recent personal matters, I haven't completed this work yet. I will test your code and r2r and consider addressing the code editing issues. Thank you for the clues and information you provided; they have been very helpful. Additionally, I found that the current storage module implicitly requires the queried text to be the original text stored. This makes it difficult to apply more advanced memory mechanisms, such as knowledge graphs. Therefore, I will consider doing some refactoring and cleanup work to pave the way for the long-term memory module. |
So far all my code does is a basic ingestion of C++ code into a knowledge graph, with relationships created between the classes and methods. I haven't tested the ingestion for other languages yet (though code is there for python, rust, javascript). I also added PDF ingestion but that's where I start looking at projects like R2R because why reinvent the wheel? R2R isn't the only one. There are many others and I haven't had time to investigate as much as I'd like. At least with R2R though I have it up and running and it appears to be good stuff. I like how clean the API is, and I like the functionality it provides. I will probably end up putting my C++ ingestion code into R2R but we'll see, I still have more research to do. Others similar to it that need more investigation are: https://github.com/infiniflow/ragflow https://github.com/mem0ai/mem0 https://github.com/D-Star-AI/dsRAG https://github.com/neuml/txtai https://github.com/EdwardDali/erag https://github.com/iliab1/Neo4j-RAG-Chatbot https://github.com/jayavibhavnk/GraphRetrieval https://github.com/kingjulio8238/memary https://github.com/bradAGI/GraphMemory https://github.com/AI-Commandos/RAGMeUp https://github.com/GoodAI/charlie-mnemonic https://github.com/fiatrete/OpenDAN-Personal-AI-OS NOTE: some of these are agent projects, but I put them here if they seemed to incorporate memory features that may prove useful. The rest are specific to agentic memory and/or RAG. |
I noticed the agent was working on the code and the file got bigger, and bigger, say 3k, then 6k, then 9k.
...Then at some point the file was 2 or 3k again. Basically the agent just deleted everything it had done and rewrite the file.
I think that code files should not be thought of as "files" to the agent.
Instead, whether the agent writes the code, or ingests it, the agent should determine how to split up the file (into a series of functions for example) and should have a corresponding test(s) for each function, and should only make changes to that specific function.
The entire file if possible (or maybe the header in the case of C++) should still be in the context, but the only part actually being edited should be the 1 specific function, which should be stored in its own place. Then when the agent writes the file to disk, it "compiles" it from those chunks into the final text file that is written. The agent should not have the power to rewrite the entire file at once and re-save it. Otherwise that situation prevents it from systematically working through the problem set, and instead causes it to loop endlessly starting over at the beginning with a rewrite and losing work that it had already done and even debugged.
This happened tonight with Claude-Sonnet-3.5 BTW. But I think I've also seen it with GPT-4o, deepseek-coder and/or qwen2 72b instruct.
BTW -- you may already be doing this, I saw hints of something like this in the log. But I still saw the problem, so maybe I just need to understand the code better so I can help prevent this sort of issue from happening.
The text was updated successfully, but these errors were encountered: