-
Notifications
You must be signed in to change notification settings - Fork 2
Project Report
Throughout the seminar "New Digital Literacy - From GOFAI to ML" we discussed the emergence of machine learning (ML) from a theoretical standpoint that covered everything from its history to its application. The second focus of this seminar was the practical application of the discussed theory by developing our own neural networks (NNs) using ML5 (JavaScript). Both sides of this seminar were directed towards the development of an artistic project that makes use of machine learning. This report will cover the development process of the joint ML project by Şerban-Aurelian Gorga and Kadir Daniel Arslan.
During the seminar, we independently began conceptualizing our projects while learning about the possibilities that ML offers as we practiced in the p5 web editor. After learning about each other's initial ideas and discussing them in class we requested to collaborate on a shared project, which had been approved. At this point, it is worth noting the working conditions for the majority of this project's development. We were located in different places and needed to consider the state of the COVID-19 pandemic. Under these circumstances, we could only meet virtually to discuss our project and progress. In the beginning, we needed to negotiate how to best collaborate from afar and with conflicting schedules. The solution presented itself in asynchronous work using GitHub and setting up an easily comparable work environment with Visual Studio Code.
Next was the obvious question, how do we combine our thoughts and ideas to result in a coherent ML project? Originally, we had an idea for a project using p5 of creating artificial intelligence (AI) to create an interactive user interface (UI), dubbed "p5aiui". Bringing our ideas together with the knowledge from the seminar we soon focused our attention on the potentials of image classification. Interactivity should remain as an integral feature but we realized early on that this would require a well-trained machine learning (ML) algorithm to inform the possible interactions. This naturally lead to the conception of using simple, clearly defined shapes as the basis of user input. So the algorithm had to be trained on those specific shapes to reliably recognize them, and the user should be allowed to create their own shapes. These custom shapes (from the defined list the ML algorithm recognizes) were to be given characteristics that define the possible interactions with other shapes upon creation. This evolved into a sandbox concept where all user-created shapes receive characteristics from the AI and then could interact with each other in a simulated physical space.
Our initial outline for the project therefore involved:
- a drawing feature for the user to input a custom shape;
- a trained ML neural network to recognize the shape;
- an algorithm for deriving shape characteristics based on the NN's output;
- a simulation of physics which would animate the shapes to interact with each other in novel ways every time.
Even though parts 3. and 4. never came to fruition, the work on the drawing features and machine learning would constitute the fundamental building blocks of the final project.
3. Sandbox project: First prototype of drawing feature and shape classifier (circles, squares, triangles)
We began working asynchronously on our outlined sandbox concept with regular digital meetings to discuss the next steps and progress so far. The following section will cover the relevant steps during this stage of development.
The implementation of a canvas for drawing seemed like a good place to start. It was and remains a central part of the project as well as being a relatively simple feature to code. The process of programming the canvas functions served the additional purpose of introducing us to the logic of JavaScript. By experimenting with how we could make drawing work elegantly we got used to working with JavaScript and our shared workflow. Even so, a canvas with the ability to draw proved more complex than we at first anticipated.
The initial conception of the project required the implementation of a straight-line drawing feature. This feature would seamlessly allow users to "switch" between drawing straight lines and free lines by holding shift
while clicking-and-dragging so that drawing triangles, squares, and circles would feel intuitive and easy. Commits d25b31b and c2c67ce illustrate how this was thought through. The main challenge of this feature was posed understanding the roles of the various arrays which constructed the different lines on canvas; ending one line and beginning another in relation to the click of the mouse and making sure to redraw everything at every iteration of draw()
requires a lot of 'juggling' with storing and reading information. Going through this process led up to the optimizations in 8af1a3e, where the feature was subsequently removed and the structure of the code significantly simplified.
3.2 Development of ML classification - including data set creation (Processing), training, testing, issues, and fixes, trigonometry
Image classification through a ML algorithm was the next major feature we started to implement early on. This was paramount to testing the efficacy of the code related to creating a training data set, the interaction with the drawing feature, and experimenting with our options in using the results. Here our knowledge from the seminar session and further consulting of Daniel Shiffman's videos on ML were a significant part of successfully implementing ML into our project. The necessary code for this was soon in place but training it became the challenging focus of our attention. We had drawn up a long list of possible shapes for the sandbox, most of which would never be implemented. Thinking of easily recognizable and distinct shapes we landed on circles, squares, and triangles as the initial three. A data set for training the ML algorithm could not contain hundreds of identical triangles though, so we took to randomizing the shapes parameters to automatically create varied shapes while still falling into the clear categories. This functionality in particular was at once critically important and surprisingly difficult. Apart from the refresher in trigonometry, we needed it also required a solid understanding of JavaScript to run smoothly in full automation. Our goal was the creation and saving of a complete ordered data set for training.
At the very beginning, we tried generating the data set directly in p5.js (b31f02d) but this was replaced with a Processing script because browsers could not download the images as quickly as they were being produced. One could have avoided this problem by either a) not saving the images as files but loading them straight into the NN or b) reducing the speed at which they were being generated. However, both of these would have turned out more disadvantageous, as a) could raise memory problems depending on the browser, machine, and image properties. Generating data sets entirely in memory would also be far more volatile, that is, easier to lose and harder to reproduce. On the other hand, solution b) would simply slow down the entire process of training. At 30 FPS, 300 images are generated in 10 seconds, while at 5 FPS, the same amount would take a whole minute. At this scale, the difference is not drastic, but at larger scales, it would always make training at least six times slower, memory issues notwithstanding. Even if a viable solution for p5.js could have been found, separating this functionality into a Processing script helped structure the project better and was nevertheless a learning experience for the different kinds of 'flavors' that JavaScript comes in.
Earlier versions of the shape generators (abfe23e) were a 'fork' of Daniel Shiffman's shape classifier dataset generator. As our project changed scope and theme, the basic geometric shapes were no longer sufficient for accurate classification. Nevertheless, we did try 'recycling' the datasets and models that we already had (i.e. the ones trained on triangles, squares and, circles) and applying them to drawings of leaves. This turned out to be less accurate than random guessing. Possible causes for this were either that a) not all leaves look like geometric primitives or b) the recognition of primitives was inaccurate to begin with. We, therefore, reworked Shiffman's code, keeping the outer-most structure constant: After setup()
, iterate through draw()
"n"
times, where "n"
is a set number of frames; at every iteration of draw()
, a set of instructions produces a geometric shape which is drawn on the canvas and saved to file; the next iteration erases the previous drawing and begins a new one; when the condition frameCount === n
is met, exit.
As the project quickly grew in size and complexity we had to reconsider our organizational structure. Splitting larger files apart and renaming them to more accurately reflect their role and function contained was inevitable. These early stages saw multiple restructurings during the iterative process that eventually resulted in the file structure that remained somewhat consistent for the rest of development, barring the additional files and folders that were to be added to this fundamental structure.
The first "radical" change in the structure of the project was the separation of functionalities by category into different .js files. The first part to be split off was the dataset generator, as already mentioned, followed by the training functionality and the input controls. In the case of the training functionality, we separated it completely from sketch.js
, loading it into its own .html file. This is an optimization that stops the dataset from being loaded when we don't want to train.
Other structural problems are exemplified or detailed in:
This segment reviews the reworking of the initial concept outlined above into what would become the final project.
After a substantial amount of time working on this project asynchronously, we had made good progress but were still a long way from the program we envisioned. More details and necessary or interesting adjustments and additions kept piling up until we had to face that the scope had grown too big. Being able to meet in person for the first since the beginning of this project we reconvened on how to proceed. The work that had gone into it so far should not be in vain but we had to reorientate ourselves towards a more achievable goal that can make use of our existing code. Taking inspiration from the changing seasons we came up with what would be called the "October" idea. The intricate system of characteristics for the shapes to interact with each other was scrapped in favor of a more direct translation of the image classifications' result. Drawing would remain as the user input but instead of abstract shapes, they were to draw a leaf. A newly trained ML image classification algorithm would then recognize the drawing as a specific type of leaf. Depending on the type a tree was to be generated uniquely adorned by the user's leaves. This new concept would create more relatable pieces of algorithmic art than the previous concept while also limiting the scope of work to be done so a finished program can be produced. Additional features to this were discussed by a lesson learned made us focus on the core of this reimagined concept. Nonetheless, some of these ideas like randomized names for each tree and the ability to save the resulting image made it into the final version after all.
4.1 Salvaging the existing code - where did changes need to be made and what could be kept unaltered
It has already been mentioned above that the repurposing of the existing code was one of our priorities. To this end, we reviewed our work so far and found a substantial amount that could be salvaged. Especially the drawing feature needed to be kept and only underwent minor alterations to better fit the purpose of drawing leaves. The core of the ML implementations was kept as well but especially the creation of its training data set had to be reworked almost entirely. With the new end goal of this project the overall structure underwent a few changes too but did not depart from the established ordering logic.
At first, our set of categories of leaves included 6 different types: Entire leaves and leaves with 3, 4, 5, 6, and 7 lobes respectively. All of these had to be procedurally generated in order to allow for the creation of large datasets. Entire leaves were imitated with ovals of different lengths, widths, and rotations, as in e760ab3. Lobed leaves were less straight-forward, being constructed of 3-7 curves starting at 0*pi, and going around a circle in steps of 2pi/number of lobes. This required a basic understanding of trigonometry to find the x and y coordinates of each point around the circle and a mathematical analysis of parabolic functions in a Cartesian plane. In an attempt to increase classification accuracy, we ultimately reduced the categories of leaves to two, namely entire and lobed (of all kinds). Additionally, we narrowed the range of numbers of lobes, implemented lobed leaf rotation and randomized scaling and adjusted the configuration of entire leaves as to not produce any circles (i.e. to never have very similar widths and heights for the ovals). A great challenge of increasing classification accuracy was understanding which features of a drawing determine its categorization: All of the above measures were taken to reduce the possible overlap between the categories with the hopes of improving the discrimination capacities of the NN.
To start us off, we forked Daniel Shiffman's p5.js fractal tree. We integrated the function branch()
with tree()
, which would fetch the results from handleResult()
and pass them down to branch()
. tree()
manipulates the length and angle of the branches in relation to the result of the NN's classification, that is, based on category and confidence: Entire-leaved trees are "slimmer" (i.e. narrower angle of rotation), while lobe-leaved trees are "broader" (i.e. great angle of rotation). For both categories, the height of the tree is determined by the confidence of the classification. The leaves are "collected" from the upper-left-hand side of the canvas through a canvas.get()
method and cloned in pairs at the end of each branch. The image of the leaf is blended in the LIGHTEST
mode in order to remove its black background. The start of the leaf is "matched" with the branch only if the drawing starts from the upper-left corner of the designated area (this also allows floating leaves). Lastly, the trees are assigned a fictional botanical name composed of a genus and a species. Both types of trees have a list of potential genera which approximately correspond to reality, while there is only one list of species. Therefore, the possible options based on the NN's output are random entire-leaved genus + random species
or random lobe-leaved genus + random species
.
As evidenced by the development process detailed above it was unavoidable to run into limitations. The fundamental factors of time and prior knowledge always limit the scope of the end result. More interesting though is a review of the limits of the final product itself and potential points of further development. As the october-project stands today it allows the user to draw a leaf inside a designated box on the top left of the screen. The user-drawn leaf is then classified by the ML as either an entire or a lobed leaf. A semi-random tree is then generated, the leaves placed on it, and a botanical name is given to the tree. Both the tree shape and the name are in part dependant on the image classification result. Finally, the unique tree can be saved or a new one created by starting the process over.
The drawing feature is where the limits become apparent first as the user is instructed to start the drawing in the upper-left-hand corner as well as finish it in a single stroke. The program does not crop the image to only use the leaf shape itself but instead takes the top left corner as the connecting point with the tree. Once the user lifts the left mouse button the tree image is immedeatly classified and all following steps executed to create the end result. A possible change could be made here to allow the user as many strokes as they wish and only trigger the rest of the code once a designed button has been pressed. This change would also open up the possibility for a more varied drawing feature with different stroke widths for example.
The next step has the strictest limitations and greatest potential for improvement. Image classification only recognizes two distinct shape leaves and could be expanded to include more categories as had been conceptualized at first. Critically, the classification accuracy needs to be drastically improved. Despite various attempts at creating effective training data sets the ML algorithm always skewed towards either recognizing shapes as lobed leaves or entire leaves. Further development of an effective training data set for ML in order to accurately recognize different leaf shapes is the most important task.
Beyond this, there are other potential improvements like an alternative way of generating tress in less rigid shapes. This has been foregone as the resulting inconsistency would have caused a myriad of new issues to fix. The tree naming could also be expanded even further for greater variety or more accurately reflect the type of leaf that is associated with certain types of trees. As it is implemented it already provides great variety, takes leaf type into consideration, and delivers realistic names. A final limitation only revealed itself near the end of development as the elements shown on the screen do not scale with different window settings. This might cause texts to overlap but could be fixed by implementing scaling of all screen elements.
The october-project in its current implementation contains all the critical parts we set out to develop for it and fulfills its purpose of applying ML in a creative project that generates appealing trees with an element of user interaction. Throughout the development process, we had to readjust our plans and make difficult decisions on what to include and what to leave out. Some decisions were made naturally to realize our idea, others were the result of problems and limitations we ran into at the attempts to include certain features. While further development would allow for a more polished result the only critical issue that remains unresolved is the accuracy of the image classification. Nonetheless, a number of creative solutions were tried out and resulted in two trained models that represent a preference for the two leaf types. All the critical elements of our idea found expression in the current state of the october-project. User interaction is realized in the drawing of the leaves, ML has been implemented as image classification and serves to inform the further steps, and appealing unique trees are generated as a result.