Gemini API Developer Competition
So, I decided to compete in Google's AI developer competition. It's about writing an innovative application using the Gemini AI to perform some cool logic, submit a video clip explaining your innovation along with the source code and there's a chance of winning cool prizes. It's also exciting to test your innovation & coding skills.
I decided to write a kind of real-time, AI based game creator - the user will talk to the AI agent (Gemini in this case), explain how the scene should look like and behave and according to that chat a game will be created and evolve in real-time.
I already have built my own a game engine, SceneMax3D which uses the open source Java project JMonkey Engine for the scene rendering and other open source libraries such as Minie for the physics simulation etc.
SceneMax3D has its own dedicated programming language and IDE which makes the AI developer competition project so much simpler to implement since I have control on all the components used for creating a 3D game and all that's left is to integrate Gemini AI API for creating a game…
So what is the initial plan?
Plugins system - Google requires the application's source code and there's no way to submit the entire SceneMax3D project sources so I decided to create a plug-ins mechanism for SceneMax3D and submit the Gemini integration plug-ins source code.
Language grammar - SceneMax3D programming language should support using any of the installed plugins so a new plugins command was added to the language. In this case, I need to activate an in-process web server while the engine is running. This will allow direct communication from the outside world to the engine while the game is running so after Gemini AI will generate its product, it could be streamed through a REST API using this in-proc web server.
Speech to text - the end-user should be able to express his thoughts by talking to the agent. The user's speech will be converted to text, which will be used inside some clever prompt, which will be sent to Gemini AI to do his magic.
IDE plugin - in addition to writing a plugin for the run-time engine, we will need one more plugin for the IDE. Its role is to accept the Gemini product, convert it to some SceneMax3D code and send that code to the engine for execution. In rare cases we will ask Gemini to actually create the final SceneMax3D code. It's possible but the results might need manual corrections to make it fit to the game.
Export to Android - the generated game should be able to run on Android devices so all kind of design aspects should be taken in consideration while generating the game's code.
So this is the high level design of the SceneMax3D-Gemini AI integration system. I believe that this design will allow people with no coding skills at all to create some fun games.
Plug-ins system for SceneMax3D - the run-time engine as well as the IDE can now load pre-installed plug-ins
SceneMax3D language modifications - the programming language now supports a new "plugins" command which can start and stop any installed plugin. for example, we can open our game to accept commands from the outside world by writing the command: plugins.ws.start 8080
This means that an internal web server will start, listening on port 8080, accepting commands from the outside world via the /run REST API
In-process web server plugin for the run-time engine - allows starting an in-process web server while the game is running. For the web server support I chose using the NanoHttpd open source project. It's very simple, small and runs perfectly on any platform including mobile devices. I have very good experience using it in my Android off-road navigation application, Adv Rally where it is used for rendering an offline maps.
I tested this configuration and it works perfectly. I was able to run a scene, start the in-proc web server using the "plugins.ws.start" command and send commands to the scene from the outside world using postman client. The engine accepted the commands and executed them affecting the game's scene in real-time.
IDE plugin for converting configuration (which will be generated in the future by Gemini) to SceneMax3D code
IDE plugin for sending prompts to Gemini and accepting the results as a JSON configuration
Speech to text component - can be either a Java Swing form (implemented as an IDE plugin), an Android application or a Web application. IDE form and Web application can run locally and send their commands to the engine's in-proc server. Android app or a remote website will need an additional mechanism to push their commands to the engine. A good candidate can be socket.io client or a public SQS queue.