Ever wanted to modify some value or execute some statement while your C++ program is running just to test it out? Something that can’t be done through the debugger or isn’t trivial? Scripting languages have a REPL (read-eval-print-loop) - frontend web developers use the “javascript console” of the browser to test things which is available after pressing F12.
cling is an interactive C++ interpreter (developed at CERN) but since it is built on top of LLVM it isn’t easy at all to integrate in your application in an elegant way so everything is callable and works on any platform and toolchain.
RCRL is an interactive C++ compiler in a demo application with GUI which demonstrates the technique. Without further ado here is a showcasing video:
In the video some basic usage is shown and the scene is interacted with. The build system used is Ninja
for the fastest possible build times. Also the compilation happens in a background process - submitting code is not blocking - our program continues to run and loads the result when ready.
How to use it
There are 3 types of sections of code that the user may submit for compilation:
global
- code that should be compiled in global scope goes there (like class/function definitions, includes, etc.)once
- executable statements go in there - this is compiled in function scope and executed only oncevars
- definitions of variables that will be used later in other sections should go there
Sections are changed with single-line comments containing one of the 3 words global
/once
/vars
. For example:
// global
int foo() { return 42; }
// vars
int a = foo();
int& b = a;
// once
a++;
// global
#include <iostream>
void print() { std::cout << a << b << std::endl; }
// once
print(); // ======> will result in "4343" being printed
In the scripting language world REPLs automatically try to print the “value” of the last statement. In RCRL we need to do the printing ourselves.
It should be noted that in the above example the variables may have been written also in global
sections, but then if we submit more code for compilation as a separate step, those globals would be initialized again and there would be multiple instances of them - and that is what vars
sections are for - persistent globals which are initialized only once. Also variable definitions in once
sections aren’t visible outside of them since they are local variables in function scope. That will make more sense after the following section.
When we are done we can press the Cleanup
button which will:
- call the destructors of all globals defined in
vars
sections in reverse order - unload all the plugins (as explained later) in reverse order
- delete all the plugins from the filesystem
How it works
Here is what happens each time you submit code:
- A
.cpp
file is reconstructed with all previousglobal
andvars
sections (in the appropriate order) and then the newly submitted sections are also appended (includingonce
sections) in their submission order. - The
.cpp
file is compiled as a shared object (.dll
) and links against the executable (more on that later) - The plugin is then copied with a different name depending on which compilation this is (so we would end up with
plugin_5.dll
if we have previously submitted code for compilation 4 times) - The plugin is loaded by the host application and all globals defined in the
.cpp
file are initialized from top to bottom
The .cpp
file always includes a header called rcrl_for_plugin.h
located in <repo>/src/rcrl/
which has a few macros and forward declarations to make everything work:
RCRL_SYMBOL_IMPORT void*& rcrl_get_persistence(const char* var_name);
RCRL_SYMBOL_IMPORT void rcrl_add_deleter(void* address, void (*deleter)(void*));
// macros...
Here is how code from the different sections ends up in the source file:
global
sections just go straight to the.cpp
file being compiled-
once
sections go in a lambda that is called while globals are initialized:RCRL_ONCE_BEGIN a++; RCRL_ONCE_END
And after the preprocessor we get something like this:
int rcrl_anon_12 = []() { a++; return 0; }();
-
vars
sections - they are parsed so that for each variable definition the type/name/initializer are extracted. The source code forint a = 5;
is this:RCRL_VAR((int), (int), RCRL_EMPTY(), a, (5));
which expands to the following after the preprocessor:
int& a = *[]() { auto& address = rcrl_get_persistence("a"); if(address == nullptr) { address = (void*)new int(5); rcrl_add_deleter(address, [](void* ptr) { delete static_cast<int*>(ptr); }); } return static_cast<int*>(address); }();
- First we get a pointer (by ref) to the persistent variable with name
a
. - If that pointer is null (
a
is defined in a newly submittedvars
section) we allocate a new integer using the appropriate initializer and then add a deleter fora
by passing a lambda as a function pointer. - Then we return the pointer from the lambda and immediately dereference it to initialize a global reference with the name
a
.
That way we ensure that the global persistent variable will be initialized only once and its state will be preserved through the following compilations. In code below the
vars
section we can continue using the referencea
as if it is a global variablea
. - First we get a pointer (by ref) to the persistent variable with name
The parser for vars
sections is nothing special (it’s not a recursive descent parser and isn’t written like usual parsers) but is just a few hundred lines of code and is surprisingly adequate - works with very complex types with lots of templates, decltype()
, auto
, references and complex initializers. I could have used something proper like LibClang but decided to go with something custom and tiny instead.
Restrictions
There are 2 sources of limitations for RCRL:
- the parser for the
vars
sections is imperfect - the method itself (shared objects, dynamic allocation of variables, etc.)
Let’s see what C++ constructs/usages are unsupported. In vars
sections:
- nothing else should go here except for variable definitions
- [*]
alignas()
cannot be used - [*] C arrays can’t be used - use
std::array<>
instead - [*] multiple variables cannot be defined at once - like
int a, b;
- [*] don’t use
auto*
- useauto
directly and let it deduce pointer types - [*] raw string literals shouldn’t be used
- cannot assign lambdas to auto - should use
std::function<>
instead - no deleted operator
new
/delete
for types - they should be allocatable - temporaries cannot bind to const references (and have their lifetime extended) because pointers are used under the hood - will get a compiler error when trying to get the address of a temporary
- rvalue references as variables are disallowed by the parser itself - related to the const reference restriction
The [*]
entries can be removed by improving the parser or by using LibClang.
And here is a list of general restrictions to keep in mind:
- don’t rely on the address of functions - it will be different after each recompilation and reloading
- don’t pass pointers to functions or globals (and persistent globals from a
vars
section) to the host app without a way to remove them before doing a cleanup (or you would end up with dangling pointers) - don’t use the
static
keyword - it won’t work as expected for local variables in functions and it doesn’t make sense for functions and globals - don’t use
goto
inonce
sections… decltype()
of names fromvars
sections will return a reference to the type- constexpr variables should be in
global
sections and not invars
because it wouldn’t make any sense - preprocessor use is allowed only in the
once
andglobal
sections - and should be kept to a minimum - global non-constexpr variables in the
global
section will be initialized (and have their state reset) each time code is submitted and a new plugin is compiled and loaded (also there will be multiple instances of them and the initializing code will be executed many times so if it has side effects they will accumulate) - usevars
sections for proper persistence. - class static variables should go into
global
sections outside of the class definition, but that means they will always be initialized - currently there isn’t a way to make them persistent like globals from avars
section - C++14 is required by the RCRL engine itself only for support of
auto
variables invars
sections (otherwise C++11 is enough)
Perhaps there are other issues I haven’t found yet but all these seem minor.
How to integrate
The RCRL repository is mainly a demo (tested on Windows/Linux/MacOS and uses OpenGL 2 - to build it follow the instructions in the repository). The important parts are the RCRL “engine” itself which is located in <repo>/src/rcrl/
and it depends only on the tiny-process-library third party which is a submodule of the repo and is located in <repo>/src/rcrl/third_party/
. Everything else is for the demo project and the GUI. The plugin which is compiled by the engine also has a precompiled header (<repo>/src/precompiled_for_plugin.h
) for speed of compilation.
The purpose of the whole repository is not to take the sources from <repo>/src/rcrl/
as they are but to adapt them to your needs - the goal wasn’t to create a one-size-fits-all solution because that is hardly possible.
The RCRL integration in the demo project is by no means optimal:
- it invokes CMake instead of the compiler directly
- the plugin is part of the whole CMake setup instead of being separated - and slow build systems like
make
andMSBuild
take a lot of time to scan the dependencies (can be more than half a second) - unlikeNinja
Integrating the RCRL engine properly and optimally requires knowledge of build systems, compilers, static/dynamic libraries and more!
Currently there are a few preprocessor identifiers setup from CMake for the RCRL engine to use for convenience (used in <repo>/src/rcrl/rcrl.cpp
):
RCRL_PLUGIN_FILE
- full path to the.cpp
plugin source file for use by RCRLRCRL_PLUGIN_NAME
- the name of the plugin target in CMakeRCRL_BUILD_FOLDER
- the root build directory of the whole CMake projectRCRL_BIN_FOLDER
- the folder where the plugin will be after compilationRCRL_EXTENSION
- the platform-specific shared object extension (.dll
for Windows,.so
for Linux,.dylib
for MacOS)RCRL_CONFIG
- only for multi-config IDEs like Visual Studio and XCode - the identifier represents the current configuration (Debug, Release, etc.)
The plugin needs to link to the executable so it can interact with the host application through the API with exported symbols. It also needs to link for the 2 functions exported by the RCRL engine for vars
sections (rcrl_get_persistence()
and rcrl_add_deleter()
). In CMake executable targets cannot be linked against by default but this can be enabled by setting the ENABLE_EXPORTS
target property to ON
of the executable.
Actually the RCRL engine can reside also in some shared object and it is possible that the important parts of the host application API are implemented in other shared objects and not in the executable - in which case linking to it would be unnecessary (but still linking to the appropriate modules will be).
The entire “API” of RCRL is just a few functions in <repo>/src/rcrl/rcrl.h
which have a lot of comments for them and it is used in <repo>/src/main.cpp
.
std::string cleanup_plugins(bool redirect_stdout = false);
bool submit_code(std::string code, Mode default_mode, bool* used_default_mode = 0);
bool is_compiling();
bool try_get_exit_status_from_compile(int& exitcode);
std::string get_new_compiler_output();
std::string copy_and_load_new_plugin(bool redirect_stdout = false);
Some ideas about the integration of the RCRL engine:
- the code editor may be separate from the host application - perhaps vim or whatever you’d like (so you can also have auto completion and whatever) - it doesn’t have to be integrated into the host application
- the entire RCRL engine can be also outside of the host application - there should only be a way for the host application to be notified that a new plugin needs to be loaded (can be even done with a filesystem watcher)
- for performance: try to avoid linking the plugin to static libraries as build times will increase and global state in them might lead to problems
- for easy interaction with the host application most symbols should be exported - on Unix platforms all are exported by default from shared objects and linkable executables (unless built with
-fvisibility=hidden
) but on Windows the opposite is default - so unless everything is annotated with__declspec(dllexport)
properly this target property can be used in CMake to have everything exported by default (or hacked for not CMake)
Room for improvement
global
andvars
sections can be merged if the parser is improved to be able to handle not just variable definitions (or perhaps it should be ditched entirely in favor of LibClang - in which case we might even infer which pieces of the code are statements intended for function scope - to be executed only once - and ditch also theonce
sections)- auto complete - but this is a big topic and is hard to make a universal solution that would fit everyone’s needs
- crash handling - perhaps with structured exceptions under Windows when loading the plugin - in case anything happens while the code is executed
- compiler error messages - a mapping of lines between the plugin
.cpp
file and the submitted code can be made so errors can be highlighted in the original submitted source directly - debugging support - with breakpoints and etc. - no idea about this…
- build system integration - as mentioned in the previous section
Random further thoughts
-
This technique can be used for other compiled languages as well.
-
It would be really cool to see something like this integrated into some big project like the Unreal Engine :)
- Wouldn’t it be cool if applications had an optional module which enables interacting with them with C++? The module would be comprised of:
- a custom version of the RCRL engine
- a C++ compiler (perhaps the same version used for building them)
- application API headers with exported symbols
- application export lib so RCRL can link to it
There can be different code “snippets” for applications that modify them.
- I’m betting that an awesome C++ game engine that enables an incredibly fast and flexible workflow can be developed without the need for scripting if 3 things are in place:
- something like dynamix is used for the object model which would allow for very flexible and elegant implementation of the business logic - one of the 2 main reasons people turn to scripting languages
- hot-reloading of any module (including whole subsystems like rendering and physics) and the ability to change the layout of types (like adding a new field in a class) at runtime is possible - perhaps with the help of a practical generic reflection system - this is the second main reason people turn to scripting languages for parts of the business logic - reloadability
- something like a REPL for convenience (RCRL fills this gap)
What we lose by eliminating scripting is:
- the ability to update clients remotely without modifying executables
- game designers can no longer tinker in something easier than C++ - but in many studios this is the case anyway (or perhaps they can - with something like Unreal’s Blueprints - later compiled to C++)
What we gain from eliminating scripting is:
- the (imperfect) binding layer between C++ and the scripting is gone
- no need for a virtual machine
- optimal performance
- programmers work in only 1 language
- no code duplication (which is atleast in part inevitable otherwise)
Anyway - I’m eager to see what the C++ community thinks of this project/technique and what comes out of it!
Leave a Comment