Craft Development Diary: Daniel Mccarthy

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

CRAFT DEVELOPMENT DIARY

Daniel McCarthy

Introduction
This is a development diary of the Craft compiler also known previously by the
name Goblin.
The Craft compiler is designed to compile Craft code a language which I have
designed.
Unfortunately, I have only documented the project since 25th July 2016 so there is
months of work undocumented. Anyhow it is titled day 1 and onwards even though
it was not day 1 of the project it was day 1 of the diary.
Throughout this document you will see all the struggles and design choices I have
made to get the Craft compiler completed.

Information
Before continuing it is important that you understand the names used in this
diary, I will help explain the ones an experienced reader may have trouble with.
For anything else such as trouble understanding what a Lexer, Parser, virtual
method, or virtual pure method is. Please research on them to understand this
dairy.

Code generator
The code generator is the base class for all code generators, it parses the abstract
syntax tree from the root and invokes methods in its inheritor/child depending on
the branch it finds in the tree. For example, if it should find an assignment such as
x = 50 then it would invoke a method on its inheritor for handling assignments.

Goblin Bytecode generator


The Goblin bytecode generator extends the CodeGenerator class its methods get
invoked by the code generator depending on the branch the code generator has
found. Depending on the method that was invoked the goblin bytecode generator
would write data to the stream or do further iteration of the AST (abstract syntax
tree) until it finds what it is looking for.

Craft bytecode generator


The Goblin bytecode generator was renamed to Craft bytecode generator so
throughout this diary know that they are the same.

Day 1 - 25th July 2016


This is the first day of documenting the Craft compiler. Unfortunately, I left it a bit
late so I am actually writing this on the 26th July 2016 however I will write about
the 25th as if it is the 25th.
Today I was working more on implementing arrays, at this point arrays were
coming along well, I could only support 1 dimensional arrays although up to 3
dimensions can be parsed. I managed to make array access in expressions
possible, although not perfect it worked, I was in the process of fixing a bug where
array access would only work in expressions if it was on the left hand side, this was
because the A register was used and if something else was on the left hand side of
the expression, lets say a number it would occupy register A so the array access
on the right hand side of the expression would generate byte code that would have
overwritten the number at run time. I was in the process of fixing this or I did fix it
I dont remember and then all of a sudden my computer crashed. When I booted
back up I found my entire goblin bytecode generator NULLED. I searched my git
repository for the most recent commit I could find and unfortunately the commit I
found was on the 7th of July. The code that was present at that time would be
completely obsolete a lot of development has been done since then so I chose to
write the goblin code generator again. I rewrote a basic template for the goblin
bytecode generator and I worked a little bit on the code generator improving the
design.

Day 2 - 26th July 2016


Today early hours in the morning I was focusing on improving the design of the
code generator, as well as implementing more code on the goblin bytecode
generator since the file was NULLED yesterday due to a system crash.
The work early hours in the morning today was continued from yesterday I
continued into the 26th I usually do work late.
I decided that I should modify the code generator to use a Scope class instead of
all the functionality defined in the code generator. This made sense as it is certainly
possible to have child scopes which would have caused problems in my current
design. So I created a Scope class and this class contains variables as well as
methods for working with these variables such as creating new variables or getting
variables by name. A lot of the related content was moved from the code generator
into the Scope class and modified. The Scope class has the ability to have one
parent and one child although this will be changed later to allow for multiple
children.
I also removed a lot of obsolete methods from the code generator, the old design
approach was to have a virtual pure method for every possible scenario, for
example I would have a method for handling a number, and let the code generator
invoke a virtual pure method every time a number was found. The child class
would then be able to write the machine/byte code to the stream.
This design was not appropriate and my new design approach is to have a method
for only more structured elements such as functions, and assignments.
At the moment only the assignment method is implemented and was implemented
a few days ago, the function method does not exist yet, in its place is the
scope_start method. This however also needs to be removed as every scope can
act differently. This will be replaced with a method such as function_start and
this would be for function declarations including the function scope.
I also fixed the cleanBranch method in the Parser class. Originally it was not
working or was not working as expected. I made it clean the tree of branches who
only have one child starting at the branch passed as the first argument. This meant
that branches with only one child are replaced with its only child, leading to a
cleaner tree with information that is no longer needed removed.
I thought about the parser rules today and how I perhaps should change them, for example my
expressions will have three branches the first one being the left operand, the second being the
operator and the third being the right operand, this could be changed allowing the parent branch to
be the operator and then its two children being the left operand and the right operand. This is how
most designs look when I search for Abstract syntax tree on google images. There will be no
definite change yet as I need to outline the pros of this change and cons if there is any.

Day 3 - 28th July 2016


Today I made the Exception class extend one of C++s standard exceptions
logic_error. My reason for this is that should an exception be thrown and not
caught I or others will be able to see the output in the console window.
I also fixed the design problems caused by the rule changes and the parser update
where I fixed the cleanBranch method. I also made the ability to have multiple
children in the Scope class which is required because scopes can have multiple
scopes inside of them.
I have considered changing certain parser rules to output different types of
branches than it does at the moment. Google image searches have given me many
results with expressions that look like this:

Fig 1.
My system does things a little differently and I need to consider whether or not the
above image would be the best way to go about things. My tree is workable but I do
believe the above image represents a better structure, the parser rules will need to
be thought through carefully and not just the ones for expressions. I partially think
that because I am not doing a similar structure as shown in Fig 1. May be partly
the reason certain expressions cause parsing errors but I cannot be sure as of yet I
am still in a bit of a grey area with this situation.
Another bug was discovered in function calls, a function call cannot have
arguments that are also function calls, this is a parsing error.
In other news I attempted to update the parser to do a look ahead first before
choosing a rule as the fact it does not do a look ahead has already caused me
countless problems and requires rules to be placed in a pacific position to work
correctly. Although today I have failed to implement the look ahead feature as the
previous parser design would loop through all the rules and then apply it to every
branch in the root, so in the first phase for example 3 + 3 + a would become E +
E + E and then as it goes to other rules would be broken down further. A look
5

ahead would not be possible with this design. I changed the design to apply the
rules on a left to right basis without applying it to every branch at once, for
example 3 + 3 + a would become E + 3 + a now because there is no rule for E +
3 + a only E + E the parser will now fail to generate an abstract syntax tree, more
thinking will need to be done but I plan to make the parser better before continuing
the project as if I do not it may cause me even worse problems later on.
One possible solution that I would rather avoid because it might come across as a
bad design perhaps, is the ability to state that a particular rule can be any of the
following branches. E.g the E branch can be the identifier token or the
number token.

Day 4 - 29th July 2016


Today I made the parser a look ahead parser which will help prevent rule
conflictions, for example two rules may exist, E:operator:E and
E:operator:E:operator, The second rule is a rule that you probably would never
implement but I will use it as an example as its the perfect scenario. Now take the
expression E+E+ you would expect this expression to match the second rule but
in the first parser this did not happen it would pick the first rule that it matched to
so it would be E:operator:E. Now that the parser is changed it will now look ahead
and find the most appropriate rule so it would pick E:operator:E:operator which
is the correct rule.
Upon implementing the new Parser, I removed some code that should be their this
caused an issue and prevented branches from being excluded from the tree, this
was a simple bug to fix and I fixed it quickly.
Since the parser is now more efficient I no longer need the hash tag to represent
functions and function calls I will be considering soon weather to remove them or
keep them the same just to make the language a little different.
In other news I am considering rewriting the parser rules all together to try and do
a better job, or at least rewrite some of them, also another problem was found and
that is with the cleaning system for the parser. This system does exactly what it is
supposed to do but in certain situations this would cause problems.
For example, the fourth branch in the FUNC branch holds a bunch of statement
branches, now because of this cleaning system if only one statement in the
function body exists then the cleaning system will replace that STMT branch with
its child branch, I do not like this at all as obviously the function body should have
some sort of root branch for its self. Anyway more thought will need to be given
before any change is made.
If I do make this change, then I would set it up a bit similar to the way you exclude
branches in the parser rules. In the parser rules you exclude branches from the
tree by using the quote character: in the current rule.

You might also like