Anti-patterns in code generation – Part I

So it finally hits Ted, The Enterprise Developer: all his enterprise applications consisted of the same architectural style applied ad nauseum to each of the entities they dealt with. And Ted asks himself: “why am I wasting so much time of my life doing the same stuff again and again, for each new application, module or entity in the system? The implementation is always the same, only the data model and business rules change from entity to entity!”

The Epiphany

So Ted figures: “just like I write code to test my code, I will write code to write my code!”.

Ted decides that, for his next project, he will take the approach of code generation. Ted is going to model all domain entities as UML classes, and have the code generator produce not only the Java (or C#, or whatever) classes, properties, relationships and methods, but all the boilerplate that goes along with it (constructors, getters, setters, lazy initialization, etc). “This is going to be awesome.”

The Compromise

One of the first things Ted realizes is that since his UML models are pretty dumb and contain no behavior (“UML models can have no behavior, right?”), there is no way to fully generate the code. Bummer.

“Wait a minute, that is not totally true.” Ted’s models contain operation names, parameter lists and return types, so Ted can at least generate empty methods (stubs), complete with Javadoc with the operation description. “This *is* awesome!”

Ted still has all these empty methods that need to be filled in for the application to be fully functional. So he starts filling them in with handwritten code.

Reality Kicks In

Things are looking great. Ted is already filling in the stubbed methods for the tenth entity in the system. But then he realizes there is a problem in the generated code. It would be an easy fix in his generator, and rerunning it will fix the problem everywhere (isn’t that beautiful?). However, Ted would end up losing all changes he had made so far. Argh.

Any way out?

Ted thinks: “shoot, this was going so well, look at how much code I produced in so little time. There must be a solution for this.”

He almost feels like backing up his current code somewhere, regenerating the code (losing his changes) with the new generator, and then adding his handwritten code back (“Just this once!”)”. But he knows better. At some point he will need to regenerate the code again (and then again, and again…), and his team won’t buy the approach if it is that complicated to fix problems or to react to changes. It will look pretty bad.

He opens a new browser tab, and starts thinking about the best search terms he should use to search for a solution to this problem…


In the next episode, Ted, The Enterprise Developer, continues his saga in search for a fix to his (currently) broken approach to code generation. If you have any ideas of what he should try next, let me know in the comments.


Email this to someoneShare on FacebookShare on LinkedInShare on Google+Tweet about this on Twitter

21 thoughts on “Anti-patterns in code generation – Part I

  1. Zef

    April 5, 2011 at 1:21am

    Wonder in what episode Ted decides to step away from UML altogether ;)

  2. rafael.chaves

    April 5, 2011 at 1:39am

    @Zef

    Haha, that was fast!

    I am using UML to appeal to a more mainstream audience, but the intention is that the fundamentals should apply to any metamodel.

  3. Aurelien Pupier

    April 5, 2011 at 1:51am

    he has a lot of solutions :)

    - jump to EMF
    - subclass generated code
    - modify templates for generation
    - use aspects on the templates for generation
    - use aspects on the generated code (AspectJ, ObjectTeams)
    - use anootations to tell to keep soem part of code and so write a reconcilier (oh no I don’t like this method)

    And sure they are another.
    Let’s see what Ted will use :)

  4. rafael.chaves

    April 5, 2011 at 2:53am

    Thanks, Mikaël, I will make sure to cover that approach. It certainly solves Ted’s current issues.

  5. Rasmus Toftdahl Olesen

    April 5, 2011 at 6:24am

    Like Aurelien I also lean towards sub-classing as a way to deal with this.

    In C# there is also the possibility of using partial classes to seperate the generated code from the user-implemented code.

  6. rafael.chaves

    April 5, 2011 at 8:08am

    Being a Java-head, I was completely ignoring partial classes. Thanks for bringing that up, Rasmus.

  7. Rui Curado

    April 5, 2011 at 10:33am

    The ABSE (http://www.abse.info) modeling approach treats user code as first-class citizens: “user code” is a type, just like string or boolean.

    So, using ABSE Ted has no problems with his hand-written code.

    Regarding regeneration, any true model-based generator can re-read the model and generate again, so that aspect is a non-issue IMHO.

  8. Aurelien Pupier

    April 5, 2011 at 1:41pm

    @Rasmus : I was also completely ignoring partial classes. sounds cool. But might requires a nice IDE to merge the files in order to be clearer, is it available?

    In java, for subclass, from my POV the best approach is to have several source folders, one for the generated code, one for custom code.

    I better know about Eclipse EMF/GMF. The way that we are using in my team is to have a plugin for the generated code and another one for the custom code. Why? because plugin.xml and MANIFEST.MF are also generated and we can’t have twice of them per bundle.

    @Rui I just tried quickly AtomWeaver so I surely missed a lot of things.
    “no problems with his hand-written code.”
    The hand-written code is set in your own editor. You can’t use your favorite IDE easily.
    “true model-based generator can re-read the model and generate again”
    It implies that you respect the convention in the generated code, the marker to delimit generated from non-generated code.
    Respecting markers and conventions, a lot of generator have reconcilier to do that I think (GMF with @generated NOT, Acceleo with User code blocks,…)
    And for regeneration, if someone modify the custom part of code in the generated code, are you taking it or keeping the custom code stored in the model (that might have been modified also)?

    @Raphael
    Yes, let’s see the walkthrough of Ted and to which solution he will go.
    :)

  9. Jan

    April 6, 2011 at 2:39am

    He could also have a look at the Domainmodel example shipped with Xtext 2.0: Operation bodies can be specified in the model itself referring to the model’s abstractions or other Java concepts using Xbase expressions. As everything is compiled to Java, there is no more generation gap. And he’s free to choose his own concepts and syntax. http://www.eclipsecon.org/2011/sessions/?page=sessions&id=2053 (slides coming soon)

  10. Filipe Correia

    April 6, 2011 at 4:44am

    I wonder if Ted will consider interpreting the model at runtime, rather than generate code :)

  11. Gonçalo Borrêga

    April 6, 2011 at 3:33pm

    I guess what Ted really wants is a way to keep modeling and coding with a visual language (facilitating knowledge transfer in his team), supported by c# or java code extensions that are maintained within the model, with the proper source control and visual merge of the models… I definitely think he will end up in OutSystems site trying out the Agile Platform in the cloud. Good luck Ted! :-)

  12. Vincent Hanniet

    April 6, 2011 at 11:14pm

    Best Ted’s next meta-move is to hire Aurélien!

    Any good and scalable code generation tool will for sure keep trace of human added code from a generation to another (my favorite is http://www.mia-software.com/en/products/mia-studio/ ;D).

    Ted will certainly soon discover one of the 10 productive MDD best practices: don’t model/generate things that are much simple to code/maintain! (keeping generation for his DRY goal)

    But Ted’s very next move may be to subclass NON generated code…

  13. rafael.chaves

    April 7, 2011 at 1:13am

    @Moritz: Yup, generation gap will sure be on his path.

    @Jan: Pointer was appreciated.

    @Filipe: I think he should at least be aware of that option.

    @Gonçalo: If you had included a link at least I could claim a referral fee…

  14. rafael.chaves

    April 7, 2011 at 11:56pm

    Thanks, Vincent, missed your comment in the approval queue.

    Totally agree. Don’t model what you should be coding. But the converse is also true.

    Hope that Ted, by the end of his journey, will have a reasonable understanding of what is best to model and what is best to code.

  15. Rui Curado

    April 8, 2011 at 2:21am

    @Aurelien Because the generator (AtomWeaver) knows which lines are custom (it keeps a map for each file) changes made to the custom lines can be read back.

    So Ted can make changes to custom code blocks without re-generating the entire model.

  16. Aaron Digulla

    April 8, 2011 at 5:59am

    What he needs in his code generator is a merge tool that works two ways. He needs to be able to say “take the generated class and make it extend class X” or he needs to be able to say “extend generated class Y as Y2 and make sure everyone now uses ‘new Y2′ where we had ‘new Y’”

    But that leads to a new problem: For many test cases, you’ll need to specify different type hierarchies. So you’ll end up with several similar copies of all generated classes.

    The problem is that many OO languages don’t perform well when you start to mess with the inheritance. Languages like Python allow you to modify the behavior of an instance after is has been created (by adding or removing methods and fields from it).

    In Java, you need workarounds. It’s a limitation of the language, really (or rather of the slow CPUs of the past). Today, it might be possible to rebuild type inheritance using proxies and interfaces and large proxy caches at runtime. The idea here is that you wrap an instance with a proxy at runtime (for example with byte code instrumentation) and then save it in a cache, so when you need the same proxy type again, you can ask for the existing proxy instance.

    Confused? Say I need to add the method foo() to an existing instance. The solution is to create an interface with foo() and then a wrapper which wraps an existing instance with an implementation of the interface. To be able to have state in foo(), we need to cache the wrapper.

    I’m not really a fan of tools that “know” which lines a user has changed because that can break. And if it breaks, you’re in deep trouble. Something explicit is much better. It also allows you to undo changes – with a tool, you not only need to restore the original code but you also need to tell the tool to forget about the change you just made. And you better not make a mistake or ugly things will happen.

    In the end, it all boils down to being unable to build software from smaller blocks than classes. If we could say “take this instance and replace the method foo() with that one”, such things wouldn’t be an issue at all.

  17. [...] recently stumbled upon abstratt’s blog post on Anti-patterns in code generation and could not resist the call for comments. My response got a little bit longer so I am posting it [...]

Comments are closed.