Doctrine 1 – Three ways to get a record’s attributes

Working with Doctrine 1 can seriously wreck your nerves if your objects form a complex graph and come in gargantuesque quantities. I recently had to fight my way through the meanders of D1 to bring performance back to acceptable levels. After spending about an hour trying to optimize a rather simple routine working with about 10 000 objects, I realised, thanks to xDebug’s profiler, that one of the bottlenecks was the getter magic method on my Doctrine records. This appeared strange to me so I investigated further.

One of my first stop was the Doctrine_Record object. Click here [github] to see the source: the culprit is around line 1336 as of 1ccee085f49ab0be17ee. Well, a lot of shady, mostly unnecessary (for my use case) stuff happens in the background, each time a property is accessed. Multiply this by a few thousands of accesses and you get a mess.

Screaming lady

Screaming lady, just like me

Analyzing the code and the call graph of the get() function, I came to the conclusion that my first performance killer was the unexpected lazy loading. I tried as hard as I could to explicitly join all the dependencies in my initial DQL query but my objects had many, many associations, meaning I always ended up losing track, forgetting a handful. Since the lazy loading is, by definition, very stealthy, it was hard for me to track all the occurrences. All I knew was that each user query generated about one hundred database queries. My redemption came when I discovered that explicitly using $record->get(‘association’, false) instead of $record->association makes Doctrine squeal if the data is missing since you’re trying to access a property that was not fetched and that you explicitly prohibited to lazy load (hence the false parameter). Changing all the occurences of direct accesses and the getters on my records to use the get method explicitly took time but was definitely worth it since this gave me absolute control over what was loaded and when. I could now go and fix the missing explicit joins.

I still was not satisfied (my hair actually looks like that, sometimes)

Although the gain in speed of execution was humongous, I still was not satisfied. Whilst researching this issue, it appeared evident that the get() method was doing way too much, at least, compared to what I expected/needed it to do. The enormous amount of overhead caused by its usage caused it to become one of the bottlenecks of my code. I figured out that most of it is avoidable when dealing with properties that you do not want lazy-loaded anyway; one simply has to use the rawGet() method instead. NULL will be returned if the property has not been fetched; no questions asked, no relationships verified, no accessors/mutators poking and, most importantly, no lazy loading. As rawGet() cannot be used to access related objects, its usage is limited to properties. Scrolling through my code for hours replacing implicit gets with explicit rawGets eventually paid off; the speed increase was about ~20%.

Oh well,  I just spared you some trouble didn’t I ? Here is a succinct summary of methods available to access properties and associations:

  • $record->property (through PHP’s magic __get method), $record->get(‘property’): will lazy load, works for properties and relations whatever the state of the object is.
  • $record->get(‘property’, false): will not lazy load, works for properties and relations. If your object’s data was only partially fetched and/or your related records were not explicitly joined, this will fail.
  • $record->rawGet(‘property’): works only for properties. Will not lazy load.

This highlights some of the weak points of the Active Record pattern. First, if your domain is relatively complex, chances are you’ll end up crying in foetal position under your desk, contemplating the big ball of mud you’ve just created. I eventually ended up with DQL queries joining 20 different records and had to further abstract the data fetching in a repository to keep my code sane; so much for simplicity. Second, the overhead can sometimes be unbearable. Just try to var_dump a Doctrine_Record derivative and you’ll see what I mean. Pages and pages of intertwined classes interacting with your record directly in ways you can’t imagine. Again, this ends up making your life miserable if you have a relatively complex domain. As Fowler says so well in P of EAA, “Active Record is a good choice for domain logic that isn’t too complex, such as creates, reads, updates and deletes [CRUD]”.

Hence my suggestion, if you have a complex domain and still want to use PHP: try Doctrine 2 and you’ll never come back to Active Record. It implements the Data Mapper, Repository and Entity Manager patterns and lets you manage your object whilst it concentrates on doing what it does the best: persistence.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s