Doctrine Sluggable and Transliteration

I’m sure absolutely everybody has this problem! Well … maybe not but some of us do! So standard good practice these days is to have your object accessible through a url friendly so-called “slug”. This means that this exact post here on this blog will be accessible at http://sf.khepin.com/some/date/doctrine-sluggable-and-transliteration instead of being at http://sf.khepin.com/some/date/67. There’s a few good reasons for that: here 67 would be the primary id of my post. For a post on a blog it’s not very risky to expose your internal logic that your ids are simply increments, but it can be a problem some times. It means people can guess what url things will appear at in the future for example. So if there’s an ipad to win by being the first one to put a comment on your next blog post. Since I already know its url because it’s gonna be last post’s ID + 1, I can create a bot to start posting as soon as this URL is available. Or if you have private user data and some of it is not so secure, having all of your user’s profile urls known as well as all their status updates or whatever is also pretty bad.
If you’re using MongoDB and use those primary ids, then all your urls become ugly instantly with really long ids.
And URL search friendliness can only help your SEO.
The Doctrine extensions provide a great way to create slugs and especially unique slugs automatically without ever worrying about anything. So you’re done here. EXCEPT! except if you have users using those weird non alphabetic based languages. Chinese, Japanese, Cyrillic, Thai, Tibetan, Arab, and most probably a myriad others. In this case you need to first transliterate your article title. I don’t know about other languages, but for Chinese this means that my article titled:
我喜欢吃面 (= I like to eat noodles) has to be “slugged” as wo-xihuan-chi-mian. I’d still be happy with wo-xi-huan-chi-mian though!
It gets worse when you have Chinese AND Japanese users at the same time because some characters are common but their transliteration is not the same.
There already is a transliterator included in Doctrine Extensions so you might be fine but for Chinese, it’s clearly not sufficient. Drupal has one that is more complete and handles some parts of the problem of having different transliteration of the same characters depending on the language. It seems wordpress also has a pretty good one.

The Solution

Doctrine extensions sluggable behavior allows you to define your own transliterator through $sluggable_listener->setTransliterator($callable). However if you’re using this through the Symfony2 bundle, all those services are private and instanciations happen behind the scenes.

The bundle configuration does not allow to set the callable in the configuration. However it does allow you to specify the classname for the Sluggable listener. So you can create your own listener that extends the normal one. Hook your own translator there in the constructor relax and enjoy. Here’s an example listener:

And the config that goes with it:

Now there’s one last thing you need to be careful about: currently the default transliterator does more than just transliteration, it also transforms the given string to a url friendly one. Your slug has already been build with all the fields required, but there are still spaces instead of dashes and capitalized characters etc… You can use the urlize method of the default transliterator in your own transliterator to get around this issue:

Now you’re good to use your custom transliterator!

 

4 thoughts on “Doctrine Sluggable and Transliteration

    • There might not be. However the sluggable behavior of the doctrine extensions uses a transliterator by default. So in order to keep the real characters, you’d still have to provide your own transliterator that would “do nothing”. It’s worth a shot and wouldn’t take very long to try.

  1. Pingback: A week of symfony #266 (30 January -> 5 February 2012) « We are php

  2. I think you should mention that StofDoctrineExtensionsBundle must be available to override SluggableListener class in config.yml.
    Anyway, I did this, implemented a custom SluggableListener and a custom transliterator letting non latin characters show on slug but it is always converted to latin chars :(
    My Transliterator is doing its job well, but SluggableListener::generateSlug make it latin!
    Have you any idea, advice?

Comments are closed.