2011S

On Symfony Php Python Openoffice Excel Business Intelligence and Simple Reporting Tools With a Bit of Pdf Too

Wow! That is one long title! But this is actually all the things that I’m gonna try to put together in this article. On a technical point of view, it is a follow up to my previous article on simple excel exports from within Symfony.

The other part is about simple reporting system to be used in PHP and how to do it. The article is quite long, if you don’t give a $#@&%! about my opinions on Business Intelligence and reporting tools, you can skip to the technical part which is all about creating exports in many different formats (as listed in the title) from a Symfony project.

BI and reporting

There are a lot of tools that do reporting and business analytic. There’s even a lot of them that are open source although usually the company’s creating those tools make it pretty hard for you to find where the hell you can find the community edition of their software. Most importantly, those software are complex, very complex, or it’s that I am stupid, very stupid. But first you’ll have to setup a data store which can be your current database or another one that would be optimized for reporting purposes. Optimizing a data store for reporting purposes usually means having a star shaped schema. Meaning there will never be more than one level of join. Depending on the complexity of the reporting you need, the complexity of your business application etc…. This might or might not be needed.

Usually that’s the reason why BI / Reporting tools (the limit is not always very clear to me) come in many pieces: ETL (Extract Transform Load) is a tool which will put your data in a format that is easy and efficient to use for later reporting. Report designers, which are pieces of software in which you design a report based on the post ETL data. And the actual reporting server that connects all those pieces together and output reports.

Doing a “real” BI / Reporting project usually costs time and money. Going it alone just based on a free community edition of a BI vendor is not something that I would recommend, at the very least, you should be trained by that vendor on how to conduct your project and all the technical skills.

That being said, your users still want some reports on their business and rightly so! They need to be aware of what is going on. Are sales going up or down? By how much? Josh Holmes who was talking at Symfony live recently about “The lost art of simplicity” said something about this too: when your users ask you “how many shoes did we sell in Germany last month?” you should not jump forward to “Wow, I need to setup a reporting system for them” but instead just write your SQL query and give them the number. But after a while, when they start coming everyday asking for different numbers, think about how to push things one step further.

This is what I know about my users: they don’t have any IT system (yet), they currently manage everything through excel sheets. They make their reports on excel sheets.

This is what Gartner says about Excel for BI:

Gartner advises BI advocates to give up on trying to wean business users off Excel, and instead accept that Excel is here to stay.

Indeed, I’d say that the users realize it’s a bit messy to manage all their business through excel sheets, because they need to have many of them which all need to be updated regularly by different people and it’s hard to keep track of that. But no one ever said that they wanted the reporting abilities they have through Excel to go somewhere else and they’re actually quite happy with those.

Unless their business has really big databases that excel cannot handle, I don’t see one single good reason to make them move away from Excel. And I think Excel 2007 is in the million of rows area so I really do have time before I tell them to go knock at the door of Business Objects.

Even better, they ALREADY KNOW how to use excel and make reports based on excel. I don’t need anyone from my team with technical knowledge to sit with them and define how to make their report. All I need is that they tell me clearly where they put raw data in their file and which parts are making calculations and formatting based on this raw data. Then if I can re-generate this excel file automatically only with changing the data part, it’s a win! They don’t need to wait months for me to either program their report by hand or someone with BI / Reporting tools knowledge to do it for them. Think about it, they do all their charts and pies and tables ONE TIME, probably it takes them a bit longer than usual in order to make that file clearly separate what is data and what is not, but then… They have an always up to date excel file at no cost (well you can give them that!).

Plus, they love to have it under excel format rather than PDF or web because they can make a quick fix very easily.

Now I hope I sold you on the idea, the rest of the article will explore how under Symfony, combining some Python and OpenOffice, we can make it happen.

Let’s get technical

If you read my previous article on how to create excel files in Symfony, I strongly advise you to forget about it! The ideas behind it are the good ones, but the chosen format does not actually work! At that time I thought I could use Excel XML, but you can’t easily put images in it, let alone store formulas, graphs etc… It still works for simple excel exports though.

The Central Format

After this article, you’ll be able to produce your reports in either ods (open office spreadsheets), xls (microsoft excel) or pdf and even web format. You will also realize that it’s easy to apply the same methods to MS Word or oooWriter documents. But we never want to have to deal with all those formats ourselves. The better option is to only have one format to deal with, and have some tools do the conversion from ods to pdf or anything else for us. The chosen central format is OpenOffice Spreadsheet: ods. The reason is that it is clean, easy to understand and implement and finally it’s been standardized by the ISO. One small piece of advice though: OpenOffice creates documents for which the XML is MUCH MORE readable and MUCH MORE simple than the one produced by MS Excel. And I mean this for the SAME format. Export an ods file that has been created under excel or directly under OpenOffice, the OpenOffice one is much cleaner.

What is an ods file?

An ods file is a zipped folder. Truly just a simple zipped folder. Go ahead with your zip tool and try to unzip an ods file, it just works (the same is true of .xlsx files). Inside it you will find a few files and folders. The most interesting file is “content.xml”, it is a simple to understand file. It’s even more simple if you used OpenOffice to generate it instead of MS Excel. Excel adds a lot of info and xml tags that are not mandatory but that you need to keep up to date when you modify the initial file. OpenOffice doesn’t use them but keeps them if the file was first generated otherwise. “content.xml” defines a table with rows and cells very much like you would define it in HTML except the tags have different names and contain different attributes.

What else is there? A folder called picture. You will put all the images here and then output an image tag inside your content.xml to have the image appear. Then each graph is going to have a folder of its own to define it. The functions appear in xml tags too. For all of these, the best way to see how it works is to save ods files containing what you want, open the generated xml and see what’s going on.

So XML! XML is good because it is VERY easy to generate through symfony. An xml template is something easy to create and mix with php (or twig or any templating engine) to output your own data. Then in ods format, all the sheets and all your info is contained in only one xml file which makes it really easier (compared to the xlsx format).

Case study

I’m gonna run through a whole case here, but an extremely simple one. Also I only generate one report here, there are for sure a lot of optimizations that can happen when you start having many different files. As I only have one file for example, here I won’t need to add pictures etc…

So my model is going to be very simple, I have students, I have their exam marks, and I want to have the average, and maybe try to make some kind of chart out of that. I only do this so that there is some code accessing the database to make it seem “real”.

This is the kind of report that I want in the end:

This is insanely basic for an excel report but think of how long it would take you to program the same report and show it on a webpage (yes including the graphic because everything else is, here again quite basic). Now replace the “average” function by a business one, that finds monthly evolution of the absorption of budget 2 and ROI of budget 3 etc… If the people you work for already have excel files to calculate that, use those (until they actually do need more)!

A simple model

And some data to fill our report (actually exactly the same as the one in the report). Since the file becomes a bit long and is not that interesting I won’t embed, but check it.

The action

As there is not much logic for now in our application, the action is straight forward: we get the student table and get every student then on to the view.

I also add one method to the Student class called getTest($month) that will get me back the test results for that student and the given month.

The view

Wow! This is the hot topic of the day! First we’ll create an OpenDocument Spreadsheet. This is the most easy and does not require third party software for conversion. If you can get your users to use OpenOffice or Excel 2007 (which can read OpenDocuments too) then you’re fine with just that. In my case, I need excel or pdf so this will add one step. So what are the steps?

  1. Have a route that includes the format
  2. Intercept the format definition and store it for later use, specify our formats mime-type (that will be sent in the response headers to tell the client what exactly it is that we are sending).
  3. Create the content.xml file with our data
  4. Catch symfony’s response before it sends it to the user
  5. Inject it in the “ods/zip” thus replacing any previous one that could have been here. Now let me insist on this here: if you’re going to have many users accessing that functionality more or less at the same time, you need to copy that base file with a unique name before using it otherwise your users might be overriding each others modifications, or you might even give access to some people to reports they should not be seeing. Isolate!
    • If you need a format different than ods, convert your file
  6. Send the response.

That’s the job of the day!

Routing

That’s easy!

Format

First thing to do is define the mime-type. Symfony let’s us do that easily in a config file called factories.yml (in the application config folder). We’re going to add our own formats here (ods, xls and pdf):

This is fairly easy too. Now comes something more tricky. After generating our “content.xml” file, we will need to catch it and insert it into a zip file that will be our actual response. At that time we will have to know what format is expected. For those two actions, we will use Symfony events and a custom class surprisingly named MyExcel.php or MyFormatter.php or whatever. To hook up our events we need to declare this in ProjectConfiguration.php like that:

Then the first event will trigger a method that will set a static variable on MyExcel for the execution of the script:

Yeah! this part is done, we now know our format for when the response will come back to us!

Content and data!

After you’ve done all I’m talking about here, this is the only part you’ll need to ever worry about for the next reports you’ll have to make like this. Piece of advice, as we create for pdf, excel and ods formats, you should also keep one reference for xml format. The reason for that is that more often than not you have the risk of mis-typing something therefore breaking your valid xml. Office and OpenOffice don’t forgive this like browser forgive bad HTML. But browsers that receive bad XML will tell you so and will even tell you where the issue is.

To create different view for different formats, Symfony has a handy mechanism where templates are named xxxSuccess..php but all our formats are going to use the SAME xml content. I first thought about using partials but partials follow the same format naming rule. For now all I have is an include. You define your xml in a template named “_myreport.php” and for reportSuccess.xls.php, reportSuccess.pdf.php and reportSuccess.ods.php, you just include it to avoid duplicating your template. If there is a way to tell symfony that the “pdf” format should actually be based on “xml”, and if you know about it, I’d be happy to hear word from you!

XML content looks like just any other template. You can see the full file, but the interesting part is here:

The few lines after that are worth a look too as they define the average function and the graph, you’ll also realize that extending the range of the “average” function dynamically based on how many students / months we have would not be very difficult. OK, done with the view, as you can see, this xml format is pretty easy to understand and if you want to push further, it’s an open standard, the whole specification can be found online.

Catch and transform

The content has been created, now is time for our second event listener to come in play, catch the response and make a pretty file out of it. For that we make use of the ZipArchive extension of PHP:

As an ods file is a zip file, we can directly use this. I stored one such file in my data directory. Be careful about read / write rights here, I had some problems at the beginning that I can’t reproduce anymore, and don’t really remember. addFromString and close return a boolean to tell if the operation was successful or not in case you have doubts.

If all you need is ods, you’re done! Congratulations.

Other formats

I have looked around for a long, very long time on good and simple ways to change from one format to another. The best job is done by open office. But open office is not that simple to hack into. Unless you’re also a java or python ninja, PHP does not have much convenience for communicating with open office.

But on the web, you’ll find about JODConverter and his little brother PyODConverter. The first one is a Java library for communicating with OpenOffice via webservices and therefore in a language independent way. Also as the name suggests, the main purpose here is to convert documents from one format to another (all formats supported by open office and all conversions supported by openoffice are available). I did not setup JODConverter but I am using the little brother.

PyODConverter is a Python script (remember there was Python in the title!) a very short one, that will give you the same possibilities as JODConverter but from the command line. For it to run properly, you need to call it with the python that came bundled in your OpenOffice installation or to download the python libs that make the bridge between OpenOffice and python. In order to use it, you need to have OpenOffice running as a service on your server (see their website for that). Then the command looks like this:

python DocumentConverter.py mydoc.ods mydoc.xls

This is exactly what we’re going to do through the exec function of php, so this is how the setContent method of MyExcel.php actually looks like:

You can now try it, export your file in many different formats, with graphics based on functions that your user can define in minutes only. One note: OpenOffice does not update all the graphics on opening the file. Excel does. Converting to pdf, though OpenOffice which is the base format has no updated graph, do contain the updated graph. Export to HTML is available too if needed but the quality of the exported graphs as pictures is awful.

Hope that some of you did read the article but considering the length… Hell anyway it was a good exercise for me!