Insights on Software Development and Architecture

ErionPC's weblog on software development

Tag Archives: docx

A SOA approach to dynamic DOCX-PDF report generation – Part 2

Introduction

Having already achieved automatized MsOffice-independent Docx report generation in a client-server architecture following the approach explained in my previous article “A SOA approach to dynamic DOCX-PDF report generation – part 1”, now we’ll look into automatically printing those docx files into PDF from managed code and transmitting the PDF bytes through HTTP.

The PDF conversion is based on a free BullZip PDF product, which offers a free, full-featured, programmable and very well documented PDF printer that can print any file to PDF, including Docx files.

Needless to say that PDF is probably the most used document exchange format between different platforms, therefore the need to have PDF reports of some kind of data is common to most data-centric applications.

1. Installing the PDF Printer

The first thing to do is to download and install BullZipPdf. It will create a PDF printer in the system and it will include the help file in the installation directory. Read through the help file to learn how to use the Bullzip.PdfWriter namespace.

2. Adding the PDF Conversion to an Existing Visual Studio Solution

First of all, we need to import the package into the solution. As sweet as it can be, we can find the package in the GAC, so just go on Add Reference -> .NET and find BullZip Pdf Writer. This will add the Bullzip.PDFWriter assembly to the solution, which exposes its classes and methods under the Bullzip.PDFWriter namespace. The next thing to do is configuring the PDF printer. This can be achieved through a .ini file, but I’m not going to enter into this, you can read a lot about it in the Bullzip documentation. The printer settings are managed by a class called PdfSettings, whilst the PDF creation methods are in a class called PdfUtils. Everything is ready now, we can already start converting to PDF!

3. Converting to PDF

Here’s what the test application does:

  1. It includes some docx templates with sample data in a templates directory
  2. Generates customized docx reports based on the docx templates and some XML-serialized Business-Logic data whose structure corresponds to the custom XML parts in the docx templates
  3. Saves the docx reports into a temporary directory
  4. Prints the docx reports into PDF
  5. Sends the PDF bytes through HTTP
  6. Destroys the docx and PDF files

This PrintToPdf method loads the printer settings from an “.ini” file, it “reads” a docx file from a temporary directory, creates the PDF file and then destroys the original docx and PDF.

using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
using System.Diagnostics;
using System.ComponentModel;
using System.Configuration;
using System.ServiceModel;
using Bullzip.PdfWriter;

namespace DocxGenerator.SL.WCF
{
    public class PdfMaker
    {
        internal static byte[] PrintToPdf(string appFolder, string tempDocxFileName)
        {
            try
            {
                string tempFolder = appFolder + @"\temp";
                string tempDocxFilePath = tempFolder + @"\" + tempDocxFileName;

                PdfSettings pdfSettings = new PdfSettings();
                pdfSettings.PrinterName = ConfigurationManager.AppSettings["PdfPrinter"];

                string settingsFile = pdfSettings.GetSettingsFilePath(PdfSettingsFileType.Settings);
                pdfSettings.LoadSettings(appFolder + @"\App_Data\printerSettings.ini");
                pdfSettings.SetValue("Output", tempFolder + @"\<docname>.pdf");
                pdfSettings.WriteSettings(settingsFile);

                PdfUtil.PrintFile(tempDocxFilePath, pdfSettings.PrinterName);
                string tempPdfFilePath = tempFolder + @"\Microsoft Word - " + tempDocxFileName + ".pdf";

                bool fileCreated = false;
                while (!fileCreated)
                {
                    fileCreated = PdfUtil.WaitForFile(tempPdfFilePath, 1000);
                }

                byte[] pdfBytes = File.ReadAllBytes(tempPdfFilePath);

                File.Delete(tempDocxFilePath);
                File.Delete(tempPdfFilePath);

                return pdfBytes;
            }
            catch (Exception ex)
            {
                throw new FaultException("WCF ERROR!\r\n" + ex.Message);
            }
        }
    }

Points of Interest

The scope of this article is limited to a mere illustration of what can be achieved through this architecture. With a little bit of head-scratching, you can extend this and make it into a PDF conversion server (did anyone think of a free version Adobe Distiller ???), a scheduled batch printer, an archiving system, etc.
If integrated in the SOA report generation solution mentioned above this permits you to get rid of the docx files and use PDF as the document exchange format.

Have fun!

History

The previous (must-read to understand the SOA integration concepts) article that brought to this: “A SOA approach to dynamic DOCX-PDF report generation – part 1”

Click here to view this article on CodeProject.

Click here to download the test application’s source code.

A SOA approach to automatized DOCX-PDF report generation – part 1

Introduction

With the advent of Ms Office 2007 Open XML formats, the philosophy of Office report generation was deeply changed into making it dettached from Office itself and open to any kind of programming language which is capable of reading compressed archives and manipulating XML. For further reading visit

In this article I’m going to illustrate a SOA approach for generating Docx reports in a distributed environment with the necessity of having Ms Office 2007 installed only on the developer machine (not the production server). The application is composed by the following parts:

  1. An ASP.NET web application
  2. An IIS-hosted WCF service
  3. A business tier
  4. A data access tier
  5. A database

The scope of this article is limited to the top 2 tiers. By using the Open XML SDK (now 2.0) it’s possible to programmatically read and write inside Office Open XML packages – that means reading and writing Office files without using Office COM objects. This approach is very fast, easy, light on resources and STABLE. The WCF service in this application must be able to create Docx reports on the basis of an existing docx template and some db data serialized as XML. Docx files are constructed in a modular way. To be able to appreciate this, you can just rename a docx file changing it’s extension to .zip. To know more about how this archive is organized visit http://msdn.microsoft.com/en-us/library/bb266220%28office.12%29.aspx. The part that we’re interested in is called Custom XML (read http://msdn.microsoft.com/en-us/library/bb608618.aspx). The approach that’s best to follow for manipulating data within a Docx file is binding content controls to custom xml parts.

1. Generating a docx template document

The first thing to do is to build a docx document which defines the layout of the reports by using Word 2007 or above. In this document there are going to be static parts (text-blocks, images and so on) and dynamic parts, which are going to be dependent on the data. At first we build and format the docx file as we expect it to look with dynamic data on it. Then, when we’re happy enough with the way it looks, it’s time to add the content controls. On the Word ribbon we need to go to the Developer tab (if you don’t see it, click here to learn how to activate it). In this tab we can find a few content controls, such as rich text, plain text, image, etc. We now need to replace the fake data that we’ve put into the document with the appropriate content controls.

2. Creating Custom XML parts

Using Word 2007 we’re able to put Content Controls into a docx document, but we’re not able to bind those controls to custom data. In order to do this we need to use another tool called Word 2007 Content Control Toolkit. At this point our docx document doesn’t contain any custom xml parts. We can create these by using WCCT. Open the docx document inside WCCT. On the right panel click on “Create a new Custom XML part”. The custom XML part will be created and we’ll be able to see it from the “Bind view” tab. On the left part of the window we will be able to see references to the content controls that we’ve inserted in the file. Clicking on the “Edit view” of the right panel it’s possible to edit the xml. The xml structure that we need to create has to be valid and needs to correspond to the content controls in the page. For example

<documentData>
<title alias=”Title”>document title</title>
<body alias=”Body”>document body</body>
</documentData>


When we’ve finished creating the xml, it’s always good to get the xml syntax checked by WCCT clicking on the “Check Syntax” button. We’re now ready to go back to the “Bind View”. We will now be able to see the xml nodes we’ve just inserted in a tree-like structure and the fun part is about to begin. We’ll now bind the xml nodes to the content controls, and this is as easy as drag-and-drop. Select one of the nodes on the right panel and drag it on the reference to one of the content controls of the document. Repeat this operation for all of the xml nodes until all the content controls have been bound to data. When you’re done, save the file and click on the preview button to open the document using Word. Notice how the custom xml data has replaced the text inside the content controls.

3. Building the WCF service

The WCF service will replace the custom xml inside the docx template with business logic xml data. Using the Open XML SDK this is actually very easy. Here’s the replaceXML method

private void replaceCustomXML(string docxTemplate, string customXML)
{
try
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(docxTemplate, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;

mainPart.DeleteParts<CustomXmlPart>(mainPart.CustomXmlParts);

//Add a new customXML part and then add content
CustomXmlPart customXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);

//copy the XML into the new part…
using (StreamWriter ts = new StreamWriter(customXmlPart.GetStream()))
ts.Write(customXML);
}
}
catch (Exception ex)
{
throw new FaultException(“Errore WCF!\r\n” + ex.Message);
}
}

Once the docx document is created it will be sent to the client as an array of bytes.

public byte[] GenerateDynamicDocx(string customXML)
{
try
{
HttpServerUtility webServer = HttpContext.Current.Server;

// Copy template.docx in the temp folder to preserve the original copy
string tempFolder = webServer.MapPath(“temp”);
string tempDocxFileName = Guid.NewGuid() + “.docx”;
string tempDocxFilePath = tempFolder + @”\” + tempDocxFileName;
File.Copy(webServer.MapPath(@”App_Data/template.docx”), tempDocxFilePath);

replaceCustomXML(tempDocxFilePath, customXML);

byte[] docxContents = File.ReadAllBytes(tempDocxFilePath);

//Delete the temporary file
File.Delete(tempDocxFilePath);

return docxContents;
}
catch (Exception ex)
{
throw new FaultException(“Errore WCF!\r\n” + ex.Message);
}
}


4. Building the ASP.NET client

The ASP.NET client will have a template.xml file which replicates the structure of the custom XML part in the server’s docx template. Ideally, there would be a web page which automatically generates web controls for inputing data which mirrors the structure of the xml template file. After the data is inputed the web client must compose an xml document which follows the structure of the existing template.xml but replaces the data with those inputed by the user. The xml string is then sent to the WCF service wich returns the bytes of the docx file. These bytes can then either be saved as a docx file on the server or sent directly to the client through HTTP.

Click here to download source code.

Click here to view article on codeproject.com.