Friday, April 4, 2014

C# : HTML to PDF conversion

This is relatively free day today and I thought to utilize it by writing a bit :)

Sometime back I needed to create pdf from my html page, to do that I ended up using ITextSharp library which is a very decent library for this purpose. It doesn't provides much support for complex css, but is still very good for simple pages as was in my case.

Issue came when my html was using special symbol for less than equal to operator(≤). ITextSharp simply ignored it, so i came up with below solution that I found on a SO answer.

Solution: Use StyleSheet with font Arial while parsing html. Below is my method to convert html to PDF, code for style sheet is highlighted.
I used Arial font, but you might need some other depending on characters you need to support, so do a bit testing.

private MemoryStream CreatePdfStream(string html)
            using (TextReader htmlReader = new StringReader(html))
                string fontPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "Arial.TTF");
                StyleSheet style = new StyleSheet();
                if (File.Exists(fontPath))
                    style.LoadTagStyle("body", "face", "Arial");
                    style.LoadTagStyle("body", "encoding", BaseFont.IDENTITY_H);  

                using (Document document = new Document())
                    MemoryStream pdfStream = new MemoryStream();
                    PdfWriter pdfWriter = PdfWriter.GetInstance(document, pdfStream);
                    pdfWriter.CloseStream = false;

                    List<IElement> elements = HTMLWorker.ParseToList(htmlReader, style);
                    elements.ForEach(e => document.Add(e));
                    pdfStream.Position = 0;
                    return pdfStream;


