How to Save HTML as PDF in Python

Saving HTML content as a PDF can be incredibly useful for a variety of applications, from generating reports to archiving web pages.
In this tutorial, we’ll explore how to accomplish this task using Python. Whether you’re a beginner or an experienced developer, you’ll find that the methods we discuss are straightforward and effective. We’ll cover different libraries that can help you convert HTML to PDF seamlessly. By the end of this article, you’ll not only understand how to save HTML as a PDF but also be equipped with practical code examples to implement in your projects. Let’s dive in!
Method 1: Using WeasyPrint
WeasyPrint is a powerful library that allows you to convert HTML and CSS documents into PDF files. It’s particularly favored for its ability to render complex layouts accurately. To get started, you first need to install the WeasyPrint library. You can do this via pip:
bashCopypip install WeasyPrint
Once you have WeasyPrint installed, you can use the following code snippet to convert an HTML file to PDF:
pythonCopyfrom weasyprint import HTML
HTML('yourfile.html').write_pdf('output.pdf')
In this example, replace ‘yourfile.html’ with the path to your HTML file, and ‘output.pdf’ will be the name of the generated PDF file.
Output:
textCopyPDF file created successfully as output.pdf
This simple code does the trick. WeasyPrint takes the HTML file and processes it, rendering the content into a beautifully formatted PDF. One of the standout features of WeasyPrint is its support for CSS, allowing you to maintain the styling of your HTML content in the PDF. This makes it an excellent choice for generating reports, invoices, or any other documents where layout matters.
Method 2: Using pdfkit
Another popular choice for converting HTML to PDF in Python is pdfkit, which is a wrapper for the wkhtmltopdf command line tool. This method is particularly useful if you are looking for a solution that handles web pages as well as local HTML files. First, you need to install pdfkit and wkhtmltopdf. You can install pdfkit via pip:
bashCopypip install pdfkit
Next, make sure you have wkhtmltopdf installed on your system. You can download it from the official website and follow the installation instructions for your operating system. Once you have everything set up, you can use the following code:
pythonCopyimport pdfkit
pdfkit.from_file('yourfile.html', 'output.pdf')
Just like before, replace ‘yourfile.html’ with the path to your HTML file. The output will be saved as ‘output.pdf’.
Output:
textCopyPDF file created successfully as output.pdf
The pdfkit library is very versatile and can handle complex HTML and CSS, similar to WeasyPrint. It also allows for additional options, such as setting page size, orientation, and margins, making it a flexible choice for various PDF generation needs. If you need to convert web pages directly, pdfkit can do that as well by using pdfkit.from_url('http://example.com', 'output.pdf')
.
Method 3: Using ReportLab
If you’re looking for more control over PDF generation, ReportLab is a fantastic library to consider. Unlike WeasyPrint and pdfkit, ReportLab allows you to create PDFs from scratch, which means you can customize every aspect of the PDF document. To get started, install ReportLab:
bashCopypip install reportlab
Here’s a simple example of how to create a PDF file using ReportLab:
pythonCopyfrom reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
c = canvas.Canvas("output.pdf", pagesize=letter)
c.drawString(100, 750, "Hello World")
c.save()
In this example, we create a simple PDF that contains the text “Hello World”. The canvas
class allows for precise placement of text and graphics on the page, giving you full control over the document layout.
Output:
textCopyPDF file created successfully as output.pdf
While ReportLab may not directly convert HTML to PDF, it offers a robust solution for generating PDFs programmatically. This can be particularly useful in scenarios where you need to create dynamic content or reports based on data. If you want to incorporate HTML content, you may need to parse the HTML and manually add elements using ReportLab’s API.
Conclusion
In this tutorial, we’ve explored three powerful methods for saving HTML as a PDF in Python: WeasyPrint, pdfkit, and ReportLab. Each method has its unique strengths, making them suitable for different use cases. Whether you need a quick conversion or want to create a highly customized PDF document, you now have the tools to achieve your goals. With the examples provided, you can easily implement these solutions in your projects. Happy coding!
FAQ
-
How do I install WeasyPrint?
You can install WeasyPrint using pip with the command pip install WeasyPrint. -
Can pdfkit handle web pages?
Yes, pdfkit can convert both local HTML files and web pages to PDF. -
What is ReportLab best used for?
ReportLab is best used for creating PDFs from scratch, allowing for detailed customization. -
Are there any alternatives to these libraries?
Yes, other alternatives include PyPDF2, xhtml2pdf, and FPDF, each with its own features. -
Do I need to know HTML to use these libraries?
While knowing HTML helps, you can still use these libraries with basic HTML knowledge.
Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.
LinkedIn