Export an HTLM table to Excel, or export an Excel table to HTML.
<table>
<td rowspan="2" colspan="3">
<td style="background-color:red; text-align:center;">
<td class="skip">
<li>
,<ul>
and<ol>
<b>
and<strong>
<i>
,<em>
,<blockquote>
and<code>
<u>
<s>
and<strike>
<mark color="red">
<span style="color:red" class="text-big">
<img src="some_path">
<br>
We rely on Python's excellent XLSXWriter to generate the XLSX file.
#!/usr/bin/env python3
import xlsxwriter
with open('html2excel.py') as infile:
exec(infile.read())
wb = xlsxwriter.Workbook('test.xlsx')
ws = wb.add_worksheet()
p = HTML2Excel(wb, ws, default_format={
'font_name': 'Arial',
'font_size': 10,
'text_wrap': 1,
'valign': 'top',
'border': 1,
'border_color': '#0000ff',
})
with open('test.html') as input:
html = input.read()
p.feed(html)
wb.close()
If you text-wrap cells or if you merge cells, your images may be squeezed. In that case, you may use the following workaround:
# ... (same as above)
with open('test.html') as input:
html = input.read()
image_paths = p.feed(html)
wb.close()
# Now switch to a different library to add images with no squeeze
import openpyxl
from xlsxwriter.utility import xl_rowcol_to_cell
wb = openpyxl.load_workbook('test.xlsx')
ws = wb.active
for row, col, path in image_paths:
image = openpyxl.drawing.image.Image(path)
ws.add_image(image, xl_rowcol_to_cell(row, col))
wb.save('test.xlsx')
There is no support for rich strings in openpyxl, so we use rubyXL. They are both excellent libraries.
#!/usr/bin/env ruby
require 'rubyXL'
require 'rubyXL/convenience_methods'
require './excel2html'
wb = RubyXL::Parser.parse('some excel file.xlsx')
wb.worksheets.each { |ws|
worksheet_to_html(ws)
}
There is a little bit of a conendrum:
- openpyxl does not support rich text, so we use RubyXL; however
- rubyxl does not support images, so we also need to use openpyxl
Fortunately, pycall
comes to the rescue and allows us to use Python code inside Ruby.
#!/usr/bin/env ruby
require 'rubyXL'
require 'rubyXL/convenience_methods'
require 'pycall/import'
include PyCall::Import
pyimport :openpyxl
pyfrom 'openpyxl.drawing.spreadsheet_drawing', import: 'TwoCellAnchor'
require './excel2html'
wb = RubyXL::Parser.parse('test.xlsx')
wb2 = openpyxl.load_workbook('test.xlsx')
wb.worksheets.each_with_index { |ws, i|
# Index images by cell row/col for easier later retrieval
ws.images = wb2.worksheets[i]._images
puts ws.sheet_name
html = worksheet_to_html(ws)
File.open("test_#{ws.sheet_name}.html", 'wb') { |f|
f.write(html)
}
}
<table>
<thead>
<tr>
<th>C1</th><th>C2</th><th>C3</th>
</tr>
</thead>
<tbody>
<tr>
<td>Basic string</td>
<td><mark class="red"><u>A string</u></mark></td>
<td><u><mark class="red"><i>Another</i></mark></u> bizarre <u>string</u></td>
<td>A <mark class="red">third</mark> <mark class="blue">enourmous</mark> string</td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr><td>a</td><td>b</td><td>c</td><td>d</td><td>e</td></tr>
<tr><td>a</td><td rowspan="3" colspan="3">A <mark class="red">third</mark> <mark class="blue">enourmous</mark> string</td><td>e</td></tr>
<tr><td>a</td><td>e</td></tr>
<tr><td>a</td><td>e</td></tr>
<tr><td>a</td><td>b</td><td>c</td><td>d</td><td>e</td></tr>
</tbody>
</table>
<table class='table table-bordered table-hover table-striped'>
<tr>
<th ><p>Col1</p></td>
<th ><p>Col2</p></td>
<th ><p>Col3</p></td>
</tr>
<tr>
<td colspan='1' rowspan='4'><p>Merged</p></td>
<td colspan='1' rowspan='4'><p>1</p></td>
<td ><p>Some text</p></td>
</tr>
<tr>
<td ><p><b>3.2.1 </b><b><u>Section</u></b></p><p>number <mark color='#FF0000'>two</mark> <mark color='#00B0F0'>three</mark></p></td>
</tr>
<tr>
<td ><p>*nil*</p></td>
</tr>
<tr>
<td ><p>*nil*</p></td>
</tr>
<tr>
<td ><p>11</p></td>
<td ><p>13</p></td>
</tr>
</table>