Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix and improve detection and import of ods, xlsx and csv documents #1446

Merged
merged 11 commits into from
Nov 27, 2024

Conversation

blizzz
Copy link
Member

@blizzz blizzz commented Nov 8, 2024

addresses #1440

This PR fixes a few corner cases with importing from different document types, especially about recognization of date formats (but not only). For instance, it is not possible again to re-import and exported CSV.

Integration tests are extended with column formats and do not only test csv import anymore, but also ODS, XLSX (from 365) and XLSX (from LibreOffice) – they have some subtle differences and failed differently along the way.


Historic notes

With the first commit, the added test fails as expected.

Screenshot_20241111_175746

The second commit brings in two tests for a similar table in xlsx files (by SharePoint/365 and by LibreOffice). The first tests succeeds with a little datetime adjustment – containing the time, although only the date was entered as value. Maybe it is possible to read out the format and adjust automatically. The second test against the LibreOffice-created file fails with import errors. This needs further investigation.

P.S.: Changed back to date-only type.

Screenshot_20241113_181325

The third commit adds the treatment of date-only datetime types imported from native xlsx.

The eights commit also fixes the rendering of the value in the preview dialog. For instance in this screenshot, the Date used to be shown as Excel timestamp.

Screenshot_20241115_173756

@blizzz blizzz added bug Something isn't working 2. developing Work in progress labels Nov 8, 2024
@blizzz blizzz requested a review from luka-nextcloud November 8, 2024 19:05
@blizzz blizzz force-pushed the fix/1440/import-types branch from d599204 to a67ba6e Compare November 14, 2024 12:55
@blizzz blizzz changed the title test(Integration): extends csv import test data improve and fix detection and import of xlsx and csv documents Nov 14, 2024
@blizzz blizzz changed the title improve and fix detection and import of xlsx and csv documents fix and improve detection and import of xlsx and csv documents Nov 14, 2024
@blizzz blizzz force-pushed the fix/1440/import-types branch 3 times, most recently from b154e9e to e8320cc Compare November 15, 2024 16:40
@blizzz blizzz added 3. to review Waiting for reviews and removed 2. developing Work in progress labels Nov 15, 2024
@blizzz blizzz marked this pull request as ready for review November 15, 2024 19:21
@blizzz blizzz force-pushed the fix/1440/import-types branch from 845ef0c to 095314a Compare November 15, 2024 19:25
@blizzz blizzz requested review from juliusknorr and enjeck November 15, 2024 19:25
@blizzz blizzz changed the title fix and improve detection and import of xlsx and csv documents fix and improve detection and import of ods, xlsx and csv documents Nov 15, 2024
@blizzz blizzz force-pushed the fix/1440/import-types branch from 095314a to 3beaa58 Compare November 15, 2024 19:28
@blizzz

This comment was marked as resolved.

Signed-off-by: Arthur Schiwon <blizzz@arthur-schiwon.de>
Signed-off-by: Arthur Schiwon <blizzz@arthur-schiwon.de>
Signed-off-by: Arthur Schiwon <blizzz@arthur-schiwon.de>
improved handling of LibreOffice-generated XLSX documents

Signed-off-by: Arthur Schiwon <blizzz@arthur-schiwon.de>
Signed-off-by: Arthur Schiwon <blizzz@arthur-schiwon.de>
Signed-off-by: Arthur Schiwon <blizzz@arthur-schiwon.de>
Signed-off-by: Arthur Schiwon <blizzz@arthur-schiwon.de>
… and get date value with fault tolerance

Signed-off-by: Arthur Schiwon <blizzz@arthur-schiwon.de>
Signed-off-by: Arthur Schiwon <blizzz@arthur-schiwon.de>
@blizzz blizzz force-pushed the fix/1440/import-types branch from 3beaa58 to 044d62d Compare November 18, 2024 20:58
@blizzz
Copy link
Member Author

blizzz commented Nov 25, 2024

/backport to stable0.8

@backportbot backportbot bot added the backport-request Pending backport by the backport-bot label Nov 25, 2024
Copy link
Contributor

@enjeck enjeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and works.
One question:

if ($cell->getDataType() === 'null') {
// LibreOffice generated XLSX doc may have more empty columns in the first row.
// Continue without increasing error count.
// Question: What about tables where a column does not have a heading?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this question left as a TODO? or addressed below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha. Good finding. Forgot about this one. Let me check me one thing, to ensure that this does not change from the previous behavior. Might indeed need a touch.

There is also a documentation component in that headers were expected on all columns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, import works as before. But there is one difference:

Screenshot_20241126_123404

Previously it was one.

It was testing with a three-column table where the middle-one was not having a caption. But the values were set (always strings for simplicity). Before the change and now, only the columns with a caption were being added, the middle one ignored.

On import, however, the first two columns were added and the third column ignored.

To visualize, the original

Caption 1 Caption 3
Aaa A Zzz
Bbb B Yyy
Ccc C Xxx

turns (elements taken over in italic) into

Caption 1 Caption 3
Aaa A
Bbb B
Ccc C

Our columns require a caption. Something to document, but I look into increasing the error again, if it is not a trailing column.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in 50815ae

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also add that as a note to the import dialog, most users will probably never get to the wiki

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also add that as a note to the import dialog, most users will probably never get to the wiki

👍 Added a brief supplement in 36179dc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍

…ders

Signed-off-by: Arthur Schiwon <blizzz@arthur-schiwon.de>
Signed-off-by: Arthur Schiwon <blizzz@arthur-schiwon.de>
@blizzz blizzz merged commit 3a06109 into main Nov 27, 2024
63 checks passed
@blizzz blizzz deleted the fix/1440/import-types branch November 27, 2024 08:07
@backportbot backportbot bot removed the backport-request Pending backport by the backport-bot label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants