Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement importing excel (.xls .xlsx) files #5638

Merged
merged 13 commits into from
Sep 12, 2024

Conversation

shun2wang
Copy link
Contributor

@shun2wang shun2wang commented Aug 15, 2024

Close: jasp-stats/jasp-issues#2134
Close: https://github.com/jasp-stats/INTERNAL-jasp/issues/1390
Close: https://github.com/jasp-stats/INTERNAL-jasp/issues/412

JASP can now import excel files from first workbook and the first sheet by default.

I introduced a third-party library freexl to parse the excel files. but a known limitation is that you cannot import files with more than 705 columns (this is a bug of freexl), I made a patch to lifte this limitation temporarily.

  • init implement import excel files (on Linux)
  • work for Windows
  • work for macOS (someone could help?)
  • update development document

My C++ skills are not yet proficient. I would be grateful for your guidance and review.

@shun2wang shun2wang changed the title [WIP] implement importing excel (.xls .xlsx) files feat: implement importing excel (.xls .xlsx) files Aug 21, 2024
@boutinb boutinb self-requested a review August 29, 2024 15:44
@boutinb
Copy link
Contributor

boutinb commented Aug 29, 2024

What nice!!! I'll try it tomorrow on Mac.

@boutinb
Copy link
Contributor

boutinb commented Aug 30, 2024

I wonder why you did not use conan to load the freexl library.
I have done this for Mac (see my last commit), and it works perfect. we don't need to copy the freexl.h file in this way.
Can you try this on Windows and Linux?

@@ -18,6 +18,7 @@ brotli/1.0.9
sqlite3/3.46.0
gmp/6.3.0
mpfr/4.2.1
freexl/2.0.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

official FreeXL has a bug, so we using a patch version from my fork.

@shun2wang
Copy link
Contributor Author

shun2wang commented Aug 30, 2024

I wonder why you did not use conan to load the freexl library. I have done this for Mac (see my last commit), and it works perfect. we don't need to copy the freexl.h file in this way. Can you try this on Windows and Linux?

because it cannot read more than 700 cols Excel files so I made a patch and build it manually. see my first comment.

see: https://github.com/shun2wang/freexl

@shun2wang
Copy link
Contributor Author

shun2wang commented Sep 3, 2024

see: https://github.com/shun2wang/freexl

Do we maintain a repo of JASP, I am just using my fork for now. @boutinb

@boutinb
Copy link
Contributor

boutinb commented Sep 4, 2024

But if I see your PR, you are using a freeXL version set in rtools.
We could use your freeXL version on GitHub and compile it, but for maintenance reason (JASP depends on many libraries, and its build is becoming quite complex) I would prefer to lie on conan which takes care on portability and compiling issues.
That the current freeXL cannot load more than 700 columns is not a big issue I find. I would ask freeXL to merge your fix (maybe you did it already), so that the next version will include your fix.

@shun2wang
Copy link
Contributor Author

shun2wang commented Sep 4, 2024

But if I see your PR, you are using a freeXL version set in rtools.

Yes, we using rtools msys2 build toolchain to build it on Windows and the ReadStat also build manually by this way. maybe @JorisGoosen know more details here.

That the current freeXL cannot load more than 700 columns is not a big issue I find. I would ask freeXL to merge your fix (maybe you did it already), so that the next version will include your fix.

We have received complaints from users about multi-column importing, and Rens has applied a fix temporarily for that. Yes I had filed bugs to that project but I dont think it will be merged for quite some time, because freeXL is updated very infrequently to ensure stability, and the last release was a year ago.

Your concerns are valid, I would try to find a way to apply patches manually from conan instead of waiting for the official master branch release.

EDIT: no, I used a dev branch of FreeXL which including some bugfix important. I think maybe build it manually is best choice for now.

@JorisGoosen JorisGoosen self-requested a review September 5, 2024 10:53
@JorisGoosen
Copy link
Contributor

I agree with Bruno that adding more manual stuff to the build is not very maintainable...
And Im already unhappy with the fact that for some systems we need to manually build readstat on some systems.

But if I see your PR, you are using a freeXL version set in rtools.

Yes, we using rtools msys2 build toolchain to build it on Windows and the ReadStat also build manually by this way. maybe @JorisGoosen know more details here.

That the current freeXL cannot load more than 700 columns is not a big issue I find. I would ask freeXL to merge your fix (maybe you did it already), so that the next version will include your fix.

I agree that staying under 700 columns until they merge your fix is not at all a problem.
Just making it possible to import from xlsx files etc is going to be a highly appluaded feature in any case.
So very nice you made it.

We have received complaints from users about multi-column importing, and Rens has applied a fix temporarily for that. Yes I had filed bugs to that project but I dont think it will be merged for quite some time, because freeXL is updated very infrequently to ensure stability, and the last release was a year ago.

Your concerns are valid, I would try to find a way to apply patches manually from conan instead of waiting for the official master branch release.

EDIT: no, I used a dev branch of FreeXL which including some bugfix important. I think maybe build it manually is best choice for now.

What bug was fixed?
Maybe it is ok to just use the version with bug that is in conan?

I will now do an indepth review of your code ^^

@JorisGoosen
Copy link
Contributor

Also, if we are going to use our modified version I would like to move that code to jasp-stats and use that one.

@JorisGoosen
Copy link
Contributor

JorisGoosen commented Sep 5, 2024

Already a thing: in qml and c++ files we use tabs not spaces. So perhaps you could clean up your indentation to be tabs of 4 instead of the spaces?

Edit: I see that you do use tabs in a lot of places, just not everywhere?

@shun2wang
Copy link
Contributor Author

Thanks for your reviews! OK let's use conan version to make build easier. and ignore 700 cols limitation. then I will following the official release. 🍻

Copy link
Contributor

@boutinb boutinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works really well on Mac.
I think we can merge this for 0.19.2

@JorisGoosen JorisGoosen merged commit 90a9752 into jasp-stats:development Sep 12, 2024
1 check passed
@JorisGoosen
Copy link
Contributor

I agree!

Very nice @shun2wang 😃

@shun2wang shun2wang deleted the readExcelFile branch September 12, 2024 15:27
JorisGoosen added a commit that referenced this pull request Sep 30, 2024
* implement importing excel files

* update ubuntu build bot

* work on Windows

* code style formatting and cleanup

* add to installer and file type support

* showing file type filter as a list

* Use conan to load freexl

* add some missing supported filetypes to macos manifest thing

* use jasp-stats fork of shuns fixes for freexl

* address reviews suggestions

* fix build on Linux

* some cleanup and comments

* replace newlines in excel cell value

---------

Co-authored-by: boutinb <b.boutin@uva.nl>
Co-authored-by: Joris Goosen <joris@jorisgoosen.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Open Excel (.xls/.xlsx) data files
3 participants