Releases: microsoft/markitdown
Releases · microsoft/markitdown
v0.0.1a3
New Features and Formats
- Add zip handling by @Josh-XT in #22
- Add PPTX chart support by @nyosegawa in #33
Breaking Changes
Renamed mlm_client
and mlm_model
arguments to llm_client
and llm_model
, and added appropriate deprecation warnings.
See:
- Fix LLM terminology in code by @CharlesCNorton in #73
- Fix LLM terms by @CharlesCNorton in #72
- Added deprecation warnings for mlm_* arguments. by @afourney in #101
Bug fixes and enhancements
- Remove invalid classifiers by @simonw in #10
- Add installation instructions from haesleinhuepf:patch-1 by @gagb in #27
- Update README.md by @gagb in #28
- Improve the readme with contributing guidelines by @gagb in #7
- Add installation instructions by @haesleinhuepf in #24
- Update README.md by @pawarbi in #26
- Update README.md by @gagb in #29
- CLI usage instructions by @simonw in #11
- Fix character decoding issues with text-like files by @brc-dd in #19
- Catching pydub's warning of ffmpeg or avconv missing by @SH4DOW4RE in #39
- Exclude test files from language statistics using linguist-vendored by @Y-Kim-64 in #44
- Support specifying YouTube transcript language by @narumiruna in #50
- Add passing style_map kwarg to Mammoth when converting docx to allow keeping comments by @VillePuuska in #38
- Fix: pass the kwargs to _convert method when converting an url file by @Soulter in #48
- Added Dockerfile by @madduci in #60
- fix issue #65 by @DIMAX99 in #67
- Cybernobie/main by @gagb in #75
- Ensure hatch is installed before running tests by @cybernobie in #63
- Kevinclb/main by @gagb in #77
- feature: add argument parsing for cli tool capability by @kevinclb in #46
- Added llm tests to the local test set. by @afourney in #100
New Contributors
- @simonw made their first contribution in #10
- @gagb made their first contribution in #27
- @haesleinhuepf made their first contribution in #24
- @pawarbi made their first contribution in #26
- @brc-dd made their first contribution in #19
- @Josh-XT made their first contribution in #22
- @nyosegawa made their first contribution in #33
- @VillePuuska made their first contribution in #38
- @SH4DOW4RE made their first contribution in #39
- @Y-Kim-64 made their first contribution in #44
- @Soulter made their first contribution in #48
- @narumiruna made their first contribution in #50
- @madduci made their first contribution in #60
- @CharlesCNorton made their first contribution in #73
- @DIMAX99 made their first contribution in #67
- @cybernobie made their first contribution in #63
- @kevinclb made their first contribution in #46
Full Changelog: v0.0.1a2...v0.0.1a3
v0.0.1a2
Initial Release of markitdown
The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)
It presently supports:
- PDF (.pdf)
- PowerPoint (.pptx)
- Word (.docx)
- Excel (.xlsx)
- Images (EXIF metadata, and OCR)
- Audio (EXIF metadata, and speech transcription)
- HTML (special handling of Wikipedia, etc.)
- Various other text-based formats (csv, json, xml, etc.)
The API is simple:
from markitdown import MarkItDown
markitdown = MarkItDown()
result = markitdown.convert("test.xlsx")
print(result.text_content)