Skip to content

Releases: microsoft/markitdown

v0.0.1a3

17 Dec 22:31
3ce21a4
Compare
Choose a tag to compare
v0.0.1a3 Pre-release
Pre-release

New Features and Formats

Breaking Changes

Renamed mlm_client and mlm_model arguments to llm_client and llm_model, and added appropriate deprecation warnings.

See:

Bug fixes and enhancements

  • Remove invalid classifiers by @simonw in #10
  • Add installation instructions from haesleinhuepf:patch-1 by @gagb in #27
  • Update README.md by @gagb in #28
  • Improve the readme with contributing guidelines by @gagb in #7
  • Add installation instructions by @haesleinhuepf in #24
  • Update README.md by @pawarbi in #26
  • Update README.md by @gagb in #29
  • CLI usage instructions by @simonw in #11
  • Fix character decoding issues with text-like files by @brc-dd in #19
  • Catching pydub's warning of ffmpeg or avconv missing by @SH4DOW4RE in #39
  • Exclude test files from language statistics using linguist-vendored by @Y-Kim-64 in #44
  • Support specifying YouTube transcript language by @narumiruna in #50
  • Add passing style_map kwarg to Mammoth when converting docx to allow keeping comments by @VillePuuska in #38
  • Fix: pass the kwargs to _convert method when converting an url file by @Soulter in #48
  • Added Dockerfile by @madduci in #60
  • fix issue #65 by @DIMAX99 in #67
  • Cybernobie/main by @gagb in #75
  • Ensure hatch is installed before running tests by @cybernobie in #63
  • Kevinclb/main by @gagb in #77
  • feature: add argument parsing for cli tool capability by @kevinclb in #46
  • Added llm tests to the local test set. by @afourney in #100

New Contributors

Full Changelog: v0.0.1a2...v0.0.1a3

v0.0.1a2

17 Dec 22:17
b401396
Compare
Choose a tag to compare
v0.0.1a2 Pre-release
Pre-release

Initial Release of markitdown

The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)

It presently supports:

  • PDF (.pdf)
  • PowerPoint (.pptx)
  • Word (.docx)
  • Excel (.xlsx)
  • Images (EXIF metadata, and OCR)
  • Audio (EXIF metadata, and speech transcription)
  • HTML (special handling of Wikipedia, etc.)
  • Various other text-based formats (csv, json, xml, etc.)

The API is simple:

from markitdown import MarkItDown

markitdown = MarkItDown()
result = markitdown.convert("test.xlsx")
print(result.text_content)