ExcelReaders is a package that provides functionality to read Excel files.
Use Pkg.add("ExcelReaders")
in Julia to install ExcelReaders and its dependencies.
The package uses the Python xlrd library. If either Python or the xlrd package are not installed on your system, the package will use the conda.jl package to install all necessary dependencies automatically.
The Taro package also provides Excel file reading functionality. The main difference between the two packages (in terms of Excel functionality) is that ExcelReaders uses the Python package xlrd for its processing, whereas Taro uses the Java packages Apache Tika and Apache POI.
The most basic usage is this:
using ExcelReaders
data = readxl("Filename.xlsx", "Sheet1!A1:C4")
This will return a DataMatrix{Any}
with all the data in the cell range A1 to C4 on Sheet1 in the Excel file Filename.xlsx.
If you expect to read multiple ranges from the same Excel file you can get much better performance by opening the Excel file only once:
using ExcelReaders
f = openxl("Filename.xlsx")
data1 = readxl(f, "Sheet1!A1:C4")
data2 = readxl(f, "Sheet2!B4:F10")
The readxlsheet
function reads complete Excel sheets, without a need to specify precise range information. The most basic usage is
using ExcelReaders
data = readxlsheet("Filename.xlsx", "Sheet1")
This will read all content on Sheet1 in the file Filename.xlsx. Eventual blank rows and columns at the top and left are skipped. readxlsheet
takes a number of optional keyword arguments:
skipstartrows
accepts either:blanks
(default) or a positive integer. With:blank
any empty initial rows are skipped. An integer skips as many rows as specified.skipstartcols
accepts either:blanks
(default) or a positive integer. With:blank
any empty initial columns are skipped. An integer skips as many columns as specified.nrows
accepts either:all
(default) or a positive integer. With:all
, all rows (except skipped ones) are read. An integer specifies the exact number of rows to be read.ncols
accepts either:all
(default) or a postiive integer. With:all
, all columns (except skipped ones) are read. An integer specifies the exact number of columns to be read.
readxlsheet
also accepts an ExcelFile (as obtained from openxl
) as its first argument.
To read into a DataFrame:
using ExcelReaders
using DataFrames
df = readxl(DataFrame, "Filename.xlsx", "Sheet1!A1:C4")
This code will use the first row in the range A1:C4 as the column names in the DataFrame.
To read in data without a header row use
df = readxl(DataFrame, "Filename.xlsx", "Sheet1!A1:C4", header=false)
This will auto-generate column names. Alternatively you can specify your own names:
df = readxl(DataFrame, "Filename.xlsx", "Sheet1!A1:C4",
header=false, colnames=[:name1, :name2, :name3])
You can also combine header=true
and a custom colnames
list, in that case the first row in the specified range will just be skipped.
To read the whole sheet into a DataFrame (respective keyword arguments (header
, skipstartrows
etc.) should work as expected):
df = readxlsheet(DataFrame, "Filename.xlsx", "Sheet1")