API Reference

XLSX.XLSXFile — Type

XLSXFile represents a reference to an Excel file.

It is created by using XLSX.readxlsx or XLSX.openxlsx.

From a XLSXFile you can navigate to a XLSX.Worksheet reference as shown in the example below.

Example

xf = XLSX.readxlsx("myfile.xlsx")
sh = xf["mysheet"] # get a reference to a Worksheet

source

XLSX.readxlsx — Function

readxlsx(source::Union{AbstractString, IO}) :: XLSXFile

Main function for reading an Excel file. This function will read the whole Excel file into memory and return a closed XLSXFile.

Consider using XLSX.openxlsx for lazy loading of Excel file contents.

source

XLSX.openxlsx — Function

openxlsx(f::F, source::Union{AbstractString, IO}; mode::AbstractString="r", enable_cache::Bool=true) where {F<:Function}

Open XLSX file for reading and/or writing. It returns an opened XLSXFile that will be automatically closed after applying f to the file.

Do syntax

This function should be used with do syntax, like in:

XLSX.openxlsx("myfile.xlsx") do xf
    # read data from `xf`
end

Filemodes

The mode argument controls how the file is opened. The following modes are allowed:

r : read mode. The existing data in source will be accessible for reading. This is the default mode.
w : write mode. Opens an empty file that will be written to source.
rw : edit mode. Opens source for editing. The file will be saved to disk when the function ends.

Warning

The rw mode is known to produce some data loss. See #159.

Simple data should work fine. Users are advised to use this feature with caution when working with formulas and charts.

Arguments

source is IO or the complete path to the file.
mode is the file mode, as explained in the last section.
enable_cache:

If enable_cache=true, all read worksheet cells will be cached. If you read a worksheet cell twice it will use the cached value instead of reading from disk in the second time.

If enable_cache=false, worksheet cells will always be read from disk. This is useful when you want to read a spreadsheet that doesn't fit into memory.

The default value is enable_cache=true.

Examples

Read from file

The following example shows how you would read worksheet cells, one row at a time, where myfile.xlsx is a spreadsheet that doesn't fit into memory.

julia> XLSX.openxlsx("myfile.xlsx", enable_cache=false) do xf
          for r in XLSX.eachrow(xf["mysheet"])
              # read something from row `r`
          end
       end

Write a new file

XLSX.openxlsx("new.xlsx", mode="w") do xf
    sheet = xf[1]
    sheet[1, :] = [1, Date(2018, 1, 1), "test"]
end

Edit an existing file

XLSX.openxlsx("edit.xlsx", mode="rw") do xf
    sheet = xf[1]
    sheet[2, :] = [2, Date(2019, 1, 1), "add new line"]
end

See also XLSX.readdata.

source

getdata(ws::Worksheet, cell::Cell) :: CellValue

Returns a Julia representation of a given cell value. The result data type is chosen based on the value of the cell as well as its style.

For example, date is stored as integers inside the spreadsheet, and the style is the information that is taken into account to chose Date as the result type.

For numbers, if the style implies that the number is visualized with decimals, the method will return a float, even if the underlying number is stored as an integer inside the spreadsheet XML.

If cell has empty value or empty String, this function will return missing.

source

XLSX.getcell — Function

getcell(xlsxfile, cell_reference_name) :: AbstractCell
getcell(worksheet, cell_reference_name) :: AbstractCell
getcell(sheetrow, column_name) :: AbstractCell
getcell(sheetrow, column_number) :: AbstractCell

Returns the internal representation of a worksheet cell.

Returns XLSX.EmptyCell if the cell has no data.

source

getcell(sheet, ref)

Returns an AbstractCell that represents a cell in the spreadsheet.

Example:

julia> xf = XLSX.readxlsx("myfile.xlsx")

julia> sheet = xf["mysheet"]

julia> cell = XLSX.getcell(sheet, "A1")

source

XLSX.getcellrange — Function

getcellrange(sheet, rng)

Returns a matrix with cells as Array{AbstractCell, 2}. rng must be a valid cell range, as in "A1:B2".

source

XLSX.row_number — Function

row_number(c::CellRef) :: Int

Returns the row number of a given cell reference.

source

XLSX.column_number — Function

column_number(c::CellRef) :: Int

Returns the column number of a given cell reference.

source

XLSX.eachrow — Function

eachrow(sheet)

Creates a row iterator for a worksheet.

Example: Query all cells from columns 1 to 4.

left = 1  # 1st column
right = 4 # 4th column
for sheetrow in XLSX.eachrow(sheet)
    for column in left:right
        cell = XLSX.getcell(sheetrow, column)

        # do something with cell
    end
end

source

XLSX.readtable — Function

readtable(
    source,
    sheet,
    [columns];
    [first_row],
    [column_labels],
    [header],
    [infer_eltypes],
    [stop_in_empty_row],
    [stop_in_row_function],
    [keep_empty_rows]
) -> DataTable

Returns tabular data from a spreadsheet as a struct XLSX.DataTable. Use this function to create a DataFrame from package DataFrames.jl.

Use columns argument to specify which columns to get. For example, "B:D" will select columns B, C and D. If columns is not given, the algorithm will find the first sequence of consecutive non-empty cells.

Use first_row to indicate the first row from the table. first_row=5 will look for a table starting at sheet row 5. If first_row is not given, the algorithm will look for the first non-empty row in the spreadsheet.

header is a Bool indicating if the first row is a header. If header=true and column_labels is not specified, the column labels for the table will be read from the first row of the table. If header=false and column_labels is not specified, the algorithm will generate column labels. The default value is header=true.

Use column_labels to specify names for the header of the table.

Use infer_eltypes=true to get data as a Vector{Any} of typed vectors. The default value is infer_eltypes=false.

stop_in_empty_row is a boolean indicating whether an empty row marks the end of the table. If stop_in_empty_row=false, the TableRowIterator will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true.

stop_in_row_function is a Function that receives a TableRow and returns a Bool indicating if the end of the table was reached.

Example for stop_in_row_function:

function stop_function(r)
    v = r[:col_label]
    return !ismissing(v) && v == "unwanted value"
end

keep_empty_rows determines whether rows where all column values are equal to missing are kept (true) or dropped (false) from the resulting table. keep_empty_rows never affects the bounds of the table; the number of rows read from a sheet is only affected by, first_row, stop_in_empty_row and stop_in_row_function (if specified). keep_empty_rows is only checked once the first and last row of the table have been determined, to see whether to keep or drop empty rows between the first and the last row.

Example

julia> using DataFrames, XLSX

julia> df = DataFrame(XLSX.readtable("myfile.xlsx", "mysheet"))