API Reference
XLSX.XLSXFile
— TypeXLSXFile
represents a reference to an Excel file.
It is created by using XLSX.readxlsx
or XLSX.openxlsx
.
From a XLSXFile
you can navigate to a XLSX.Worksheet
reference as shown in the example below.
Example
xf = XLSX.readxlsx("myfile.xlsx")
sh = xf["mysheet"] # get a reference to a Worksheet
XLSX.readxlsx
— Functionreadxlsx(source::Union{AbstractString, IO}) :: XLSXFile
Main function for reading an Excel file. This function will read the whole Excel file into memory and return a closed XLSXFile.
Consider using XLSX.openxlsx
for lazy loading of Excel file contents.
XLSX.openxlsx
— Functionopenxlsx(f::F, source::Union{AbstractString, IO}; mode::AbstractString="r", enable_cache::Bool=true) where {F<:Function}
Open XLSX file for reading and/or writing. It returns an opened XLSXFile that will be automatically closed after applying f
to the file.
Do
syntax
This function should be used with do
syntax, like in:
XLSX.openxlsx("myfile.xlsx") do xf
# read data from `xf`
end
Filemodes
The mode
argument controls how the file is opened. The following modes are allowed:
r
: read mode. The existing data insource
will be accessible for reading. This is the default mode.w
: write mode. Opens an empty file that will be written tosource
.rw
: edit mode. Openssource
for editing. The file will be saved to disk when the function ends.
The rw
mode is known to produce some data loss. See #159.
Simple data should work fine. Users are advised to use this feature with caution when working with formulas and charts.
Arguments
source
is IO or the complete path to the file.mode
is the file mode, as explained in the last section.enable_cache
:
If enable_cache=true
, all read worksheet cells will be cached. If you read a worksheet cell twice it will use the cached value instead of reading from disk in the second time.
If enable_cache=false
, worksheet cells will always be read from disk. This is useful when you want to read a spreadsheet that doesn't fit into memory.
The default value is enable_cache=true
.
Examples
Read from file
The following example shows how you would read worksheet cells, one row at a time, where myfile.xlsx
is a spreadsheet that doesn't fit into memory.
julia> XLSX.openxlsx("myfile.xlsx", enable_cache=false) do xf
for r in XLSX.eachrow(xf["mysheet"])
# read something from row `r`
end
end
Write a new file
XLSX.openxlsx("new.xlsx", mode="w") do xf
sheet = xf[1]
sheet[1, :] = [1, Date(2018, 1, 1), "test"]
end
Edit an existing file
XLSX.openxlsx("edit.xlsx", mode="rw") do xf
sheet = xf[1]
sheet[2, :] = [2, Date(2019, 1, 1), "add new line"]
end
See also XLSX.readxlsx
.
openxlsx(source::Union{AbstractString, IO}; mode="r", enable_cache=true) :: XLSXFile
Supports opening a XLSX file without using do-syntax. In this case, the user is responsible for closing the XLSXFile
using close
or writing it to file using XLSX.writexlsx
.
See also XLSX.writexlsx
.
XLSX.writexlsx
— Functionwritexlsx(output_source, xlsx_file; [overwrite=false])
Writes an Excel file given by xlsx_file::XLSXFile
to IO or filepath output_source
.
If overwrite=true
, output_source
(when a filepath) will be overwritten if it exists.
XLSX.sheetnames
— Functionsheetnames(xl::XLSXFile)
sheetnames(wb::Workbook)
Returns a vector with Worksheet names for this Workbook.
XLSX.sheetcount
— Functionsheetcount(xlsfile) :: Int
Counts the number of sheets in the Workbook.
XLSX.hassheet
— Functionhassheet(wb::Workbook, sheetname::AbstractString)
hassheet(xl::XLSXFile, sheetname::AbstractString)
Returns true
if wb
contains a sheet named sheetname
.
XLSX.Worksheet
— TypeA Worksheet
represents a reference to an Excel Worksheet.
From a Worksheet
you can query for Cells, cell values and ranges.
Example
xf = XLSX.readxlsx("myfile.xlsx")
sh = xf["mysheet"] # get a reference to a Worksheet
println( sh[2, 2] ) # access element "B2" (2nd row, 2nd column)
println( sh["B2"] ) # you can also use the cell name
println( sh["A2:B4"] ) # or a cell range
println( sh[:] ) # all data inside worksheet's dimension
XLSX.readdata
— Functionreaddata(source, sheet, ref)
readdata(source, sheetref)
Returns a scalar or matrix with values from a spreadsheet.
See also XLSX.getdata
.
Examples
These function calls are equivalent.
julia> XLSX.readdata("myfile.xlsx", "mysheet", "A2:B4")
3×2 Array{Any,2}:
1 "first"
2 "second"
3 "third"
julia> XLSX.readdata("myfile.xlsx", 1, "A2:B4")
3×2 Array{Any,2}:
1 "first"
2 "second"
3 "third"
julia> XLSX.readdata("myfile.xlsx", "mysheet!A2:B4")
3×2 Array{Any,2}:
1 "first"
2 "second"
3 "third"
XLSX.getdata
— Functiongetdata(sheet, ref)
getdata(sheet, row, column)
Returns a scalar or a matrix with values from a spreadsheet. ref
can be a cell reference or a range.
Indexing in a Worksheet
will dispatch to getdata
method.
Example
julia> f = XLSX.readxlsx("myfile.xlsx")
julia> sheet = f["mysheet"]
julia> matrix = sheet["A1:B4"]
julia> single_value = sheet[2, 2] # B2
See also XLSX.readdata
.
getdata(ws::Worksheet, cell::Cell) :: CellValue
Returns a Julia representation of a given cell value. The result data type is chosen based on the value of the cell as well as its style.
For example, date is stored as integers inside the spreadsheet, and the style is the information that is taken into account to chose Date
as the result type.
For numbers, if the style implies that the number is visualized with decimals, the method will return a float, even if the underlying number is stored as an integer inside the spreadsheet XML.
If cell
has empty value or empty String
, this function will return missing
.
XLSX.getcell
— Functiongetcell(xlsxfile, cell_reference_name) :: AbstractCell
getcell(worksheet, cell_reference_name) :: AbstractCell
getcell(sheetrow, column_name) :: AbstractCell
getcell(sheetrow, column_number) :: AbstractCell
Returns the internal representation of a worksheet cell.
Returns XLSX.EmptyCell
if the cell has no data.
getcell(sheet, ref)
Returns an AbstractCell
that represents a cell in the spreadsheet.
Example:
julia> xf = XLSX.readxlsx("myfile.xlsx")
julia> sheet = xf["mysheet"]
julia> cell = XLSX.getcell(sheet, "A1")
XLSX.getcellrange
— Functiongetcellrange(sheet, rng)
Returns a matrix with cells as Array{AbstractCell, 2}
. rng
must be a valid cell range, as in "A1:B2"
.
XLSX.row_number
— Functionrow_number(c::CellRef) :: Int
Returns the row number of a given cell reference.
XLSX.column_number
— Functioncolumn_number(c::CellRef) :: Int
Returns the column number of a given cell reference.
XLSX.eachrow
— Functioneachrow(sheet)
Creates a row iterator for a worksheet.
Example: Query all cells from columns 1 to 4.
left = 1 # 1st column
right = 4 # 4th column
for sheetrow in XLSX.eachrow(sheet)
for column in left:right
cell = XLSX.getcell(sheetrow, column)
# do something with cell
end
end
XLSX.readtable
— Functionreadtable(
source,
sheet,
[columns];
[first_row],
[column_labels],
[header],
[infer_eltypes],
[stop_in_empty_row],
[stop_in_row_function],
[keep_empty_rows]
) -> DataTable
Returns tabular data from a spreadsheet as a struct XLSX.DataTable
. Use this function to create a DataFrame
from package DataFrames.jl
.
Use columns
argument to specify which columns to get. For example, "B:D"
will select columns B
, C
and D
. If columns
is not given, the algorithm will find the first sequence of consecutive non-empty cells.
Use first_row
to indicate the first row from the table. first_row=5
will look for a table starting at sheet row 5
. If first_row
is not given, the algorithm will look for the first non-empty row in the spreadsheet.
header
is a Bool
indicating if the first row is a header. If header=true
and column_labels
is not specified, the column labels for the table will be read from the first row of the table. If header=false
and column_labels
is not specified, the algorithm will generate column labels. The default value is header=true
.
Use column_labels
to specify names for the header of the table.
Use infer_eltypes=true
to get data
as a Vector{Any}
of typed vectors. The default value is infer_eltypes=false
.
stop_in_empty_row
is a boolean indicating whether an empty row marks the end of the table. If stop_in_empty_row=false
, the TableRowIterator
will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true
.
stop_in_row_function
is a Function that receives a TableRow
and returns a Bool
indicating if the end of the table was reached.
Example for stop_in_row_function
:
function stop_function(r)
v = r[:col_label]
return !ismissing(v) && v == "unwanted value"
end
keep_empty_rows
determines whether rows where all column values are equal to missing
are kept (true
) or dropped (false
) from the resulting table. keep_empty_rows
never affects the bounds of the table; the number of rows read from a sheet is only affected by, first_row
, stop_in_empty_row
and stop_in_row_function
(if specified). keep_empty_rows
is only checked once the first and last row of the table have been determined, to see whether to keep or drop empty rows between the first and the last row.
Example
julia> using DataFrames, XLSX
julia> df = DataFrame(XLSX.readtable("myfile.xlsx", "mysheet"))
See also: XLSX.gettable
.
XLSX.gettable
— Functiongettable(
sheet,
[columns];
[first_row],
[column_labels],
[header],
[infer_eltypes],
[stop_in_empty_row],
[stop_in_row_function],
[keep_empty_rows]
) -> DataTable
Returns tabular data from a spreadsheet as a struct XLSX.DataTable
. Use this function to create a DataFrame
from package DataFrames.jl
.
Use columns
argument to specify which columns to get. For example, "B:D"
will select columns B
, C
and D
. If columns
is not given, the algorithm will find the first sequence of consecutive non-empty cells.
Use first_row
to indicate the first row from the table. first_row=5
will look for a table starting at sheet row 5
. If first_row
is not given, the algorithm will look for the first non-empty row in the spreadsheet.
header
is a Bool
indicating if the first row is a header. If header=true
and column_labels
is not specified, the column labels for the table will be read from the first row of the table. If header=false
and column_labels
is not specified, the algorithm will generate column labels. The default value is header=true
.
Use column_labels
as a vector of symbols to specify names for the header of the table.
Use infer_eltypes=true
to get data
as a Vector{Any}
of typed vectors. The default value is infer_eltypes=false
.
stop_in_empty_row
is a boolean indicating whether an empty row marks the end of the table. If stop_in_empty_row=false
, the TableRowIterator
will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true
.
stop_in_row_function
is a Function that receives a TableRow
and returns a Bool
indicating if the end of the table was reached.
Example for stop_in_row_function
function stop_function(r)
v = r[:col_label]
return !ismissing(v) && v == "unwanted value"
end
keep_empty_rows
determines whether rows where all column values are equal to missing
are kept (true
) or dropped (false
) from the resulting table. keep_empty_rows
never affects the bounds of the table; the number of rows read from a sheet is only affected by first_row
, stop_in_empty_row
and stop_in_row_function
(if specified). keep_empty_rows
is only checked once the first and last row of the table have been determined, to see whether to keep or drop empty rows between the first and the last row.
Example
julia> using DataFrames, XLSX
julia> df = XLSX.openxlsx("myfile.xlsx") do xf
DataFrame(XLSX.gettable(xf["mysheet"]))
end
See also: XLSX.readtable
.
XLSX.eachtablerow
— Functioneachtablerow(sheet, [columns]; [first_row], [column_labels], [header], [stop_in_empty_row], [stop_in_row_function], [keep_empty_rows])
Constructs an iterator of table rows. Each element of the iterator is of type TableRow
.
header
is a boolean indicating whether the first row of the table is a table header.
If header == false
and no column_labels
were supplied, column names will be generated following the column names found in the Excel file.
The columns
argument is a column range, as in "B:E"
. If columns
is not supplied, the column range will be inferred by the non-empty contiguous cells in the first row of the table.
The user can replace column names by assigning the optional column_labels
input variable with a Vector{Symbol}
.
stop_in_empty_row
is a boolean indicating whether an empty row marks the end of the table. If stop_in_empty_row=false
, the iterator will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true
. Empty rows may be returned by the iterator when stop_in_empty_row=false
.
stop_in_row_function
is a Function that receives a TableRow
and returns a Bool
indicating if the end of the table was reached.
Example for stop_in_row_function
:
function stop_function(r)
v = r[:col_label]
return !ismissing(v) && v == "unwanted value"
end
keep_empty_rows
determines whether rows where all column values are equal to missing
are kept (true
) or skipped (false
) by the row iterator. keep_empty_rows
never affects the bounds of the iterator; the number of rows read from a sheet is only affected by first_row
, stop_in_empty_row
and stop_in_row_function
(if specified). keep_empty_rows
is only checked once the first and last row of the table have been determined, to see whether to keep or drop empty rows between the first and the last row.
Example code:
for r in XLSX.eachtablerow(sheet)
# r is a `TableRow`. Values are read using column labels or numbers.
rn = XLSX.row_number(r) # `TableRow` row number.
v1 = r[1] # will read value at table column 1.
v2 = r[:COL_LABEL2] # will read value at column labeled `:COL_LABEL2`.
end
See also XLSX.gettable
.
XLSX.writetable
— Functionwritetable(filename, table; [overwrite], [sheetname])
Write Tables.jl table
to the specified filename.
writetable(filename::Union{AbstractString, IO}, tables::Vector{Pair{String, T}}; overwrite::Bool=false)
writetable(filename::Union{AbstractString, IO}, tables::Pair{String, Any}...; overwrite::Bool=false)
writetable(filename, data, columnnames; [overwrite], [sheetname])
data
is a vector of columns.columnames
is a vector of column labels.overwrite
is aBool
to control iffilename
should be overwritten if already exists.sheetname
is the name for the worksheet.
Example
import XLSX
columns = [ [1, 2, 3, 4], ["Hey", "You", "Out", "There"], [10.2, 20.3, 30.4, 40.5] ]
colnames = [ "integers", "strings", "floats" ]
XLSX.writetable("table.xlsx", columns, colnames)
See also: XLSX.writetable!
.
writetable(filename::Union{AbstractString, IO}; overwrite::Bool=false, kw...)
writetable(filename::Union{AbstractString, IO}, tables::Vector{Tuple{String, Vector{Any}, Vector{String}}}; overwrite::Bool=false)
Write multiple tables.
kw
is a variable keyword argument list. Each element should be in this format: sheetname=( data, column_names )
, where data
is a vector of columns and column_names
is a vector of column labels.
Example:
julia> import DataFrames, XLSX
julia> df1 = DataFrames.DataFrame(COL1=[10,20,30], COL2=["Fist", "Sec", "Third"])
julia> df2 = DataFrames.DataFrame(AA=["aa", "bb"], AB=[10.1, 10.2])
julia> XLSX.writetable("report.xlsx", "REPORT_A" => df1, "REPORT_B" => df2)
XLSX.writetable!
— Functionwritetable!(sheet::Worksheet, table; anchor_cell::CellRef=CellRef("A1")))
Write Tables.jl table
to the specified sheet.
writetable!(
sheet::Worksheet,
data,
columnnames;
anchor_cell::CellRef=CellRef("A1"),
write_columnnames::Bool=true,
)
Writes tabular data data
with labels given by columnnames
to sheet
, starting at anchor_cell
.
data
must be a vector of columns. columnnames
must be a vector of column labels.
See also: XLSX.writetable
.
XLSX.rename!
— Functionrename!(ws::Worksheet, name::AbstractString)
Renames a Worksheet
.
XLSX.addsheet!
— Functionaddsheet!(workbook, [name]) :: Worksheet
Create a new worksheet with named name
. If name
is not provided, a unique name is created.