Cells and data
Cell referencing
XLSX.CellRef
— TypeCellRef(n::AbstractString)
CellRef(row::Int, col::Int)
A CellRef
represents a cell location given by row and column identifiers.
CellRef("B6")
indicates a cell located at column 2
and row 6
.
These row and column integers can also be passed directly to the CellRef
constructor: CellRef(6,2) == CellRef("B6")
.
Finally, a convenience macro @ref_str
is provided: ref"B6" == CellRef("B6")
.
Examples
cn = XLSX.CellRef("AB1")
println( XLSX.row_number(cn) ) # will print 1
println( XLSX.column_number(cn) ) # will print 28
println( string(cn) ) # will print out AB1
cn = XLSX.CellRef(1, 28)
println( XLSX.row_number(cn) ) # will print 1
println( XLSX.column_number(cn) ) # will print 28
println( string(cn) ) # will print out AB1
cn = XLSX.ref"AB1"
println( XLSX.row_number(cn) ) # will print 1
println( XLSX.column_number(cn) ) # will print 28
println( string(cn) ) # will print out AB1
XLSX.row_number
— Functionrow_number(c::CellRef) :: Int
Returns the row number of a given cell reference.
XLSX.column_number
— Functioncolumn_number(c::CellRef) :: Int
Returns the column number of a given cell reference.
Base.eachrow
— Functioneachrow(sheet)
Creates a row iterator for a worksheet.
Example: Query all cells from columns 1 to 4.
left = 1 # 1st column
right = 4 # 4th column
for sheetrow in eachrow(sheet)
for column in left:right
cell = XLSX.getcell(sheetrow, column)
# do something with cell
end
end
Note: The eachrow
row iterator will not return any row that consists entirely of EmptyCell
s. These are simply not seen by the iterator. The length(eachrow(sheet))
function therefore defines the number of rows that are not entirely empty and will, in any case, only succeed if the worksheet cache is in use.
XLSX.eachtablerow
— Functioneachtablerow(sheet, [columns]; [first_row], [column_labels], [header], [stop_in_empty_row], [stop_in_row_function], [keep_empty_rows], [normalizenames]) -> TableRowIterator
Constructs an iterator of table rows. Each element of the iterator is of type TableRow
.
header
is a boolean indicating whether the first row of the table is a table header.
If header == false
and no column_labels
were supplied, column names will be generated following the column names found in the Excel file.
The columns
argument is a column range, as in "B:E"
. If columns
is not supplied, the column range will be inferred by the non-empty contiguous cells in the first row of the table.
The user can replace column names by assigning the optional column_labels
input variable with a Vector{Symbol}
.
stop_in_empty_row
is a boolean indicating whether an empty row marks the end of the table. If stop_in_empty_row=false
, the iterator will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true
. Empty rows may be returned by the iterator when stop_in_empty_row=false
.
stop_in_row_function
is a Function that receives a TableRow
and returns a Bool
indicating if the end of the table was reached.
Example for stop_in_row_function
:
function stop_function(r)
v = r[:col_label]
return !ismissing(v) && v == "unwanted value"
end
keep_empty_rows
determines whether rows where all column values are equal to missing
are kept (true
) or skipped (false
) by the row iterator. keep_empty_rows
never affects the bounds of the iterator; the number of rows read from a sheet is only affected by first_row
, stop_in_empty_row
and stop_in_row_function
(if specified). keep_empty_rows
is only checked once the first and last row of the table have been determined, to see whether to keep or drop empty rows between the first and the last row.
normalizenames
controls whether column names will be "normalized" to valid Julia identifiers. By default, this is false. If normalizenames=true, then column names with spaces, or that start with numbers, will be adjusted with underscores to become valid Julia identifiers. This is useful when you want to access columns via dot-access or getproperty, like file.col1. The identifier that comes after the . must be valid, so spaces or identifiers starting with numbers aren't allowed. (Based ib CSV.jl's CSV.normalizename
.)
Example code:
for r in XLSX.eachtablerow(sheet)
# r is a `TableRow`. Values are read using column labels or numbers.
rn = XLSX.row_number(r) # `TableRow` row number.
v1 = r[1] # will read value at table column 1.
v2 = r[:COL_LABEL2] # will read value at column labeled `:COL_LABEL2`.
end
See also XLSX.gettable
.
Cell data
XLSX.readdata
— Functionreaddata(source, sheet, ref)
readdata(source, sheetref)
Return a scalar, vector or matrix with values from a spreadsheet file. 'ref' can be a defined name, a cell reference or a cell, column, row or non-contiguous range.
See also XLSX.getdata
.
Examples
These function calls are equivalent.
julia> XLSX.readdata("myfile.xlsx", "mysheet", "A2:B4")
3×2 Array{Any,2}:
1 "first"
2 "second"
3 "third"
julia> XLSX.readdata("myfile.xlsx", 1, "A2:B4")
3×2 Array{Any,2}:
1 "first"
2 "second"
3 "third"
julia> XLSX.readdata("myfile.xlsx", "mysheet!A2:B4")
3×2 Array{Any,2}:
1 "first"
2 "second"
3 "third"
Non-contiguous ranges return vectors of Array{Any, 2} with an entry for every non-contiguous (comma-separated) element in the range.
julia> XLSX.readdata("customXml.xlsx", "Mock-up", "Location") # `Location` is a `definedName` for a non-contiguous range
4-element Vector{Matrix{Any}}:
["Here";;]
[missing;;]
[missing;;]
[missing;;]
XLSX.getdata
— Functiongetdata(sheet, ref)
getdata(sheet, row, column)
Returns a scalar, matrix or a vector of matrices with values from a spreadsheet.
ref
can be a cell reference or a range or a valid defined name.
If ref
is a single cell, a scalar is returned.
Most ranges are rectangular and will return a 2-D matrix (Array{AbstractCell, 2}
). For row and column ranges, the extent of the range in the other dimension is determined by the worksheet's dimension.
A non-contiguous range (which may not be rectangular) will return a vector of Array{AbstractCell, 2}
matrices with one element for each non-contiguous (comma separated) element in the range.
Indexing in a Worksheet
will dispatch to getdata
method.
Example
julia> f = XLSX.readxlsx("myfile.xlsx")
julia> sheet = f["mysheet"] # Worksheet
julia> matrix = sheet["A1:B4"] # CellRange
julia> matrix = sheet["A:B"] # Column range
julia> matrix = sheet["1:4"] # Row range
julia> matrix = sheet["Contiguous"] # Named range
julia> matrix = sheet[1:30, 1] # use unit ranges to define rows and/or columns
julia> matrix = sheet[[1, 2, 3], 1] # vectors of integers to define rows and/or columns
julia> vector = sheet["A1:A4,C1:C4,G5"] # Non-contiguous range
julia> vector = sheet["Location"] # Non-contiguous named range
julia> scalar = sheet[2, 2] # Cell "B2"
See also XLSX.readdata
.
getdata(ws::Worksheet, cell::Cell) :: CellValue
Returns a Julia representation of a given cell value. The result data type is chosen based on the value of the cell as well as its style.
For example, date is stored as integers inside the spreadsheet, and the style is the information that is taken into account to chose Date
as the result type.
For numbers, if the style implies that the number is visualized with decimals, the method will return a float, even if the underlying number is stored as an integer inside the spreadsheet XML.
If cell
has empty value or empty String
, this function will return missing
.
XLSX.getcell
— Functiongetcell(xlsxfile, cell_reference_name) :: AbstractCell
getcell(worksheet, cell_reference_name) :: AbstractCell
getcell(sheetrow, column_name) :: AbstractCell
getcell(sheetrow, column_number) :: AbstractCell
Returns the internal representation of a worksheet cell.
Returns XLSX.EmptyCell
if the cell has no data.
getcell(sheet, ref)
getcell(sheet, row, col)
Return an AbstractCell
that represents a cell in the spreadsheet. Return a 2-D matrix as Array{AbstractCell, 2}
if ref
is a rectangular range. For row and column ranges, the extent of the range in the other dimension is determined by the worksheet's dimension. A non-contiguous range (which may not be rectangular) will return a vector of Array{AbstractCell, 2}
with one element for each non-contiguous (comma separated) element in the range.
If ref
is a range, getcell
dispatches to getcellrange
.
Example:
julia> xf = XLSX.readxlsx("myfile.xlsx")
julia> sheet = xf["mysheet"]
julia> cell = XLSX.getcell(sheet, "A1")
julia> cell = XLSX.getcell(sheet, 1:3, [2,4,6])
Other examples are as getdata()
.
XLSX.getcellrange
— Functiongetcellrange(sheet, rng)
Return a matrix with cells as Array{AbstractCell, 2}
. rng
must be a valid cell range, column range or row range, as in "A1:B2"
, "A:B"
or "1:2"
, or a non-contiguous range. For row and column ranges, the extent of the range in the other dimension is determined by the worksheet's dimension. A non-contiguous range (which may not be rectangular) will return a vector of Array{AbstractCell, 2}
with one element for each non-contiguous (comma separated) element in the range.
Example:
julia> ncr = "B3,A1,C2" # non-contiguous range, "out of order".
"B3,A1,C2"
julia> XLSX.getcellrange(f[1], ncr)
3-element Vector{Matrix{XLSX.AbstractCell}}:
[XLSX.Cell(B3, "", "", "5", XLSX.Formula("", nothing));;]
[XLSX.Cell(A1, "", "", "2", XLSX.Formula("", nothing));;]
[XLSX.Cell(C2, "", "", "5", XLSX.Formula("", nothing));;]
XLSX.gettable
— Functiongettable(
sheet,
[columns];
[first_row],
[column_labels],
[header],
[infer_eltypes],
[stop_in_empty_row],
[stop_in_row_function],
[keep_empty_rows],
[normalizenames]
) -> DataTable
Returns tabular data from a spreadsheet as a struct XLSX.DataTable
. Use this function to create a DataFrame
from package DataFrames.jl
.
Use columns
argument to specify which columns to get. For example, "B:D"
will select columns B
, C
and D
. If columns
is not given, the algorithm will find the first sequence of consecutive non-empty cells.
Use first_row
to indicate the first row from the table. first_row=5
will look for a table starting at sheet row 5
. If first_row
is not given, the algorithm will look for the first non-empty row in the spreadsheet.
header
is a Bool
indicating if the first row is a header. If header=true
and column_labels
is not specified, the column labels for the table will be read from the first row of the table. If header=false
and column_labels
is not specified, the algorithm will generate column labels. The default value is header=true
.
Use column_labels
as a vector of symbols to specify names for the header of the table.
Use normalizenames=true
to normalize column names to valid Julia identifiers.
Use infer_eltypes=true
to get data
as a Vector{Any}
of typed vectors. The default value is infer_eltypes=true
.
stop_in_empty_row
is a boolean indicating whether an empty row marks the end of the table. If stop_in_empty_row=false
, the TableRowIterator
will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true
.
stop_in_row_function
is a Function that receives a TableRow
and returns a Bool
indicating if the end of the table was reached.
Example for stop_in_row_function
function stop_function(r)
v = r[:col_label]
return !ismissing(v) && v == "unwanted value"
end
keep_empty_rows
determines whether rows where all column values are equal to missing
are kept (true
) or dropped (false
) from the resulting table. keep_empty_rows
never affects the bounds of the table; the number of rows read from a sheet is only affected by first_row
, stop_in_empty_row
and stop_in_row_function
(if specified). keep_empty_rows
is only checked once the first and last row of the table have been determined, to see whether to keep or drop empty rows between the first and the last row.
Example
julia> using DataFrames, XLSX
julia> df = XLSX.openxlsx("myfile.xlsx") do xf
DataFrame(XLSX.gettable(xf["mysheet"]))
end
See also: XLSX.readtable
.
XLSX.readtable
— Functionreadtable(
source,
[sheet,
[columns]];
[first_row],
[column_labels],
[header],
[infer_eltypes],
[stop_in_empty_row],
[stop_in_row_function],
[enable_cache],
[keep_empty_rows],
[normalizenames]
) -> DataTable
Returns tabular data from a spreadsheet as a struct XLSX.DataTable
. Use this function to create a DataFrame
from package DataFrames.jl
(or other Tables.jl
` compatible object).
If sheet
is not given, the first sheet in the XLSXFile
will be used.
Use columns
argument to specify which columns to get. For example, "B:D"
will select columns B
, C
and D
. If columns
is not given, the algorithm will find the first sequence of consecutive non-empty cells. A valid sheet
must be specified when specifying columns
.
Use first_row
to indicate the first row of the table. first_row=5
will look for a table starting at sheet row 5
. If first_row
is not given, the algorithm will look for the first non-empty row in the spreadsheet.
header
is a Bool
indicating if the first row is a header. If header=true
and column_labels
is not specified, the column labels for the table will be read from the first row of the table. If header=false
and column_labels
is not specified, the algorithm will generate column labels. The default value is header=true
.
Use column_labels
to specify names for the header of the table.
Use normalizenames=true
to normalize column names to valid Julia identifiers.
Use infer_eltypes=true
to get data
as a Vector{Any}
of typed vectors. The default value is infer_eltypes=true
.
stop_in_empty_row
is a boolean indicating whether an empty row marks the end of the table. If stop_in_empty_row=false
, the TableRowIterator
will continue to fetch rows until there's no more rows in the Worksheet or range. The default behavior is stop_in_empty_row=true
.
stop_in_row_function
is a Function that receives a TableRow
and returns a Bool
indicating if the end of the table was reached.
Example for stop_in_row_function
:
function stop_function(r)
v = r[:col_label]
return !ismissing(v) && v == "unwanted value"
end
enable_cache
is a boolean that determines whether cell data are loaded into the worksheet cache on reading. The default behavior is enable_cache=true
.
keep_empty_rows
determines whether rows where all column values are equal to missing
are kept (true
) or dropped (false
) from the resulting table. keep_empty_rows
never affects the bounds of the table; the number of rows read from a sheet is only affected by first_row
, stop_in_empty_row
and stop_in_row_function
(if specified). keep_empty_rows
is only checked once the first and last row of the table have been determined, to see whether to keep or drop empty rows between the first and the last row. The default behavior is `keep_empty_rows=false
.
Example
julia> using DataFrames, XLSX
julia> df = DataFrame(XLSX.readtable("myfile.xlsx", "mysheet"))
See also: XLSX.gettable
.
XLSX.readto
— Functionreadto(
source,
[sheet,
[columns]],
sink;
[first_row],
[column_labels],
[header],
[infer_eltypes],
[stop_in_empty_row],
[stop_in_row_function],
[enable_cache],
[keep_empty_rows],
[normalizenames]
) -> sink
Read and parse an Excel worksheet, materializing directly using the sink
function (e.g. DataFrame
or StructArray
).
Takes the same keyword arguments as XLSX.readtable
Example
julia> using DataFrames, StructArrays, XLSX
julia> df = XLSX.readto("myfile.xlsx", DataFrame)
julia> df = XLSX.readto("myfile.xlsx", StructArray)
julia> df = XLSX.readto("myfile.xlsx", "mysheet", DataFrame)
julia> df = XLSX.readto("myfile.xlsx", "mysheet", "A:C", DataFrame)
See also: XLSX.gettable
.
XLSX.writetable
— Functionwritetable(filename, table; [overwrite], [sheetname])
Write a Tables.jl compatible table
as an Excel file with the specified file name (and sheet name, if specified).
If a file with the given name already exists, writing will fail unless overwrite=true
is specified, in which case the existing file will be overwritten.
writetable(filename::Union{AbstractString, IO}, tables::Vector{Pair{String, T}}; overwrite::Bool=false)
writetable(filename::Union{AbstractString, IO}, tables::Pair{String, Any}...; overwrite::Bool=false)
writetable(filename, data, columnnames; [overwrite], [sheetname])
data
is a vector of columns.columnames
is a vector of column labels.overwrite
is aBool
to control iffilename
should be overwritten if already exists.sheetname
is the name for the worksheet.
Returns the filepath of the written file if a filename is supplied, or nothing
if writing to an IO
.
Example
import XLSX
columns = [ [1, 2, 3, 4], ["Hey", "You", "Out", "There"], [10.2, 20.3, 30.4, 40.5] ]
colnames = [ "integers", "strings", "floats" ]
XLSX.writetable("table.xlsx", columns, colnames)
See also: XLSX.writetable!
.
writetable(filename::Union{AbstractString, IO}; overwrite::Bool=false, kw...)
writetable(filename::Union{AbstractString, IO}, tables::Vector{Tuple{String, Vector{Any}, Vector{String}}}; overwrite::Bool=false)
Write multiple tables.
kw
is a variable keyword argument list. Each element should be in this format: sheetname=( data, column_names )
, where data
is a vector of columns and column_names
is a vector of column labels.
Returns the filepath of the written file if a filename is supplied, or nothing
if writing to an IO
.
Example:
julia> import DataFrames, XLSX
julia> df1 = DataFrames.DataFrame(COL1=[10,20,30], COL2=["Fist", "Sec", "Third"])
julia> df2 = DataFrames.DataFrame(AA=["aa", "bb"], AB=[10.1, 10.2])
julia> XLSX.writetable("report.xlsx", "REPORT_A" => df1, "REPORT_B" => df2)
XLSX.writetable!
— Functionwritetable!(sheet::Worksheet, table; anchor_cell::CellRef=CellRef("A1")))
Write a Tables.jl compatible table
to the specified sheet starting with the anchor cell (if given) in the top left.
writetable!(
sheet::Worksheet,
data,
columnnames;
anchor_cell::CellRef=CellRef("A1"),
write_columnnames::Bool=true,
)
Write tabular data data
with labels given by columnnames
to sheet
, starting at anchor_cell
.
data
must be a vector of columns. columnnames
must be a vector of column labels.
Column labels that are not of type String
will be converted to strings before writing. Any data columns that are not of type String
, Float64
, Int64
, Bool
, Date
, Time
, DateTime
, Missing
, or Nothing
will be converted to strings before writing.
See also: XLSX.writetable
.
Defined names
XLSX.addDefinedName
— FunctionaddDefinedName(xf::XLSXFile, name::AbstractString, value::Union{Int, Float64, String}; absolute=true)
addDefinedName(xf::XLSXFile, name::AbstractString, value::AbstractString; absolute=true)
addDefinedName(sh::Worksheet, name::AbstractString, value::Union{Int, Float64, String}; absolute=true)
addDefinedName(sh::Worksheet, name::AbstractString, value::AbstractString; absolute=true)
Add a defined name to the Workbook or Worksheet. If an XLSXFile
is passed, the defined name is added to the Workbook. If a Worksheet
is passed, the defined name is added to the Worksheet.
When adding defined name referring to a cell or range to a workbook, value
must include the sheet name (e.g. Sheet1!A1:B2
).
If the new definedName
is a cell reference or range, by default, it will be an absolute reference (e.g. $A$1:$C$6
). If absolute=false
is specified, the new definedName
will be a relative reference (e.g. A1:C6
). Any absolute
argument specified is ignored if the definedName
is not a cell reference or range.
In the context of XLSX.jl
there is no difference between an absolute reference and a relative reference. However, Excel treats them differently. When definedNames
are read in as part of an XLSXFile, we keep track of whether they are absolute or not. If the XLSXFile is subsequently written out again, the status of the definedNames
is preserved.
Examples
julia> XLSX.addDefinedName(sh, "ID", "C21")
julia> XLSX.addDefinedName(sh, "NEW", "A1:B2")
julia> XLSX.addDefinedName(sh, "my_name", "A1,B2,C3")
julia> XLSX.addDefinedName(xf, "New", "'Mock-up'!A1:B2")
julia> XLSX.addDefinedName(xf, "Life_the_universe_and_everything", 42)
julia> XLSX.addDefinedName(xf, "first_name", "Hello World")