Cells and data

Cell referencing

XLSX.CellRef — Type

CellRef(n::AbstractString)
CellRef(row::Int, col::Int)

A CellRef represents a cell location given by row and column identifiers.

CellRef("B6") indicates a cell located at column 2 and row 6.

These row and column integers can also be passed directly to the CellRef constructor: CellRef(6,2) == CellRef("B6").

Finally, a convenience macro @ref_str is provided: ref"B6" == CellRef("B6").

Examples

cn = XLSX.CellRef("AB1")
println( XLSX.row_number(cn) ) # will print 1
println( XLSX.column_number(cn) ) # will print 28
println( string(cn) ) # will print out AB1

cn = XLSX.CellRef(1, 28)
println( XLSX.row_number(cn) ) # will print 1
println( XLSX.column_number(cn) ) # will print 28
println( string(cn) ) # will print out AB1

cn = XLSX.ref"AB1"
println( XLSX.row_number(cn) ) # will print 1
println( XLSX.column_number(cn) ) # will print 28
println( string(cn) ) # will print out AB1

source

XLSX.row_number — Function

row_number(c::CellRef) :: Int

Returns the row number of a given cell reference.

source

XLSX.column_number — Function

column_number(c::CellRef) :: Int

Returns the column number of a given cell reference.

source

XLSX.eachrow — Function

eachrow(sheet)

Creates a row iterator for a worksheet.

Example: Query all cells from columns 1 to 4.

left = 1  # 1st column
right = 4 # 4th column
for sheetrow in eachrow(sheet)
    for column in left:right
        cell = XLSX.getcell(sheetrow, column)

        # do something with cell
    end
end

Note: The eachrow row iterator will not return any row that consists entirely of EmptyCells. These are simply not seen by the iterator. The length(eachrow(sheet)) function therefore defines the number of rows that are not entirely empty and will, in any case, only succeed if the worksheet cache is in use.

source

XLSX.eachtablerow — Function

eachtablerow(sheet, [columns]; [first_row], [column_labels], [header], [stop_in_empty_row], [stop_in_row_function], [keep_empty_rows], [normalizenames]) -> TableRowIterator

Constructs an iterator of table rows. Each element of the iterator is of type TableRow.

header is a boolean indicating whether the first row of the table is a table header.

If header == false and no column_labels were supplied, column names will be generated following the column names found in the Excel file.

The columns argument is a column range, as in "B:E". If columns is not supplied, the column range will be inferred by the non-empty contiguous cells in the first row of the table.

The user can replace column names by assigning the optional column_labels input variable with a Vector{Symbol}.

stop_in_empty_row is a boolean indicating whether an empty row marks the end of the table. If stop_in_empty_row=false, the iterator will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true. Empty rows may be returned by the iterator when stop_in_empty_row=false.

stop_in_row_function is a Function that receives a TableRow and returns a Bool indicating if the end of the table was reached.

Example for stop_in_row_function:

function stop_function(r)
    v = r[:col_label]
    return !ismissing(v) && v == "unwanted value"
end

keep_empty_rows determines whether rows where all column values are equal to missing are kept (true) or skipped (false) by the row iterator. keep_empty_rows never affects the bounds of the iterator; the number of rows read from a sheet is only affected by first_row, stop_in_empty_row and stop_in_row_function (if specified). keep_empty_rows is only checked once the first and last row of the table have been determined, to see whether to keep or drop empty rows between the first and the last row.

normalizenames controls whether column names will be "normalized" to valid Julia identifiers. By default, this is false. If normalizenames=true, then column names with spaces, or that start with numbers, will be adjusted with underscores to become valid Julia identifiers. This is useful when you want to access columns via dot-access or getproperty, like file.col1. The identifier that comes after the . must be valid, so spaces or identifiers starting with numbers aren't allowed. (Based ib CSV.jl's CSV.normalizename.)

Example code:

for r in XLSX.eachtablerow(sheet)
    # r is a `TableRow`. Values are read using column labels or numbers.
    rn = XLSX.row_number(r) # `TableRow` row number.
    v1 = r[1] # will read value at table column 1.
    v2 = r[:COL_LABEL2] # will read value at column labeled `:COL_LABEL2`.
end

Cell data

XLSX.readdata — Function

readdata(source, sheet, ref)
readdata(source, sheetref)

Return a scalar, vector or matrix with values from a spreadsheet file. 'ref' can be a defined name, a cell reference or a cell, column, row or non-contiguous range.

See also XLSX.readdata.

source

getdata(ws::Worksheet, cell::Cell) :: CellValue

Returns a Julia representation of a given cell value. The result data type is chosen based on the value of the cell as well as its style.

For example, date is stored as integers inside the spreadsheet, and the style is the information that is taken into account to chose Date as the result type.

For numbers, if the style implies that the number is visualized with decimals, the method will return a float, even if the underlying number is stored as an integer inside the spreadsheet XML.

If cell has empty value or empty String, this function will return missing.

source

XLSX.getcell — Function

getcell(xlsxfile, cell_reference_name) :: AbstractCell
getcell(worksheet, cell_reference_name) :: AbstractCell
getcell(sheetrow, column_name) :: AbstractCell
getcell(sheetrow, column_number) :: AbstractCell

Returns the internal representation of a worksheet cell.

Returns XLSX.EmptyCell if the cell has no data.

source

getcell(sheet, ref)
getcell(sheet, row, col)

Return an AbstractCell that represents a cell in the spreadsheet. Return a 2-D matrix as Array{AbstractCell, 2} if ref is a rectangular range. For row and column ranges, the extent of the range in the other dimension is determined by the worksheet's dimension. A non-contiguous range (which may not be rectangular) will return a vector of Array{AbstractCell, 2} with one element for each non-contiguous (comma separated) element in the range.

If ref is a range, getcell dispatches to getcellrange.

Example:

julia> xf = XLSX.readxlsx("myfile.xlsx")

julia> sheet = xf["mysheet"]

julia> cell = XLSX.getcell(sheet, "A1")

julia> cell = XLSX.getcell(sheet, 1:3, [2,4,6])

Other examples are as getdata().

source

XLSX.getcellrange — Function

getcellrange(sheet, rng)

Return a matrix with cells as Array{AbstractCell, 2}. rng must be a valid cell range, column range or row range, as in "A1:B2", "A:B" or "1:2", or a non-contiguous range. For row and column ranges, the extent of the range in the other dimension is determined by the worksheet's dimension. A non-contiguous range (which may not be rectangular) will return a vector of Array{AbstractCell, 2} with one element for each non-contiguous (comma separated) element in the range.

Example:

julia> ncr = "B3,A1,C2" # non-contiguous range, "out of order".
"B3,A1,C2"

julia>  XLSX.getcellrange(f[1], ncr)
3-element Vector{Matrix{XLSX.AbstractCell}}:
 [XLSX.Cell(B3, "", "", "5", XLSX.Formula("", nothing));;]
 [XLSX.Cell(A1, "", "", "2", XLSX.Formula("", nothing));;]
 [XLSX.Cell(C2, "", "", "5", XLSX.Formula("", nothing));;]

For other examples, see getcell() and getdata().

source

XLSX.gettable — Function

gettable(
    sheet,
    [columns];
    [first_row],
    [column_labels],
    [header],
    [infer_eltypes],
    [stop_in_empty_row],
    [stop_in_row_function],
    [keep_empty_rows],
    [normalizenames]
) -> DataTable

Returns tabular data from a spreadsheet as a struct XLSX.DataTable. Use this function to create a DataFrame from package DataFrames.jl.

Use columns argument to specify which columns to get. For example, "B:D" will select columns B, C and D. If columns is not given, the algorithm will find the first sequence of consecutive non-empty cells.

Use first_row to indicate the first row from the table. first_row=5 will look for a table starting at sheet row 5. If first_row is not given, the algorithm will look for the first non-empty row in the spreadsheet.

header is a Bool indicating if the first row is a header. If header=true and column_labels is not specified, the column labels for the table will be read from the first row of the table. If header=false and column_labels is not specified, the algorithm will generate column labels. The default value is header=true.

Use column_labels as a vector of symbols to specify names for the header of the table.

Use normalizenames=true to normalize column names to valid Julia identifiers.

Use infer_eltypes=true to get data as a Vector{Any} of typed vectors. The default value is infer_eltypes=true.

stop_in_empty_row is a boolean indicating whether an empty row marks the end of the table. If stop_in_empty_row=false, the TableRowIterator will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true.

stop_in_row_function is a Function that receives a TableRow and returns a Bool indicating if the end of the table was reached.

Example for stop_in_row_function

function stop_function(r)
    v = r[:col_label]
    return !ismissing(v) && v == "unwanted value"
end

keep_empty_rows determines whether rows where all column values are equal to missing are kept (true) or dropped (false) from the resulting table. keep_empty_rows never affects the bounds of the table; the number of rows read from a sheet is only affected by first_row, stop_in_empty_row and stop_in_row_function (if specified). keep_empty_rows is only checked once the first and last row of the table have been determined, to see whether to keep or drop empty rows between the first and the last row.

Example

julia> using DataFrames, XLSX

julia> df = XLSX.openxlsx("myfile.xlsx") do xf
        DataFrame(XLSX.gettable(xf["mysheet"]))
    end

Defined names

XLSX.addDefinedName — Function

addDefinedName(xf::XLSXFile,  name::AbstractString, value::Union{Int, Float64, String}; absolute=true)
addDefinedName(xf::XLSXFile,  name::AbstractString, value::AbstractString; absolute=true)
addDefinedName(sh::Worksheet, name::AbstractString, value::Union{Int, Float64, String}; absolute=true)
addDefinedName(sh::Worksheet, name::AbstractString, value::AbstractString; absolute=true)

Add a defined name to the Workbook or Worksheet. If an XLSXFile is passed, the defined name is added to the Workbook. If a Worksheet is passed, the defined name is added to the Worksheet.

When adding defined name referring to a cell or range to a workbook, value must include the sheet name (e.g. Sheet1!A1:B2).

If the new definedName is a cell reference or range, by default, it will be an absolute reference (e.g. $A$1:$C$6). If absolute=false is specified, the new definedName will be a relative reference (e.g. A1:C6). Any absolute argument specified is ignored if the definedName is not a cell reference or range.

In the context of XLSX.jl there is no difference between an absolute reference and a relative reference. However, Excel treats them differently. When definedNames are read in as part of an XLSXFile, we keep track of whether they are absolute or not. If the XLSXFile is subsequently written out again, the status of the definedNames is preserved.

Examples

julia> XLSX.addDefinedName(sh, "ID", "C21")

julia> XLSX.addDefinedName(sh, "NEW", "A1:B2")

julia> XLSX.addDefinedName(sh, "my_name", "A1,B2,C3")

julia> XLSX.addDefinedName(xf, "New", "'Mock-up'!A1:B2")

julia> XLSX.addDefinedName(xf, "Life_the_universe_and_everything", 42)

julia> XLSX.addDefinedName(xf, "first_name", "Hello World")

source