API

API

Keeps track of formatting information.

source
XLSX.CellRangeType.

A CellRange represents a rectangular range of cells in a spreadsheet.

CellRange("A1:C4") denotes cells ranging from A1 (upper left corner) to C4 (bottom right corner).

As a convenience, @range_str macro is provided.

cr = XLSX.range"A1:C4"
source
XLSX.CellRefType.

A CellRef represents a cell location given by row and column identifiers.

CellRef("A6") indicates a cell located at column 1 and row 6.

Example:

cn = XLSX.CellRef("AB1")
println( XLSX.row_number(cn) ) # will print 1
println( XLSX.column_number(cn) ) # will print 28
println( string(cn) ) # will print out AB1

As a convenience, @ref_str macro is provided.

cn = XLSX.ref"AB1"
println( XLSX.row_number(cn) ) # will print 1
println( XLSX.column_number(cn) ) # will print 28
println( string(cn) ) # will print out AB1
source
XLSX.CellValueType.

CellValue is a Julia type of a value read from a Spreadsheet.

source

Relationships are defined in ECMA-376-1 Section 9.2. This struct matches the Relationship tag attribute names.

A Relashipship defines relations between the files inside a MSOffice package. Regarding Spreadsheets, there are two kinds of relationships:

* package level: defined in `_rels/.rels`.
* workbook level: defined in `xl/_rels/workbook.xml.rels`.

The function parse_relationships!(xf::XLSXFile) is used to parse package and workbook level relationships.

source

Shared String Table

source

Iterates over Worksheet cells. See eachrow method docs. Each element is a SheetRow.

Implementations: SheetRowStreamIterator, WorksheetCache.

source
XLSX.WorkbookType.

Workbook is the result of parsing file xl/workbook.xml.

source
XLSX.XLSXFileType.

XLSXFile stores all XML data from an Excel file.

filepath is the filepath of the source file for this XLSXFile. data stored the raw XML data. It maps internal XLSX filenames to XMLDocuments. workbook is the result of parsing xl/workbook.xml.

source
Base.inMethod.
Base.in(ref::CellRef, rng::CellRange) :: Bool

Checks wether ref is a cell reference inside a range given by rng.

source
Base.issubsetMethod.
Base.issubset(subrng::CellRange, rng::CellRange)

Checks wether subrng is a cell range contained in rng.

source
Base.iterateFunction.
SheetRowStreamIterator(ws::Worksheet)

Creates a reader for row elements in the Worksheet's XML. Will return a stream reader positioned in the first row element if it exists.

If there's no row element inside sheetData XML tag, it will close all streams and return nothing.

source

Adds new relationship. Returns new generated rId.

source
add_shared_string!(sheet, str_unformatted, [str_formatted]) :: Int

Add string to shared string table. Returns the 0-based index of the shared string in the shared string table.

source
XLSX.addsheet!Function.
addsheet!(workbook, [name]) :: Worksheet

Create a new worksheet with named name. If name is not provided, a unique name is created.

source
XLSX.column_boundsMethod.

column_bounds(sr::SheetRow)

Returns a tuple with the first and last index of the columns for a SheetRow.

source
XLSX.column_numberMethod.
column_number(c::CellRef) :: Int

Returns the column number of a given cell reference.

source
decode_column_number(column_name::AbstractString) :: Int

Converts column name to a column number.

julia> XLSX.decode_column_number("D")
4

See also: encode_column_number.

source

Returns the default CellDataFormat for a type

source
XLSX.eachrowMethod.
eachrow(sheet)

Creates a row iterator for a worksheet.

Example: Query all cells from columns 1 to 4.

left = 1  # 1st column
right = 4 # 4th column
for sheetrow in XLSX.eachrow(sheet)
    for column in left:right
        cell = XLSX.getcell(sheetrow, column)

        # do something with cell
    end
end
source
XLSX.eachtablerowMethod.
eachtablerow(sheet, [columns]; [first_row], [column_labels], [header], [stop_in_empty_row], [stop_in_row_function])

Constructs an iterator of table rows. Each element of the iterator is of type TableRow.

header is a boolean indicating wether the first row of the table is a table header.

If header == false and no names were supplied, column names will be generated following the column names found in the Excel file. Also, the column range will be inferred by the non-empty contiguous cells in the first row of the table.

The user can replace column names by assigning the optional names input variable with a Vector{Symbol}.

stop_in_empty_row is a boolean indicating wether an empty row marks the end of the table. If stop_in_empty_row=false, the iterator will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true. Empty rows may be returned by the iterator when stop_in_empty_row=false.

stop_in_row_function is a Function that receives a TableRow and returns a Bool indicating if the end of the table was reached.

Example for stop_in_row_function:

function stop_function(r)
    v = r[:col_label]
    return !ismissing(v) && v == "unwanted value"
end

Example code:

for r in XLSX.eachtablerow(sheet)
    # r is a `TableRow`. Values are read using column labels or numbers.
    rn = XLSX.row_number(r) # `TableRow` row number.
    v1 = r[1] # will read value at table column 1.
    v2 = r[:COL_LABEL2] # will read value at column labeled `:COL_LABEL2`.
end

See also gettable.

source
encode_column_number(column_number::Int) :: String

Converts column number to a column name.

Example

julia> XLSX.encode_column_number(4)
"D"

See also: decode_column_number.

source

Converts Excel number to Date.

See also: isdate1904 function.

source

Converts Excel number to DateTime.

The decimal part represents the Time (see _time function). The integer part represents the Date.

See also: isdate1904 function.

source

Converts Excel number to Time. x must be between 0 and 1.

To represent Time, Excel uses the decimal part of a floating point number. 1 equals one day.

source
XLSX.filenamesMethod.

Lists internal files from the XLSX package.

source
XLSX.get_dimensionMethod.

Retuns the dimension of this worksheet as a CellRange.

source

Checks if string is inside shared string table. Returns nothing if it's not in the shared string table. Returns the index of the string in the shared string table. The index is 0-based.

source
XLSX.getcellMethod.
getcell(sheet, ref)

Returns an AbstractCell that represents a cell in the spreadsheet.

Example:

julia> xf = XLSX.readxlsx("myfile.xlsx")

julia> sheet = xf["mysheet"]

julia> cell = XLSX.getcell(sheet, "A1")
source
XLSX.getcellrangeMethod.
getcellrange(sheet, rng)

Returns a matrix with cells as Array{AbstractCell, 2}. rng must be a valid cell range, as in "A1:B2".

source
XLSX.getdataMethod.
getdata(sheet, ref)

Returns a escalar or a matrix with values from a spreadsheet. ref can be a cell reference or a range.

Indexing in a Worksheet will dispatch to getdata method.

Example

julia> f = XLSX.readxlsx("myfile.xlsx")

julia> sheet = f["mysheet"]

julia> v = sheet["A1:B4"]

See also readdata.

source
XLSX.getdataMethod.
getdata(ws::Worksheet, cell::Cell) :: CellValue

Returns a Julia representation of a given cell value. The result data type is chosen based on the value of the cell as well as its style.

For example, date is stored as integers inside the spreadsheet, and the style is the information that is taken into account to chose Date as the result type.

For numbers, if the style implies that the number is visualized with decimals, the method will return a float, even if the underlying number is stored as an integer inside the spreadsheet XML.

If cell has empty value or empty String, this function will return missing.

source
XLSX.gettableMethod.
gettable(sheet, [columns]; [first_row], [column_labels], [header], [infer_eltypes], [stop_in_empty_row], [stop_in_row_function]) -> data, column_labels

Returns tabular data from a spreadsheet as a tuple (data, column_labels). data is a vector of columns. column_labels is a vector of symbols. Use this function to create a DataFrame from package DataFrames.jl.

Use columns argument to specify which columns to get. For example, columns="B:D" will select columns B, C and D. If columns is not given, the algorithm will find the first sequence of consecutive non-empty cells.

Use first_row to indicate the first row from the table. first_row=5 will look for a table starting at sheet row 5. If first_row is not given, the algorithm will look for the first non-empty row in the spreadsheet.

header is a Bool indicating if the first row is a header. If header=true and column_labels is not specified, the column labels for the table will be read from the first row of the table. If header=false and column_labels is not specified, the algorithm will generate column labels. The default value is header=true.

Use column_labels as a vector of symbols to specify names for the header of the table.

Use infer_eltypes=true to get data as a Vector{Any} of typed vectors. The default value is infer_eltypes=false.

stop_in_empty_row is a boolean indicating wether an empty row marks the end of the table. If stop_in_empty_row=false, the TableRowIterator will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true.

stop_in_row_function is a Function that receives a TableRow and returns a Bool indicating if the end of the table was reached.

Example for stop_in_row_function:

function stop_function(r)
    v = r[:col_label]
    return !ismissing(v) && v == "unwanted value"
end

Rows where all column values are equal to missing are dropped.

Example code for gettable:

julia> using DataFrames, XLSX

julia> df = XLSX.openxlsx("myfile.xlsx") do xf
                DataFrame(XLSX.gettable(xf["mysheet"])...)
            end

See also: readtable.

source
XLSX.has_sstMethod.
has_sst(workbook::Workbook)

Checks wether this workbook has a Shared String Table.

source

Returns true if the file data was read into xl.data.

source

Indicates wether worksheet cache will be fed while reading worksheet cells.

source

Detects a closing sheetData element

source
is_valid_cellname(n::AbstractString) :: Bool

Checks wether n is a valid name for a cell.

Cell names are bounded by A1 : XFD1048576.

source
XLSX.is_writableMethod.
is_writable(xl::XLSXFile)

Indicates wether this XLSX file can be edited. This controls if assignment to worksheet cells is allowed. Writable XLSXFile instances are opened with XLSX.open_xlsx_template method.

source
XLSX.isdate1904Method.
isdate1904(wb) :: Bool

Returns true if workbook follows date1904 convention.

source
open_empty_template(sheetname::AbstractString="") :: XLSXFile

Returns an empty, writable XLSXFile with 1 worksheet.

sheetname is the name of the worksheet, defaults to Sheet1.

source

Open a file for streaming.

source
open_xlsx_template(filepath::AbstractString) :: XLSXFile

Open an Excel file as template for editing and saving to another file with XLSX.writexlsx.

The returned XLSXFile instance is in closed state.

source
XLSX.openxlsxMethod.
openxlsx(f::Function, filepath::AbstractString; mode::AbstractString="r", enable_cache::Bool=true)

Open XLSX file for reading and/or writing. It returns an opened XLSXFile that will be automatically closed after applying f to the file.

Do syntax

This function should be used with do syntax, like in:

XLSX.openxlsx("myfile.xlsx") do xf
    # read data from `xf`
end

Filemodes

The mode argument controls how the file is opened. The following modes are allowed:

  • r : read mode. The existing data in filepath will be accessible for reading. This is the default mode.

  • w : write mode. Opens an empty file that will be written to filepath.

  • rw : edit mode. Opens filepath for editing. The file will be saved to disk when the function ends.

Arguments

  • filepath is the complete path to the file.

  • mode is the file mode, as explained in the last section.

  • enable_cache:

If enable_cache=true, all read worksheet cells will be cached. If you read a worksheet cell twice it will use the cached value instead of reading from disk in the second time.

If enable_cache=false, worksheet cells will always be read from disk. This is useful when you want to read a spreadsheet that doesn't fit into memory.

The default value is enable_cache=true.

Examples

Read from file

The following example shows how you would read worksheet cells, one row at a time, where myfile.xlsx is a spreadsheet that doesn't fit into memory.

julia> XLSX.openxlsx("myfile.xlsx", enable_cache=false) do xf
          for r in XLSX.eachrow(xf["mysheet"])
              # read something from row `r`
          end
       end

Write a new file

XLSX.openxlsx("new.xlsx", mode="w") do xf
    sheet = xf[1]
    sheet[1, :] = [1, Date(2018, 1, 1), "test"]
end

Edit an existing file

XLSX.openxlsx("edit.xlsx", mode="rw") do xf
    sheet = xf[1]
    sheet[2, :] = [2, Date(2019, 1, 1), "add new line"]
end

See also readxlsx method.

source

Parses filemode string to the tuple (read, write). See openxlsx.

source

Parses package level relationships defined in _rels/.rels. Prases workbook level relationships defined in xl/_rels/workbook.xml.rels.

source

parse_workbook!(xf::XLSXFile)

Updates xf.workbook from xf.data["xl/workbook.xml"]

source
XLSX.readtableMethod.
readtable(filepath, sheet, [columns]; [first_row], [column_labels], [header], [infer_eltypes], [stop_in_empty_row], [stop_in_row_function]) -> data, column_labels

Returns tabular data from a spreadsheet as a tuple (data, column_labels). data is a vector of columns. column_labels is a vector of symbols. Use this function to create a DataFrame from package DataFrames.jl.

Use columns argument to specify which columns to get. For example, columns="B:D" will select columns B, C and D. If columns is not given, the algorithm will find the first sequence of consecutive non-empty cells.

Use first_row to indicate the first row from the table. first_row=5 will look for a table starting at sheet row 5. If first_row is not given, the algorithm will look for the first non-empty row in the spreadsheet.

header is a Bool indicating if the first row is a header. If header=true and column_labels is not specified, the column labels for the table will be read from the first row of the table. If header=false and column_labels is not specified, the algorithm will generate column labels. The default value is header=true.

Use column_labels as a vector of symbols to specify names for the header of the table.

Use infer_eltypes=true to get data as a Vector{Any} of typed vectors. The default value is infer_eltypes=false.

stop_in_empty_row is a boolean indicating wether an empty row marks the end of the table. If stop_in_empty_row=false, the TableRowIterator will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true.

stop_in_row_function is a Function that receives a TableRow and returns a Bool indicating if the end of the table was reached.

Example for stop_in_row_function:

function stop_function(r)
    v = r[:col_label]
    return !ismissing(v) && v == "unwanted value"
end

Rows where all column values are equal to missing are dropped.

Example code for readtable:

julia> using DataFrames, XLSX

julia> df = DataFrame(XLSX.readtable("myfile.xlsx", "mysheet")...)

See also: `gettable`.
source
XLSX.readxlsxMethod.
readxlsx(filepath) :: XLSXFile

Main function for reading an Excel file. This function will read the whole Excel file into memory and return a closed XLSXFile.

Consider using openxlsx for lazy loading of Excel file contents.

source

Returns (row, column) representing a ref position relative to rng.

For example, for a range "B2:D4", we have:

  • "C3" relative position is (2, 2)

  • "B2" relative position is (1, 1)

  • "C4" relative position is (3, 2)

  • "D4" relative position is (3, 3)

source
XLSX.row_numberMethod.
row_number(c::CellRef) :: Int

Returns the row number of a given cell reference.

source

Returns real sheet column numbers (based on cellref)

source
XLSX.sheetcountMethod.

Counts the number of sheets in the Workbook.

source
XLSX.sheetnamesMethod.

Lists Worksheet names for this Workbook.

source
split_cellname(n::AbstractString) -> column_name, row_number

Splits a string representing a cell name to its column name and row number.

Example

julia> XLSX.split_cellname("AB:12")
("AB:", 12)
source
split_cellrange(n::AbstractString) -> start_name, stop_name

Splits a string representing a cell range into its cell names.

Example

julia> XLSX.split_cellrange("AB12:CD24")
("AB12", "CD24")
source

Returns tuple (columnnamestart, columnnamestop).

source
sst_formatted_string(wb, index) :: String

Looks for a formatted string inside the Shared Strings Table (sst). index starts at 0.

source
sst_unformatted_string(wb, index) :: String

Looks for a string inside the Shared Strings Table (sst). index starts at 0.

source

Defines a custom font. Returns the index to be used as the fontId in a cellXf definition.

source

Defines a custom number format to render numbers, dates or text. Returns the index to be used as the numFmtId in a cellXf definition.

source

Returns the xf XML node element for style index. index is 0-based.

source

Queries numFmtId from cellXfs -> xf nodes.

source

Cell Xf element follows the XML format below. This function queries the 0-based index of the first xf element that has the provided numFmtId. Returns -1 if not found.

<styleSheet ...
    <cellXfs count="5">
            <xf borderId="0" fillId="0" fontId="0" numFmtId="0" xfId="0"/>
            <xf applyNumberFormat="1" borderId="0" fillId="0" fontId="0" numFmtId="14" xfId="0"/>
            <xf applyNumberFormat="1" borderId="0" fillId="0" fontId="0" numFmtId="20" xfId="0"/>
            <xf applyNumberFormat="1" borderId="0" fillId="0" fontId="0" numFmtId="22" xfId="0"/>
source

Queries numFmt formatCode field by numFmtId.

source

Returns an iterator for table column numbers.

source

Maps table column index (1-based) -> sheet column index (cellref based)

source
unformatted_text(el::EzXML.Node) :: String

Helper function to gather unformatted text from Excel data files. It looks at all childs of el for tag name t and returns a join of all the strings found.

source
XLSX.writetableMethod.
writetable(filename, data, columnnames; [overwrite], [sheetname])

data is a vector of columns. columnames is a vector of column labels. overwrite is a Bool to control if filename should be overwritten if already exists. sheetname is the name for the worksheet.

Example using DataFrames.jl:

import DataFrames, XLSX
df = DataFrames.DataFrame(integers=[1, 2, 3, 4], strings=["Hey", "You", "Out", "There"], floats=[10.2, 20.3, 30.4, 40.5])
XLSX.writetable("df.xlsx", DataFrames.columns(df), DataFrames.names(df))
source
XLSX.writetableMethod.
writetable(filename::AbstractString; overwrite::Bool=false, kw...)
writetable(filename::AbstractString, tables::Vector{Tuple{String, Vector{Any}, Vector{String}}}; overwrite::Bool=false)

Write multiple tables.

kw is a variable keyword argument list. Each element should be in this format: sheetname=( data, column_names ), where data is a vector of columns and column_names is a vector of column labels.

Example:

import DataFrames, XLSX

df1 = DataFrames.DataFrame(COL1=[10,20,30], COL2=["Fist", "Sec", "Third"])
df2 = DataFrames.DataFrame(AA=["aa", "bb"], AB=[10.1, 10.2])

XLSX.writetable("report.xlsx", REPORT_A=( DataFrames.columns(df1), DataFrames.names(df1) ), REPORT_B=( DataFrames.columns(df2), DataFrames.names(df2) ))
source
XLSX.writexlsxMethod.
writexlsx(output_filepath, xlsx_file; [overwrite=false])

Writes an Excel file given by xlsx_file::XLSXFile to file at path output_filepath.

If overwrite=true, output_filepath will be overwritten if it exists.

source
XLSX.xlsx_encodeMethod.

Returns the datatype and value for val to be inserted into ws.

source
XLSX.xmldocumentMethod.
xmldocument(xl::XLSXFile, filename::String) :: EzXML.Document

Utility method to find the XMLDocument associated with a given package filename. Returns xl.data[filename] if it exists. Throws an error if it doesn't.

source
XLSX.xmlrootMethod.
xmlroot(xl::XLSXFile, filename::String) :: EzXML.Node

Utility method to return the root element of a given XMLDocument from the package. Returns EzXML.root(xl.data[filename]) if it exists.

source