API
XLSX.CellDataFormat
— Type.Keeps track of formatting information.
XLSX.CellRange
— Type.A CellRange
represents a rectangular range of cells in a spreadsheet.
CellRange("A1:C4")
denotes cells ranging from A1
(upper left corner) to C4
(bottom right corner).
As a convenience, @range_str
macro is provided.
cr = XLSX.range"A1:C4"
XLSX.CellRef
— Type.A CellRef
represents a cell location given by row and column identifiers.
CellRef("A6")
indicates a cell located at column 1
and row 6
.
Example:
cn = XLSX.CellRef("AB1")
println( XLSX.row_number(cn) ) # will print 1
println( XLSX.column_number(cn) ) # will print 28
println( string(cn) ) # will print out AB1
As a convenience, @ref_str
macro is provided.
cn = XLSX.ref"AB1"
println( XLSX.row_number(cn) ) # will print 1
println( XLSX.column_number(cn) ) # will print 28
println( string(cn) ) # will print out AB1
XLSX.CellValue
— Type.CellValue is a Julia type of a value read from a Spreadsheet.
XLSX.Relationship
— Type.Relationships are defined in ECMA-376-1 Section 9.2. This struct matches the Relationship
tag attribute names.
A Relashipship
defines relations between the files inside a MSOffice package. Regarding Spreadsheets, there are two kinds of relationships:
* package level: defined in `_rels/.rels`.
* workbook level: defined in `xl/_rels/workbook.xml.rels`.
The function parse_relationships!(xf::XLSXFile)
is used to parse package and workbook level relationships.
XLSX.SharedStringTable
— Type.Shared String Table
XLSX.SheetRowIterator
— Type.Iterates over Worksheet cells. See eachrow
method docs. Each element is a SheetRow
.
Implementations: SheetRowStreamIterator, WorksheetCache.
XLSX.Workbook
— Type.Workbook is the result of parsing file xl/workbook.xml
.
XLSX.XLSXFile
— Type.XLSXFile
stores all XML data from an Excel file.
filepath
is the filepath of the source file for this XLSXFile. data
stored the raw XML data. It maps internal XLSX filenames to XMLDocuments. workbook
is the result of parsing xl/workbook.xml
.
Base.in
— Method.Base.in(ref::CellRef, rng::CellRange) :: Bool
Checks wether ref
is a cell reference inside a range given by rng
.
Base.issubset
— Method.Base.issubset(subrng::CellRange, rng::CellRange)
Checks wether subrng
is a cell range contained in rng
.
Base.iterate
— Function.SheetRowStreamIterator(ws::Worksheet)
Creates a reader for row elements in the Worksheet's XML. Will return a stream reader positioned in the first row element if it exists.
If there's no row element inside sheetData XML tag, it will close all streams and return nothing
.
XLSX.add_relationship!
— Method.Adds new relationship. Returns new generated rId.
XLSX.add_shared_string!
— Method.add_shared_string!(sheet, str_unformatted, [str_formatted]) :: Int
Add string to shared string table. Returns the 0-based index of the shared string in the shared string table.
XLSX.addsheet!
— Function.addsheet!(workbook, [name]) :: Worksheet
Create a new worksheet with named name
. If name
is not provided, a unique name is created.
XLSX.column_bounds
— Method.column_bounds(sr::SheetRow)
Returns a tuple with the first and last index of the columns for a SheetRow
.
XLSX.column_number
— Method.column_number(c::CellRef) :: Int
Returns the column number of a given cell reference.
XLSX.decode_column_number
— Method.decode_column_number(column_name::AbstractString) :: Int
Converts column name to a column number.
julia> XLSX.decode_column_number("D")
4
See also: encode_column_number
.
XLSX.default_cell_format
— Method.Returns the default CellDataFormat
for a type
XLSX.eachrow
— Method.eachrow(sheet)
Creates a row iterator for a worksheet.
Example: Query all cells from columns 1 to 4.
left = 1 # 1st column
right = 4 # 4th column
for sheetrow in XLSX.eachrow(sheet)
for column in left:right
cell = XLSX.getcell(sheetrow, column)
# do something with cell
end
end
XLSX.eachtablerow
— Method.eachtablerow(sheet, [columns]; [first_row], [column_labels], [header], [stop_in_empty_row], [stop_in_row_function])
Constructs an iterator of table rows. Each element of the iterator is of type TableRow
.
header
is a boolean indicating wether the first row of the table is a table header.
If header == false
and no names
were supplied, column names will be generated following the column names found in the Excel file. Also, the column range will be inferred by the non-empty contiguous cells in the first row of the table.
The user can replace column names by assigning the optional names
input variable with a Vector{Symbol}
.
stop_in_empty_row
is a boolean indicating wether an empty row marks the end of the table. If stop_in_empty_row=false
, the iterator will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true
. Empty rows may be returned by the iterator when stop_in_empty_row=false
.
stop_in_row_function
is a Function that receives a TableRow
and returns a Bool
indicating if the end of the table was reached.
Example for stop_in_row_function
:
function stop_function(r)
v = r[:col_label]
return !ismissing(v) && v == "unwanted value"
end
Example code:
for r in XLSX.eachtablerow(sheet)
# r is a `TableRow`. Values are read using column labels or numbers.
rn = XLSX.row_number(r) # `TableRow` row number.
v1 = r[1] # will read value at table column 1.
v2 = r[:COL_LABEL2] # will read value at column labeled `:COL_LABEL2`.
end
See also gettable
.
XLSX.encode_column_number
— Method.encode_column_number(column_number::Int) :: String
Converts column number to a column name.
Example
julia> XLSX.encode_column_number(4)
"D"
See also: decode_column_number
.
XLSX.excel_value_to_date
— Method.Converts Excel number to Date.
See also: isdate1904
function.
XLSX.excel_value_to_datetime
— Method.Converts Excel number to DateTime.
The decimal part represents the Time (see _time
function). The integer part represents the Date.
See also: isdate1904
function.
XLSX.excel_value_to_time
— Method.Converts Excel number to Time. x
must be between 0 and 1.
To represent Time, Excel uses the decimal part of a floating point number. 1
equals one day.
XLSX.filenames
— Method.Lists internal files from the XLSX package.
XLSX.get_dimension
— Method.Retuns the dimension of this worksheet as a CellRange.
XLSX.get_shared_string_index
— Method.Checks if string is inside shared string table. Returns nothing
if it's not in the shared string table. Returns the index of the string in the shared string table. The index is 0-based.
XLSX.getcell
— Method.getcell(sheet, ref)
Returns an AbstractCell
that represents a cell in the spreadsheet.
Example:
julia> xf = XLSX.readxlsx("myfile.xlsx")
julia> sheet = xf["mysheet"]
julia> cell = XLSX.getcell(sheet, "A1")
XLSX.getcellrange
— Method.getcellrange(sheet, rng)
Returns a matrix with cells as Array{AbstractCell, 2}
. rng
must be a valid cell range, as in "A1:B2"
.
XLSX.getdata
— Method.getdata(sheet, ref)
Returns a escalar or a matrix with values from a spreadsheet. ref
can be a cell reference or a range.
Indexing in a Worksheet
will dispatch to getdata
method.
Example
julia> f = XLSX.readxlsx("myfile.xlsx")
julia> sheet = f["mysheet"]
julia> v = sheet["A1:B4"]
See also readdata
.
XLSX.getdata
— Method.getdata(ws::Worksheet, cell::Cell) :: CellValue
Returns a Julia representation of a given cell value. The result data type is chosen based on the value of the cell as well as its style.
For example, date is stored as integers inside the spreadsheet, and the style is the information that is taken into account to chose Date
as the result type.
For numbers, if the style implies that the number is visualized with decimals, the method will return a float, even if the underlying number is stored as an integer inside the spreadsheet XML.
If cell
has empty value or empty String
, this function will return missing
.
XLSX.gettable
— Method.gettable(sheet, [columns]; [first_row], [column_labels], [header], [infer_eltypes], [stop_in_empty_row], [stop_in_row_function]) -> data, column_labels
Returns tabular data from a spreadsheet as a tuple (data, column_labels)
. data
is a vector of columns. column_labels
is a vector of symbols. Use this function to create a DataFrame
from package DataFrames.jl
.
Use columns
argument to specify which columns to get. For example, columns="B:D"
will select columns B
, C
and D
. If columns
is not given, the algorithm will find the first sequence of consecutive non-empty cells.
Use first_row
to indicate the first row from the table. first_row=5
will look for a table starting at sheet row 5
. If first_row
is not given, the algorithm will look for the first non-empty row in the spreadsheet.
header
is a Bool
indicating if the first row is a header. If header=true
and column_labels
is not specified, the column labels for the table will be read from the first row of the table. If header=false
and column_labels
is not specified, the algorithm will generate column labels. The default value is header=true
.
Use column_labels
as a vector of symbols to specify names for the header of the table.
Use infer_eltypes=true
to get data
as a Vector{Any}
of typed vectors. The default value is infer_eltypes=false
.
stop_in_empty_row
is a boolean indicating wether an empty row marks the end of the table. If stop_in_empty_row=false
, the TableRowIterator
will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true
.
stop_in_row_function
is a Function that receives a TableRow
and returns a Bool
indicating if the end of the table was reached.
Example for stop_in_row_function
:
function stop_function(r)
v = r[:col_label]
return !ismissing(v) && v == "unwanted value"
end
Rows where all column values are equal to missing
are dropped.
Example code for gettable
:
julia> using DataFrames, XLSX
julia> df = XLSX.openxlsx("myfile.xlsx") do xf
DataFrame(XLSX.gettable(xf["mysheet"])...)
end
See also: readtable
.
XLSX.has_sst
— Method.has_sst(workbook::Workbook)
Checks wether this workbook has a Shared String Table.
XLSX.internal_xml_file_isread
— Method.Returns true if the file data was read into xl.data.
XLSX.is_cache_enabled
— Method.Indicates wether worksheet cache will be fed while reading worksheet cells.
XLSX.is_end_of_sheet_data
— Method.Detects a closing sheetData element
XLSX.is_valid_cellname
— Method.is_valid_cellname(n::AbstractString) :: Bool
Checks wether n
is a valid name for a cell.
Cell names are bounded by A1 : XFD1048576
.
XLSX.is_writable
— Method.is_writable(xl::XLSXFile)
Indicates wether this XLSX file can be edited. This controls if assignment to worksheet cells is allowed. Writable XLSXFile instances are opened with XLSX.open_xlsx_template
method.
XLSX.isdate1904
— Method.isdate1904(wb) :: Bool
Returns true if workbook follows date1904 convention.
XLSX.open_empty_template
— Function.open_empty_template(sheetname::AbstractString="") :: XLSXFile
Returns an empty, writable XLSXFile
with 1 worksheet.
sheetname
is the name of the worksheet, defaults to Sheet1
.
XLSX.open_internal_file_stream
— Method.Open a file for streaming.
XLSX.open_xlsx_template
— Method.open_xlsx_template(filepath::AbstractString) :: XLSXFile
Open an Excel file as template for editing and saving to another file with XLSX.writexlsx
.
The returned XLSXFile
instance is in closed state.
XLSX.openxlsx
— Method.openxlsx(f::Function, filepath::AbstractString; mode::AbstractString="r", enable_cache::Bool=true)
Open XLSX file for reading and/or writing. It returns an opened XLSXFile that will be automatically closed after applying f
to the file.
Do
syntax
This function should be used with do
syntax, like in:
XLSX.openxlsx("myfile.xlsx") do xf
# read data from `xf`
end
Filemodes
The mode
argument controls how the file is opened. The following modes are allowed:
r
: read mode. The existing data infilepath
will be accessible for reading. This is the default mode.w
: write mode. Opens an empty file that will be written tofilepath
.rw
: edit mode. Opensfilepath
for editing. The file will be saved to disk when the function ends.
Arguments
filepath
is the complete path to the file.mode
is the file mode, as explained in the last section.enable_cache
:
If enable_cache=true
, all read worksheet cells will be cached. If you read a worksheet cell twice it will use the cached value instead of reading from disk in the second time.
If enable_cache=false
, worksheet cells will always be read from disk. This is useful when you want to read a spreadsheet that doesn't fit into memory.
The default value is enable_cache=true
.
Examples
Read from file
The following example shows how you would read worksheet cells, one row at a time, where myfile.xlsx
is a spreadsheet that doesn't fit into memory.
julia> XLSX.openxlsx("myfile.xlsx", enable_cache=false) do xf
for r in XLSX.eachrow(xf["mysheet"])
# read something from row `r`
end
end
Write a new file
XLSX.openxlsx("new.xlsx", mode="w") do xf
sheet = xf[1]
sheet[1, :] = [1, Date(2018, 1, 1), "test"]
end
Edit an existing file
XLSX.openxlsx("edit.xlsx", mode="rw") do xf
sheet = xf[1]
sheet[2, :] = [2, Date(2019, 1, 1), "add new line"]
end
See also readxlsx
method.
XLSX.parse_file_mode
— Method.Parses filemode string to the tuple (read, write). See openxlsx
.
XLSX.parse_relationships!
— Method.Parses package level relationships defined in _rels/.rels
. Prases workbook level relationships defined in xl/_rels/workbook.xml.rels
.
XLSX.parse_workbook!
— Method.parse_workbook!(xf::XLSXFile)
Updates xf.workbook from xf.data["xl/workbook.xml"]
XLSX.readtable
— Method.readtable(filepath, sheet, [columns]; [first_row], [column_labels], [header], [infer_eltypes], [stop_in_empty_row], [stop_in_row_function]) -> data, column_labels
Returns tabular data from a spreadsheet as a tuple (data, column_labels)
. data
is a vector of columns. column_labels
is a vector of symbols. Use this function to create a DataFrame
from package DataFrames.jl
.
Use columns
argument to specify which columns to get. For example, columns="B:D"
will select columns B
, C
and D
. If columns
is not given, the algorithm will find the first sequence of consecutive non-empty cells.
Use first_row
to indicate the first row from the table. first_row=5
will look for a table starting at sheet row 5
. If first_row
is not given, the algorithm will look for the first non-empty row in the spreadsheet.
header
is a Bool
indicating if the first row is a header. If header=true
and column_labels
is not specified, the column labels for the table will be read from the first row of the table. If header=false
and column_labels
is not specified, the algorithm will generate column labels. The default value is header=true
.
Use column_labels
as a vector of symbols to specify names for the header of the table.
Use infer_eltypes=true
to get data
as a Vector{Any}
of typed vectors. The default value is infer_eltypes=false
.
stop_in_empty_row
is a boolean indicating wether an empty row marks the end of the table. If stop_in_empty_row=false
, the TableRowIterator
will continue to fetch rows until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true
.
stop_in_row_function
is a Function that receives a TableRow
and returns a Bool
indicating if the end of the table was reached.
Example for stop_in_row_function
:
function stop_function(r)
v = r[:col_label]
return !ismissing(v) && v == "unwanted value"
end
Rows where all column values are equal to missing
are dropped.
Example code for readtable
:
julia> using DataFrames, XLSX
julia> df = DataFrame(XLSX.readtable("myfile.xlsx", "mysheet")...)
See also: `gettable`.
XLSX.readxlsx
— Method.readxlsx(filepath) :: XLSXFile
Main function for reading an Excel file. This function will read the whole Excel file into memory and return a closed XLSXFile.
Consider using openxlsx
for lazy loading of Excel file contents.
XLSX.relative_cell_position
— Method.Returns (row, column) representing a ref
position relative to rng
.
For example, for a range "B2:D4", we have:
"C3" relative position is (2, 2)
"B2" relative position is (1, 1)
"C4" relative position is (3, 2)
"D4" relative position is (3, 3)
XLSX.row_number
— Method.row_number(c::CellRef) :: Int
Returns the row number of a given cell reference.
XLSX.sheet_column_numbers
— Method.Returns real sheet column numbers (based on cellref)
XLSX.sheetcount
— Method.Counts the number of sheets in the Workbook.
XLSX.sheetnames
— Method.Lists Worksheet names for this Workbook.
XLSX.split_cellname
— Method.split_cellname(n::AbstractString) -> column_name, row_number
Splits a string representing a cell name to its column name and row number.
Example
julia> XLSX.split_cellname("AB:12")
("AB:", 12)
XLSX.split_cellrange
— Method.split_cellrange(n::AbstractString) -> start_name, stop_name
Splits a string representing a cell range into its cell names.
Example
julia> XLSX.split_cellrange("AB12:CD24")
("AB12", "CD24")
XLSX.split_column_range
— Method.Returns tuple (columnnamestart, columnnamestop).
XLSX.sst_formatted_string
— Method.sst_formatted_string(wb, index) :: String
Looks for a formatted string inside the Shared Strings Table (sst). index
starts at 0.
XLSX.sst_unformatted_string
— Method.sst_unformatted_string(wb, index) :: String
Looks for a string inside the Shared Strings Table (sst). index
starts at 0.
XLSX.styles_add_font
— Method.Defines a custom font. Returns the index to be used as the fontId
in a cellXf definition.
XLSX.styles_add_numFmt
— Method.Defines a custom number format to render numbers, dates or text. Returns the index to be used as the numFmtId
in a cellXf definition.
XLSX.styles_cell_xf
— Method.Returns the xf XML node element for style index
. index
is 0-based.
XLSX.styles_cell_xf_numFmtId
— Method.Queries numFmtId from cellXfs -> xf nodes.
XLSX.styles_get_cellXf_with_numFmtId
— Method.Cell Xf element follows the XML format below. This function queries the 0-based index of the first xf element that has the provided numFmtId. Returns -1 if not found.
<styleSheet ...
<cellXfs count="5">
<xf borderId="0" fillId="0" fontId="0" numFmtId="0" xfId="0"/>
<xf applyNumberFormat="1" borderId="0" fillId="0" fontId="0" numFmtId="14" xfId="0"/>
<xf applyNumberFormat="1" borderId="0" fillId="0" fontId="0" numFmtId="20" xfId="0"/>
<xf applyNumberFormat="1" borderId="0" fillId="0" fontId="0" numFmtId="22" xfId="0"/>
XLSX.styles_numFmt_formatCode
— Method.Queries numFmt formatCode field by numFmtId.
XLSX.table_column_numbers
— Method.Returns an iterator for table column numbers.
XLSX.table_column_to_sheet_column_number
— Method.Maps table column index (1-based) -> sheet column index (cellref based)
XLSX.unformatted_text
— Method.unformatted_text(el::EzXML.Node) :: String
Helper function to gather unformatted text from Excel data files. It looks at all childs of el
for tag name t
and returns a join of all the strings found.
XLSX.writetable
— Method.writetable(filename, data, columnnames; [overwrite], [sheetname])
data
is a vector of columns. columnames
is a vector of column labels. overwrite
is a Bool
to control if filename
should be overwritten if already exists. sheetname
is the name for the worksheet.
Example using DataFrames.jl
:
import DataFrames, XLSX
df = DataFrames.DataFrame(integers=[1, 2, 3, 4], strings=["Hey", "You", "Out", "There"], floats=[10.2, 20.3, 30.4, 40.5])
XLSX.writetable("df.xlsx", DataFrames.columns(df), DataFrames.names(df))
XLSX.writetable
— Method.writetable(filename::AbstractString; overwrite::Bool=false, kw...)
writetable(filename::AbstractString, tables::Vector{Tuple{String, Vector{Any}, Vector{String}}}; overwrite::Bool=false)
Write multiple tables.
kw
is a variable keyword argument list. Each element should be in this format: sheetname=( data, column_names )
, where data
is a vector of columns and column_names
is a vector of column labels.
Example:
import DataFrames, XLSX
df1 = DataFrames.DataFrame(COL1=[10,20,30], COL2=["Fist", "Sec", "Third"])
df2 = DataFrames.DataFrame(AA=["aa", "bb"], AB=[10.1, 10.2])
XLSX.writetable("report.xlsx", REPORT_A=( DataFrames.columns(df1), DataFrames.names(df1) ), REPORT_B=( DataFrames.columns(df2), DataFrames.names(df2) ))
XLSX.writexlsx
— Method.writexlsx(output_filepath, xlsx_file; [overwrite=false])
Writes an Excel file given by xlsx_file::XLSXFile
to file at path output_filepath
.
If overwrite=true
, output_filepath
will be overwritten if it exists.
XLSX.xlsx_encode
— Method.Returns the datatype and value for val
to be inserted into ws
.
XLSX.xmldocument
— Method.xmldocument(xl::XLSXFile, filename::String) :: EzXML.Document
Utility method to find the XMLDocument associated with a given package filename. Returns xl.data[filename] if it exists. Throws an error if it doesn't.
XLSX.xmlroot
— Method.xmlroot(xl::XLSXFile, filename::String) :: EzXML.Node
Utility method to return the root element of a given XMLDocument from the package. Returns EzXML.root(xl.data[filename]) if it exists.