View on GitHub

LuaRock "htmlparser"

Parse HTML text into a tree of elements with selectors

Download this project as a .zip file Download this project as a tar.gz file

Install

Htmlparser is a listed LuaRock. Install using LuaRocks: luarocks install htmlparser

Dependencies

Htmlparser depends on Lua 5.2, and on the "set" LuaRock, which is installed along automatically. To be able to run the tests, lunitx also comes along as a LuaRock

Usage

Start off with

require("luarocks.loader")
local htmlparser = require("htmlparser")

Then, parse some html:

local root = htmlparser.parse(htmlstring)

The input to parse may be the contents of a complete html document, or any valid html snippet, as long as all tags are correctly opened and closed. Now, find specific contained elements by selecting:

local elements = root:select(selectorstring)

Or in shorthand:

local elements = root(selectorstring)

This wil return a list of elements, all of which are of the same type as the root element, and thus support selecting as well, if ever needed:

for _,e in ipairs(elements) do
    print(e.name)
    local subs = e(subselectorstring)
    for _,sub in ipairs(subs) do
        print("", sub.name)
    end
end

The root element is a container for the top level elements in the parsed text, i.e. the <html> element in a parsed html document would be a child of the returned root element.

Selectors

Supported selectors are a subset of jQuery's selectors:

Selectors can be combined; e.g. ".class:not([attribute]) element.class"

Element type

All tree elements provide, apart from :select and (), the following accessors:

Basic

Other

Limitations

Examples

See ./doc/sample.lua

Tests

See ./tst/init.lua

License

LGPL+; see ./doc/LICENSE