Skip to content

Parsing Content

Besides parsing content with DOM or cheerio methods, we provide some jiant.parse.* methods for convenience.

js
// DOM
const dom = await jiant.get.dom()
let text = dom.querySelector('div.title').textContent
let link = dom.querySelector('a').href

// cheerio
const $ = await jiant.get.$()
let text = $('div.example').find('div.title').text()
let link = $('div.example').find('a').attr('href')

Reference

jiant.parse.readability(doc: DOM, options: object) -> object

@param: options are allowed except serializer refer to Readability documentation.

Returns object value same as new Readability(doc, options).parse()

js
let dom = await jiant.get.dom()
try {
    let res = jiant.parse.readability(dom)
    console.log(res)
    // output: {
    //     title: '...',
    //     content: '<p>....</p>'
    //     textContent: '....',
    //     publishedTime: '....'
    //     ...
    // }
} catch (error) {}

jiant.parse.toURL(url: string) -> string

Parse relative URL.

js
// When target pageURL is 'https://jiant.ing'

jiant.parse.toURL('/faq')
// output: https://jiant.ing/faq

jiant.parse.toURL('google.com')
// output: https://google.com

jiant.parse.toURL('//jiant.ing/faq')
// output: https://jiant.ing/faq

// unable to parse return orginal value
jiant.parse.toURL('agsidugauidiausgda')
// output: agsidugauidiausgda

jiant.parse.date(date, ...options) -> Date

@param: date options refer to day.js documentation.

Equals to dayjs(date, ...options).toDate()

js
let d = jiant.parse.date('2024-08-22 12:34:56')
// OR
let d = jiant.parse.data('2024-08-22 12:34:56', 'YYYY-MM-DD HH:mm:ss')

jiant.parse.timezoneOffset(date: Date, timezoneOffset) -> Date

Some websites may not convert the time zone according to the visitor's location, resulting in a date that doesn't accurately reflect the user's local time. To avoid this issue, you can manually specify the time zone.

js
let d = jiant.parse.date('2024-08-22 12:34:56')
let dz = jiant.parse.timezoneOffset(d, -6)

jiant.parse.markdownToHTML(md: string) -> string

  • Line breaks \n will be rendered as <br>
  • Raw HTML in markdown text will be ignored.
js
let d = jiant.parse.markdownToHTML('## example title \n **bold text** \n normal \n <b>ignore html</b>')
// output:
// <h2>example title</h2>
// <p><strong>bold text</strong><br>
// &lt;b&gt;ignore html&lt;/b&gt;</p>

jiant.parse.pathRegExp(url: string, pathRegExps: string|string[])

Get params from URL with RegExp. Implementation refers to path-to-regexp documentation.

js
let d = jiant.parse.pathRegExp('http://earth.example.com/usa/ca', '/:nation/:state')
// output: { nation: 'usa', state: 'ca'}

let d = jiant.parse.pathRegExp('http://earth.example.com/usa/ca', [
    ':plant.example.com/:nation/:state',
    '/:nation/:state'
])
// output: {plant: 'earth', nation: 'usa', state: 'ca'}

let d = jiant.parse.pathRegExp('http://example.com/usa/ca', [
    ':plant.example.com/:nation/:state',
    '/:nation/:state'
])
// output: { nation: 'usa', state: 'ca'}

jiant.parse.cheerioLoad(html)

Equals to cheerio.load. Refer to cheerio documentation.

js
const $ = jiant.parse.cheerioLoad('<h2 class="title">Hello world</h2>');

$('h2.title').text('Hello there!');
$('h2').addClass('welcome');

$.html();
// output:
// <html><head></head><body><h2 class="title welcome">Hello there!</h2></body></html>

jiant.parse.rss({ url }) -> Promise

Fetch RSS feed URL and parse to formatted object

js
let d = await jiant.parse.rss({
    url: 'https://feed.jiant.ing/r/example-XXXXX'
})
// output:
// {
//     title: 'XXXXX',
//     pageTitle: 'XXXXXX',
//     pageUrl: 'https://example.com/XXXXX',
//     items: [{
//         title,
//         description,
//         link,
//         pubDate,
//         author,
//     },
//     ...
//     ]
// }