Parsing Content
Besides parsing content with DOM
or cheerio
methods, we provide some jiant.parse.*
methods for convenience.
// DOM
const dom = await jiant.get.dom()
let text = dom.querySelector('div.title').textContent
let link = dom.querySelector('a').href
// cheerio
const $ = await jiant.get.$()
let text = $('div.example').find('div.title').text()
let link = $('div.example').find('a').attr('href')
Reference
jiant.parse.readability(doc: DOM, options: object) -> object
@param: options
are allowed except serializer
refer to Readability documentation.
Returns object value same as new Readability(doc, options).parse()
let dom = await jiant.get.dom()
try {
let res = jiant.parse.readability(dom)
console.log(res)
// output: {
// title: '...',
// content: '<p>....</p>'
// textContent: '....',
// publishedTime: '....'
// ...
// }
} catch (error) {}
jiant.parse.toURL(url: string) -> string
Parse relative URL.
// When target pageURL is 'https://jiant.ing'
jiant.parse.toURL('/faq')
// output: https://jiant.ing/faq
jiant.parse.toURL('google.com')
// output: https://google.com
jiant.parse.toURL('//jiant.ing/faq')
// output: https://jiant.ing/faq
// unable to parse return orginal value
jiant.parse.toURL('agsidugauidiausgda')
// output: agsidugauidiausgda
jiant.parse.date(date, ...options) -> Date
@param: date
options
refer to day.js documentation.
Equals to dayjs(date, ...options).toDate()
let d = jiant.parse.date('2024-08-22 12:34:56')
// OR
let d = jiant.parse.data('2024-08-22 12:34:56', 'YYYY-MM-DD HH:mm:ss')
jiant.parse.timezoneOffset(date: Date, timezoneOffset) -> Date
Some websites may not convert the time zone according to the visitor's location, resulting in a date that doesn't accurately reflect the user's local time. To avoid this issue, you can manually specify the time zone.
let d = jiant.parse.date('2024-08-22 12:34:56')
let dz = jiant.parse.timezoneOffset(d, -6)
jiant.parse.markdownToHTML(md: string) -> string
- Line breaks
\n
will be rendered as<br>
- Raw HTML in markdown text will be ignored.
let d = jiant.parse.markdownToHTML('## example title \n **bold text** \n normal \n <b>ignore html</b>')
// output:
// <h2>example title</h2>
// <p><strong>bold text</strong><br>
// <b>ignore html</b></p>
jiant.parse.pathRegExp(url: string, pathRegExps: string|string[])
Get params from URL with RegExp. Implementation refers to path-to-regexp documentation.
let d = jiant.parse.pathRegExp('http://earth.example.com/usa/ca', '/:nation/:state')
// output: { nation: 'usa', state: 'ca'}
let d = jiant.parse.pathRegExp('http://earth.example.com/usa/ca', [
':plant.example.com/:nation/:state',
'/:nation/:state'
])
// output: {plant: 'earth', nation: 'usa', state: 'ca'}
let d = jiant.parse.pathRegExp('http://example.com/usa/ca', [
':plant.example.com/:nation/:state',
'/:nation/:state'
])
// output: { nation: 'usa', state: 'ca'}
jiant.parse.cheerioLoad(html)
Equals to cheerio.load
. Refer to cheerio documentation.
const $ = jiant.parse.cheerioLoad('<h2 class="title">Hello world</h2>');
$('h2.title').text('Hello there!');
$('h2').addClass('welcome');
$.html();
// output:
// <html><head></head><body><h2 class="title welcome">Hello there!</h2></body></html>
jiant.parse.rss({ url }) -> Promise
Fetch RSS feed URL and parse to formatted object
let d = await jiant.parse.rss({
url: 'https://feed.jiant.ing/r/example-XXXXX'
})
// output:
// {
// title: 'XXXXX',
// pageTitle: 'XXXXXX',
// pageUrl: 'https://example.com/XXXXX',
// items: [{
// title,
// description,
// link,
// pubDate,
// author,
// },
// ...
// ]
// }