Interacting with Page

Page lifecycle and timing

jiant.get.html( -> Promise: return copy of HTML.
jiant.get.dom( -> Promise: return copy of DOM.
jiant.get.$( -> Promise: return copy of HTML wrapped by cheerio.load.
jiant.action.pushURL( -> Promise
jiant.action.goBack( -> Promise

Promises 👆 up there will wait until target page's document.readyState getting interactive or complete.

At start of checking, checking script always runs after target page's document.readyState as well, if configure DISABLE_AUTO_OPEN_PAGE_URL is not set.

When you use methods jiant.action.click which will make target window reload, it is required to get new content again via html|dom|$ then wait it ready.

// at start of script
// It is already document.readyState now.

await jiant.action.click('ul.example-list li span.expand')
// It works.

let html = await jiant.get.html()
// Return html value instantly.

await jiant.action.click('ul.example-list li a')
// Click a link and target page starts redirecting.

await jiant.action.click('ul.other-list-on-new-page li span.expand') 
// It does not work, as page is not document ready.

let dom = await jiant.get.dom()
// Waiting document.readyState.

await jiant.action.click('ul.other-list-on-new-page li span.expand') 
// It works.

In lots of site, some parts of page may start loading async after DOM ready.

Some parts of page may start loading async after page scroll down.

You can use jiant.action.sleep waiting some time before doing actions.

// at start of script
// It is already document.readyState now.

await jiant.action.click('ul.async-list li span.expand') 
// It does not work if the part starts loading async after document ready.

await jiant.action.scroll(100, 621)
// make target page scroll-y down 621px, scroll-x right 100px.

await jiant.action.sleep(1000)
// Wait 1000 milliseconds for some async loading data. It depends.

await jiant.action.click('ul.async-list li span.expand') 
// It works after the part loaded.

Parse inner pages

We always track index list page and check for new updates, but some infomations are only in inner page like description...

We can parse links of inner pages, then push and parse one by one.

html

<!-- HTML of target page in left window -->
<html>
  <body>
    <ul class="example-list">
      <li>
        <a href="https://example.com/story001">
          <div class="title">兼听则明</div>
        </a>
      </li>
      <li>
        <a href="https://example.com/story002">
          <div class="title">偏信则暗</div>
        </a>
      </li>
    </ul>
  </body>
</html>

// script in side panel
const $ = await jiant.get.$()
let items = []

// get all links from index list page
let links = $('ul.example-list li')
  .toArray()
  .map(e => $(e).find('a').attr('href'))

for (const link of links) {
  // find previous item with the same link
  let prevItem = await jiant.get.prevItem({ link })
  // if found, return the previous one and skip to avoid duplicate parsing.
  if (prevItem) {
    items.push(prevItem)
    continue
  }
  
  // if not duplicate, push to the link
  await jiant.action.pushURL(link)

  // waiting for page loaded then get the page content
  let n$ = await jiant.get.$()
  
  // parsing more info on detail page
  let title = n$('h2.title').first().text()
  let description = n$('div.main-text').first().text()
  items.push({
    title, description, link
  })
  // not too frequent
  await jiant.action.sleep(5000)
  // try next link
}

return { items }

Reference

jiant.get.html(noWait) -> Promise

jiant.get.dom(noWait) -> Promise

jiant.get.$(noWait) -> Promise

Returns copy content of target page.

The Promise will wait until target page's document.readyState getting interactive or complete

@param: noWait (boolean) Promise will resovle without waiting document.readyState

let htmlString = await jiant.get.html()
// return raw html string: 
// <html><body>....</body></html>

let domInstance = await jiant.get.dom()
// domInstance is equal to: 
// (new DOMParser()).parseFromString(htmlString, 'text/html')

let $ = await jiant.get.$()
// $ is equal to: cheerio.load(htmlString)

TIP

Cheerio is an HTML parser which API is similar to jQuery.

For more infomation about cheerio.load, please refer to Cheerio documentation

jiant.get.pageURL() -> Promise

Returns location.href of current page on left window.

jiant.get.targetPageURL() -> string

Returns target page URL of task which set in task editing panel. If you want to get current page URL, use jiant.get.pageURL.

jiant.get.prevItem({ contentHash, link, title }) -> Promise

Returns object of previous locally saved item of the task with same params if found else null. It is useful in deduplication.

let link = 'https://example.com/story002'
let prevItem = await jiant.get.prevItem({ link })
if (prevItem) console.log(prevItem)
// output: {
//   link: 'https://example.com/story002',
//   id: '...'
//   title: '...',
//   ...
// }

jiant.get.configs() -> object

Returns task configure (JSON) as an object.

jiant.get.customParams() -> object

Returns customParams of task configure (JSON) as an object.

jiant.get.rsshub(config: object) -> Promise

Returns some methods compatible with RSSHub documentation.

const {
  ctx, // partially compatible
  load, // == cheerio.load
  md5,
  ofetch,
  cache, // cache.tryGet
  parseDate,
  parseRelativeDate,
  timezone,
} = await jiant.get.rsshub()

@param options.path parse page URL to params and queries with path RegExp.

To be compatible with RSSHub's ctx.req.param and ctx.req.query, it is required to set path in options.

Parsing page URL with path is same to method jiant.parse.pathRegExp

// if current page URL is
// 'https://example.com/page/animal/cat?name=gaf'
const { ctx } = await jiant.get.rsshub({
  path: '/page/:category/:tp', 
})

const name = ctx.req.query('name')
// output: 'gaf'

const { category, tp } = ctx.req.param()
console.log(category, tp)
// output: 'animal' 'cat'

// ctx.req.param is same to:
const pageUrl = await jiant.get.pageURL()
const { category, tp } = jiant.parse.pathRegExp(
  pageUrl,
  '/page/:category/:tp'
)

jiant.action.click(cssSelector: string | string[], delay: int) -> Promise

In target page, Get elements by query selectors, then click them one by one.

@param: cssSelector can be a selector string or array of selector strings.
@param: delay is value of milliseconds to wait after every click. default is 0.

await jiant.action.click('span.expand', 100)
// In target page, click all span element with class expand one by one.
// Waiting for 100ms before next click.

await jiant.action.click(['span.expand', 'span.expand > button'], 10)

jiant.action.pushURL(url: string, noWait: false) -> Promise

jiant.action.goBack(noWait: false) -> Promise

@param: noWait: if not waiting document.readyState.

Let target page push to new URL or go back to previous one.

Promise will wait until target page loaded with document.readyState

jiant.action.sleep(duration: int) -> Promise

@param: duration is in millisecond.

// It is:
function sleep(ms) {
  return new Promise((res) => setTimeout(res, ms))
}

jiant.action.scroll(x: int, y: int) -> Promise

Scroll target page to (x, y) in px.

jiant.action.scrollDownSmooth(y: int, duration: int, steps: int) -> Promise

Scroll down with total y px during total duration milliseconds.

The scroll action will be separated to steps small actions.

Distance value and duration value of every small are slightly randomed.

@param: y total distance to scroll down in px. default is 0.
@param: duration total time during scrolling in milliseconds. default is 0.
@param: steps default is 5.

await jiant.action.scrollDownSmooth(1000, 2000, 5) 
// It is equal to following actions:

let totalDistance = 1000
let totalDuration = 2000
let steps = 5
// 1. separate 1000px distance to 5 steps in slightly random.
let distances = someFunctionSplitValueRandom(totalDistance, steps)  
// -> [101, 202, 303, 222, 172]

// 2. sepatate 2000ms duration to 5 steps in sligtly random.
let durations = someFunctionSplitValueRandom(totalDuration, steps)  
// -> [202, 404, 606, 444, 344]

// 3. run small actions
for (const i = 0; i < steps; i++) { 
  await jiant.action.scroll(0, distances[i]) 
  await jiant.action.sleep(durations[i]) 
}

jiant.action.ofetch(...params) -> Promise

@param: params: refer to ofetch

Call ofetch in target page.

The request will take default headers of target page like cookies, origin...
The response only contains result data, no headers, status...

TIP

You can fetch API in script as well. However it will not take any headers of target page, and it will also get restricted by server's CORS rule.

jiant.action.ofetchCors(...params) -> Promise

@param: params: refer to ofetch

Call ofetch bypassing CORS.

The request will take empty headers without origin ...
The response only contains result data, no headers, status...

Interacting with Page ​

Page lifecycle and timing ​

Parse inner pages ​

Reference ​

jiant.get.html(noWait) -> Promise ​

jiant.get.dom(noWait) -> Promise ​

jiant.get.$(noWait) -> Promise ​

jiant.get.pageURL() -> Promise ​

jiant.get.targetPageURL() -> string ​

jiant.get.prevItem({ contentHash, link, title }) -> Promise ​

jiant.get.configs() -> object ​

jiant.get.customParams() -> object ​

jiant.get.rsshub(config: object) -> Promise ​

jiant.action.click(cssSelector: string | string[], delay: int) -> Promise ​

jiant.action.pushURL(url: string, noWait: false) -> Promise ​

jiant.action.goBack(noWait: false) -> Promise ​

jiant.action.sleep(duration: int) -> Promise ​

jiant.action.scroll(x: int, y: int) -> Promise ​

jiant.action.scrollDownSmooth(y: int, duration: int, steps: int) -> Promise ​

jiant.action.ofetch(...params) -> Promise ​

jiant.action.ofetchCors(...params) -> Promise ​