Skip to content

Deduplication

Built-in deduplication

As a monitoring task runs in interval after you start continuous checking, it is certain that duplicate items will be fetched.

Before saving fetched items, there is a duplication check on value of contentHash.

contentHash is automatically generated from title, link, descirption, if not set manually.

If found previous saved item which has same contentHash by same monitoring task, the new one will be skipped when saving.

Manually skip

Even with built-in deduplication, repeated parsing will still waste your time and resource even get banned by website's frequency restriction.

We strong recommend you skipping duplicate item in script manually with jiant.get.prevItem.

if found duplicate item, use the previous one and skip parsing.

WARNING

Do not forget commenting out it when debugging, otherwise it will always load previous saved items.

js
// script in side panel
const $ = await jiant.get.$()
let items = []

// get all links from index list page
let links = $('ul.example-list li')
  .toArray()
  .map(e => $(e).find('a').attr('href'))

for (const link of links) {
  // find previous item with the same link
  let prevItem = await jiant.get.prevItem({ link }) 
  // if found, return the previous one and skip to avoid duplicate parsing.
  if (prevItem) { 
    items.push(prevItem) 
    continue
  } 
  
  // if not found, push to the detail page
  await jiant.action.pushURL(link)

  // waiting for page loaded then get the page content
  let n$ = await jiant.get.$()
  
  // parsing more info on detail page
  let title = n$('h2.title').first().text()
  let description = n$('div.main-text').first().text()
  items.push({
    title, description, link
  })
  // try next link
}

return { items }

Disable built-in deduplication

Sometime if you want to save duplicate item, you can set DUPLICATE_CHECK_IN_MS in configure (JSON) for specific task.