Deduplication
Built-in deduplication
As a monitoring task runs in interval after you start continuous checking, it is certain that duplicate items will be fetched.
Before saving fetched items, there is a duplication check on value of contentHash.
contentHash is automatically generated from title, link, descirption, if not set manually.
If found previous saved item which has same contentHash by same monitoring task, the new one will be skipped when saving.
Manually skip
Even with built-in deduplication, repeated parsing will still waste your time and resource even get banned by website's frequency restriction.
We strong recommend you skipping duplicate item in script manually with jiant.get.prevItem.
if found duplicate item, use the previous one and skip parsing.
WARNING
Do not forget commenting out it when debugging, otherwise it will always load previous saved items.
// script in side panel
const $ = await jiant.get.$()
let items = []
// get all links from index list page
let links = $('ul.example-list li')
.toArray()
.map(e => $(e).find('a').attr('href'))
for (const link of links) {
// find previous item with the same link
let prevItem = await jiant.get.prevItem({ link })
// if found, return the previous one and skip to avoid duplicate parsing.
if (prevItem) {
items.push(prevItem)
continue
}
// if not found, push to the detail page
await jiant.action.pushURL(link)
// waiting for page loaded then get the page content
let n$ = await jiant.get.$()
// parsing more info on detail page
let title = n$('h2.title').first().text()
let description = n$('div.main-text').first().text()
items.push({
title, description, link
})
// try next link
}
return { items }Disable built-in deduplication
Sometime if you want to save duplicate item, you can set DUPLICATE_CHECK_IN_MS in configure (JSON) for specific task.
