Atomic Lead Extractor Data mining
Number of processes means the parameter that depends on the computer power. The default value is 5. You can set more number of processes but should keep in mind that it can influence the program's work-speed, and specifically the search speed.
Domain detection. Atomic Lead Extractor searches for similar webpages within one domain only. You can customize the number of domain levels up to 5 levels to search through.
Allowed proportions are set to determine typical pages: whether with full or partial matching of elements' structure. The following criteria are taken into account:
- • Total number of elements in a page — 100% is a default value.
- • Number of found elements in a page — the default value is 70%. This means that the webpage can be classified as a typical even with a partial matching of selected items structure.
- • HTML class name difference — the default value is 0%. This means the inadmissibility of any changes in the html class of selected elements. For example, you choose an element that has got <class="top_title"> html class and named it "Header". Thus when searching similar pages the program will be detect the "Header" elements with included html class <class="top_title"> only. If you change the percentage value for a bigger, the program will identify a webpage where the "Header" element has <class="top_title_red"> html class as a typical one.
- • Element offset in the HTML tree structure — the default value is 0%. This means the inadmissibility of any displacement within elements' structure of a typical page. If you need to extract URL with little offset, then you should increase percentage value.