Clarity.ngram¶
Description¶
Task that aggregates n-grams across the selected document set. Uses textacy. There’s no need to specify final on this task. Any n-gram that occurs at at least the minimum frequency will show up in the final result.
Example¶
define demographicsNgram:
Clarity.ngram({
termset:[DemographicTerms],
"n": "3",
"filter_nums": false,
"filter_stops": false,
"filter_punct": true,
"min_freq": 2,
"lemmas": true,
"limit_to_termset": true
});
Arguments¶
| Name | Type | Required | Notes |
|---|---|---|---|
| termset | termset | No | |
| documentset | documentset | No | |
| cohort | cohort | No | |
| n | int | No | Default = 2 |
| filter_nums | bool | No | Default = false; Exclude numbers from n-grams |
| filter_stops | bool | No | Default = true; Exclude stop words |
| filter_punct | bool | No | Default = true; Exclude punctuation |
| lemmas | bool | No | Default = true; Converts work tokens to lemmas |
| limit_to_termset | bool | No | Default = false; Only include n-grams that contain at least one term from termset |
| min_freq | bool | No | Default = 1; Minimum frequency for n-gram to return in final result |
Results¶
| Name | Type | Notes |
|---|---|---|
| text | str | The n-gram detected |
| count | int | The number of occurrences of the n-gram |