From Newsgroup: comp.programming
On 31.10.2023 12:03, Jivanmukta wrote:
I programmed in C++ obfuscator of PHP. I want to check in C++ if
obfuscated project contains pornography, satanism, drugs, violence, prostitution etc. (I don't want to obfuscate such projects). How to do
it? How can I get a database of such kewords (best would be in English,
but the more langauges the better).
I found on GitHub List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words-master which
contains files with bad words, one file for each language.
But I am not sure how to program my algorithm. For example website with
single occurence of word 'sex' is acceptable, but website which contains
20% of words to be bad words is not acceptable.
Do you have an idea of an algorithm for my problem?
I have some idea but I am not sure if it is OK:
threshold_percentage = 2/3 * avg_percentage_of_bad_words_for_set_of_sample_bad_websites
--- Synchronet 3.20a-Linux NewsLink 1.114