Language Detection

Detecting the language of a feed and its articles (some feeds have articles in multiple languages).

Imports

Get Model and Tokenizer Files for the Language Detection Model

We have to download each model_name to the specified model_path. For the given model_name, the function will download all the appropriate model and tokenizer files to that path. If the specified path is not existing, then it will be created by the function.


download_lang_model

 download_lang_model (model_path:str, model_name:str)

Download a Hugging Face language detection model and tokenizer to the specified directory

Detect Language

Supported Languages

The languages currently supported are the ones supported by the langdetect module. Supported language codes are:

Load Model & Tokenizer

We load the model and tokenizer that we previously downloaded. Then we will pass a reference to the model and tokenizer to the detect_language function such that we don’t have to load it every time we call it.


load_model

 load_model (model_path:str)

Load a Hugging Face model and tokenizer from the specified directory

Detect Language


detect_language

 detect_language (text:str, model, tokenizer)

Detect the language of a given text