Language Detection
Imports
Get Model and Tokenizer Files for the Language Detection Model
We have to download each model_name
to the specified model_path
. For the given model_name
, the function will download all the appropriate model and tokenizer files to that path. If the specified path is not existing, then it will be created by the function.
download_lang_model
download_lang_model (model_path:str, model_name:str)
Download a Hugging Face language detection model and tokenizer to the specified directory
Detect Language
Supported Languages
The languages currently supported are the ones supported by the langdetect
module. Supported language codes are:
Load Model & Tokenizer
We load the model and tokenizer that we previously downloaded. Then we will pass a reference to the model and tokenizer to the detect_language
function such that we don’t have to load it every time we call it.
load_model
load_model (model_path:str)
Load a Hugging Face model and tokenizer from the specified directory
Detect Language
detect_language
detect_language (text:str, model, tokenizer)
Detect the language of a given text