import arxiv
import chromadb
import os
import typer
from dotenv import load_dotenv
from readnext import __version__
from readnext.arxiv_categories import exists, main, sub
from readnext.arxiv_sync import sync_arxiv
from readnext.embedding import embed_category_papers, download_embedding_model, embedding_system
from readnext.personalize import get_personalized_papers, save_personalized_papers_in_zotero
from rich import print
from typing_extensions import Annotated
main
Imports
The command line interface is using typer, a library to build command line interfaces. We also use arxiv to query their search service to display the articles’ titles from the list of IDs proposed by the system.
Otherwise, we import all the internal modules of the project used to implement the different commands of the CLI.
Command line interface
version
The version
command displays the current installed version of ReadNext.
version
version ()
Get the current installed version of ReadNext
You can get the version number of the ReadNext instance installed of your machine by running:
readnext version
Configuration
Display the current configuration of ReadNext.
config
config ()
Get the current configuration of ReadNext
You can display the current configuration uptions picked-up by ReadNext by running:
readnext config
arxiv-top-categories
The arxiv-top-categories
command displays the complete list of ArXiv top categories. Note that the categories’ keys are case sensitive.
arxiv_top_categories
arxiv_top_categories ()
Display ArXiv main categories. Keys are case sensitive.
You can get the list of all the top categories by using this command line:
readnext arxiv-top-categories
arxiv-sub-categories
The arxiv-sub-categories
command displays the complete list of ArXiv sub categories. Note that the categories’ keys are case sensitive.
The arxiv sub categories are:
arxiv_sub_categories
arxiv_sub_categories ()
Display ArXiv sub categories. Keys are case sensitive.
You can get the list of all the sub categories by using this command line:
readnext arxiv-sub-categories
personalized-papers
The personalized-papers
command gives a list of personalized papers based on the user’s current research focus. That command has two required parameters and three optional:
category
[required] : the ArXiv category to use to query the ArXiv search service. It can be a top or sub category, case sentitive.focus_collection
[required] : the name of the Zotero collection where all the user’s papers of interest are available for ReadNext.proposals_collection
[default: “”] : the name of the Zotero collection where the papers proposed by ReadNext will be added.with_artifacts
[default: False] : if set toTrue
, the artifacts related to the proposed papers (PDF & summary files) will be added to Zotero.nb_proposals
[default: 10] : the number of papers that will be proposed by ReadNext.
To get new papers proposals, you have to run the personalized-papers
command. That command requires two arguments:
category
[required] : the arXiv top, or sub, category from which you want to get new papers proposalszotero_collection
[required] : the name of the Zotero collection where your papers of interest are stored in Zotero. This is what we refer to as the “Focus” collection above. The name of the collection is case sensitive and should be exactly as written in Zotero.
Then you also have three options available:
--proposals-collection
[default: “”] : which tells ReadNext that you want to save the proposed papers in Zotero, in the Zotero Collection specified by the argument. If you don’t use this option, ReadNext will only print the proposed papers in the terminal, but will not save them in Zotero. The default behaviour is that you don’t save them in Zotero.--with-artifacts
/-a
[default: False] : which tells ReadNext that you want to save the artifacts (PDF file of the papers and their summarization) into Zotero. This is the recommended workflow, but it requires a lot more space in your Zotero account. If you want to do this, you will most likely need to subscribe to one of their paid option.--nb-proposals
[default: 10] : which tells ReadNext how many papers you want to be proposed.
The following command will propose 3 papers from the cs.AI
caterory, based on the Readnext-Focus-LLM
collection in my Zotero library, save them in Zotero in the Readnext-Propositions-LLM
with all related artifacts:
readnext personalized-papers cs.AI Readnext-Focus-LLM --proposals-collection=Readnext-Propositions-LLM --with-artifacts --nb-proposals=3
As you can see, you can easily create a series of topics you want papers proposals around, where each of the topic is defined by a series of specific papers that you read and found important for your research.
personalized_papers
personalized_papers (category:str, focus_collection:str, proposals_collec tion:typing.Annotated[str,<typer.models.OptionInfoob jectat0x7f14ec87f910>]='', with_artifacts:typing.Ann otated[bool,<typer.models.OptionInfoobjectat0x7f14ec 87f6d0>]=False, nb_proposals=10)
Get personalized papers of a focus-collection
from an ArXiv category
. If the category is all
then all categories that have been locally synced will be used. if –proposals-collection is set, then the papers will be uploaded to the that Zotero collection, otherwise it will only be displayed to the command line.
Initialize
Before running the command line application, we have to make sure that the tool is properly initialized. The current initialization steps that are required are:
- Load environment variables
- Make sure that all the configuration options are properly set as environment variables.
- Check that all the required local models artifacts are available on the local file system. If not, download them from their source.
config_check_one_exists
config_check_one_exists (env_vars:list)
Check if one of the env_vars
environment variables exists
config_exists
config_exists (env_var:str)
Check if env_var
environment variable exists
One thing that needs to be validated at initialization time is the shape of the embeddings in ChromaDB. If the user changed the setting EMBEDDING_SYSTEM
from one system to another, then most likely that the number of dimentions will be different. If it is the case, then Chroma won’t be able to load the embeddings with a different dimention. This is why we have to warn the user.
get_embeddings_dimensions
get_embeddings_dimensions (chroma_client, category:str)
Get the embedding dimensions of the given category
init
init ()
Initialize the application