Read Next
ReadNext: A Personal Papers Recommender
Every day, approximately 500 new papers are published in the cs
category on arXiv, with tens more in cs.AI
alone. Amidst the recent craze around Generative AI, I found it increasingly challenging to keep up with the rapid influx of papers. Distilling the ones that were most relevant to my work and my employer’s interests became a daunting task.
ReadNext was born out of these pressing needs:
- The necessity for a command-line tool, one that could be executed directly or scheduled as a cron job.
- The requirement to access the latest papers from arXiv.
- The integration with Zotero, an excellent tool for managing academic papers.
- The ability to propose a selection of
x
potentially relevant papers based on my current research focus. - The proposed papers should be accessible from the command line.
- The proposed papers should also be easily uploaded to Zotero.
The key focus is to recommend papers that align with my evolving interests and research objectives, which may change on a daily basis and need to be continuously accounted for.
Install
You can easily install the ReadNext command line tool using pip
:
pip install readnext
Requirements
ReadNext relies on one fundamental external services to create the propose user workflow:
Zotero
: Zotero serves as the primary papers management tool, playing a pivotal role in ReadNext’s workflow. To configure ReadNext on your local computer, you have to create a Zotero account. If you do not already have one, you will have to create one for yourself, please refer to the section below.
By integrating Zotero, ReadNext helps in discovering papers that align with your research interests and focus.
The second piece of the command line tool is the embedding system. That system is used to create embeddings that are used to recommend papers to users according to their current research focus. By default, ReadNext is using the BAAI/bge-base-en
model from Hugging Face. Optionally, you can use the Cohere embedding service instead. The processing is a little bit faster depending on your local desktop, but it requires and additional dependency.
Also note that the performance between the two systems are comparable. In my experience, about 80% of the propositions are the same, and the remaining 20% that are different yeld no major difference in accuracy. However, I do prefer the BAAI/bge-base-en
propositions a little better.
Zotero Account
If you do not have a Zotero account yet, you can create one here for free. This will give you a basic account with 200MB
of space, sufficient to get started. Afterwards, you can install the desktop application, mobile apps, and necessary browser plugins to fully integrate Zotero into your digital environment.
Take the time to refer to their extensive online documentation to get to know its full potential.
Cohere Account (Optional)
For Cohere, you will have to create an account and login on their Dashboard. The services are completely free for the volume and size of request required by ReadNext.
Configure
ReadNext currently needs to be configured using the environment variables of your terminal session. The following configuration options currently needs to be configured:
Option | Description |
---|---|
COHERE_API_KEY | Cohere API Key as created in their Dashboard here |
ZOTERO_LIBRARY_ID | Your personal library ID as defined in Zotero’s backend. This ID will appears here as Your userID for use in API calls is 750 |
ZOTERO_LIBRARY_TYPE | Type of library: user or group |
ZOTERO_API_KEY | You Zotero API Key, it needs to be created and managed here. |
CHROMA_DB_PATH | This is the local path where you want the embedding database management system to save its indexes (ex: /Users/me/.readnext/chroma_db/ ) |
EMBEDDING_SYSTEM | This is the embedding system you want to use. One of: BAAI/bge-base-en (local) or cohere . |
MODELS_PATH | This is the local path where you want the models files to be saved on your local file system (ex: /Users/me/.readnext/models/ ) |
DOCS_PATH | This is the local path where you want the PDF files of the papers from arXiv to be saved locally (ex: /Users/me/.readnext/docs/ ) |
RECOMMENDATIONS_PATH | This is the local path where you want the recommended papers to be saved locally (ex: /Users/me/.readnext/recommendations/ ) |
Setup Environment Variables
For Windows
- Open the Command Prompt or PowerShell.
- Use the
setx
command to create a new environment variable permanently. For example, to set theZOTERO_API_KEY
, you can use the following command:
setx ZOTERO_API_KEY "your_zotero_api_key_here"
The changes will take effect after you open a new command prompt or restart your computer.
MacOS and Linux:
- Open the Terminal.
- Use the
export
command to set the environment variable for the current session. For example, to set theZOTERO_API_KEY
, use the following command:
export ZOTERO_API_KEY="your_zotero_api_key_here"
This will set the ZOTERO_API_KEY
variable for the current session only.
To make the variable permanent, add the export
command to your shell’s configuration file. For example, if you’re using Bash, add the line to the ~/.bashrc
or ~/.bash_profile
file. If you’re using Zsh, add it to the ~/.zshrc
file. You can do this with a text editor or by using the following command:
echo 'export ZOTERO_API_KEY="your_zotero_api_key_here"' >> ~/.bashrc
Replace ~/.bashrc
with the appropriate file name if you’re using a different shell configuration.
After saving the environment variables, they will be available in your command line sessions, and any application that relies on them, such as ReadNext, will be able to access the configured values. Remember to restart your command line or open a new session after making changes to ensure the environment variables take effect.
You can verify that the environment variables are set by running the env
command in your terminal session.
Here is what the full export looks like:
export COHERE_API_KEY=""
export ZOTERO_LIBRARY_ID=""
export ZOTERO_API_KEY=""
export ZOTERO_LIBRARY_TYPE="user"
export EMBEDDING_SYSTEM="BAAI/bge-base-en"
export CHROMA_DB_PATH="/Users/[MY-USER/.readnext/chroma_db/"
export MODELS_PATH="/Users/[MY-USER]/.readnext/models/"
export DOCS_PATH="/Users/[MY-USER]/.readnext/docs/"
export RECOMMENDATIONS_PATH="/Users/[MY-USER]/.readnext/recommendations/"
How it works?
ReadNext is a command line tool that can be used to generate personalized paper recommendations based on your research interests. It is designed to be used as a daily routine to help you discover new papers that are relevant to your research focus.
The tool is designed to be used in conjunction with Zotero, a free and open-source reference management software to manage your research library. ReadNext will use your Zotero library to identify your research interests and focus, and will propose papers that are relevant to your research focus.
ReadNext is designed to be used as a daily routine. It will propose a list of papers that are relevant to your research focus, and will save them in a dedicated collection in your Zotero library. You can then review the proposed papers and decide which ones you want to read. Once you have read a paper, you can move it to another collection in your Zotero library, and ReadNext will learn from your feedback to improve the quality of the proposed papers.
ReadNext is designed to be used with with an embedding system. It uses the system to generate embeddings for the papers in your Zotero library, and will use these embeddings to identify papers that are similar to your research focus. Two embedding systems are currently support: a local one using the BAAI/bge-base-en
model from Hugging Face, and a remote one using the Cohere embedding service.
ReadNext is designed to be used with arXiv, a free service that provides access to scientific papers in the fields of mathematics, physics, astronomy, computer science, quantitative biology, statistics, and quantitative finance. ReadNext will use arXiv to identify the latest papers in your research focus, and will propose them to you as part of your daily routine.
The designed userflow is the following:
- As a Zotero user, I will create one or multiple “Focus” collections in my Zotero library. Those are the collections where I will add the papers that are the most interesting to my current research. It is expected that the content of those collections will change over time as my research focus and interests evolves.
- On a daily basis, I will run
readnext
in my terminal, or I will create a cron job to run it automatically for me.- ReadNext will fetch the latest papers from arXiv
- ReadNext will identify the papers that are relevant to your research focus, as defined in Zotero
- ReadNext will propose the relevant papers to me and add them to Zotero in a dedicated collection where proposed papers are saved
- I will go in Zotero, start to read the proposed papers, and if any are of a particular interest I will add them to one of the “Focus” collections
- ReadNext will learn from your feedback to improve the quality of the proposed papers
Now, let’s see how to actually do this.
Usage
Help
Any time, you can get contextual help for any command like this:
readnext --help
readnext personalized-papers --help
Those commands will tell you which arguments and options are available for each command.
arXiv categories and subcategories
You can get the full list of arXiv categories and subcategories by running the following command:
readnext arxiv-top-categories
readnext arxiv-sub-categories
Those are the categories where you can get specific new papers from arXiv.
Getting new papers proposals
To get new papers proposals, you have to run the personalized-papers
command. That command requires two arguments:
category
: the arXiv top, or sub, category from which you want to get new papers proposalszotero_collection
: the name of the Zotero collection where your papers of interest are stored in Zotero. This is what we refer to as the “Focus” collection above. The name of the collection is case sensitive and should be exactly as written in Zotero.
Then you also have three options available:
--proposals-collection
: which tells ReadNext that you want to save the proposed papers in Zotero, in the Zotero Collection specified by the argument. If you don’t use this option, ReadNext will only print the proposed papers in the terminal, but will not save them in Zotero. The default behaviour is that you don’t save them in Zotero.--with-artifacts
/-a
: which tells ReadNext that you want to save the artifacts (PDF file of the papers and their summarization) into Zotero. This is the recommended workflow, but it requires a lot more space in your Zotero account. If you want to do this, you will most likely need to subscribe to one of their paid option.--nb-proposals
: which tells ReadNext how many papers you want to be proposed. The default value is 10.
The following command will propose 3 papers from the cs.AI
caterory, based on the Readnext-Focus-LLM
collection in my Zotero library, save them in Zotero in the Readnext-Propositions-LLM
with all related artifacts:
readnext personalized-papers cs.AI Readnext-Focus-LLM --proposals-collection=Readnext-Propositions-LLM --with-artifacts --nb-proposals=3
As you can see, you can easily create a series of topics you want papers proposals around, where each of the topic is defined by a series of specific papers that you read and found important for your research.
Here is what it looks like in the terminal:
Here is what it looks like in Zotero:
Future Work
Here is a list of future work that could be done to improve ReadNext after the initial release:
- Adding an Abstraction Layer for Multiple Embedding Services: Currently, ReadNext utilizes Cohere for embeddings in its initial version. LangChain could serve as a potential abstraction layer to support multiple different embedding services.
- Expanding Paper Sources with an Abstraction Layer: ReadNext aims to integrate additional paper sources beyond just arXiv. This will be facilitated by implementing an abstraction layer for seamless integration.
- Enhancing Test Coverage: To improve testing, we will go beyond testing utility functions and incorporate mocks for external services, ensuring comprehensive test coverage.
- Interactive Configuration via Command Line Tool: We plan to augment the command line tool’s functionality, allowing users to configure it directly from the command line using appropriate prompts and interactions.
- Refining Paper Selection Process: Currently, every time readnext is executed, it retrieves today’s latest papers from arXiv, identifies relevant papers based on current interests, and matches them against the personal research focus. To further enhance this functionality, we may introduce additional capabilities, such as restricting proposed papers to today’s papers only.
Contributions
We welcome contributions to ReadNext! If you’d like to contribute, please follow these steps:
- Fork the repository on GitHub.
- Create a new branch with a descriptive name:
git checkout -b feature/your-feature-name
- Make your changes and commit them:
git commit -m "Add feature: your feature name"
- Push your changes to your fork:
git push origin feature/your-feature-name
- Submit a pull request to the
main
branch of the original repository.