Skip to content

Add Python tutorial: Build Your Own Search Engine (inverted index + TF-IDF)#882

Open
Osamaali313 wants to merge 1 commit into
practical-tutorials:masterfrom
Osamaali313:add-search-engine-tutorial
Open

Add Python tutorial: Build Your Own Search Engine (inverted index + TF-IDF)#882
Osamaali313 wants to merge 1 commit into
practical-tutorials:masterfrom
Osamaali313:add-search-engine-tutorial

Conversation

@Osamaali313

Copy link
Copy Markdown

Adds a new project-based tutorial under Python → Miscellaneous:

[Build Your Own Search Engine (inverted index + TF-IDF)](https://github.com/Osamaali313/build-your-own-search-engine)

It's a step-by-step tutorial that builds a working full-text search engine
from scratch, using the Python standard library only (no nltk,
whoosh, numpy, or any third-party package). The guided README walks through:

  1. a tokenizer (text → normalized terms),
  2. an inverted index (term → postings with term frequencies),
  3. boolean queries (AND / OR / NOT), and
  4. TF-IDF ranking (idf = log(N/df)),

wired behind a small CLI that indexes a folder of .txt files. Each module is
focused and covered by a 20-test unittest suite.

Checklist (per CONTRIBUTING.md):

  • New tutorial — distinct from the existing "Implementing a Search Engine" blog
    series (this is a fresh, from-scratch, dependency-free repo with full code and
    tests).
  • Placed under the appropriate language/section (Python → Miscellaneous).
  • Format [Title](link); direct link, no URL shortener.
  • Single tutorial in this PR.

Adds a from-scratch, dependency-free (Python stdlib only) project tutorial to
Python > Miscellaneous. It builds a tokenizer, inverted index, boolean queries
(AND/OR/NOT), and TF-IDF ranking step by step, with a guided README, focused
modules, runnable sample documents, and a 20-test unittest suite.
Copilot AI review requested due to automatic review settings June 9, 2026 19:42

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds another learning resource link to the README’s project/resources list for building a search engine.

Changes:

  • Added a “Build Your Own Search Engine (inverted index + TF-IDF)” link to the README resource list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
Comment on lines 518 to 522
- [Part 1](http://www.ardendertat.com/2011/05/30/how-to-implement-a-search-engine-part-1-create-index/)
- [Part 2](http://www.ardendertat.com/2011/05/31/how-to-implement-a-search-engine-part-2-query-index/)
- [Part 3](http://www.ardendertat.com/2011/07/17/how-to-implement-a-search-engine-part-3-ranking-tf-idf/)
- [Build Your Own Search Engine (inverted index + TF-IDF)](https://github.com/Osamaali313/build-your-own-search-engine)
- [Build the Game of Life](https://robertheaton.com/2018/07/20/project-2-game-of-life/)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants