Entries tagged with search
I've been working on some new features for Scout and thought they might be worth a short blog post. The super-short version is that Scout now supports complex filtering on metadata, adding another layer of filtering besides the full-text search. Additionally, I've added support for SQLite FTS5, using it by default if it's available otherwise falling back to FTS4.
Back in September, word started getting around trendy programming circles about a new file that had appeared in the SQLite fossil repo named json1.c. I originally wrote up a post that contained some gross hacks in order to get pysqlite to compile and work with the new
json1 extension. With the release of SQLite 3.9.0, those hacks are no longer necessary.
SQLite 3.9.0 is a fantastic release. In addition to the much anticipated
json1 extension, there is a new version of the full-text search extension called
fts5 improves performance of complex search queries and provides an out-of-the-box BM25 ranking implementation. You can also boost the significance of particular fields in the ranking. I suggest you check out the release notes for the full list of enhancements
This post will describe how to compile SQLite with support for
fts5. We'll use the new SQLite library to compile a python driver so we can use the new features from python. Because I really like
apsw, I've included instructions for building both of them. Finally, we'll use peewee ORM to run queries using the
This post is going to be a greatest hits of my open-source libraries and blog posts concerning the use of SQLite with Python. I'll also share a list of some other neat SQLite projects that you may not have heard of before.
SQLite 22.214.171.124 contains a new, experimental version of the full-text search extension named FTS5. Reviewing the documentation for FTS5, I saw that it includes a couple cool enhancements, namely a more sophisticated query language, and built-in BM25 result ranking.
I decided to give it a try and thought I'd share my notes on compiling the extension in case anyone else is curious.
In my continuing adventures with SQLite, I had the idea of writing a RESTful search server utilizing SQLite's full-text search extension. You might think of it as a poor man's ElasticSearch – a very, very poor man.
So what is this project? Well, the idea I had was that instead of building out separate search implementations for my various projects, I would build a single lightweight search service I could use everywhere. I really like SQLite (and have previously blogged about using SQLite's full-text search with Python), and the full-text search extension is quite good, so it didn't require much imagination to take the next leap and expose it as a web-service.
Read on for more details.
I'm interested in learning to use ElasticSearch, so I thought I'd document how I set it up on my EC2 instance. Because I wanted to write code on my laptop, I needed to expose ElasticSearch over the public internet, which added a bit of extra complexity. Here is a rough outline of the process:
- Install ElasticSearch on my EC2 instance.
- Use supervisor to manage the ElasticSearch process.
- Use Nginx to create a publicly-visible HTTP endpoint with:
- Basic auth
- Self-signed SSL cert
- Install python locally and connect to my EC2 ElasticSearch server.
In this post I will show how to use SQLite full-text search with Python (and a lot of help from peewee ORM). We will see how to index content for searching, and how to order search results using two ranking algorithms.
Last week I migrated my site from Postgresql to SQLite. I had been using Redis to power my site's search, but since SQLite has an awesome full-text search extension, I decided to give it a try. I am really pleased with the results, and being able to specify boolean search queries is an added plus. Here is a brief overview of the types of search queries SQLite supports:
- Simple phrase: peewee would return all docs containing the word peewee.
- Prefix queries: py* would return docs containing Python, pypi, etc.
- Quoted phrases: "sqlite extension"
NEAR: peewee NEAR sqlite would return docs containing the words peewee and sqlite with no more than
10intervening words. You can also specify the max number of intervening words, e.g. peewee NEAR/3 sqlite.
NOT: sqlite OR postgresql AND NOT mysql would return docs about high-quality databases (just trollin).
Check out the full post for details on adding full-text search to your project.
Users of djangosnippets.org may have noticed the addition of a few search-related features over the past several months. I'd like to highlight some of the additions that have been made and show how you can implement similar functionality on your sites. All of djangosnippet's search leans on Apache Solr, a powerful search engine built on top of Apache Lucene (full-text search). Haystack is the search solution for Django apps - it provides a querying interface similar to Django's ORM, handles indexing your models for you, and supports advanced features like "more-like-this" and faceting.