JUST IN: Things - What You Need To Know

Negative: San Francisco International Airport (SFO), South Terminal construction. Negative. Collection of SFO Museum, SFO Museum Collection. 2011.032.0828

This was originally published on the SFO Museum Mills Field weblog, in February 2026.

This is a blog post about tools for machine-learning related tasks that you can run on MacOS devices like your laptop or a Mac mini. Devices which may not be "cheap" but are certainly "affordable" when compared to the current crop of leading (or bleeding) edge alternatives. If you're not interested in the technical details the best way to think about these tools is that they enable machine-learning related features and functionality using open or low-cost alternatives to the big commercial vendors. These are not "silver bullet" tools. Rather, they endeavour to be part of a set of building blocks for creating an infrastructure that preserves and guarantees the cultural heritage sector some agency in our work.

The first tool, called embeddingsdb, is a follow-up to our last blog post: Similar object images derived using the MobileCLIP computer-vision models. That blog post discussed using the MobileCLIP models to generate "embeddings", or mathematical representations, for images of objects in the Aviation Collection. The embeddingsdb tool, as the name suggests, is a database for storing, indexing and querying those embeddings. It's actually not so much a "real" database as it is an interface or protocol for a specific set of actions SFO Museum wants to perform with embeddings, an implementation of that protocol using the DuckDB database and its Vector Search Similarity (VSS) extension and a service layer for interacting with the database (protocol) using the gRPC networking framework. This is what SFO Museum used to generate the lists of "similar" objects which are now displayed on every object page on the Aviation Collection website.

The second tool is actually a series of updates and improvements to an existing tool: The previously-named WallPaper Swift library which has been renamed Docent. We first wrote about this library in the Registrar – Experiments with Apple's on-device machine-learning frameworks and the WallLabel – Experiments with Apple's open source machine-learning frameworks blog posts. The new Docent library includes updates to the underlying mlx-swift packages which are used to perform the heavy lifting of machine-learning number-crunching using Apple's "silicon" processors which enables the use of more "open-weight" models, a new "Summarizer" sub-package for generating fixed-length text summaries and a new docent tool for exposing the library's functionality from the command-line or a gRPC client/server session.

embeddingsdb

Negative: San Francisco International Airport (SFO). Negative. Collection of SFO Museum, SFO Museum Collection. 2011.032.0729

"Vector" databases are a specialized kind of database tailored for indexing and querying machine-learning embeddings; the long lists of numbers that are used to "depict" a text or an image as a mathematical representation. There are a number of different vector databases to choose from but for the initial release of embeddingsdb we chose to use the DuckDB database. DuckDB is not a traditional vector database but supports storing, querying and indexing vector embeddings through the use of its Vector Similarity Search (VSS) extension. Additionally, it is possible to embed DuckDB itself as a library in a stand-alone application (like embeddingsdb) which means we can bundle all the dependecies and requirements for an embeddedings-focused service in to a single application.

Although the VSS extension loads, and operates on, all vector data in memory the embeddingsdb tool makes of point of exporting those data to disk every time they are updated. This allows the raw vector data to be used again not only if or when the embeddingsdb tool restarts but also in any other instance of DuckDB (which enjoys broad platform support).

embeddingsdb calls itself an "opinionate package for storing, indexing and querying vector embeddings" which means that many of the concepts are informed by the needs and goals of SFO Museum. Specifically, vector embeddings are assigned and grouped by the following properties:

provider – The source, or context, of the data for which embeddings are generated. For the sfomuseum-data-media-collection data repository.
depiction_id – The primary identifier of the data for which embeddings are generated. For example image ID 1779489551.
subject_id – The primary identifier for the subject that the depiction ID depicts (and for which emebeddings are generated). For example, object ID 1511924829.
model – A unique identifier of the machine-learning model used to generate embeddings for the depiction. For example, apple/mobileclip_s0.

These properties, in addition to the vector embeddings themselves, make up the core of what constitutes a stored "record" in the database and allow us to store multiple embeddings (generated using different models) for the same depiction, to store data from multiple sources (providers) and to query them jointly or separately and to scope queries with both subject (object) and depiction (image) levels of granularity. Despite being SFO Museum-focused we think that embeddingsdb is broadly applicable to many other cultural heritage collections. As of this writing embeddings database does not support embeddings with different dimensionalities but I expect that, out of necessity, it will in the near future.

The first version of embeddingsdb was written in Swift and used the duckdb-swift package. It works and is useful to know about for DuckDB-backed applications which need to run in an iOS environment but query-time remains, as of this writing, very very slow. After confirming the query-time issues were specific to the Swift library itself, embeddingsdb was rewritten in Go and means that it can build and run on all major platforms (Windows, MacOS, Linux) and requires no external dependencies since, as previously mentioned, DuckDB is included as an embedded resource.

The embeddingsdb tool can be accessed from the command-line, from library code in Go or from a gRPC client (assuming the corresponding server endpoint is running). For example:

$> ./bin/embeddingsdb-server \
	-server-uri 'grpc://localhost:8081?database-uri={database}' \
	-database-uri 'duckdb:///usr/local/data/embeddings' \
	-verbose 2026/01/17 06:24:58 DEBUG Verbose logging enabled
2026/01/17 06:24:58 DEBUG Set up database
2026/01/17 06:24:58 DEBUG Statically linked VSS extension installed and loaded
2026/01/17 06:24:58 DEBUG Load database from path path=/usr/local/data/embeddings
2026/01/17 06:24:58 DEBUG IMPORT DATABASE '/usr/local/data/embeddings'
2026/01/17 06:25:40 DEBUG Finished setting up database time=41.931554166s
2026/01/17 06:25:40 DEBUG Set up database export timer path=/usr/local/data/embeddings
2026/01/17 06:25:40 DEBUG Set up listener
2026/01/17 06:25:40 DEBUG Set up server
2026/01/17 06:25:40 DEBUG Allow insecure connections
2026/01/17 06:25:40 INFO Server listening address=localhost:8081

Note the longer-than-desired start time as the embeddingsdb server re-imports data that had previously been exported to disk. And then:

$> ./bin/embeddingsdb-client similar-by-id \
	-provider sfomuseum-data-media-collection \
	-depiction-id 1527858087 \
	-client-uri 'grpc://localhost:8081' \
	| jq -r '.[]["depiction_id"]' 1527858091
1527858093
1880320457
1880320459
1880320639
1914676715
1914058931
1880273579
1880319239
1964039457

There might eventually be a simpler HTTP client/server framework as well. It wouldn't be very hard to implement but it hasn't been a priority since we have chosen to standardize, internally, on the gRPC framework for these services. More examples and comprehensive documentation are available in the sfomuseum/go-embeddings package on our GitHub account:

https://github.com/sfomuseum/go-embeddingsdb

Docent

Negative: San Francisco International Airport (SFO), South Terminal construction. Negative. Collection of SFO Museum, SFO Museum Collection. 2011.032.0830

Docent describes itself as a Swift package for "museum-related" tasks using on-device machine learning (large language) models. The definition of "museum-related" is vague and debateable but SFO Museum is a museum and these tools target things we do so that's where things stand today. There is also no "rocket science" in anything the Docent tools do so the best way to think about them is as reference implementations and demonstrations for how do LLM stuff locally on consumer-grade hardware or to be able to compile LLM-based tools down to stand-alone applications that don’t require managing complicated dependency trees.

Docent supports using the built-in "Foundation" models that ship with AppleOS 26 devices and models available from HuggingFace which are manipulated using the Apple's MLX framework. Being able to use models with the MLX framework is importantly because it increases the pool of models to choose from and the ability to deploy specific models for specific tasks, does not require the use of either AppleOS 26 or enabling "Apple Intelligence" system-wide and facilitates long-running background tasks (discussed below).

It bears repeating: There is no "magic" in the Docent tools. In many ways they are little more than boilerplate code and scaffolding around purpose-specific prompts (instructions) for large language models: Derive structured data from museum wall label text, generate fixed length summaries from arbitrary texts and make both services available from the command-line, library code and over gRPC.

Neither of these services are part of our day-to-day operations either. Deriving structured data from wall labels is part of a larger still-kind-of-fuzzy project I wrote about in the Registrar – Experiments with Apple’s on-device machine-learning frameworks blog post and the need to generate fixed-length summaries was predicated by the hard limit on inputs (77 characters) imposed by the MobileCLIP models when generating text embeddings. Did we really need to write a whole suite of tools just to accomodate the limits of one particular set of models, particularly when there are many other models that will accomodate longer texts? Maybe not and maybe even probably not.

That being said, given that we are already using the MobileCLIP models for image embeddings it does feel worth investigating and understanding their text-based counterpart embeddings and it forced us to upgrade the mlx-swift depedencies which in turn opened up the availability of newer and improved models (like Olmo3) that have demonstrated much better results parsing wall label texts in to structured data. A fixed-length summary generator might not have any immediate user-facing applications but it does feel useful as a tool for testing models and, with a little of human feedback, might prove even more useful as a tool for training and refining models.

Here is a silly example using the gRPC client and server to summarize each paragraph in this blog post. This is what that looks like from the server-side as the model attempts to reduce everything it sees down to 77 characters or less:

$> docent grpc-server --verbose=true

2026-02-10T09:43:10-0800 debug org.sfomuseum.docent.grpcd: [DocentModels] Load model mlx-community/Olmo-3-7B-Instruct-8bit
2026-02-10T09:43:19-0800 debug org.sfomuseum.docent.grpcd: [DocentModels] Loading mlx-community/Olmo-3-7B-Instruct-8bit 100.0% complete
2026-02-10T09:43:24-0800 info org.sfomuseum.docent.grpcd: [docent] listening for requests on 127.0.0.1:8080)
2026-02-10T09:44:49-0800 debug org.sfomuseum.docent.grpcd: [docent] Summarize text with 20 retries
2026-02-10T09:44:49-0800 debug org.sfomuseum.docent.grpcd: [Summarizer] Summarize text 1/20 text length is 710
2026-02-10T09:44:52-0800 debug org.sfomuseum.docent.grpcd: [Summarizer] Time to summarize text 2.3380210399627686 seconds
2026-02-10T09:44:52-0800 debug org.sfomuseum.docent.grpcd: [Summarizer] Summarize text 2/20 text length is 162
2026-02-10T09:44:53-0800 debug org.sfomuseum.docent.grpcd: [Summarizer] Time to summarize text 1.5390290021896362 seconds
2026-02-10T09:44:53-0800 debug org.sfomuseum.docent.grpcd: [Summarizer] Summarize text 3/20 text length is 102
2026-02-10T09:44:54-0800 debug org.sfomuseum.docent.grpcd: [Summarizer] Time to summarize text 1.1588139533996582 seconds
2026-02-10T09:44:54-0800 debug org.sfomuseum.docent.grpcd: [Summarizer] Summarize text 4/20 text length is 84
2026-02-10T09:44:55-0800 debug org.sfomuseum.docent.grpcd: [Summarizer] Time to summarize text 1.0774099826812744 seconds
2026-02-10T09:44:55-0800 debug org.sfomuseum.docent.grpcd: [Summarizer] Time to summary text with 4 attempts, 6.113394021987915 seconds
2026-02-10T09:44:55-0800 info org.sfomuseum.docent.grpcd: [docent] Time to summary text 6.11341392993927 seconds
... and so on

And again on the client-side:

$> foreach line ( "`cat post.txt`" )
foreach? docent grpc-summarize --max_retries=20 $line | jq -r .body
foreach? end

Affordable ML tools on Mac help old orgs manage projects cheaply.
SFO Museum uses image + DuckDB & gRPC for fast aviation search.
Docent: updated on-device ML with summarization & CLI tools.
Database for storing and retrieving embeddings.
DuckDB VSS stores ML embeddings in the embeddingsdb app.
VSS loads data in memory; embeddingsdb saves updates for shared DuckDB use.
embeddingsdb organizes SFO Museum vector embeddings by relevant properties.
Data provider: museum dataset
ID for data whose embeddings are generated.
Subject ID identifies the main subject depicted.
ML model identifier for text embedding
DB stores multi-model embeddings with flexible, variable dimensions.
Embeddingsdb rewritten in Go for fast, cross-platform use with DuckDB.
Access embeddingsdb via CLI, Go code, or gRPC (if server is running).
Server delay occurs when embeddingsdb re-imports earlier exported data.
gRPC prioritized over new HTTP framework; see GitHub for details.
GitHub repo for storing embeddings in a database.
Guides tours or explains exhibits
Docent: Swift LLM toolkit for local museum app demos with on-device AI.
Docent uses MLX for flexible ML, avoiding OS 26 & Apple Intelligence.
Docent tools use LLM prompts for museum data extraction and summaries.
Upgraded MLX: better text embeddings and models for improved AI training.
GitHub repository for Docent project by sfomuseum
Preparing, assembling, and operating software systems.
Tools need signed Apple installers; packaging issues occurred.
DuckDB needs a special macOS build for embeddingsdb due to security settings.
Docent needs Metal library; docs needed until automation.
DuckDB builds with extensions, set up as per README.
Resolved Apple app issues to save time for others.
Grpc-server fails on Apple MLX due to background service issues.

Careful readers will note that a) I just finished saying that embeddingsdb does NOT support embeddings with "variable dimensions" and b) The docent grpc-server command DOES work running as a background service using the MLX framework. Summarizing the text above yields the following:

$> docent grpc-summarize --max_retries=20 ~/Desktop/summary.txt | jq

{ "body": "SFO Museum uses affordable Mac ML tools for fast aviation analysis and demos.", "model": "mlx-community/Olmo-3-7B-Instruct-8bit", "attempts": 8
}

Careful readers will note that SFO Museum does NOT do "aviation analysis", whatever that is, fast or otherwise. More examples and comprehensive documentation are available in the sfomuseum/Docent package on our GitHub account:

https://github.com/sfomuseum/Docent

Building, packaging and running

Negative: San Francisco International Airport (SFO), South Terminal construction. Negative. Collection of SFO Museum, SFO Museum Collection. 2011.032.0844

Both of these tools can be deployed (built and run) from source. For reasons specific to our circumstances we deploy them from signed and notarized Apple installer packages (which also happens to be a good way to demonstrate confidence that other people can run these tools with a minimum of hassle). In both cases this requirement introduced packaging challenges that needed to be overcome:

The embeddingsdb tool required that DuckDB be compiled from source with custom build instructions to preload the VSS extension. By default DuckDB loads extensions at runtime downloading (and caching) them from the DuckDB servers. The problem is that while those extensions are signed they are signed by a different "team" than the embeddingsdb tool itself which triggers all kinds of alarm bells in MacOS.
The docent tool required that it be distributed with an embedded copy of the Apple Metal library used by the MLX Framework. I only have a surface-level understanding of Apple's Metal libraries so until its demonstrated otherwise I will assume there are reasons why all of the hoop-jumping around Metal and Metal-derived applications is the way it is. I can live with that but I wish it were all better documented than it is until the whole thing can be automated away.

In order to build our copy of embeddingsdb we compile DuckDB as a library, from scratch, with the VSS and JSON extensions preloaded and then make sure the Go code uses our custom build. In order to build our copy of docent we track down the required dependencies and manually include them with our packaging instructions making the whole process repeatable but not generally applicable. Detailed instructions for both processes are included at the end of the README documents in the sfomuseum/go-embeddingsdb and sfomuseum/Docent packages respectively.

I mention those things not because they are specifically interesting in the context of machine-learning tools but because they are both things which might manifest themselves in any number of circumstances when building and distributing applications targeting Apple devices. We have figured out how to deal with these things (at least until they change again) and wanted to bring them to the attention of others to save them the time and burden of having to decipher it all anew.

There is still one unresolved issue, as of this writing: It is not possible to run the docent grpc-server tool as a long-running, system-level background service with the built-in AppleOS Foundation models (it works fine using MLX-backed models). It is also not possible to run as a background service, assuming the role of a user-level but logged-out user, when started using Apple's launchctl application controller. It is unclear why exactly, as there are few if any diagnostic errors to interpret. It appears that:

Access to the Foundation models are unavailable to "system" users?
System-level tools are unable to access the folder, in another user's "home" directory, where those models are stored without disabling system-wide access controls?
The Foundation models themselves are encrypted, or otherwise locked, using a human user's credentials which are only present at login?

Or some combination of all three? Hopefully these restrictions will be lifted soon or, if nothing else, the ambiguity around what's happening will be cleared up. Until then this issue is considered to be a "known known" without any immediate remedy.

This blog post is full of links.

#docent

this is aaronland

things I have written about elsewhere #20260210

JUST IN: Things - What You Need To Know

embeddingsdb

Docent

Building, packaging and running