Introduction
spoti-dl, a Python-based CLI song downloading tool was the first “proper” application that I developed. It acted as a proof-of-concept of my programming skills as a self-taught developer, and helped me land my first job. However, it lacked some basic features, mainly- no parallel downloads for albums and playlists.
I recently added a few new features and re-wrote its core functionality in Rust, as I have been enjoying working with Rust’s robust type system, compiler-level error handling and syntax.
Development
Development was relatively smooth for the most part, as the app logic is straightforward — you accept and parse the input Spotify link, the CLI flag parameters and process downloads. I figured out general things by googling and/or through some experimentation, such as the trait implementations to parse CLI flags from String
s into enum
s and vice-versa. The lazy_static
macro helped me allocate a static HashSet
containing disallowed characters for files and folder names, on runtime. I also became more comfortable with bound traits and experienced the power of generics. I was able to use the following function across all of my download flows, as it accepts any input P
that can be referenced as Path
and any input S
that can be converted into a String
type:
pub fn add_metadata<P, S>(
file_path: P,
album_art_path: P,
simple_song: spotify::SimpleSong,
album_name: S,
) where
P: AsRef<Path> + Debug,
S: Into<String>,
{...}
I mainly struggled when implementing the async logic to download songs in parallel, due to my inexperience with writing async code in Rust. I had to spend a lot of time working with the compiler’s restrictions and Tokio’s ’static + Send
requirements for spawning tasks, as its work-stealing scheduler model means that a task running in one thread could be picked up by another thread. I used tokio::task::block_in_place
to wrap the add_metadata
function call as the lofty crate does not support async.
I added a CLI flag, allowing users to specify the number of tasks to use to process parallel downloads, and used batch downloads of 100 songs for playlists, as they can contain several thousands of songs.
The following is the core async logic for parallel downloads — calculate songs to be downloaded by each task, make Arc
s to pass cheap, shareable clones for certain values, chunk the list of songs and create and wait for the spawned tasks to finish downloads:
let parallel_tasks: usize = if album.songs.len() >= cli_args.parallel_downloads as usize {
cli_args.parallel_downloads as usize
} else {
album.songs.len()
};
let songs_per_task = album.songs.len() / parallel_tasks;
let remaining_songs = album.songs.len() % parallel_tasks;
let cli_args = Arc::new(cli_args);
let album_art_dir = Arc::new(album_art_dir);
let album_name = Arc::new(album.name);
let mut handles = Vec::with_capacity(parallel_tasks);
let mut start = 0;
for i in 0..parallel_tasks {
let mut end = start + songs_per_task;
if i < remaining_songs {
end += 1
}
let songs_chunk = &album.songs[start..end];
let handle = tokio::spawn(download_songs(
file_path.clone(),
cli_args.clone(),
album_art_dir.clone(),
album_name.clone(),
songs_chunk.to_vec(),
));
start = end;
handles.push(handle)
}
for handle in handles {
handle.await?;
}
Tooling
I dropped Poetry as it would not be compatible with the Rust bindings and used simple virtual environments for dependency management, and Twine for distributing built wheels.
Pyo3 acts as the bridge between the parent Python code that calls a single exposed Rust function and enables all the inter-op between the two systems. Maturin compiles the Rust code into a Python library, and also compiles both codebases into a distributable Python wheel.
The following is a list of changes I had to make in my Cargo
and pyproject
TOML files, to ensure that the build process and pip
installed package worked as intended:
Maturin
did not recognize the project as a mixed Python-Rust project, hence did not include Rust code in the distributable Python wheel. Settinglib.name
table’s value to match Python source directory (spotidl
) inCargo.toml
fixed this error.pyproject.toml
required several modifications — I needed to set theproject.scripts
value tospoti-dl = "spotidl.main:main"
, partially because the project name (spoti-dl
) and Python source directory names were different. I also added thepython-packages = ["spotidl"]
value undertool.maturin
to ensure its inclusion during the build process. I also had to add my dependencies and relevant project metadata in their apt sections, after droppingPoetry
.Maturin
compiles the Rust code as a library inside our Python source directory. It adds an underscore_
to the library’s name by default, which is quite confusing. I rectified this by configuring themodule-name
value undertool.maturin
.
I faced several problems when attempting to build wheels for Linux using Docker, on my M1 MacBook. I must have easily spent 15-20 hours trying to get the openssl-sys
crate to compile as it was the single point of failure, using both the python manylinux
and maturin
Docker images. I tried to integrate a CI/CD setup using GitHub Actions too, but to no avail, as the crate kept failing to compile. You can check the graveyard of my CI’s failed runs here. Eventually I had to settle for manually compiling wheels on Linux, Mac and Windows and copying them to a folder before publishing them with Twine
.
Conclusion
This was a rewarding experience for me, as I dealt with efficiently processing large amounts of data and sharpened my skills with Rust and Tokio
.
I witnessed a 20-25% speed increase and 50% less memory consumption in my Rust code when downloading a single song. The development process was smooth as Pyo3
and Maturin
are very well-documented and provide convenient APIs, make it incredibly easy to get started with writing FFIs for Python.