I’ve been working on a bit of a hodgepodge of features as I near completion of the backend. After implementing the endpoints for the users, all that was left to do for full (albeit still buggy) functionality was to implement the ability to parse recipe lines, and add spaCy back into the equation.
The recipe parser was actually quite easy: I pretty much just used the code I’d made for the first version of this app and did some rewriting to make sure it functioned properly in it’s new setting. I also added some tweaks to catch more of the AllRecipes recipes, since before I was having some issues.
There’s too much code to post all of it here, but to recap: I have two dictionaries that store information related to scraping. The first is for simple cases, where the recipe lines and the title are just in simple containter tags (such as <div>
or <h1>
). For these, I store the specifics of the required tags, and and pull them out when necessary. The dictionary keys are matched to the website name:
Some websites require more complicated ways to access the data, either because the recipe lines don’t have an associated attribute or becuase there is more than one way the pages are laid out. For these, I have a second dictionary that stores functions on how to get the recipe from that specific website. For example, here’s the function for allrecipes.com:
Then all I have to do is return the result of the evaluated function:
Either way, I return a dictionary with the name of the recipe, the lines in the recipe, and the url I got it from.
This produces a format that works for my schema. Accordingly, I felt that the best place to insert this functionality into the main program was in the pre_load
of the RecipeSchema
. That way, the user could pass in a url and get the complete recipe without me having to write a bunch of special cases.
But before that, I wanted to implement spaCy functionality. After all, loading a recipe into the program doesn’t do much when the program can’t tell what the ingredients are.
I created a new nlp
object in my __init__.py
file, and wrote a simple function that takes a provided recipe dictionary and determines the ingredients in the line. It also formats the ingredients properly, so that the IngredientSchema
can recognize them. This way, new ingredients that aren’t in the database are automatically added.
This was honestly much easier than last time, because spaCy has a built-in way to determine the size of various tokens. Previously, I had to split the tokens up just to recombine them again later. It’s just another example of how rough my first version was. So much unnecessary work.
But anyway, I put both the scraper and the parser into the pre_load
section of the RecipeSchema
, and had it check for a specific tag, create_from_url
, in order to determine if the recipe needed to be created this way. If not, it simply passed on the provided data.
And with that, all of the basic functionality for my backend is done! It’s still a bit of a mess though, and so I next turned my attention to refactoring.
I want to follow the “DRY” principle as best I can, and looking through my endpoints, I noticed that there was a lot of repeated code. Plus, the endpoint logic in my various routes
files was complicated and hard to follow. I decided that I was going to kill two birds with one stone here, and move all of the logic for my endpoints into centralized functions that could be repeated for all resources. After all, the same basic functions work for every resource, right?
I started with the simplest version of this I could: the humble GET
request. It’s fairly short, as far as the endpoint functions go, but it still exposes a lot of unnecessary logic.
In order to generalize the code, I first created a new utils.py
file for the whole application (though I might rename it to something more apt). I then wrote a function that takes in the model type and the identifier (either the id_
or the name
in the case of an ingredient) and either returns the resource or raises an error if it couldn’t be found.
From there, my Recipe
“GET” endpoint became just:
Absolutely beautiful. At this point, I was inspired. I turned my attention to a slightly more challenging problem: the “POST” request.
For reference, here’s my GroceryList
“POST” request before refactoring:
Again, it’s not terrible but there’s still too much low level logic exposed for my taste. I want the endpoint files to operate almost as a table of contents to direct a reader to other areas of of the program, where the actual logic takes place.
But a “POST” request was a bit trickier. After all, different things need to be validated for different resources. But my use of schemas makes this easy; by putting the validation logic there, instead of in the endpoint functions, I can generalize it with a simple dictionary.
From there, I split the actual POST function into two functions. The first loads and validates the new resource, and the second commits it to the database.
Thus, my new GroceryList
post request becomes:
Simple and extremely readable.
As of this writing, I am still working on generalizing the “PUT” request. There are a few details about it (namely the fact that not every attribute for the resource should be able to be changed) that are tripping me up, but I will have a writeup on solutions as soon as I finish. I was hoping to have all four in a single blog post, but time gets away from us all. Still, I’m quite pleased with my progress here. The endpoint code is much, much more readable and I feel like I have a better handle on the how and why of my program, rather than just the what.