Solving problems is fun, especially when it’s done in a concise and elegant way. But, what do you do when the problem is the procedure?
I want to discuss a specific technological issue I personally ran into not too long ago. When I was asked to interact with a directory of .xlsx spreadsheets and retrieve data from them, I (perhaps, naively) assumed that it would be relatively straightforward.
However, the closer I got to the concise elegant solution mentioned above, the further away from solving the original problem I was. This is a common issue that people come across; the choice between doing it now or doing it properly. Doing things the correct way can take time, effort and diligence. It's easy for one of those requirements to be regarded as unnecessary, most commonly, it being time.
So, here's the problem:
Retrieving cell level data from a .xlsx spreadsheet including text content, formatting, notes and any overlay.
So, the problem seems straightforward enough - my weapon of choice will be “Ruby.” I begin by googling for: 'ruby xlsx' where I find the gem "Spreadsheet", aptly named, looks supported and fairly feature-filled. This gem is great. It does everything I need.
Here are some examples of the solutions to the requirements as I found them:
Text content of a cell
Reasonably concise and quite rubyesque. Nice! Background colour
Okay, this one is a little denser. But quite nice nonetheless? We can retrieve the value of any cell's instance variables by passing in a row, column and attribute name. Since the program will rely on the values of these variables, I have a list on hand to make life easier.
Now, my interest has peaked. This looks good! Suddenly spreadsheets make a little more sense, and solving this problem is going to be a breeze. All we need to do is match cell variable values to a key of what we care about and the job is done. Right?
The Spreadsheet Gem does not support .xlsx. After charging in, filled with creative zeal, I failed to properly discount this Gem at the outset. Having researched a little deeper to begin with, I would've found this fantastic image:
Well damn, either I can convert to .xls, which could will get messy, or I can use one of the other gems listed in that helpful table.
Enter Roo. Unlike Spreadsheet, rows and columns start at 1, not 0. Spreadsheet treats them like a typical array, which makes sense in the context of programming, but I should say, this little addition to Roo really is great. Having to constantly add 1 (either in your head or in your code) is an unnecessary irritant. Roo even has some additional methods that Spreadsheet doesn't have, including:
Cell text content is even easier to access than in Spreadsheet.
This time though, I've learned my lesson. I won't be duped by a good initial impression. I search for one of the above items, to ensure that this gem can do all the things I need it to, before properly getting some work done. Here's the search term I used: ‘Roo xls background color’. Enter stack overflow: http://stackoverflow.com/questions/29957262/roo-gem-cell-background-color. To save you a click, there's one answer I found that had been approved with no comments or further answers.
Well damn. Seems like the user ‘diegoesp’ has had some of the same headaches I've had, unfortunately for me though, Spreadsheet is already off my list.
So, what's the problem here, exactly? One solution would not provide access to several of the data fields we need. The other would require engineering a conversion solution - this warrants further investigation. And, if found to work, it could solve the problem at a small scale. But it would be far from concise or elegant.
Think for a second that we're not dealing with one spreadsheet, but potentially hundreds and we want to read in new data every time it's published. This conversion step could be costly, error prone and could cause far more headaches than it’s worth.
So, what's left? Procedure. The problem can be solved by modifying the way in which the spreadsheet is used.
Consider the following two options: Convert all existing spreadsheets to .xls and use that as the defacto format going forward. Or you could avoid using cell level variable values to provide information, this includes cell formatting and overlays.
Both solutions would require a serious overhaul of the current usage, they'd require user training and enforcement to ensure there's no deviation.
Problems are fun to solve, but often the solution turns out to be neither concise nor elegant.
If you have any questions about ECS Digital, don’t hesitate to get in touch!
ECS Digital are leaders in Automation and Digital Transformation. We’ve been helping enterprises deliver software and software-related services faster and at lower cost through the adoption of DevOps and Continuous Delivery practices, since 2003.