It’s a brand new year, and I just have a few short resolutions.
1. I will stop building dashboards.
I'm not sure if I'll ever stop building dashboards.
However, I've found that only about one third of the dashboards I've created this year have been independently explored by my stakeholders. More often, I either get no response or have to walk someone through the results in a Zoom call.
Even when I do present the results, they may not be revisited due to changing priorities or because the analysis was a one-time strategic analysis rather than a recurring one. It's frustrating that I have to spend so much time on formatting, alignment, and coloring, even if it's just a small part of the process.
Next year, I plan to make it clear that dashboard building is part of the project, but only after I've already presented the results and there is clear stakeholder enthusiasm. This way, they can understand what I'm delivering before it's presented in a standalone dashboard format.
2. I will take an 80/20 approach to model development, and make everyone else believe this is the right thing to do.
Unfortunately, there is a tendency for everyone involved in the model building process to over-invest in creating a perfect first result rather than just a "beta" version. I know I tend to spend a lot of time fine-tuning and trying to improve the accuracy of my models, even if it's just by a small percentage. On the stakeholder side, there is often an attitude of "everything is important" and "we want it to be as good as possible," even though strategic priorities can change quickly.
However, with a large backlog of potential use cases,, I want to get models in front of users efficiently so I can reserve time for iterating on results as needed. And honestly, the twentieth model is typically not so much better than the third.
Next year, I plan to tell my stakeholders that good is good enough. I will deliver a validated model that I can defend (with some safeguards), but as quickly as possible, rather than spending too much time trying to perfect it.
3. I will not automate things too early, but I will go back and automate them when I should.
Once again, I am drawn to how much more efficiently I could have used my time this year. When a project is shiny and new, I want nothing more than to build the best possible pipeline. That time is often poorly spent (see: too-good models and too-pretty dashboards), and I kick myself for having spent that time if the automation and finishing touches were never needed in the first place.
But when a project is a month old, I find myself loath to invest the time to make a fully automated process out of what is “just 5 minutes of work a week.” However, even the reduction of mindshare to have automatic pipelines (as well as losing the dependency on me) is probably worth more than the simple time savings. Once I have proof that a certain data pipeline has value, I should revisit and make it into a semi-robust deployment.
Next year, I will find (buy or build) a lightweight way to automate my pipelines so that it becomes rather easy to build a “good enough” automation for everything. I will dedicate a day a month to cleaning up old pipelines (deprecate, upgrade, or revisit).
4. I will learn about graph databases.
I am making a bet, with 0 vested interest and not too much evidence, that 2023 is going to be a year where the graph database applications make major inroads. Beneath all of the “hot” topics in data science and AI/ML are a few rumblings of less sexy, but intuitively useful tools and datasets. Maybe I am talking to the wrong people, but very few of the organizations I have worked with, talked to, or been a part of, have had significant deployed insights based on graph data.
If you’ve heard of a cool topic, let me know.
Next year, I will use graph-based source data in at least one machine learning project.
5. Spend more time giving back
I think one of the most beautiful parts of data science, and indeed any programming domain is the immense collaboration that exists. I don’t want to romanticize, but from open source repositories and packages that can do almost anything to StackOverflow, it does feel like data science is one of the fields where the most help is offered and traded. Nothing feels quite as good as finding a page that tells me how to do what I was searching for (and a nice copy-paste to follow) or a tool that saves me hours of time.
However, I have come to feel that I am consuming more out of the guides, answers, packages, software, etc…than I am currently contributing. And as Einblick develops as a 14-person company, and we begin to ask our data science users and partners for feedback, the normal response rates are not always . So for karmic purposes, I will try and spend more time putting into the universe what I hope to receive.
Next year, I will answer more forum questions, add my experiences to answers I have found, star more repositories, and leave more feedback when requested.