Coffee Sessions #43

Maturing Machine Learning in Enterprise

The definition of Data Science in production has evolved dramatically in recent years. Despite increasing investments in MLOps, many organizations still struggle to deliver ML quickly and effectively. They often fail to recognize an ML project as a massively cross-functional initiative and confuse deployment with production. Kyle talks about both the functional and non-functional requirements of production ML, and the organizational challenges that can inhibit companies from delivering value with ML.

Take-aways

- Data science is still poorly defined and there is a large variance in organizational maturity - Basically, everything we need for mature ML in modern organizations exists technically except for the strategy, mentality, organization, and governance - Organizations who poorly define data science often overburden their data scientists, but there are expectations that data scientists know some engineering - Operationalizing data science is not that different from software engineering, and software engineering can be one of the most valuable skill sets for a data scientist.

Transcript

Demetrios: Today, we've got an excellent conversation coming up. Kyle is working on the machine learning platform team at Etsy. He's a software engineer there. We were just talking before we hit record about how he's been burying himself in coding so much that he can't even play his guitar. He has not found any time to play the guitar in the last couple of years, but that is good for us because we don't want to talk about guitar today. We want to talk about MLOps and everything. Under the sun when it comes to MLOps. And I'm excited to get into how you see the current state of what is happening in this space. Kyle, I think it would be interesting for us to start with your path from being a data analyst, to a data scientist, to an ML engineer, and then a platform software engineer. What did that look like? And, what kind of hurdles did you have to overcome as you jumped from one to the next. Kyle: Yeah, definitely. I would say it's a somewhat non-traditional path. I've always liked to say, I've said a bunch of times that I tripped and fell into the data science space. So I was actually finishing up my master's in molecular and cellular biology when I got this random job as a data analyst. And my job there was to teach myself R and Python. So, analyze the data that we had at this biotechnology today. After generating some interest in the data space there, I ended up kind of fostering a love for data and data science. I was doing some unsupervised machine learning. Got super interested in that, went to a Bootcamp here in New York City. So I did the whole Bootcamp route. Then after some odd jobs ended up as a data scientist and then a machine learning engineer at Pfizer. So kind of from more and more analytical stuff and statistical stuff slowly more towards the machine learning-driven path. And now, you know, purely basically a software engineer on the infrastructure side. So, very backend. Demetrios: There's some funny stuff happening right now in the MLOps community slack, which is a group of people that decided data scientists need to, or they want to share horror stories of what they've had with data scientists when it comes to the code that a data scientist write. I want to get your take on how hard was it or what were some things that helped you along the way when you were learning to go from that data analyst to an actual full-fledged engineer. Kyle: Yeah. That's, a great question because there is a huge difference, in the required skillset and the expensive quality of those, of those skill sets. Obviously I, you know, didn't come out of school with a software engineering degree and feel like I really lacked some of those fundamental CS basics just from, you know, data structures and algorithms that aren't as important. I mean, maybe not even as important in a real software engineering job, but really build the base for all of the code that you learn beyond there. And so it's kind of been funny, learning, what you would call, I guess, more advanced data science code. But then having to go back to CS 101 on my own. And in my own time to like also build that basis on the side. Vishnu: I've actually had a very similar experience. So I, I studied bioengineering in my undergrad and master's and you know, I think sometimes I'm like, man, if only I had an undergrad CS degree, not that I would, you know, Change what I studied, but if I could just tack that on really easily, it would be, it'd make my life so, so, so much easier. Kyle: Yeah. Now tell me about it. It's like, "I'm very happy with the path I took, but I just always wish I'm like, wow. I wish I'd just taken more classes, had more of that knowledge, or at least, you know, paid attention to the one time I took a CS class." Vishnu: Yeah, exactly. I just want to be just one of the freshmen and sophomore classes. That's all I need. Kyle: Yeah, exactly. Exactly. That's what I'm missing. Vishnu: Got it. Demetrios Well, in this production code channel that we have, there was a pretty awesome question that came up and I want to hear your take on this, Kyle. It's asking about, if you were to build a SAS app right now with some heavy ML services from scratch, what languages and frameworks and libraries would you use for the backend if you had. Oh, you weren't under the gun for time. You had your abilities that you're currently using and you could really engineer this stuff the way that you wanted. How would you go about doing that? Kyle: That's a great question. I guess it'll, I'm sure it'll give different answers throughout different points in my career. I mean, for software and for just a language, I mean, I'm, I might just have to go with Python just because it's so ubiquitous. It's easy to use. I can write it quickly. I can test it quickly. Like you said it right for I'm also writing a lot of Scala lately. And it's awesome, you know, high-performance, but it's much more difficult comparatively. So, do the things that I want to do. Just because "Python is so much support for so many different things, especially within the machine learning space." But, yeah, I mean, at the end of the day, I feel like I ended up writing more YAML than anything else. So, you know, it'd all be templated. Vishnu: Yeah. The YAML thing is interesting because, you know, I think we both, you know, I work in the ML infrastructure as well, and, you know, it's, sometimes, It's just, it's so simple. It's so templated. We'll do it. It feels like, you know, I'm just doing, almost stenography work where I'm filling out a form. But there's, there's, there's so much that is based on that. It kind of blew my mind recently. When I was really diving into some CI concepts and trying to figure out, okay, how do I make this whole CI pipeline process scalable for sort of like machine learning, repost, etcetera. I was like, man, this YAML stuff. It's just, so what I can do all day, but it's so I don't know, replicable and, and crucial to the workflows nowadays. Kyle: Yeah, it is also everywhere. I mean, I think it's the Python is markup languages. right? You know, it's simple, it's easy to read. Just like colons and nothing else, no brackets or anything like that. But it's kind of the, yeah, but kind of every tool is, reliant on it. And, it's cool. I like it. I think it's easy to read and use. And it's a really nice introduction to a ton of different concepts where we're templating is key. Vishnu: So I would like to ask you, what are your maybe top two or three tips in terms of working with sort of the YAML in general, YAML files, and the explosion that can happen that you would offer to beginners, especially data scientists who are saying, okay, I want to get into this MLOps jyutsu. This seems important. How should I do it? Kyle: Yeah. For that, I guess it's a good question. I mean. "Find a technology may be that you're comfortable with." I think I started with Docker compose, you know, just one step above, Docker files, pretty simple. But you can do a lot with them and there's a really easy transition from there so full-scale and Kubernetes. "I think the only real second big tip that I have would be just, you know, no need to go light on the plugins for your IDE." I have a rainbow indent plugin for vs code. That's perfect for Yammel, you know, you can just, see where, where things are indented out to. And, and just let, let the ID do the work. You know you don't want to, be trying to apply something and constantly getting errors because it's not in the right format or not the right spacing. That's super frustrating. Vishnu: Yeah. I actually think tips that are underrated. That's a great tip because. I used to say this it's like taking hard classes and in college is way overrated. Nobody ever, no employers are gonna be like, Hey, you know, Vishnu, you took the hardest class for math and your sophomore year, we're going to give you a raise. Nobody ever does that, right? It's the same way with coding, right? Nobody ever is going to say, wow, you did it without any plugins. in your VS code environment, you were really good at this. And I think. For beginners, in particular, that's a hard bridge to cross, right? Because you know, people in this field and this data science field, you know, all bright, ambitious, talented people who want to learn how to do things right. And I think Kyle and I are here to tell you that doesn't mean you can't do them faster and use tools to do it. Demetrios: Ain't no shame in it. Kyle: No, not at all. Please. Yeah. "Please take the easy way. Don't try and do everything in vain or something like that. You have all these tools available to you and there's definitely no, there's no extra points for doing it that way. Even if it might save you once or twice in some random scenarios." Demetrios: Well, speaking of random tools, do you have any others that you really like or plugins or anything that you can share with us? Kyle: Let me see, I mean, what's up in my visual studio code right now. I'm a big visual studio code fan. I definitely use that for most everything kind of just like install plugins as I go. I'm a big fan of the remote SSH plugin, do a lot of remote editing. A lot of the basic ones, you know, those pop up and be suggested as you're editing. And I'm going to try this. This is great. I really love those. Demetrios: Well, let's talk for a minute about this idea that you were mentioning before, to me of where you think the future of machine learning and data science and even machine learning platforms is going and how you're looking at the bigger picture right now in the ecosystem of machine learning, data science, ML ops, and that whole climate, the current climate that we're in, where do you see it moving forward? Kyle: Yeah. I, it's kind of funny, I guess, cause I've seen it through a couple of different phases, so we started off and kind of data science was the super immature space. And that was, it was just like a data science team, but there was no concept of at least formerly like MLOps. And then you think beyond that and the software engineering side of things data scientists, I don't know, felt kind of isolated, siloed. And it was more so a lot of proof of concept projects and things that were been really difficult to get off the ground. And then we kind of came to this, you know, MLOps space was like, okay lots of these are making into production. Why aren't they. It's cause we don't have the operational tools and processes to get them there. We can't deploy this into production easily or just the data science skillset isn't conducive to getting them there. Software engineers obviously know how to do this, but there's a disconnect between the two that handoff was, was really, really tough because data scientists and software engineers just weren't speaking the same language. And now I think we're kind of, you know, we're kind of there a lot of teams. "MLOps is again ubiquitous everywhere as a buzzword. And it's awesome to see it exploding so much in terms of from open-source support to the enterprise platform offerings." But even then I think there's kind of a need for the step above that, which I would say is something like governance, right? Where you actually manage your ML ops and you have I'll focus on some of the more important non-functional requirements, something like observability, right. Which a software term. It's not something that inherently I think, has been phrased a lot in like MLOps, but it's something that is observability and visibility, I think are some of the most important things that you can have. And that is kind of that next phase of, of governance, how you manage like control all of those operations and processes that you have and like how much visibility you have over them. Vishnu: Yeah. I agree, first and foremost, with your overall description of the maturity and the process in the entire space. I think, what you landed on that I think is really interesting is, I realize now that a lot of machine learning platforms and, you know, a lot of this MLOps and overall platform engineering work has kind of come out of frustration. You know, it's really frustration-driven in terms of that handoff not necessarily working in terms of, you know, groups, speaking, different languages, having different objectives and enough people kind of saying, man, this sucks. You know, and I think that's the iteration of this whole space that we're in right now, where a lot of people are saying, okay, this doesn't have to suck. And going forward, the way we're going to see it is it's going to move kind of from an efficiency to a capability right? I think that is kind of where I think those things like observability, visibility, you know, being able to scale your whole stack around your model easily, the same way that you do, like infrastructure as code in another regard. You know, you're going to have your model as code, and I think that's already happening, right. Was with open AI, basically just exposing GPT-3 as an API and saying, yeah, you know, there's models behind this fence and, you know, come use it. I think more companies are going to be able to do things like that. So, you know, I think the interesting part here for me, and I'm curious to get your thoughts is I think that part of visibility and really understanding the way things work with respect to the model and, you know, data drift and how the question of robustness, that's where I find that maybe the research isn't quite there yet, you know, it's like what kind of waiting to see, you know, how to do that the right way? What has been your experience with that question? Do you think there's maybe some heuristics we can use or do you think we still have a little bit further to go? Kyle: I think I agree that we have a little bit further to go. I think it's kind of funny because I mean, I'm sure it's been said on this podcast even a bunch of times, but you know, the technology exists somewhere. It's kind of how, and when we apply it, I feel like a lot of the decisions that I've had to make in a day to day aren't like I'm oh, I need to build an entirely new technology. It's like, okay, what is the best way to apply this for a machine learning-specific operation. A lot of things in the machine learning space do map directly from software one-to-one I think at least I, and there's a lot of parallels and things that we can just say, Hey, this is working in software. We should definitely be doing this, and machine learning, which is also software, but there's also some things, some considerations, maybe it's data drift, model drift. Maybe it's like the inherent nature of trying to serve you know, a stateless model at scale or something like that. But I think that there's some considerations there that aren't really fully flushed out and no one has had the time. Or maybe even the inclination to kind of go through, like you said, and research, like, okay, these are the most efficient ways to monitor for data drift or model drift or something like that. And, "We don't fully know what the best way to go about that is once you get to that level, it is kind of unknown territory." Even if you could feasibly implement whatever you want, you don't know what the best thing might be to implement for your use case. Vishnu: Yeah. I see what you're saying there. And I think this is where I'm hopeful of all of our awesome vendors that are in the MLOps community slack, and that are helping us figure this out together. I think I do think that there are some interesting work that they're doing. And, you know, when you have these sort of, you know, it's kind of like you have your deep companies, and then you have your broad companies, right? And like, when you have your deep companies, like, you know, an Etsy or a Pfizer, or, you know, a Tesseract, which is, which is my company or any other company, you know, We're trying to figure it out in a particular vertical, but then when you have these broad companies, you know, like Fiddler who came on our, you know, our podcast recently, they're figuring out across a whole bunch of different verticals and they're kind of saying, okay, this is what works everywhere. This is what aspect of our monitoring and observability and that I think hopefully that'll trickle out. I guess one of the questions that... go ahead... Demetrios: Do you think it can be that? Because it feels to me like there's so many different use cases and there is no standardization right now. And it's really hard to say, like, this is what works in everywhere, right? Do you think that will ever be a thing or is that something that? It's going to have to be specialized and customized or bespoken every time you want to implement it? Vishnu: I would kick it to you Kyle here and say, you've been in consumer. You've been in healthcare. You've been in bio. I mean, what has been your experience with that question of what's abstractable and what's not. Kyle: Yeah. It's definitely a hard question to answer. I mean, there's because it is dramatically different and I don't think that we're at least at a place where like one size would be able to, would be able to fit all. I think the key there is kind of I want to think a lot of like the really good vendors out there are doing, it's making themselves integration for as platforms. I'm like, all right, we're going to see what the most common use cases are. We're going to say, like, we're going to solve 80% of your requirements, but we're going to allow you to build that, you know, that remaining 20% because our platform is so flexible and integratable with all of the systems that you might happen to have running. So I think it's really about, yeah. "I think there are a lot of commonalities, but the subtle differences, if you try and build for everything, then, then you're going to, then you would fail." Vishnu: Yeah, I see what you're saying there. it's like taking on too much in a sense, and it's like, you end up becoming, it's hard to balance all of that. And you know, if you have that sort of flexibility inherent to integrations, rather than trying to custom engineer everything, it becomes a lot easier. You know, I kind of have this question just around this notion of machine learning, infrastructure, and platform. You have had this transition from let's say, user of the platform, loosely speaking to architect of the platform, right? And I think there are a lot of people in our community who feel similarly, right? You start off because companies are hiring machine learning engineers, or they're hiring data scientists, and then machine learning engineer, you know, usually one or two realize, wait. We should probably do this in a repeatable, scalable maintainable way. Kyle: Yeah. Vishnu How did that realization come about for you? I would really love to dive into this. Kyle: Yeah. I guess it was just probably the first time what to take somebody to production or maybe it was, maybe it was more so the second, right? Maybe the first time. Kind of all right, maybe this is just how it is. And then it was the second time I was like, all right, why don't we just do that? But again and I personally, because I didn't at the time, I guess, have the insights into what was going on, behind the infrastructure fail. I probably didn't know the right way to interface with software engineers. And at the time there was no common practice for that, that delivery of data science model to software engineer infrastructure. So yeah, I think a lot of people in the slack, I probably gained an interest in that slide side of things and applied myself there. A lot of now tried to come back and say, okay, when we build, you know, a platform team, your customers or our data scientists where do you meet them? How do you meet them? What is your expectation of a data scientists? What do you want them to come to you with? What do you need them to know? And then what is the expectation on what, you know, "You're supposed to provide them so that they can have the best user experience in your platform and deliver models as quickly as possible to production." "I think a common KPI for a lot of teams is time to production for a new data scientist or something like that." But it's kind of thinking about in the context of your company, your, or team, whatever where do you meet the data scientists to best foster that relationship and best expedite that trip to production. Demetrios: Those... typical zoom. Yeah. So I was just thinking about the place and where the data scientists fits in on this and how you've gone from being the data scientist or the customer, and then actually creating it. And we talked a little bit about the ability to know how to code and know the software engineering gets you. So it gives you so many advantages when you're a data scientist. And so I'm just wondering, and this is something that comes up quite a bit is when you're a data scientist, you just stay in your lane and you focus on getting the most out of the model and really, that's it, or are you going into the platform and trying to make it better and maybe working with the platform team, or if you don't have a platform team, are you trying to create that platform? How do you look at that? Kyle: Yeah. And so I'm not giving direct answers. It's only because again, "One size doesn't fit all. It really depends on the data scientists." Cause I have met data scientists who are very staying their lane and they want to like give you a serialized, you know, model or something like that. And be like, I'm done. That's what you get. And like it's everything from there. And then there were people who are data scientists like myself who were like, this isn't enough. I want to understand the platform I want to actually learn the things in that platforms that I can, you know, iterate and improve them with platform engineers and learn that side of things. And I actually like software engineering better than modeling personally. It's more definitive than less frustrating my mind. But yeah, I think it's so highly variable. And I guess that's part of being a platform engineer just being open to both. I've definitely worked with data scientists who really liked to learn Kubernetes or stuff like that. I'll absolutely help you with that. I think it's cool that you want to learn that. And it's helpful overall for both kind of collaborating there and there's data scientists. You want to be hands-off and we have to meet them there as well. Vishnu: Yeah. I think one thing that I realized, working at a startup, you know, we have to hire a particular kind of people, right? Because working in a startup is risky and often very frustrating. And I think it's given me a sensitivity to that exact point that you have, which is, 'How you build some of these tools really depends on the kind of people that you hire and the kind of people that come in the door and what they're interested in especially at a smaller company like ours, or even some of the larger companies that you work on?' I mean, I'm sure there is no ML team anywhere in the world, maybe outside of Fang where it's like. You know, a hundred people, right? Most nimble teams are very small. And so it's like, who do you have? And what are you going to do with them? That ends up being day-to-day. Kyle: Yeah. Maybe they're Ph.D., you know, mathematicians and stuff like that. And that's, completely different from people who came from a boot camp from a variety of different backgrounds or otherwise. And yeah, it's super, super variable, but I mean, yeah, that comes down to what we were talking about earlier about "Data science still being so poorly defined. And so in flux, in terms of a definition of a domain and as a role and everything." Vishnu: Yeah, for sure. that level of variability, it's definitely a, it's a feature, it's a feature of the field that, you know, hopefully, you know, we'll get rid of overtime. In this platform engineering realm, what would you say may be the most interesting. A challenge is that you may, that you experience and you know, some things that come to mind are like, you know, CI/CD and continuous training or data management, distributed training orchestration, what are the engineering challenges that really get you interested in this platform question? Kyle: Yeah. I've always really, really loved model serving for some reason. I don't know why. Maybe it's just because it's more of an instantaneous thing, you know, it's like an API call them back and it works. And you see the result training for me while a lot of very, very interesting engineering problems so it has been frustrating to implement because some of the time constraints, you know, maybe you kick off the job eight hours later. You're like, oh man, that failed, like gonna have to kind of run again. So I've always found the enhancements and performance, especially like now with so many large transformer-based language models and stuff like that. "How do we continuously optimize model serving to the point where it's feasible and actually generates a value to serve these models at scale?" So I think that's all of the challenges under that space are super, super interesting. And there's a lot of really cool open-source and enterprise serving frameworks now that are fun to play with. Vishnu: And so let's say a community member were to come up to you and say, Kyle, you've worked at some great companies. You're an expert on model serving. How should I, you know, my company wants to figure out a strategy for this. You know, they told me I need to figure out what our model serving platform looks like. What are some of the things I should watch out for? And what are some of the tools you'd suggest? Anything that you would offer as advice? Kyle: Yeah, that's a great question. Free consulting. Yeah. I would definitely depend a lot on the company. You know, like my first inclination would be like, you know, if depending on the resources you have just evaluate a lot of the vendors out there. There's a lot of really, really great tools coming out of a lot of different companies. I feel like so many different enterprise offerings just to have model serving, but it's also built as part of a larger experiential platform. Or there's even things that are more hands-off for instance, sell them, I believe. Just like an open-source framework for deploying models and Kubernetes also offers seldom deploy, which is, I think their enterprise tool, which is more UI driven, but a nice way to deploy manage your models. I would say that the technical aspect of serving be easy, right? Like you have something like TensorFlow serving and you just train attends a full model, get the Docker image but You have an endpoint, but managing that and actually like getting to that governance level of how do you manage few of 50 models? How do you keep track of those and how do you manage those endpoints? "Make sure you're not wasting resources being cost-effective maybe role-based access controls, all of that stuff is like the real stuff that you have to watch out for in that space because otherwise, you'd just get this explosion of tech debt that is totally unmanageable." Demetrios: Yeah. The nitty-gritty, for sure. So when it comes to this stuff, do you think about when you're working on the platform and you're looking at the platform as a whole, do you have problems when it comes to plugging in different pieces of the puzzle in the platform, because one thing that I've heard many complaints about, or just mention, maybe not complain, I'm projecting the complaints onto them, but many have mentioned how there's not really standardization yet, and you don't have an easy way to do the Lego blocks and switch out one piece for another piece. Have you found that to be true? And how have you dealt with it? Kyle: Yeah. I definitely found it to be true to a degree I guess it depends on the kind of where you're, where you're at, who you're working with a lot of monolithic in-house tools. Or is your company very microservices driven? I know again, like a buzzword, but I think there are benefits to working with microservices. But yeah, it's never as easy as plug and play, right? You're like, oh, wow. Even if these things are like, even if you're told they're going to be Lego blocks, they're never Lego blocks. You spend like the first two hours trying to like cram two together that shouldn't go together. And just like, doesn't work for you. So I think, yeah, at the high level I've definitely run into, I've definitely had issues. Yeah. Putting those pieces of the puzzle together. Even starting with the first piece of the puzzle, you know, "You always estimate how long a task is going to take in the machine learning space," like, all right, this will take me an hour. And then two days later, you're like, what is happening? Why is nothing working? Demetrios: Why do you think that is? Cause I've heard that so many times like it's really hard to judge. How long something is going to take in this space. And, and that's another reason why some people are really against doing sprints in the machine learning field because it is so hard to judge. Kyle: Yeah. I mean, look, I think it might come down to how you do your estimates. Throw stuff at a wall and see what sticks in regards to if I had to estimate building an entire ML platform I'm, you know, offhand, I would have no idea, but I would probably double or triple whatever my initial estimate was. But I feel like it just comes down to agile. right? You just want to take that and started being okay. What are the small pieces that I actually know how long they take? Or can at least guess, right? It's no big deal. If you estimate something as a two-point ticket, turns out to be a two-week thing. It happens, it's always going to happen. But, "The better that you can break down that work into smaller pieces, the more accurate you're going to be." I think it's especially tricky in the AML space because you don't know how a model is going to do before you train it. I remember as a data scientist, people being like 'How long until you get to this?' You know, this level of accuracy or like this level of acceptance. It would be like, I'm sorry. I have no idea. Like you haven't even given me data. This is all, you know, purely hypothetical project planning. I have no idea what's going to happen to each from now. So I haven't even been able to train one model yet. Demetrios: So I want to kind of switch gears real fast because. We've been trying to create this, a new podcast with one of the community members Fabiana. And we're talking to a lot of people about data access and how they go about that. And I find that as a really interesting problem to look at, especially once you start to get into larger organizations and you start to get into companies that are dealing with private information, or if you're working in the EU, for example, where there's all these regulations around the data. And then I heard horror stories about people waiting around for six months to get access to the data. And they're just kind of twiddling their thumbs until they get that access. And I'm wondering how you look at that problem and how you make sure that these horror stories don't happen, but you're also not super laxed on the data access so that anybody can have access to it. Kyle: Yeah. I guess that's tough. Cause there's also, I mean, like, you know, there's ethical considerations there as well, right? I mean it's from a data scientist's perspective, you want to say oh, just like give me the data. It's not that big of a deal. I'm just a person. I can, I should be able to access this, you know, this regulated data with personal identifying information and work with it. But that shouldn't be the case. And we should have very strict controls around that and, you know, "Consumers should understand and know what's happening with their data." It definitely comes down to that kind of governance problem again. The more oversight and control and management you have of your data that you use your is to safely provision access without, you know, potentially. Damaging or lasting effects. It's not a space that I've had to work in a lot and I guess, I mean, I've definitely seen that a lot in the healthcare space. But fortunately, it hasn't been my problem to do. Vishnu: Yeah. I think the data access question is also interesting technically. And you know, I think this gets back to your thought kind of about platform in a slightly quart corollary way, you know, These things, they tend to be built incrementally, right? You know, it's nobody ever kind of goes out and says, Hey, okay, designer daily access system, end to end. It's going to scale with us from one to a thousand engineers and one to a hundred data scientists, boom. Got to do it. right? And I think that's a challenge that I have. A lot is figuring out what to implement now versus what to implement later, particularly for a system like data access, where it's like, you were thinking a lot about interaction, what's your interaction model with, let's say it's like an API. Right. And you know, some data is just like pulling it in from, from an API, making some kind of requests. Like how do you go about designing that API model or like that interaction model from. You know, from your standpoint, how do you think about sort of like designing interfaces and, and, and, you know, kind of specking out from a technical standpoint, what the different use cases look like? Kyle: Yeah. I mean, it's, that's I guess for me, it was just all about working with users as much as possible and understanding that use case, like you, I think it's as straightforward as just like, get as many use cases as you can real use cases as quickly as possible and try and work through them because it's more than likely. Yeah. "Your first design and assumptions are going to be wrong in some way, shape, or form." Like building a platform for data scientists. Like maybe you assume that they have some skill that they don't ever want some feature that they don't, or maybe it's not important at all, but "You need to get those use cases and understand them to correctly map out those requirements and design something that suit your needs. Vishnu: Yeah. I think, I think that makes total sense. It's, you know, you really have to have that empathy, that understanding, and that sort of like intuitiveness around the use case. In order to be able to engineer the best way. Now I'm going to ask you a question or a tough question. Have you ever had a case with, or does anything come to mind in terms of maybe where the use case wasn't very clearly defined on the part of the user, maybe there was a little bit of incoherence or lack of process and if so, how did you deal with that? Kyle Yeah, absolutely. I think more so as a data scientist than anything else, I probably have that, right because software engineering is it's well defined if you're building an application. There's I mean, one, I guess the project managers and software engineers are, are very familiar with that. But so are probably consumers, right? It's someone they interact with every day whereas I think when like ML and AI is kind of blowing up as a term, people started asking for those things without asking for actual without actually having a use case. Right? Like, I've definitely had cases of people coming to me being like, oh, I want ML and be like, you can create for what? Why? Is this even a use case that ML would be remotely good for? It's I think like, actually. "Being able to spot those use cases is like, so yeah, it's a skill unto itself," right? Like seeing who is just asking for ML and who actually like has a problem that's currently tedious manual, et cetera, that could be automated by some kind of machine learning. Vishnu: You're totally right. That's the perfect example. It's totally perfect. Yeah. Demetrios: I was gonna try and steer this ship towards the idea of. What you feel the next big thing is when it comes to machine learning, maybe beyond MLOps or within the MLOps ecosystem. Where do you feel the next big thing is? Kyle: I don't want to default on governance again, but I think it's, I think it's going to be governance. I think it's going to be machine learning in the enterprise, actually people seeing that it's driving value for certain companies. And it being kind of like getting up there, like, you know, you have data like software and now you're going to have kind of machine learning up there. I think at the same time. I think it's that level of enterprise maturity that we're still not currently at, but some companies are starting to reach and really set a standard for I think, you know, like. "Obviously, machine learning research in itself has been exploding for a really, really long time." But like the distribution of companies like using, or getting valued for machine learning is huge. You have some companies doing real-time training online and getting all of these really great results from real-time predictions. And you have some companies that are still probably questioning the value of the ML itself and like the weather. Not like machine learning would, you know, they would actually get some kind of return on their investment for investing in the machine that needs to base so heavily. "A lot of companies stick a toe in the water, hire a data scientist, watch them burn out, and then mix everything." But yeah, I think it's going to be kind of a "select few companies setting that standard of what machine learning looks like when it's actually driving value and the rest of the market moving up to meet it." Vishnu: Yeah, that's I think that's spot on. I think that's something I really agree with and I think it's something we see a lot in the community, you know, the. Yeah at sea is one of those companies, DoorDash, Uber certainly, you know, Facebook, Google, et cetera, apple they've been, you know, I actually think it's kinda interesting. I think the Facebook and the Googles and the apples of the world, I think probably justifiably. So, because they're so big, but there's probably a little bit less about true MLOps stuff. I really learned from them. You know, I think it's more like the Spotify guys sort of like those mid-stage tech companies that have really given a lot of great lessons in terms of like, how does this stuff scale? In like a modern way. Kyle: Yeah. And like a manageable way. Sometimes I feel like I, yeah, sometimes I feel like I read a paper from like a fan company or something like that on their architecture or something. And I'm like, there's no way we're going to implement this. So it's just totally infeasible for wherever I'm at right now. But yeah, Demetrios Like also something to be said, I think all of these companies are tech companies. And what it kind of sounds like you're saying Kyle, is you're expecting more of the companies that aren't necessarily tech companies like your Coca-Cola's or Delta airlines to also get up there. Kyle: Yeah, there's a huge difference, I guess, between the companies where data scientists are embedded in like engineering organizations and like companies where like you have these siloed data science teams that are kind of unsupported from a lot of different standpoints. And maybe those data science teams are more so like the, you know, insights, data science and stuff like that. Maybe they're working with marketing teams or otherwise. But like if there are data scientists that need to create and deploy things to production that actually serve as the backends for, you know, like software, then "I think a lot of those companies also have really cool things that they can and want to build to automate internal processes but don't yet have the organizational structure or support for those data scientists to get to that level or have an organized that way." Vishnu: Yeah, I've actually, you know, some companies that come to mind that, you know, you may not think of as like awesome ML companies that I I've actually been pretty impressed by, or like Chick-fil-A, John Deere you know, some like, you know, someone was those two come to mind. Those are, those are the two who's working Demetrios: Shout out to Corey, Vishnu: Right. Yeah, no, I mean that, those great, great, great companies that have really started to embrace ML and, you know, kind of bake it into their entire business model and their business sort of perspective. Kyle: You know, it's definitely out there and that's not to say that like a non-tech company, like, yeah, this is nice. It's not like going to succeed at ML. I think some of those companies are definitely doing it, right? You know, your John Deere's are, it sounds like they're, they're doing it right. In terms of how they embed their data. Scientists in the organization support, they get them. And I mean like, yeah, the ISR are a lot of companies and people don't think of as tech companies that are producing cool papers or interesting things within the machine learning space. But I guess like from the, from the inside, I guess that I've seen in some of those companies that can be difficult to get the support, you need to get something into production or even just, you know, to have like a senior engineers eyes on something, you know. Vishnu: Yeah, I guess, just to kind of, you know, maybe bring this discussion to a sort of close and just kind of, you know, get into our concluding period here. You know, this idea of platform engineering, it's new, this idea, even of like machine learning infrastructure, you know, like what really is machine learning infrastructure, you know, I think I've felt that a little bit lately, you know, I think I understand, you know, the, the importance of MLOps. I mean, I'm here hosting a podcast about it and hanging out in the community. And I see how this field has to evolve in a way that's probably a lot more like software engineering than maybe some of the more numerical fields that you know are out there, like signals and systems, et cetera. But I think sometimes one of the things I struggle with really kind of defining what. Something like a machine learning infrastructure really looks like on a practical level at a company, right? These are our problems. Is it machine learning infrastructure? Is it dev ops? Is it software engineering issue? And I guess my question kind of to you is, you know, have you dealt with similar challenge in terms of defining what the problems are that you need to work on? Or have you felt a little bit that. You know, maybe because of the environments you've been in, it's been a little bit more clear. How do you think about that? Kyle: Yeah. And it has dependent based on the use cases for sure. I like, I'm trying to think of the best way to answer that because there have been times where I'm like, yeah, what is. I'm like, what are we doing? What is this here? Sometimes I felt like I've just been following the flow. I'm deploying a model to a Kubernetes cluster because that's what people do. And this other times I've felt like I've been doing it because I, I need to do that. And I need this thousand node cluster to scale up that high to actually serve the traffic that I'm trying to, I'm trying to meet. And I don't know if I'm answering your question at all, but yeah, machine learning infrastructure is like its software infrastructure, you know, it's all of the support that you need to actually support your software tooling at scale for you. Vishnu: Yeah, I think, I think you did answer my question, right? It's I think that Kubernetes examples, a perfect example of something that But, you know, I, you know, we all feel like, oh man, we should be using QP flow. We should be using ML. All these things that are talked about, but it's like, what are the problems I'm trying to solve here? What are we really trying to do here? I mean, if you're asking yourself that question, I think maybe you are doing the job of a machine learning infrastructure or a platform engineering sort of contributor. Kyle: Yeah, there are definitely times where I've started using Kubeflow and we were leveraging it way back at some other places. But it was like we started using it and we realized that we actually didn't need it. It was like, kind of for our use case, it was too much. It was more difficult to manage than it was. Then like the value, we're actually getting out of it. And we were like, we could literally have like an S for our small data science team have a standalone deployment of MLFlow and get everything we need like out of the MLS pipeline, like without any kind of scale or management of like infrastructure resources whatsoever. And then there's other cases where that's just not even gonna come close and you have all these workflows running daily and you need that kind of like a, a complex workflow orchestrator like that. Demetrios: I know there's some kind of adage, but I can't think of it right now where it's like, you get the tool and then you look for the problem or something as opposed to looking for the problem and then finding the right tool. But someone in the comments will help me out and remind me what adage that is or what the saying is and how it goes. But this has been awesome, man. I appreciate you coming on here, talking with us. I want to mention to everyone that has stuck with us, Kyle, your team is hiring, right. And you've got a referral code. Are you still here? Kyle: Well, I mean, it's not directly my team, but the company is hiring a lot. Yeah. Oh, I'm happy to provide a happy to speak to people in the MLR community about whatever. Demetrios: There we go. So if you're looking to work at Etsy, not making handmade baskets or any of that other cool stuff, I tried to sell stuff on Etsy once as a whole nother story, but if you're looking to work on the software team, then. Hit up Kyle and the slack and yeah, this has been awesome, man. I love hearing about your path. I love hearing about the vision that you have and really diving into what are some core concerns. What are some ways that you can be a better machine learning platform engineer? And thank you. Yeah, absolutely. Kyle: Thank you so much for having me. I really appreciate the time. Awesome questions. And then in the chat, it makes me think too. It's cool. Vishnu: Yeah. Thanks for coming on, Kyle. Really enjoyed it. I think our community, our listeners will get a lot from just some of the technical challenges that you highlighted in all the different aspects. I feel like we really covered. Just how broad platform is when we talked about, okay, what are your consumers, what are the different interfaces? What are the different technical challenges like in a pretty broad field? And I'm, I'm very glad that you're able to come on. So thank you. Kyle: Yeah, of course. Yeah, no, it's definitely massive. It's not like yeah, there's so many different spaces you could get into and talk about, you could talk about Kubernetes itself, you know, for so many hours and hours, like get networking and like further and further away from that from the core. But yeah, no, it's been awesome. Thank you so much.

In this episode

Kyle Gallatin

Kyle Gallatin

Software Engineer for Machine Learning Infrastructure , Etsy

Kyle Gallatin is currently a Software Engineer for Machine Learning Infrastructure at Etsy. He primarily focuses on operationalizing the training, deployment and management of machine learning models at scale. Prior to Etsy, delivered ML microservices and lead the development of MLOps workflows at the pharmaceutical company Pfizer. In his spare time, Kyle mentors data scientists and writes ML blog posts for Towards Data Science.

https://twitter.com/kylegallatin

LinkedIn

Demetrios Brinkmann

Demetrios Brinkmann

Host

Demetrios is one of the main organizers of the MLOps community and currently resides in a small town outside Frankfurt, Germany. He is an avid traveller who taught English as a second language to see the world and learn about new cultures. Demetrios fell into the Machine Learning Operations world, and since, has interviewed the leading names around MLOps, Data Science, and ML. Since diving into the nitty-gritty of Machine Learning Operations he felt a strong calling to explore the ethical issues surrounding ML. When he is not conducting interviews you can find him making stone stacking with his daughter in the woods or playing the ukulele by the campfire.

Vishnu Rachakonda

Vishnu Rachakonda

Host

Vishnu Rachakonda is the operations lead for the MLOps Community and co-hosts the MLOps Coffee Sessions podcast. He is a machine learning engineer at Tesseract Health, a 4Catalyzer company focused on retinal imaging. In this role, he builds machine learning models for clinical workflow augmentation and diagnostics in on-device and cloud use cases. Since studying bioengineering at Penn, Vishnu has been actively working in the fields of computational biomedicine and MLOps. In his spare time, Vishnu enjoys suspending all logic to watch Indian action movies, playing chess, and writing.