Deal pipeline software for Animal science
See airSlate SignNow eSignatures in action
Our user reviews speak for themselves
Why choose airSlate SignNow
-
Free 7-day trial. Choose the plan you need and try it risk-free.
-
Honest pricing for full-featured plans. airSlate SignNow offers subscription plans with no overages or hidden fees at renewal.
-
Enterprise-grade security. airSlate SignNow helps you comply with global security standards.
Deal Pipeline Software for Animal Science
deal pipeline software for Animal science
Experience the benefits of using airSlate SignNow for your Animal Science deal pipeline. Simplify your signing process, increase productivity, and ensure secure document handling. Sign up for airSlate SignNow today and take your workflow to the next level!
airSlate SignNow - The ultimate solution for efficient document management in the Animal Science field.
airSlate SignNow features that users love
Get legally-binding signatures now!
FAQs online signature
-
What is a sales pipeline management tool?
Sales pipeline stages – The tool allows you to track deals and prospects through the various stages of the pipeline, providing insights to help you make data-driven decisions. For instance, you'll see how quickly leads are being processed and identify potential bottlenecks in the sales process.
-
What is CRM software?
A Customer Relationship Management (CRM) system helps manage customer data. It supports sales management, delivers actionable insights, integrates with social media and facilitates team communication. Cloud-based CRM systems offer complete mobility and access to an ecosystem of bespoke apps. Products Overview Demo.
-
How do you generate pipeline in sales?
How to build a sales pipeline Identify prospective buyers. ... List the stages of your pipeline. ... Identify and assign tasks for each stage. ... Determine the sales cycle length. ... Define sales pipeline metrics.
-
What is pipeline CRM used for?
Pipeline CRM is a term used to describe a system of keeping track of everyone within your sales pipeline. CRM itself is an abbreviation for the phrase Customer Relationship Management, and although the leads in your pipeline may not yet be customers, they need to be kept track of in just the same way.
-
What is a deal pipeline?
In HubSpot, a "sales pipeline" or "deals pipeline" refers to the visual representation of a sales process that tracks the stages of a deal as it progresses from a lead to a closed-won or closed-lost ... Sales. May 3, 2023.
-
What are the pipeline tools for sales?
The top sales pipeline management tools are Zixflow, EngageBay, HubSpot, Lusha, Freshsales, Pipedrive, Insightly, ActiveCampaign, Keap, Zapier, SharpSpring, Nutshell, and Streak. Selling is not easy.
-
What is an example of a sales pipeline analysis?
Sales velocity refers to the amount of money passing daily through the pipeline, based on the speed of your prospects, a.k.a. how quickly they move through the pipeline. For example, in the last quarter, you've had 150 closed deals (Conversion rate x number of opportunities), and the average deal value was $8 000.
-
What should a sales pipeline include?
The seven key sales pipeline stages include: Prospecting. Through ads, public relations, and other promotional activities, potential customers discover that your business exists. ... Lead qualification. ... Demo or meeting. ... Proposal. ... Negotiation and commitment. ... Opportunity won. ... Post-purchase.
Trusted e-signature solution — what our customers are saying
How to create outlook signature
[Music] good evening everyone and welcome to the first apache liminal meetup great to see so many people joining us and we are really excited to have the opportunity to leave to lead a community project in envelopes domain and all more companies and individuals will join us this evening i mean i will start with the high level introduction on liminal why we have built it and shared it with the open source community asap will follow up and give this overview and comparison of ml orchestration solution by highlighting the uniqueness of the liminal approach last but not least a rbm will introduce a demo showcasing the strength of the liminal wave i mean hi it's all yours okay so uh again good evening very happy to see you i'm the chief architect of natural intelligence and uh i've been around building um software since the 90s uh web platforms and for the last for the last 20 years and for the last decade the big data rated system so and i'm part of the founding group of uh which we're going to hear about today you want to hi guys uh my name is i spent the last 20 years or so somewhere in the intersection between data machine learning and engineering my previous roles i was the director of machine learning and data platform for paypal and most recently i was the vice president of r d for zebra medical vision that does computer vision for medical imaging these days are a consultant to data companies and i also joined the liminal team as a founding member you want to tell a couple of words i am a vm my specialties are data and open source and my free time i liked the card game magic the gathering and i used to work under a stuff at paypal as well and now i am a data tech lead at natural intelligence cool so um we actually changed the the the flow a little bit since uh leor saw it so i'm going to start with an intro about the road to liminal what way we decided to do it then asaf in the middle of my talk we'll dive into the complexities of the existing uh ml space and lob space and then i'm going to collect it and describe what we actually built what are the base concept around luminaire and vm will finish with a deeper dive and a live demo of liminal so um natural intelligence a global leader in multi-vertical online comparison we're almost 11 years old the bootstrap company and we're working with tons of verticals mortgage insurance meal delivery unwanted you name it what we actually do is build comparison uh makes a and websites and we make the best matches between consumers and brands making sure people will be able to make confident decisions about purchase while helping the brands grow so naturally pun intended we rely on data and to create those experiences as best we can both for the visitors the users of those comparison sites and for the brands we work with so originally we started with building a state-of-the-art data platform uh we've created it with state-of-the-art tools working on the amazon cloud and building it with spark and s3 and airflow also with kubernetes and with this system allowed us to do scalable heavy lifting of data creating uh or creating the base for our reporting platform and many automation capabilities and again to create those data-driven applications that we needed to create for our use cases for our world of product comparison and then we wanted to go up a notch and we said okay let's start creating smart things let's create a machine learning based uh product again solving our i would say two of our core uh issues uh one of them would be those websites that has comparison tables and how we personalize them and and we create the best experiences for the users um so website personalization and ranking and the other one since we're among the 50 largest adwords accounts uh in the world we created we decided to create our own bidder and both projects uh utilize machine learning and we thought it's gonna be pretty easy what we're gonna do just take the data platform the state-of-the-art data platform that we already have just add some bunch of capabilities using the standard like stuff like psychic learn of python flask for real-time inference and just extend our platform and easy easy peasy it's going to work and we believe that uh we believe that many of our let's say parallel companies people out there organizations are actually having this experience they build their data platform then they try to do smarter stuff and we realize it's pretty difficult they realized that there's a diversified skill set in being able to create a machine learning uh let's let's call it application and take it from the desktop of the data scientist to production and of course uh make sure that the various workflows and technologies involved are all working and all in place and we realized that our data scientists are wasting their time on engineering problems and we have a very very long time to market for features we call it the orchestration barrier and there's a bunch of potential infra that you can use there's numerous platforms and libraries there's a lot of different skill sets involved and as well as complex workloads so we started looking around and this is where asaf will show you the complexity in this world of saf all yours thank you let me just choose the screen and share your [Music] and can you see it okay click on the rectangle at the bottom so it will be full screen for you everything thank you all right so um i took this i lifted this slide from martin fowler guru of all the software architects to describe the process of building a machine learning algorithm or machine learning product and first thing i did was i added like i thought it was missing a bunch of stuff so i added uh data right so like i said it all starts with data you don't have machine learning without data and clearly the data deserves its own its own life cycle and really let me walk you through this really quickly probably most of you already know it to some degree um so you start off by collecting data wrangling wrangling it you know torturing it until it confesses some people like to say um develop data pipelines for computing features testing these data pipelines deploying them and then at the very least for iteration number one you have some data um which you then feed into the machine learning process and then the next step is um typically data scientists uh doing what we call model building or essentially it's research they try out a number of different it can be a number of state-of-the-art off-the-shelf models to see which one performs reasonably well it could be trying to reproduce papers if their program is unique they're trying out different features and different approaches and obviously for that they need to write training code and feed in the training data and ideally at some point a model starts looking like something is sticking like this model approach this random force that i built with these 70 features it seems to be in the right direction it's learning something and it's not overfitting and now what they want to do is they want to move to them the step of really evaluating the model and really iterating to milk the performance to optimize the performance as much as possible so um here it says test data actually i should have written validation data um so they evaluate the model that they've built and what they try and do is they try and uh iterate on the model uh it can be hyper parameter tuning it can be trying out variants of the architecture it can be different sampling techniques or triangle different data sets and they typically do that repeatedly until hopefully they've reached the performance goal that they were expecting or hoping for and at which point they need to move to productionizing the model so what does it mean productionizing it can mean a lot of different things for different people but at the very least it means cleaning up their training code or their inference code from dead experiments and prints and crap and it can be um you know evaluating more advanced metrics it could be trying to optimize the model in terms of performance uh distilling it or quantizing it or compiling it into something um so that it can be a model that's really a candidate for deployment and not something that just runs on your on your machine or your cluster um typically once you have a release candidate you want to do testing testing in machine learning is a complex or or forming a discipline but at the very least best practice is to save the tests that you didn't overfit on during this experimentation and really report your results on that test set so that you really know that it's um something that the model hasn't seen before um and and it can be stuff like unit testing different parts of the transformation or other other things and then hopefully you take this model and you want to deploy it um and um in most cases models don't go to production uh like they were in training you need to add an inference pipeline that is adapted to reading input from whatever the production system is so if you're running inside a restful api you need to receive your payload of either features or whatever you're inferring on from that rather than say the file system um you also so you write your inference pipeline in many cases you wrap it in some business meaningful restful api um so that the consumers of your api can actually make sense of the results and you don't just return a json scores um and uh and ideally that's it ideally you deploy to production uh you start monitoring it you load the results um and so everybody's pretty happy and you know you get a promotion um but then in many many cases it's not every case but in many cases especially in companies that use build models on their own data that try to predict things that happen in the real world these models degrade over time it can be it can be bugs in which case the data scientist is off the hook um in say a data shift but it can also be that the real world data has changed and and you need to retrain and now all this process that that you did all the way through here um you know you have to ask yourself am i going to repeat all of this again every month now it's just unintractable and the answer is no it's not a good idea but but really the question is so you see the arrow here right so it needs to repeat once you've hit production you need to go back and be able to iterate on the model uh in the future and uh the question is how do you do that so in our opinion we're missing from this diagram something called orchestration code which is the code artifacts that control or automate the process that you went through right so once you've hit your model and you know what you're trying to do you should probably automate this entire process and we call this orchestration code and this is really what what we're here to talk about um so what is a machine learning orchestration system um a machine learning orchestration system is in charge of taking machine learning tasks these building blocks of the process and turn them into an automated system in our case the automated system's main role is to train models evaluate them promote them package them deploy so it's kind of like it's it's like a back office production system for your machine learning environment um what it controls is it controls the sequence and integration between tasks so when you're moving things from one stage to another the orchestration helps you glue these tasks together and bite their outputs to one another or control their sequence um for those of you know directed article graphs uh it's sometimes like that and obviously it manages the operations of the entire process so because it's a systems it needs to run somewhere you need to have some visibility into the training process and be able to debug when something happened so that's the responsibility of the orchestration system and um now let's let's look at um i would say the one of the challenges um uh of of doing that um and this challenge comes to the diversity of the of the stack so you start off by right you want to do some data engineering okay um all right you have a bunch of stack stacks that you're already using or you need to adopt most companies use spark and high various sql databases kafka key value stores object storage etc but that's kind of familiar right and then you need to pass the data to the model uh training process how are you gonna pass it we used to pass them in csv fires that's not so good for automation and um a new approach is to create a system which the sole purpose of that system is to be able to enrich uh sets with features that are stored in a feature store or some variants which is similar to that again they use different tools some of them are you know the dedicated open source feature stores and there are obviously build your own um now you need to write code to train your model you have an infinite number of frameworks that you can use many scientists like to use jupiter notebooks but some have opted for an ide um and you try to express your your training pipeline um and then when you come to um to experiment we used to or some people used to send one after the other experiments each time changing something small and waiting for the results this is really not a very scalable approach and this whole process is now in a very strong trend of automation it starts from hyper parameter tuning with tools that i wrote here like a ray tune or a hydra or cathedral if i'm almost taken from from kubeflow that essentially generate experiments for you and search the space and then the output of these experiments you want to track you're not you're not writing csvs anymore on your shared on your home folder so you have tools that track your data and your training code and you have experiment tracking systems from everflow or allegro and the open source project that they've built um etc so more and more tools here when you come to productionize your model this is just one example there's at least 50 different ways to optimize your model's performance especially for deep learning models uh owner nyx is just one example of taking a model that you wrote in by torch and distilling it or compiling it to something much more efficient and then where is this model going what is this building look that's moving between phases okay we need a model packaging system that is able to wrap a model with something that up upstream systems won't need to know anything about how exactly it was implemented just invoke it so you have different model packaging approaches and you typically store them in a model store and then for testing you can have tools that help you debug or understand model biases and again you you package the model finally and for deployment um you could write your own flask or fast api servers you can use a framework that does uh inference for you that takes a package model and deploys it as a rest service in kubernetes for example cortex or seldom core or if you're crazy air flow serving solution and then finally monit you know and finally you need to connect to the data that you have in production right so if you're computing features that need pre-computing and storing you might have spark pipelines you might have key value stores there you need to integrate with them and and monitoring models is not exactly the same as monitoring [Music] latency for applications so you need to also think about how exactly you're going to detect your drift as early as possible um underneath all this um the usual suspects for modern infrastructure kubernetes docker help stack and a million different monitoring tools for the info itself um i didn't mention the cube flow i'm going to talk about it in a minute as one example for a kind of an m machine learning platform there are others i'm going to cover them and of course don't forget the orchestration code that you want to write and this is what liminal is all about so as you can see um there are some unique challenges for machine learning orchestration and i in my opinion the the first one is the very heterogeneous personas and mindsets involved in the process of developing models researchers data scientists are typically not interested in engineering they don't have the skills and they just want the infrastructure to get out of the way and be transparent but when you're trying to operationalize something it's very difficult to do this without an engineering mindset and really go through the different steps and tests and and monitor and orchestrate etc um i think a lot of the a lot of folks on the industry the holy grail for them is to make data scientists not need engineers at all uh completely independent um it's it's still definitely working progress it hasn't really happened as far as i can tell in a real scale um so beyond the personas you have very very different functionality when you look at data pipelines okay for many years engineers were very proud of the complexity of the data pipelines but essentially you read data you process it you store it here the type of tasks that you're doing they they're each very very different from each other feature engineering is very different from all the training it's very different from evaluation it's very very different from production deployment and all these things need to be orchestrated despite the fact that each step is very different from each other and lastly as i think i hope i exemplified the inference stack are very diverse there's really no yet as far as i can tell a winner in this space and there are many many different tools that are coming up to help different tasks as machine learning practices improve um so we're still in a very very um evolving best practice and experience kind of scenario here um and so obviously this is uh not something where we came up with or we are the first to understand and there are many or several platforms um that attempted to address this challenge and and many of them have taken the taken the approach of raising the level of obstruction around each or many of the tasks involved in the machine learning pipeline process by for example saying [Music] kubernetes is best practice everything is going to be a kubernetes pod we're going to create a higher level of abstraction for how to run an experiment it's not things that you glue yourself with diameters and uh and helm charts we're going to give you something that raises the level of obstruction you're going to talk in the domain of training and something else if you're willing to accept our abstraction enough infrastructure will kind of take care of the things underneath it and then once you've raised the level of abstraction on tasks it makes tasks um kind of less pity less disjointed and and also you choose and you design you craft the interface so that they fit well together each um output of one task fits well and nicely into the input of the next task so it's essentially it makes orchestrating or should make orchestrating simpler but it does come with it does come with a rather opinionated approach to what are the um you know how are these dust actually implemented under the hood okay so these are a bunch of examples for platforms that uh that took this approach and i'm going to cover a number of them briefly by comparison here so start off with everflow mflow is something out of databricks um it's very uh favors working on spark and it leverages the fact that the data software development life cycle already exists in spark and venue organization and then for building models they they provide you with spark ml packages um for experiments they have a very popular experiment tracking system uh that comes as part of the as part of the stack for productionalization they offer a packaging format that's actually quite open and interchangeable that can be deployed into a number of different inference infrastructures and they've actually also ventured into the the area of deploying models i think really designed to be uh running on on a spark cluster versus a real-time inference although they're they're dabbling in this i think to some extent they don't really have a native orchestration solution um so this is emerald flow most famous for their experiment tracking system as far as i can tell and if you're heavily invested in spark already and all your data is in spark and your models aren't too big you don't require too much gpu you can leverage existing infrastructure it's an interesting approach to take um next one is i think the all-time favorite the new favorite kid on the blog cubeflow kimflow came out of google as an open source initiative it's really very tied to kubernetes it basically says everything is kubernetes forget forget about anything that's not in kubernetes um and and really they have a very impressive coverage of the of the pipeline um so for example for data they don't really have very much um as the open source project for data engineering but um when you when you set up a cube flow you get a managed notebook server that you can just start writing on so you've got your ide um for experiments you have many many out of the box dockers that run different frameworks do training in the cube flow away they also have hyper parameter tuning and experiment tracking both of them are in varying degree of maturity so they're not exactly the same level of maturity as some of the other components productionization i'm not sure if they have something that's specific to them in terms of model packaging i i might be missing something but for serving for example they have first they have their own kf serving initiative but they also integrate with a bunch of more high-end serving infrastructures like certain core or tf tensorflow serving and our subject around orchestration um they have something called kubeflow pipelines which essentially is a way to express your pipeline in python code and transfer input and output from different tasks it doesn't involve necessarily a scheduler but it does kind of create a directed article graph of your training steps at least starting from the model training step an interesting new camera to keep flow is something called k which takes the approach of let's make data scientists from notebook to hero approach and just says you don't even need to learn kubeflow pipelines just write your notebook annotate it with something we'll turn it into a cube for pipeline for you a very very uh data science centric approach to creating these orchestrated pilots aws sagemaker probably doesn't need very much introduction by aws very specific to aws um although they are open you can bring your own dockers to different stages of the flow um they have all the data stack in aws that works well with sagemaker they have their own studio managed notebooks vms containers ready out of the box and the training in stage maker that's the whole fun you don't need to work worry about the infrastructure you just send something out and you have a distributed training process running they have an experiment tracking system um and they also deal with distributed training which is something that is even adds more complexity to everything they have their own model packaging flavor that fits well with their inference service which is managed endpoint that you can query not in rest but grpc or rpc um they have their monitoring system for monitoring and interestingly they don't have an orchestration solution they advocate using either step functions which is aws but non-machinery specific or airflow or write your own kubernetes operator and last but not least google cloud which essentially is a managed cube flow with a lot of [Music] shortcuts or syntactic sugar it's not syntactic to a tight integration with a gcp stack for everything it's a very complete stack from the data the managed notebooks um all the training and distributed training provision for you they also have an interesting investment in explainability and fairness debugging tools which other tools don't really have they have a recommended packaging format for tensorflow and imagine managed tensorflow for serving um they also have an interesting monitoring solution which is like continuous evaluation of your model in production and for orchestration they offer a managed kubeflow pipelines solution so really um all of these tools basically took the approach of raising the level of abstraction on the task itself and being opinionated about what's how to actually achieve model training how to achieve model packaging um offering a critical mass of these services to help you know to draw data scientists into the platform and then by doing that they they're hoping that the whole process can also be orchestrated easily because because tasks are less brittle or less there's less variety or very big but let me know i took a slightly slightly different uh approach or mindset here i'm going to pass over to amihai to continue you're mute yeah yeah yeah i realized that and this brings us back to liminal so um just before i start talking about liminal just wanted to tell you there's a qa button in your zoom session um you can click it and start writing answer questions we're going to answer them at the end of the vm's part we're going to have a short q a session so feel free to start writing your your questions um so after you know going through what uh asaf just showed us uh we've been looking at those platforms some of them were less even less mature than today a year and a half ago and we realized we don't want to lose some stuff we have an existing infrastructure that we already um are heavily involved in and invested in and we just want to utilize it okay and the other thing we want to really allow um data scientists to do the work hide the complexities that we see as much as possible you can't hide everything but as much as possible so we started writing lumino for our our own internal uh use cases and just you know a couple of words about the name limino it comes from latin uh lemon lemonade threshold and we look at the dictionary you see it means two things something that is barely perceptible and something that is uh in the intermediate state it transitions and for us it represented the liminal uh the project as you know as best as it can because we want something that hides complexities from us and we want something that handles the transition those workflows and all of that stuff and um this is what we want to achieve we want to let data scientists focus on data science and get from research to production as easy as possible so if you look at uh maybe a little less convoluted uh look at the data scientists work um this is like a partial uh look at uh the world that the stuff just showed a data scientist would need to fetching and cleaning and preparing and all of that and and probably there's you know there's a research phase preceding this build and train phase so um we were trying to build a liminal to handle the orchestration between the different tasks rather than building the tasks themselves we wanted to be able to get common orchestration scenarios available out of the box enable the data scientists to focus on the machine learning and we created a psl for that the domain specific language for that and leave engineers to worry about the infrastructure while providing a clear separation and the interface between the data science the application machine learning application and the infrastructure and this is what liminal is all about it's about it's a lean domain specific language it is a pluggable architecture so you can plug any uh infrastructure that you you want to use so basically basically you can use your own underlying infrastructure or whatever of the system that assaf mentioned uh prior to uh um in the in his session and it's minimalistic it's not very opinionated it just allows you to uh build the orchestration uh define the different tasks in a very simple yamo approach and it as scalable as your underlying infrastructure because what it does in build time is actually build generates the artifacts for you so it would generate a docker image or an airflow that if you're using those kind of infrastructure or utilize a spark cluster and then deploy to it and all of that and um from the perspective of the data scientist it's just a very very simple uh yaml file that uh vm is going to show in a minute and i think very importantly it's open source so it's really open for extension for yourself and from the industry on any kind of uh infrastructure you would like and this brings us to um how limitless actually works a vm it's all yours thank you mikhail let me share my screen okay hi it's lvm again thank you i'm kind the client and now what i would like to do is take uh you for a deep dive into how liminal works and how we write our first liminal application so as was stressed earlier in this session we want data scientists to have as minimal work as needed to get from research to production um that sounds great on paper but how do we intend to deliver on it so what we want as a stuff mentioned is to have high abstractions that are simple for um [Music] a data scientist to describe their machine learning system from training to real-time inference so what does that interface look like for them so all we ask them sorry all we ask them to do is add one file to their repository called liminoliamo and describe um in a liminal dsl the machine learning system that they want to orchestrate so as you can see here there's a name there's an owner and let's see how we can write our first machine learning system so the purpose of this app is to train a logistic regression model on top of the iris dataset from sklearn um to create a model that can predict given input data about a flower if that flower is of genus iris virginica um so we'll take a look at the whole application code and how the yaml file describes the orchestration that we want so the ammo file has two key sections the first section is services so services are applications that are constantly running and um can serve um responses triggered by requests to that service in our case we're talking about a python server which responds to http requests the web calls and the client for this service are other services within the organization that wants access to data science insights from their models so the easiest protocol to put between the different services of the company is http rest and with this definition of a service we allow enable data scientists to have running services in production uh monitored alerted and serving the other clients within the organization so all we ask them here to configure is where in their source code they want us to look for code so in this case we write dot which means the entire repository this is a very simple project with um just a few files in it we ask them for an image name and again since we don't want them to know the underlying uh complexities on all the different platforms we create docker images for them out of their code and then they define the endpoints they want the path to the endpoint within the server in this case slash predict and which of their models and functions in their code they want to invoke when a client accesses that endpoint so in this case we access our serving model which is serving dot ey and in there we have a simple function called predict that we would see in a minute thus the next section of the file is what we call pipelines so pipelines are scheduled applications they are not constantly running usually they run on a schedule or by by a manual request and the goal of the this pipeline we see right here is to train the said model uh on the rs stat set so again we want it to be as simple as possible so we ask them for a schedule in crontab syntax and we ask a list of tests which they want to perform sequentially one after the other again we create images for them so here what we do is actually we reuse the image from the service so only when we define source do we create an image otherwise we assume it already exists so in this case we already created an image so we don't need to create another one although we allow a multiple image from one repository if that's the way your org wants to handle the code base so here we have two steps in the pipeline train which trains our model calls training py and passes of the argument train and validate which validates the model and if the validation passes deploys it to production for the serving um portion of the ammo system the service that is constantly running to become aware of these newly trained models and start serving them um over older models uh so here again we're using the same file training.pui only we pass it an argument called validate and that's it that's all we ask of a data scientist to define in very simple abstractions for what they're trying to accomplish and from there we have a liminal assisting engineers or ops in the organization to integrate uh this ml system within their existing stack by default we use apache airflow for scheduling and we use flask for serving but as mentioned before we want liminal to be extensible we want every part pluggable and to be able to fit in different stacks for different organizations which don't necessarily use specific platforms in each part of the problem but right now out of the box we have apache airflow and flask for serving with the goal of the open source community to add more implementations so let's look at the rest of the code so we have here a rudimentary model store which i wrote in python using amazon s3 as a back end now we don't have to get into all this code but just uh i want you to take from this slide is we have two functions we can save a model and we can load the latest model in a model store and we have a different model source here we have one for production and we have one for candidate obviously this is not a perfect solution but this is just for the purposes of our demonstration and um [Music] what happens here is we cache the results of a model that we fetched for one hour we catch catch the model in memory for one hour and once that cache is invalidated we go and check for the latest model and then we read it so that's how the service while it's constantly running becomes aware of new models it checks once an hour to see if new models have been written then we have a requirements file as every python application and all we ask of the data scientists is to add liminal requirements if they want access to the liminal command line interface so they can run liminal locally on their machine now let's look at the rest of the code this is the serving code as we see there's a function predict that we referenced in the yaml earlier so the whole code here there is no liminal code here we don't ask them to import anything and one of the guiding principles for us is we want minimal to no intrusion on the user's existing project so here all we ask them is to define a simple function that takes an input in json and return result that's it that's all they need to do so here you can see we read an input json we run it through our model after fetching the latest model and then we return the result to the client as simple as that as far as training code this corresponds to the pipeline portion of the yaml that we saw earlier so we have two different uh functions here train and validate which match the different parts of the pipeline that we saw so in train this is boilerplate sk learn code so again no liminal code here we want the user to have their own project as is and just add the ammo so here we train a model and we save it to a candidate model store and then when we run the validate part of the pipeline we load the latest candidates perform validations on it here i have just a basic validation to see that the model can be used uh without crashing but you can imagine in different organizations you can have different business validations on a model to decide if it's worthy for production deployment once we decide that the model is valid we then save it to the production model and as said the service within a time span of an hour uh we'll be aware of this new model and start serving it over earlier models so that's it just one yaml file with simple abstractions that can orchestrate um a full ml system and to end including monitoring and alerting um that can enable data scientists within your organization to get from research to production uh simply and in a production great way that aligns itself with the stack in uh your organization so let's take a look at liminal in action let's take a look that's running so this is the liminal command line interface which we can use to run uh liminal locally on our machine and what we do here is we perform three uh separate actions the first one is liminal build this will build docker images from user code again we don't want them to know anything about doctors so they just mentioned that they want an image and we build it for them and depending on the different type of service or task we have different docker files to build it for them of course as mentioned they can use their own images if they're already using docker and the platform is flexible enough to allow that then we have liminal deploy liminal deploy takes our yaml files in our repository and deploys them to the liminal server in a local installation again the default for that is airflow patchy airflow so we deploy the ammos to apache airflow where liminal code will pick up those animals and create airflow dags from them and we have liminal start to start liminal server which again in the default implementation this will start an apache airflow server so here is what pipeline looks when it's running on airflow so we have an airflow dag that corresponds to the steps in the pipeline that we saw in the yamaha we have two additional tests start and end which allow uh for reporting uh out out-of-the-box metrics that we want to monitor about the models that we train or any pipeline that we run not necessarily models and let's take a look at what the serving looks like so here we send a request to a service running the image that we built we send it info about a flower a battle with and we receive back the results 0.87 which means 87 percent of this flower being iris virginia genus ing to the model and we have the corresponding logs from the server these are the logs that we signed the code earlier so we get a request we print the input of it this is the same input as we saw just now we load the latest model from the cache and we return a result and that's it that's all we ask the data scientists to do so what's next for liminal uh we intend to go deeply into ci integrations to allow organizations to simply integrate liminal into their existing ci pipelines we plan on venturing into user interface because while this is all simple we want it to be simpler we don't even want data scientists to have to define that yaml file that we saw and know the schema of it what we want is for them to have a friendly wizard and user interface and which in which they define their ml systems maybe also it will display to them uh what data is available in the organization that they can use and from them there on the sky is the limit as big as we can dream uh we intend to have more ml integrations with more ml platforms across uh the different points of the ml system that asoph showed earlier so cube flow ml flow feature stores and the list goes on and we want to take special attention for experiment tracking which relates to all of the previous uh bubbles i covered here we want to support all files natively so aws gcp azure we want to have all of them and be able to run liminal with no problem and we want to pay special attention to model stores again we want to integrate with existing solutions such as ammo flows model store and other model stores and in the center of this all is the open source community so uh we can achieve some of this on our own but we can achieve all of this on home so we believe that using the strength of the open source community we can achieve great things together now we are great believers and open source and great believers in apache and the apache way so it was natural for us to submit the project to the apache incubator and now we are incubating under the apache incubator and we are uh open for contributions which we want to get from people from different companies different countries that will allow us to achieve great things so come join us come join the effort we can't do it without you so you can uh have a look at our website subscribe to our mailing lists that are listed there you can go on github and see the code and contribute with pull requests and you can go on apaches jira to see open tickets for liminal and tackle them yourselves and send us full requests so we'll be there on the mailing lists waiting uh for your contributions your questions and we are very open we want a low barrier here as well for contributions because that's the way that we believe open source should be and that's pretty much it for me amicai do you have any closing words i think we have a couple of minutes to some questions um um unfortunately guys if you can write again the right question in the q a section unfortunately i just crashed in the middle of the presentation again so i don't have the questions that was written earlier so also can you take a look or oven sure i have the questions open right here so the first question is after all this work was it worth it how much efficient would you guys consider your bidding to be rather than google's automated bidding i think uh leori you wanna take this are you still yeah sure so it's not only a liminal so i i just say that we have a positive roi and one won't get into of course the details the next question is does liminal support a dependency and dependency based pipeline meaning create a pipeline as a doug or task in which a task is triggered once or all the tasks it depends on have finished successfully uh sorry i'll read the question again this is one of the most dependency-based pipeline meaning create a python yeah that's exactly what liminal is intended to do it intends to run on one of the various platforms for running dags direct acyclic graphs tests and as you can see our different default implementations using airflow to do exactly that regarding the first question was it worth it so at least part of the answer for this would be that we already went down this route in our data platform of creating abstractions to allow our data engineers and data analysts to create data pipelines in production under production grade again without having to know all the engineering practices and different uh languages and tools to use so we want to enable data analysts to create data pipelines so extending that to data science uh seemed natural for us as well as we surveyed the existing solutions and we didn't find one solution which covers all of our needs so i hope that helps a little with that question i'll add to that one of the main i think tasks because we started um utilized it for uh data engineers at the beginning one of the main tasks we wrote was an sql task so you just write your sql pipeline some sqls one of the other without even python and kubernetes and all that and we were able to create very very effective etls running on spark and doing all the heavy lifting and monitored and uh with really really quick time to market and this is the the same anchor item infrastructure or capabilities that you need for your data science application what is the next question so it's how will the end point avoid loading the model from this for each prediction more generally how can we save state between one serving service between between one serving service call to another [Music] so the models are cached in memory in the solution that we showed um this is not part of liminal's functionality right now integrations with model stores and solving this sort of problem we showed a simple way using a model store to just cache that in memory so i hope that helps we have in africa when we started the our journey to produce to productize the mail we we had a look on the cook flow back then and it was still a pretty pretty immature and and for me it the decision to adopt a new infrastructure and and make it talk in production was a back then something that i i didn't think it can can bring the the results since we already as i said we already had a great data platform and what we wanted is to extend it to evolve it to the next level and to adapt it to the a machinery flow adopting cube flow would enter a big risk into this project uh and what we took is another approach keep your existing tech stack and plug it into a very minimalistic and real-time orchestration platform which creates the runtime artifacts and allow you to keep using the existing infrastructure and so a cool flow is the stuff they described is a very powerful in platform which also contains orchestration and i think what's unique about the liminal approach is that we only try to solve one concern the orchestration concern and not other a more infra component which create the need to and to bring it to production to mature it and to to learn it and this is something that uh as i said and at this at that time was nothing reasonable to do i think if you're if you're looking at kubeflow uh think of stuff we we discussed it just yesterday if you try to set it up and work with it it's a huge system now if you're uh for us liminal was about the the the abstraction the abstracting way if you will uh the data scientists work from the infrastructure it could be kubeflow okay we intend to do it as part of the integration in limina but it shouldn't we shouldn't have to use uh keep flow in our case it didn't make sense we had spark we had a lot of strong infrastructure in place and for our use case uh it was just a big overhead and we realized that it might be the case for many companies not just us now if you're using kubeflow and you want to use orchestration you know go you know by all means we don't uh we don't care but every other company that has some kind of a mixed infrastructure would enjoy an open source that could be extended and still allow the the power um that you get from from tube flow so um i think that would summarize that the differences you know an open source versus uh a very opinionated open source sorry versus an open uh open source i think i think that's it guys right um thank you very much i've enjoyed it we we really hope getting uh you know your inputs and your contributions and um let's uh let's make liminal uh great you know it's already great let's make it greater let's make it great again i didn't want to go there thank you guys
Show more