Streamline Your Finance Operations with Pipeline Integrity Data for Finance
See airSlate SignNow eSignatures in action
Our user reviews speak for themselves
Why choose airSlate SignNow
-
Free 7-day trial. Choose the plan you need and try it risk-free.
-
Honest pricing for full-featured plans. airSlate SignNow offers subscription plans with no overages or hidden fees at renewal.
-
Enterprise-grade security. airSlate SignNow helps you comply with global security standards.
Pipeline integrity data for Finance
pipeline integrity data for Finance
Experience the benefits of using airSlate SignNow to efficiently manage your pipeline integrity data for Finance. airSlate SignNow offers a user-friendly and cost-effective solution that empowers businesses to send and eSign documents with ease.
Take control of your document workflow today with airSlate SignNow and streamline your financial processes!
airSlate SignNow features that users love
Get legally-binding signatures now!
FAQs online signature
-
What is the pipeline integrity management process?
PIM programs are systems managed by pipeline owner-operators that consider all stages of the pipeline life cycle, from conception, to engineering and design, construction, operation, inspection, and finally to repair/replacement when necessary.
-
What is the integrity of the pipelines?
Pipeline integrity (PI) is the degree to which pipelines and related components are free from defect or damage.
-
What are the roles of well integrity engineer?
Job activities Design and supervise the construction of boreholes to ensure their long-term integrity, Develop maintenance and monitoring plans to ensure that wells remain functional and safe, Carry out regular well inspections to detect any signs of degradation or leaks,
-
What is the integrity of the pipelines?
Pipeline integrity (PI) is the degree to which pipelines and related components are free from defect or damage.
-
What are the roles and responsibilities of pipeline engineer?
Key responsibilities include providing engineering expertise and technical support for pipelines and facilities, designing and sizing new pipelines, developing engineering drawings and diagrams, preparing project scopes, budgets and schedules, performing risk analysis and integrity assessments, and identifying and ...
-
What is the pipeline integrity management process?
PIM programs are systems managed by pipeline owner-operators that consider all stages of the pipeline life cycle, from conception, to engineering and design, construction, operation, inspection, and finally to repair/replacement when necessary.
-
What does a pipeline integrity engineer do?
Pipeline casings/ road crossing/ water crossing evaluation. Inspection plan development/ optimization. Identify pipeline preventative and mitigative measures, re-assessment interval and re-assessment methods. Monitoring and surveillance of integrity parameters to ensure reliable operations.
Trusted e-signature solution — what our customers are saying
How to create outlook signature
hey y'all today I'm going to take you through a really cool use case uh functional use case that my colleague Tamara built on how to run an integrated elt and ml Pipeline on stripe data within airflow so if you want to follow along with me for this video head on to the over to the astronomer learn page and you can see all the steps that I'm going to go through here to create a elt pipeline to transform some data that is modeled after the stripe API and then what we're going to do is we're going to based on the different customer satisfaction scores and product type we're going to try to predict the total amount spent per customer hoping to figure out hey what areas of customer satisfaction we want to focus on and these are going to be the end graphs that we're going to produce so predictive versus True Values on hey what areas are we going to actually need to work on and improve our customer satisfaction score the predicament of every company um so without further Ado let's head over to vs code and I'm going to start actually building this so before we get started I actually want to call attention to a couple things um about this repo that might be different and are kind of new and are pretty useful actually for local development um so number one we're using an astro CLI generated uh airflow runtime so easy and if you want to just copy this down from GitHub you totally can um and then you just run Astro devinet Astro devs or asteroid of start don't run Aster divinet and you'll be all set then what we're also going to do and this is kind of a trick if you want to mimic you know interact with S3 bucket in kind of a development environment is that here you can create a use mini i o which is um sorry a pack our requirement that we installed sorry and what mini i o is is basically it's kind of a way to mimic an S3 database on your local machine so here and if you just you know download repo again you don't have to setting this up um we're going to say hey when we're connecting to AWS we're actually connecting the mini i o admins that allows us to you know use a mini i o S3 bucket rather than you needing to you know do all the connecting to your actual um AWS account um we're also doing something you should never do in production which is just using the local postgres database for our data processing so this definitely isn't best practices you're not going to want to store large amounts of data just in your local postgres or your backend database but for the purposes of this this is just kind of an explainer so you don't have if you want to connect this to any other database you're more than welcome to and you can just replace everything correspondingly so the postgres database just to update the connection details and it'll automatic effect but I just want to kind of show you these because they are pretty useful tools for when you want to test links locally without doing a bunch of external setup and then finally within the docker compose override this is how we are actually um right in the mini i o service so here it's just provisioning mini i o um using service and just kind of all the details there on our local Docker instance so we're using Docker to run an airflow because you know we're using the Astra CLI best place to run airflow locally so the first tag that we're going to write is actually one that we're just going to use to kind of mimic pulling the data from an external service we're not actually going to be connecting to stripe because this is just kind of a demo instead we're just going to run a script that's included and create my data so there's this include directory we have mini i o plots posted just all the extra information we need and then you can see here we have some mock stripe charge data no I don't need help with CSV files now in our finance data ingestion script we're going to need to do is actually take that information populated into our mini i o database because when it's spun up on your first creation of this environment it's not going to have any data contained with them because it's a fresh environment um so first thing we'll need to do before we actually do anything fun is import all of our packages so here decorators pendulum date time for good day time the local to S3 operator so you can bring files from our local system into that mini i o instance our S3 create bucket operators you'll notice here when you're interact with mini i o you'll just use AWS operators and use mini iOS S3 instances credentials then import OS we can read system messages and then we're going to include generating blockadeus we can call that mock data script um then after that we will use our AWS connection ID and data bucket name here uh just what we've already defined for our previously in that database connection string that uses in the i o and then we have our data bucket name which is just so make sure it's consistent across all uses within this Dag then after that we'll initialize our dag in finance data and we will create our first task so in this task it's going to be real real simple which is just a python decorated task that's going to call that python script that's in that include directory and then we will create a bucket within our local Mini i o uh S3 bucket so again using that data bucket name and then what we'll do is actually use a dynamic task where we're going to upload all those different stripe data files in parallel so this is kind of mimicking how you'd want to do this in production because you're never going to want to load like tons of different files on stripe transaction data sequentially it'll just jack up your processing time super high so always recommend use Dynamic tasks for anything you need to scale anytime you need to scale one task to Encompass many different variables or you know many different task instances for many different variables so here we have list of kwargs and then this is just going to get every file within our mock data set the destination key within many i o um and where it's pulling from using that file name and then it's just going to uh keep adding them into that bucket so just constantly pending and that's going to return that complete list of all those file names then what we'll do using those file names that they have you know they have all of their all of their source and destination paths um all created so that's within that upload K works that's what's going to be returned right there so if we generate the mock task data get all the credentials for there that we need and then we're going to use that so we don't have to have this whole formula for a bunch of different K Works going into our local file system S3 operator so this makes it really nice and clean when you want to run another task um Downstream parallel you can just use the output as the expand field and then boom we have upload keywords and now this will create like 20 different uh file instances um uploading all those files parallel so maximum efficiency maximum power um then what we'll do is create bucket and upload mock data then we'll just declare our dags and finance data and then it is time to start building our next dag which is going to be taking this raw data now that we have in our file system and transforming it so we can use it for ML so now that we are going uh we've got all of our raw data now we're going to actually e l t it into usable data for ML processing so what's happening here is we're using the verbal operators at the start of the stack that are just going to wait for data to land in that S3 bucket so we're just going to point it towards there say hey once a new file arrives then actually trigger and start processing and then we're going to elt it so we're bringing it into our database and then transforming it there which is a more sometimes a better approach for processing your data if you can do more efficient Transformations within your database so after we've gotten our description all set up uh which is always super fun we have our decorators so here we're going to use task group task dag we're going to use uh an S3 sense key sensory sync so this is how we're going to use it that async has basically deferable operator then here we have the Astro SDK so this is going to allow us to you know just execute SQL color in any database and just write these functions with a lot less boilerplate code we also have our files tables so you can interact and you know store things as file and table objects pointing to external tables or external files so don't just keep calling their full name and then we have the load file Operator just again making it easier to load files into a database and file type which is just a constant that we can add as a parameter to Define which type each file and table are as we load them into our database and then transform that um so now we've got all of our Imports set up what we'll do next is can set kind of all of our Global variables for this dag or AWS connection ID postgres just start using our local database again setting that data bucket name and then also setting our poke interval so this is saying how often is um the deferable operator going to check that location for a new file and once we've got all those set what we can do next is not click out of this file but use the add hql transform method so here what we're going to do is write a SQL statement so this is just going what akl transform is going to do is just take this SQL statement we're selecting details about each charge from each inline charge so the table that we're going to feed this is obviously a charge table one of those data Law data sets we've got earlier and then we're going to say only successful only ones that were approved by Network and paid we don't want any bones in our data set for ML because why would we want to train our ml processes on people that don't pay so the next thing we're going to do is going to make another a co-transform so again another transformation you know we're making an elt process there's going to be a lot of SQL Transformations if you're dealing within a database so here we're going to again read in from our successful charges so this is going to be Downstream of that previous task and group them by customer ID so we're going to figure out hey how much what is the average amount each customer is spending with our business after that we are going to join our amount of average successful amount our charges spent by customer with the average customer satisfaction and that's going to come from our satisfaction table so what we'll do this joint on is just on customer ID so success or satisfaction customer ID versus the amount of money they spent per customer so that way we can correlate hey are our biggest vendors are they the happiest or are they not and so now that we've got that done we can all of our transformation tasks in there are a lot because this is an elt job we are then going to Define Our Deck as Finance elt so here within our finance elt dag well our first task group will be is just using our mini i o instance waiting for that ingestion change so we're just reading the connection authenticating it to actually look in this mini IO instance and just looking for when that ingestion data and the satisfaction data from those raw files has been loaded so this is going to you know use that pings every 60 seconds it's going to pick that location see if the files have been deposited there if they have it then it's just going to sit and then wait again for another ping and so the advantage of the verbal operators is instead of taking up a task slot on a worker they go onto the trigger which is kind of just a worker that takes all the deferral and async tasks and just does all their polling for them so really helpful feature to have and that's one of the later features in airflow so I think you need to be on 2 4 for that um now after we're done with that we'll just declare congestion done so I'm just declaring it as a task group there or so you again using the taskable API with decorators and then after we're done with that we've identified that those files are actually there we're then going to have to define a task that's going to retrieve them so here we're going to credit python task basically it's just going to return the input files and output files for each obviously input raw data and where they're going to be then deposited in our backend postgres database so here we're just going to be returning all the details for all of these files we're not actually going to be uploading them so similarly to how we did in the last task we're going to collect all of these you know that K works but input and output Fields initialize them as a task so we can use them as user output and then what we'll do is actually create a separate task Downstream using the Astro sdks load file operator to dynamically generate tasks to upload all of these files in parallel into our backend or to our backend database so now we've got our files uh brought in from S3 they're loaded in our database now we're going to need to actually run those transformation tasks that we created earlier so first let's just set our Logic for how the ingestion will be done so checking for ingestion and putting the files and then from S3 to our backend database again then what we'll do is actually Define uh our successful time successful task and what this is going to be is just selecting our successful charges and then setting the table that's being read into them as those charges that we were that were ingested so instead of being each unique charge file there's like 20 from files now they're all on the same table which is called in charge so now files have been uploaded they've been only the successful charges have been selected into a temporary table then we're going to take that temporary table and uh average it find the average per successful customer and then finally join that with our customer satisfaction data so we don't need to do any average on that we just want to know the raw amount of customer satisfaction uh per the amount that they're actually spending and then the only things we have left to add to this tag are actually declaring it and adding the hql cleanup task so this task will run in tandem while all these Transformations happen and actually delete any temp tables that were created so it's a nice little helper task so you don't have to Define another task logic to you know clean up your temp tables you know when you're doing these kind of Transformations you don't want to keep the raw data after you've transformed it and loaded it um you know you probably even if you had it in the database it or you know using it down the road probably go store that in data Lake because be much cheaper um and then we'll just declare the dag as the finance CLT so now got our clean data it's ready for uh usage and our amount model so let's go and finally build that ml model so now here we have our final dag where we're going to take our clean data and then generate those plots uh run some models do some tests and then generate some plots that I showed you at the beginning of this dag so before we do that we have a whole list of packages to import for actually being able to machine learn machine learn I actually haven't said it like that in a while um decorator stack and task peruge uh then we're going to import pendulum date time SQL is AQL for Astro so going to those kind of Transformations table of metadata so I'm going to use them last time airflow configuration so this is how we're just going to pull some configuration variables from how airflow is actually configured and then we're also going to have pandas good old pandas if you work with data science data analysis you know you need pandas and then we have sklearn and these are scikit-learn so these are different regression models that we'll be using within for our machine learning finally we have numpy which again created data science tool and Os so we can again read level from our local OS [Music] now after we've done all this we'll create our second standard thing which is just our list of all of our constants for our dag so here we have connection ID to our mini i o instance and to our postgres database so we're going to be depositing those files into our mini i o instance uh or actually no we'll be depositing the minor local files so then DB schema will be using um your data bucket name so Finance you know what it is uh and then we're just going to have an environment variable to say to use our local environment it will become apparent what that is later on I'm going to keep a little bit of mystery for you um so after we've done that we are going to create one Mammoth of a task I think this sets it doesn't set the record actually I've seen longer tasks but I think a cool 60 line task so let's break it down just kind of go step by step a lot of it's just transforming data which is what you need to do if you're doing ml surprise surprise so here we're going to be doing feature engineering and so this is finding hey what are the features of this data set that we want to identify I want to train our model on and so here we're selecting our models from scikit-learn selecting some preprocessors of scalar and an encoder to process our data then we're going to create a data frame from the average amount captured so it's going to be using that data week transformed in our previous tag for dropping certain columns that we don't need so when you're training model you typically won't want to have your sought after variables as part of the training so then what we'll do is train the test so we're going to create different training sets and different testing data set so we can generate comparison or generate models and predictions on some portions of the data and then test them on other portions of the data so kind of segment it then we will do some pandas data frame create some Panda data frame using those splits index them Define our numeric columns so just knowing what the columns are for this model then again you're calling the standard scalar so we're going to fit our X training columns to this scalar and then what we're going to do is also transform for our X test uh so this is a train test split it's scaling our numeric features so sign into a standardized scale and then we're going to hot one hot encode the categorical feature product type so encoding it as a product type then you can see we're just kind of doing that again transforming and returning some data sets so this is just running those encoders running those transformations to fit them to that scale indexing them concaten or so dropping some columns that they don't need anymore resetting it so they can start testing this data dropping the product type and then what we have is our X training data set our X test data set using so now we've featured engineered and basically what that means is just we figured out that we uh are we've measured the features are the measured inputs so just the independent variables that we brought in so all that information about customers spend that customer satisfaction and the target variable so we're using all of those different variables to predict how much is a customer going to actually spend uh with us based on their amount of customer satisfaction that's the root of this ml uh task you know we want to figure out the important information on hey when someone locks in the door they're happy they're going to spend a hundred dollars and then we can just focus on Happy customers or make the unhappy customers happy but one's easier than the other um so now we're going to actually take these feature Engineers or take the engineering features and the targets we've configured all the different details we need for our ml model and then what we're going to do is train that model so you can see here we have Define train model where we're bringing that feature engineering table the model class the hyper parameters all the different things we need to actually run our model and then what we'll be doing is training that model so here you can see we're going to read in the X train all you know the feature engineer tables contains our x-trained data frames our testing data frames read them all into this next training model task then indexing them fitting them to the model or training data sets to the model then what we're doing is using those predictions to try and use so extreme X test to actually try and predict our y values which is you know how much money they're going to spend so which is average amount captured here so then we're going to use Smart two scoring and R2 scoring just saying hey how accurate do we expect this model to actually be you know is it going to actually predict our customers or is this just all kind of correlation not causation um so we're gonna get that R2 score save it and then and that'll be for each individual feature so it'll tell us how much each individual feature contributes to the actual predicted value and then for future importances we're going to read through that data frame and we are going to create a data frame that says hey eat for each call each feature each column each data points Point how important is that to the actual eventual model outcome then we're going to sort those values by importance we're going to print them out and then we're going to also show their actual raw R2 coefficients so how important they are and then what their corresponding R2 coefficient is we have R2 train R2 tests printing those out for our convenience and then we also have our y train to data frame Penance data frame just collecting all those different values that we created and again returning them so we can use these data sets Downstream to actually generate those nice pretty grass that we saw at the beginning so now that we're done setting up our model training setting up our ingestion um we are in our future engineering we'll then declare our dags so here just setting our schedule to be whenever our previous elt job completed then we're going to want to trigger this NL pipeline to consume that clean data and then use it for its ml training and predictions so after that what we're going to do similarly to our last stag is we're going to again just declare that feature engineering table using that model of satisfaction we created earlier and then what we'll do is actually set some configurations so if it's production actually yeah so if it's production this is just kind of it's not actually how we're gonna be working here but if environment was prod this will get all the details about kubernetes and then run them on a kubernetes task um using the house API so if you want to run these and sometimes you know with ML you want to run on a specialized kubernetes cluster so this is how you can do that but since we're going to be running locally we also have this lcif environment equals local and this is just a global variable that you can change basically hey we're deploying instruction switch over so this is you know kind of an example of how you might want to implement that then since we're doing locally I'll just kind of focus on what's going to happen here where we're going to then just train our model task on our local machine using our local compute on a Macbook so it should be able to handle the job and then what we'll do is say um else some neither these two environments then we're going to just declare there's been an error there's been an issue and then finally or not finally the model results so we're going to save them so we're going to initialize that using that model task definition we created earlier train model test a partial and then expand it for these three different models so we're actually going to train um our data on the random Forest regressor the rich CV linear regression and lasso CVN lasso linear regression system category so that is instead of just one ml model we're actually going to do three so you get three for the price of fund um and then our next step as I promised is we're going to actually plot all of these results on some pretty Matlab plots and so obviously Matlab and site plot take a lot of configuration to get right so you'll kind of see that Illustrated here where we have our giant plot mapped results and so here we'll have I'm sorry I got you on by the data dog here uh we're just using the map plot lib Seaborn including plots getting the Y train everything that all those good data points and then we're just plotting them on those Scatter Plots so when you're done with this dag and you'll just kind of see all the different details here configuration just defining the look of the dags so you do a lot of that shockingly when you're defining Dax purely via code and then we're going to create them dynamically to say hey for all these Model results create some plots and then once you're done these will all appear within your plots folder as you can see here I actually already took that help of running ahead of time and you can see that I actually just remove my things here on the little screen there um I have my true versus predicted values and you can see them plotted over linear regression lasso you can see again linear regression here random Forest so this is saying hey based on you know how much or based on how happy my customers are based on these values um then this is how much I predict I'm either going to spend on the left here so really cool functional uh little use case here I really recommend you go try it out at home uh it's a very cool way to kind of just get into the beginning of ml processing and the way you'll actually want to just trigger all these dags is first you just trigger your helper dag then turn on your Finance CLT and ml job and you'll see that they are because they're scheduled um on the previous one actually completing you'll see that here we have the finance CLT is linked in the finance ml but then also our finance elt dag because it's starting with that waiting for ingest asynchronous task it's just going to sit there and wait on that trigger node so that's why you don't actually need to trigger or schedule it because it's just going to constantly thinking that and so that is all I had for you today I hope you enjoyed please go try this out Big Ups big thank you to my colleague tomorrow for creating this and have a great rest of your day get a guy out
Show more