Today, I will explain how to create a simple job in Talend Open Studio to perform a task similar to the one shown in my previous post, which is exporting data from a database to a CSV file and send it in an e-mail. The main idea is to show how simple it is to use Talend Open Studio to create a simple job.
TOS (Talend Open Studio) is a type of application that is easier to explain in a video tutorial, rather than in a blog post due to drag and drop workflow. As I like detailed explanations, you will see a long list of steps and lots of screenshot animations, I just hope you don’t get lost. Once you get used to Talend, you will find it simple and you will be able to create simple jobs in minutes.
Talend is an open source data integration platform, that allows you ingest data from any source and helps you build data pipelines very fast. To simplify, it is a drag and drop ETL tool that is easy to learn and use. Talend provides various software and services for data integration, data preparation, data quality, Big Data and more.
In computing, extract, transform, load (ETL) is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s) or in a different context than the source(s).
Source: Wikipedia
Talend will create a Java code for your job without requiring you to write a single line of code. You don’t need to have Java knowledge to use it, but it helps especially when encountering errors. Talend is based on the Eclipse IDE and you can run on multiple operating systems (e.g. Windows, macOS and Linux). One of the downsides is the high memory consumption, but this is common in larger Java applications.
Here a comparisson between Talend Open Studio and the user-based paid subscription versions. Depending on your needs, Talend Open Studio can be sufficient.
Below a list of useful links for you to get Talend Open Studio for Data Integration up and running:
Since Talend IDE requires Java, you need to install Java first.
One of the first steps when opening Talend Open Studio is to ensure you have the Java settings right. On the menu, go to Windows > Preferences
As mentioned earlier, this tutorial will explain how to use Talend Open Studio to create a job to execute the following steps:
First, we have to create a job:
Because I will be extracting data from database, we need to create a Db Connection.
Now that we have a connection, we will create a query for our job.
I was playing around with different queries in TOS and I missed to use the correct query before taking the screenshots. The simplified query should be:
SELECT CONVERT(DATE,p.payment_date) AS DT, sum(p.amount) AS AMOUNT
FROM payment p
WHERE p.payment_date >= '20050727'
AND p.payment_date < '20050803'
GROUP BY CONVERT(DATE,p.payment_date)
ORDER BY DT
Notes:
WHERE mydatecolumn > DATEADD(DAY, -7, CONVERT(DATE, GETDATE())) and not a fixed date.-- hides all of the following SQL code, including the code on all the following lines. You should use block comments /*comment*/ instead. You may be able to run the query in SQL Builder using a line comment, but you could encounter an error when trying to run the job.With our database connection and query ready, from now on it is mainly drag and drop. First, I will start with the export to a CSV file:
Note: It is important to understand the difference about the triggers OnSubjobOK and OnComponentOK. Check the short description at Talend Help website.
Instead to simply drop the tSendEmail component to send the email with the file attached, I decided to add the following steps:
Now, let me add the components and show the small changes done:
The result:
To see the result when no data is exported, I changed the query so that there is no data in the result. After running we see a different flow (as expected).
And the second email:
In case you are familiar with Java, you can look at the source code via the code tab. But note that it is not possible to edit the generated code directly, you can include your own Java code in a job using one of these methods:
Source: Talend Community
You can export, edit the code and compile without using Talend. However, if you open the job code in a Java editor and make changes, as soon as you run the job again through TOS, it will overwrite the changes done manually.
To build the job so that you can run it without the IDE, right click on the job you created, adjust any settings it the Build Job window (if necessary) and click Finish. Talend will create a zip file containing all the needed files to run the job. You just need to have Java installed.
Here is the same job running on macOS, Ubuntu and Windows:
I hope you enjoyed this simple explanation about Talend Open Studio. Keep in mind that I barely scratched the tip of the iceberg, Talend is very powerful. For example, you can debug jobs, add logs, execute external code, execute data manipulations, and so on.
I was introduced to Talend for the first time in 2012, but I regret that I have not used Talend more often in the past. I could have saved many hours if I hand’t taken the hard way of coding ETL tools from scratch.
In case you need an ETL tool and don’t want to get your hands dirty, don’t hesitate to contact me.