Abinitio
Abinitio
various 16..Rec fie: (Recovery file)- When a graph is run along with it a recovery file is created,because if any failure occur we
sources transform the data as per the business requirement and load the data into DWH can start at that point only.
TH
Abinitio can support critical application llike : DWH, Batch processing , Continuous Processing , Data movement , data 4 HIGHEST SALARY: Select min(salary) from (select top3 * from employee order by salary desc)third order by
nd
transformation etc… salary asc--- (2 HIGHEST –select max(salary) from employee whre salary not in (select max(salary) from employee)
Benefits: transforms huge volumes of data effectively, very quick, high performance ,huge options of data bases Nth HIGHEST—select * from employee e1 where(n-1)=(select count(distinct(e2.salary)) from employee e2 where
(oracle,sl, plsql,postgre sql…) checkpoint, debugging and restartability feature and available e2.salary>e1.salary), NEW TABLE CREATE: select * into newtable from old table( without data where 1=0;)
Architecture: 2 Tier architecture GDE, Co>operating system, EME(enterprise metadata environment) DUPLICATE FIND: select * ,count(*) from employee group by empid having count(*)>1 DELETE DUPLICATE:
GDE: garphical development environment—GDE is graphical application for developers which is used for designing Delete from(select *,row_number()over(partition by empid order by empid) as rn from employeetable)where rn>1
and running abinitio graphs . it can communicate wit Co-Os SUBSTRING: select substring(‘fullname’,1,charindex(‘_’,fullname) as firstname,
CO-OS: co-os is a program provided by abibitio which operates on the top of the native OS and is a base for all substring(‘fullname’,charindex(‘_’,fullname)+1,len(‘fullname)) as lastname from employee
abinitio process. PRIMARY KEY: A PRIMARY KEY constraint uniquely identifies each record in a database table. All columns
EME: Eme is a repository which holds unix like structure and is used for metadata management in abinitio participating in a primary key constraint must not contain NULL values
GDE we have 2 sections : 1. Component organizer, 2.sandbox SURROGATE KEY: the key is generated when a new record is inserted into a table. When a primary key is
Component organizer: lot’s of components present in the abinitio components basically input file output file sort,filtter generated at runtime, it is called a surrogate key.
by expression, normalize, de-normalize, FACT TABLE: Fact table basically represents the metrics of a measurements and facts of business process, facts are
Serial file : In serial data will process one-by-one first record is comnpleted after second record completed …… linked with dimentions in the table. Addictive facts, Semi addictive facts, Non addictive facts
Parallel File: multi files ,data filffed into partitions all the partitions data flowing parallel in abinitio ,increases performan DIMENSION TABLE: Dimensions are descriptive data which is described by the keys dimensions are organized in the
Parallelism : Component pallelism: is used by a graph with multifile process executing simultaneously on separate tables called Dimension Table, Confirm dimension, Junk Dimension
data , Data Parallelism: is usd by a graph that works with data divided into segments and operates on each segment SCD(Slowly Changing Dimension):SCD is a dimension that stores and manages both current and historical
respectively. Pipeline parallelism: A graph with multifile components running simultaneously on the same data data over time in a data warehouse. It is considered and implemented as one of the most critical ETL tasks
LAYOUT: layout determines the location of a resource , a layout either serial or parallel, A serial layout specifies a
in tracking the history of dimension records.
single computer and a single directory on that computer, A parallel layout specifies a multiple computers with multiple
SCD1: SCD1 the new data overwrites the existing data. Thus the existing data is lost as it is not stored anywhere else.
directories across the computers.Control Center: it is a tool for scheduling job, job is travelled in control center
SCD2: Creating another dimension record, A new record is created with the changed data values and this new record
Plan and PSET testing -- Plan is the combination of PSETS which are running in the Sequential /Parallel Manner.
becomes the current record.
Majorly Plan has end to end Process embedded into Single Plan.
JOINS: Joins clause used to combine 2 or more tables related columns between them . Inner join, left join ,right join ,outer join
PSET is the Parameter Set which is defined on the Generic Graph with different Parameters. We use the same graph UNION & UNION ALL: union operator used to combine result set of 2 or more select statements, every select statement with in
to load Different tables by passing different parameters and creating different PSETS. union must have same number of columns & same data types it is not return duplicate values, UNION ALL : All records with dupli
CONSTRAINTS: UNIQUE, NOT NULL, CHECK, DEFAULT, INDEX, PK , Foreign Key
1.Reformat changes the record format of data records by dropping fields, or by using DML expressions to add OLTP: Online
transaction processing captures, stores, and processes data from transactions in real time.
fields, combine fields, or transform the data in the records. By default reformat has got one output port but
OLAP: Online analytical processing uses complex queries to analyze aggregated historical data from OLTP
incrementing value of count parameter number. – Count, select,transform, output index, output indexs
Smoke testing : Smoke Testing is performed to ascertain that the critical functionalities of the program are working
2. output index: Always returns a integers, output indexes- returns a vector (0,1)
fine. Smoke testing exercises the entire system from end to end
3.Rollup Transfom functions: It allows the users to group the records on certain field values ,it is a multi stage
Sanitary Testing: Sanity testing is done at random to verify that each functionality is working as expected.
function and contains Initilize,2.rollup,3.finalize functions
Normalization: Normalization is a database design technique which is used to reduce redundant data and unwanted
4.Filter by expression : Filter by expression is used to filter records based on the DML expressions.. parameters –
data or repeated data - 1nf,2nf,3nf
select expression, reject thersold, Logging.
STAR SCHEMA: A star schema contains both dimension tables and fact tables in it. In star schema each
5. m_dump syntax: m_dump <dml_path><inputfile.path>
dimension is surrounded by fact tables.
6.M_Queue – Multi queue are the mutil way queues (8/4 way) it works on FIFO conceptit is used as a load ready files
SNOWFLAKE SCHEMA: A snow flake schema contains all three- dimension tables, fact tables, and sub-
7.in MFS if we remove one of the data partition file will through error- Yes it through error f data is deleted (or truncated) in a
partition, ab-initio does not throw the error. If a partition itself is deleted, ab-initio is going to throw the error. dimension tables. Each dimension is normalized into sub-dimensions.
8.du command: du command is used to estimate file space usage space used under a particular directory or files on CHECK-OUT PROCESS –While working on any Project, we need to Check out the code of EME Project path. We
a file system, need to check out latest version of the Code into Our Sandbox (to make a local Copy).And then we can start working
DF: df command is used to display the amount of available disk space for file system. on the Project in our Local Copy. Once all the changes are done/ Code modified we can check –In the copy of our
9.Teradata utilities : Transferring a large amount of data can be done by using the various teradata utilities i.e- Project (Sandbox) into EME with a different version.
BTEQ(basic Teradata Queues), fastload,multiload,Tpump Transfor data from host to Terradata CASE: the case statements goes through conditions and return a value when first condition is met(like if –then-else)
10.Primary index is used to specify where the data resides in Teradata RANK, Dense_Rank, Row_nuber: it is assigns rank to each record in a table it skips the similar values
11.How to dalete empty line from unix file- grep.file .txt VIEW: View is virtual table it acts as a actual table the views are not stored in the database, no memory concept is
12.Abinition RC--- In abinition RC file ,EME connection details are there, connection information is there for MATERIALIZED VIEW: The results of a view expression are stored in a database system. It has some store memory
connecting one server to another server SUB Query : A Subquery is a SQL query within another query. It is a subset of a Select statement whose return
13.SFTP( secure file transfer using SSH), SCP(Secure Copy)--- These general commands to send file from one values are used in filtering the conditions of the main query.
environm to other environment, SCP is a protocol that allows trasferring file securely from a local host to a remote Correlated SUB Query: a correlated subquery is a subquery that uses values from the outer query in order to
host, SFTP is a protocol that allows file accessing transferring and managing over a reliable data stream which is complete. Because a correlated subquery requires the outer query to be executed first, the correlated subquery must
faster than SCP run once for every row in the outer query. It is also known as a synchronized subquery.
14.AbinitioQueue- Queue is a data strcture where we store data we can read and write the data by using subscribe
and publish component and queue helps us to store the records in sequence of files. Abiinitio Queue is a FIFO(First
in F out) They provide record-based persistence.,Publishers write data to the queue.
15.Sandbox : Sandbox – Sandbox is the local copy of the Ab-Initio EME Project confined to a specific User. Multiple
users can have copies of the EME Project in their sandboxes. User can work into their Sandboxes for the same
project at the same time. --, 2 types PublicSB- It’s the one that is visible to other projects, Private: Cannot be
accessible to other projects