Figure C5g. Data Lake (DL), Data Pool (DP), and Project Data Flow (PDF)

Figure C5g. Data Lake (DL), Data Pool (DP), and Project Data Flow (PDF)
More informations

PDF (Project Data flow)

Project data flow (PDF) is often defined using a model or diagram to map the entire project data flow process.

The data flow passes from one data pool (DP) to the next within a program or a system, considering how data changes its form (content, targets, functions) during the process.

PDF allows data movement through software and hardware, mainly in both tools. In this case, the data flow describes any data path through filters of the SPC Concept and its data structure in each specific data pool (DP).

Project Data Flow (PDF) covers all life cycle data of a project (project preparation and implementation stages), including production, services, and other working steps till the project's moral or physical end.

The Added Value (AV) of the whole life cycle process is not measurable directly. It is not the aim.

The PDF's Added Value builds on the distributed principles, which means that each Data Pool (DP) is an independent unit of its life in the project life cycle. 

The aim is to apply in/out data direction for any data pool task (DP) task via the in/out regulation. It opens the space for Blockchain and Smart Contracts machines technologies.

Project results depend on the image of the Added Value generated in each Data Pool (DP). The emphasis on control mechanisms and the implementation of control is shifted from humans to machines.

So, the evaluation of results in each DPs should respect two levels:

  1. The first level shows how the projects are successful. 

  2. The second level indicates how well the evaluation methodology works for the relevant DP. 

Both levels should generate data for a broader range of project evaluations (benchmarking) in the province, state, or region.

DL (Data Lake)

A data lake is a system or repository of data stored in its natural/raw format, usually, object blobs or files. Object blob is a large binary object with a function collecting binary data stored as a single entity.

A file is the standard storage unit in a computer, and all programs and data are "written" into a file and "read" from a file.

A folder holds one or more files and can be empty until it is filled. Remembering the core IT terminology (for communication between IT experts and other stakeholders) is practical.

Data Lake (DL) can operate as a single data store, including raw copies of source data of a system, sensor data, social data, etc. DL is transformed data for reporting, visualization, advanced analytics, and machine learning tasks.

A Data Lake (DL) can include structured data from relational databases (rows and columns), semi-structured and unstructured data (emails, documents), and binary data (images, audio, video).

DL can be established "on-premises" (within an organization's data centers) or "in the cloud" (using cloud services from vendors such as Amazon, Microsoft, or Google).

In this context, the data funnel (system) is an internal data structure of the Data Lake (DL) and fulfills two tasks.

  • The first is about the life cycle of projects. It prefers an accent on the preparation and implementation phases of any project).

  • The second is simply a marketing strategy of a new organization (just now created by project) or a running organization (with links to projects through organization development and safety needs).

Therefore we distinguish between the project pipeline (the funnel) and the sales funnel.

1 Project pipeline management.

It is an essential component of project or project portfolio management. It involves steps to ensure that an adequate number of projects are being evaluated and screened at various stages of the intake process to meet the project's strategic objectives and organization.

Other factors, such as a project budget and an organization budget and resource capacity, also come into play in such a complex task.

Any organization under a project burden resists overload work, which can be a risk factor for completing organizational and strategic goals (the whole process of project preparation and implementation).

2 Sales funnel management.

It is a pictorial representation of how to lead travels through the sales process (in organizations and projects supply chains). The method of overseeing this customer journey is called sales funnel management.

Even the best salesperson may fail to meet their targets if they are not good at managing their sales funnel in the specific supply chain.

While the exact steps of a sales funnel can vary dramatically, it typically follows the same general rules (processes): awareness, interest and evaluation, desire, action, and re-engagement.

In any case, it is vital to comment that there is minimal practice experience (not yet developed enough) with a data funnel for 1. and 2. tasks in a Data Lake environment for global digital transformation needs of the market growth and development.

In any case, it is vital to comment that there is minimal practice experience (not yet developed enough mainly for new technologies, e.g., Blockchain and Smart Contracts) with a data funnel (for 1. and 2. task) in a Data Lake environment in the present market.

DP (Data Pool)

A data pool is a centralized repository of data where trading partners (retailers, distributors, or suppliers) can obtain, maintain, and exchange product information in a standard format.

Suppliers can upload data to a data pool, which retailers receive through their data pool. The data can be anything from supply chain information to employee records.

The data can be generated automatically or manually for analysis using the entire data set or a subset of values. Database software is designed to handle the various functions associated with data pools, including synchronization and verification of information.

In a fundamental sense, any set of data collected for the analysis is a data pool (DP). Such a data collection method can affect the accuracy of all data pools (DPs). And thus, the accuracy of the analysis outcome is also threatened.

Manual data collection can be reasonably reliable if the data set is part of a simple quantitative experiment involving a vast data set. On the contrary, if the data set is significant, an automatic data collection process will be the most accurate and precise.

The accuracy and preciseness of the values in a data set are always important. It fully applies to projects in all their phases. The project data flow (PDF) processes are the construction, purpose, and use of the data pool described in more detail for individual stages in each pop-up stage DP1, DP2, DP3, DP4, and DP5.

Other data (examples of different data operations)

Loosely, without a direct link to Figure C5g, three examples of other data concepts are mentioned below (e.g., finding a connection between general-level knowledge and professional expertise).

DATA MESH

It is a relatively new concept that became one of the fastest-growing trends during 2020. It extends the paradigm shift introduced by the microservices architectures and applies it to data architectures, enabling agile and scalable analytics and machine learning or artificial intelligence.

Data mash provides an alternative to the "centralized" organizational and architectural pattern of the data lake with a distributed and decentralized architecture designed to help enterprises to:

  1. Enable agility and business scalability

  2. Reduce the time-to-market of the business initiatives 

  3. Lower maintenance costs

  4. Allow a fair and transparent internal cost allocation

DATA SILO

A data silo is a collection of data held by one group that is not quickly or thoroughly accessible by other groups in the same organization. Finance, administration, HR, marketing teams, and other departments need different information to do their work.

Removing data silos can help you get the correct information at the right time to make good decisions. In simple terms, working in silos means operating in a kind of bubble, on your own or as part of an insular team or department.

It is a metaphor for groups of people (e.g., a team is a 'container' of colleagues) who work independently from other groups.

DATA MINING

Data Mining is a process of finding potentially valuable patterns from massive data sets. It is a multi-disciplinary skill that uses machine learning, statistics, and AI to extract information to evaluate future events' probability.

Data Mining in the statistics operations assists marketing, fraud detection, scientific discovery, etc. 

Data Mining is all about discovering hidden, unsuspected, and previously unknown yet valid relationships amongst the data. Data mining is also called Knowledge Discovery in Data, Knowledge extraction, data/pattern analysis, information harvesting, etc.

EPS (Existing products and services)

When project preparation and implementation, the relevant supply chain (products, services, and works) is composed of available products, usually in its immediate vicinity.

According to public procurement rules, any selected project supplier can buy the necessary material, rent the essential services and ensure the delivery of the required work.

The contracts regime is standard for a broader project area (by territory, professional specialization, or according to other rules). The main problem is turbulence in the inflation and other influences on the current prices, which disrupt the stability of supplies and customer relations.

NPS (New products and services)

When the project is closed, the newly acquired work will fulfill its function through the newly established organization (it supplies new products, services, and jobs to the market).

The project was in the role of a consumer of the supply chain (in all stages: the cycle renewal, expansion, or modernization). The project drew resources (material, financial, logistical, etc.).

After the project's transformation into a new work (object for production, services, and jobs), a new "object" came into being, which stopped taking goods from the supply chain and, on the contrary, started expanding it.

It is a well-known, recurring cycle supported by project management technologies, e.g. Blockchain and Smart Contracts.

Inputs of new innovative projects innovated improve the environment of the global economy. It is mentioned in this chapter and more deeply discussed in chapters D and F.  The reason is to underline the potential and subsequent standardization of selected supply chains (especially infrastructure investments).

For more details, see the WEMAF and HEDEC project drivers described below, especially in Chapter E.

The DLF in Figure C5g defines the environment for the acquisition and protection of values ​​in a wide range of social and economic development (SED), disaster risk reduction (DRR), and humanitarian aid (HA) projects.

These projects have passed filters through the Funnel by entering and exiting five Data Pools (DPs) in two trajectories. One model (labeled as x, y) suggests that project values are open to more comprehensive applications.

The other one (marked as s, t) indicates that these are project values ​​tightly bound to the preset drivers for each project (according to the characteristics of the content, time, and cost of individual projects).

Figure C5g complements a series of pop-ups and other pop-ups describing Data Pools (DP 1, 2, 3, 4, and 5) above Figures C5h and C5i.